graphics - glClear() Takes Too Long - Android OpenGL ES 2 -
i'm developing android app using opengl es 2. problem encountering glclear()
function taking long process game appears jittery frames delayed. output of run of program timing probes shows while setting vertices , images atlas takes less 1 millisecond, glclear()
takes between 10 , 20 milliseconds. in fact, clearing takes 95% of total rendering time. code based upon common tutorials, , render
function this:
private void render(float[] m, short[] indices) { log.d("time", "--start render--"); // handle vertex shader's vposition member int mpositionhandle = gles20.glgetattriblocation(rigraphictools.sp_image, "vposition"); // enable generic vertex attribute array gles20.glenablevertexattribarray(mpositionhandle); // prepare triangle coordinate data gles20.glvertexattribpointer(mpositionhandle, 3, gles20.gl_float, true, 0, vertexbuffer); // handle texture coordinates location int mtexcoordloc = gles20.glgetattriblocation(rigraphictools.sp_image, "a_texcoord" ); // enable generic vertex attribute array gles20.glenablevertexattribarray ( mtexcoordloc ); // prepare texturecoordinates gles20.glvertexattribpointer ( mtexcoordloc, 2, gles20.gl_float, false, 0, uvbuffer); // handle shape's transformation matrix int mtrxhandle = gles20.glgetuniformlocation(rigraphictools.sp_image, "umvpmatrix"); // apply projection , view transformation gles20.gluniformmatrix4fv(mtrxhandle, 1, false, m, 0); // handle textures locations int msamplerloc = gles20.glgetuniformlocation (rigraphictools.sp_image, "s_texture" ); // set sampler texture unit 0, have saved texture. gles20.gluniform1i ( msamplerloc, 0); long cleartime = system.nanotime(); gles20.glclear(gles20.gl_color_buffer_bit); log.d("time", "clear time " + (system.nanotime() - cleartime)); // draw triangles gles20.gldrawelements(gles20.gl_triangles, indices.length, gles20.gl_unsigned_short, drawlistbuffer); // disable vertex array gles20.gldisablevertexattribarray(mpositionhandle); gles20.gldisablevertexattribarray(mtexcoordloc); log.d("time", "--end render--"); }
i have tried moving png atlas /drawable-nodpi
had no effect.
i have tried using glflush()
, glfinish()
functions well. interestingly, if not call glclear()
must automatically called. because total rendering time still high when called, , there no remnants of previous frame onscreen. first call glclear()
time-consuming. if called again, subsequent calls 1 or 2 milliseconds.
i have tried different combinations of parameters (such gles20.gl_depth_buffer_bit
), , using glclearcolor()
. clear time still high.
thank in advance.
you're not measuring think are. measuring elapsed time of opengl api call meaningless.
asynchronicity
the key aspect understand opengl api pass work gpu. easiest mental model (which largely corresponds reality) when make opengl api calls, queue work later submitted gpu. example, if make gldraw*()
call, picture call building work item gets queued up, , @ point later submitted gpu execution.
in other words, api highly asynchronous. work request making api calls not completed time call returns. in cases, it's not submitted gpu execution yet. queued up, , submitted @ point later, outside control.
a consequence of general approach time measure make glclear()
call has pretty nothing how long takes clear framebuffer.
synchronization
now established how opengl api asynchronous, next concept understand level of synchronization necessary.
let's @ workload overall throughput limited gpu (either gpu performance, or because frame rate capped display refresh). if kept whole system entirely asynchronous, , cpu can produce gpu commands faster gpu can process them, queuing gradually increasing amount of work. undesirable couple of reasons:
- in extreme case, amount of queued work grow towards infinity, , run out of memory storing queued gpu commands.
- in apps need respond user input, games, increasing latency between user input , rendering.
to avoid this, drivers use throttling mechanisms prevent cpu getting far ahead. details of how handled can complex. simple model, might blocking cpu when gets more 1-2 frames ahead of gpu has finished rendering. ideally, want work queued gpu never goes idle graphics limited apps, want keep amount of queued work small possible minimize memory usage , latency.
meaning of measurement
with background information explained, measurements should less surprising. far scenario glclear()
call triggers synchronization, , time measure time takes gpu catch sufficiently, until makes sense submit more work.
note not mean all submitted work needs complete. let's @ sequence hypothetical, realistic enough illustrate can happen:
- let's make
glclear()
call forms start of rendering framen
. - at time, frame
n - 3
on display, , gpu busy processing rendering commands framen - 2
. - the driver decides should not getting more 2 frames ahead. therefore, blocks in
glclear()
call until gpu finished rendering commands framen - 2
. - it might decide needs wait until frame
n - 2
shown on display, means waiting next beam sync. - now frame
n - 2
on display, buffer contained framen - 3
not used anymore. ready used framen
, meansglclear()
command framen
can submitted.
note while glclear()
call did kinds of waiting in scenario, measure part of elapsed time spent in api call, none of time used clearing framebuffer frame. sitting on kind of semaphore (or similar synchronization mechanism), waiting gpu complete submitted work.
conclusion
considering measurement not directly helpful after all, can learn it? unfortunately not whole lot.
if observe frame rate not meet target, e.g. because observe stuttering, or better because measure framerate on time period, thing know sure rendering slow. going details of performance analysis topic big format. give rough overview of steps take:
- measure/profile cpu usage verify gpu limited.
- use gpu profiling tools available gpu vendors.
- simplify rendering, or skip parts of it, , see how performance changes. example, faster if simplify geometry? might limited vertex processing. faster if reduce framebuffer size? or if simplify fragment shaders? you're limited fragment processing.