English 中文(简体)
Which OpenGL functions are not GPU-accelerated?
原标题:

I was shocked when I read this (from the OpenGL wiki):

glTranslate, glRotate, glScale

Are these hardware accelerated?

No, there are no known GPUs that execute this. The driver computes the matrix on the CPU and uploads it to the GPU.

All the other matrix operations are done on the CPU as well : glPushMatrix, glPopMatrix, glLoadIdentity, glFrustum, glOrtho.

This is the reason why these functions are considered deprecated in GL 3.0. You should have your own math library, build your own matrix, upload your matrix to the shader.

For a very, very long time I thought most of the OpenGL functions use the GPU to do computation. I m not sure if this is a common misconception, but after a while of thinking, this makes sense. Old OpenGL functions (2.x and older) are really not suitable for real-world applications, due to too many state switches.

This makes me realise that, possibly, many OpenGL functions do not use the GPU at all.

So, the question is:

Which OpenGL function calls don t use the GPU?

I believe knowing the answer to the above question would help me become a better programmer with OpenGL. Please do share some of your insights.

Edit:

I know this question easily leads to optimisation level. It s good, but it s not the intention of this question.

If anyone knows a set of GL functions on a certain popular implementation (as AshleysBrain suggested, nVidia/ATI, and possibly OS-dependent) that don t use the GPU, that s what I m after!

Plausible optimisation guides come later. Let s focus on the functions, for this topic.

Edit2:

This topic isn t about how matrix transformations work. There are other topics for that.

最佳回答

Boy, is this a big subject.

First, I ll start with the obvious: Since you re calling the function (any function) from the CPU, it has to run at least partly on the CPU. So the question really is, how much of the work is done on the CPU and how much on the GPU.

Second, in order for the GPU to get to execute some command, the CPU has to prepare a command description to pass down. The minimal set here is a command token describing what to do, as well as the data for the operation to be executed. How the CPU triggers the GPU to do the command is also somewhat important. Since most of the time, this is expensive, the CPU does not do it often, but rather batches commands in command buffers, and simply sends a whole buffer for the GPU to handle.

All this to say that passing work down to the GPU is not a free exercise. That cost has to be pitted against just running the function on the CPU (no matter what we re talking about).

Taking a step back, you have to ask yourself why you need a GPU at all. The fact is, a pure CPU implementation does the job (as AshleysBrain mentions). The power of the GPU comes from its design to handle:

  • specialized tasks (rasterization, blending, texture filtering, blitting, ...)
  • heavily parallel workloads (DeadMG is pointing to that in his answer), when a CPU is more designed to handle single-threaded work.

And those are the guiding principles to follow in order to decide what goes in the chip. Anything that can benefit from those ought to run on the GPU. Anything else ought to be on the CPU.

It s interesting, by the way. Some functionality of the GL (prior to deprecation, mostly) are really not clearly delineated. Display lists are probably the best example of such a feature. Each driver is free to push as much as it wants from the display list stream to the GPU (typically in some command buffer form) for later execution, as long as the semantics of the GL display lists are kept (and that is somewhat hard in general). So some implementations only choose to push a limited subset of the calls in a display list to a computed format, and choose to simply replay the rest of the command stream on the CPU.

Selection is another one where it s unclear whether there is value to executing on the GPU.

Lastly, I have to say that in general, there is little correlation between the API calls and the amount of work on either the CPU or the GPU. A state setting API tends to only modify a structure somewhere in the driver data. It s effect is only visible when a Draw, or some such, is called.

A lot of the GL API works like that. At that point, asking whether glEnable(GL_BLEND) is executed on the CPU or GPU is rather meaningless. What matters is whether the blending will happen on the GPU when Draw is called. So, in that sense, Most GL entry points are not accelerated at all.

I could also expand a bit on data transfer but Danvil touched on it.

I ll finish with the little "s/w path". Historically, GL had to work to spec no matter what the hardware special cases were. Which meant that if the h/w was not handling a specific GL feature, then it had to emulate it, or implement it fully in software. There are numerous cases of this, but one that struck a lot of people is when GLSL started to show up.

Since there was no practical way to estimate the code size of a GLSL shader, it was decided that the GL was supposed to take any shader length as valid. The implication was fairly clear: either implement h/w that could take arbitrary length shaders -not realistic at the time-, or implement a s/w shader emulation (or, as some vendors chose to, simply fail to be compliant). So, if you triggered this condition on a fragment shader, chances were the whole of your GL ended up being executed on the CPU, even when you had a GPU siting idle, at least for that draw.

问题回答

The question should perhaps be "What functions eat an unexpectedly high amount of CPU time?"

Keeping a matrix stack for projection and view is not a thing the GPU can handle better than a CPU would (on the contrary ...). Another example would be shader compilation. Why should this run on the GPU? There is a parser, a compiler, ..., which are just normal CPU programs like the C++ compiler.

Potentially "dangerous" function calls are for example glReadPixels, because data can be copied from host (=CPU) memory to device (=GPU) memory over the limited bus. In this category are also functions like glTexImage_D or glBufferData.

So generally speaking, if you want to know how much CPU time an OpenGL call eats, try to understand its functionality. And beware of all functions, which copy data from host to device and back!

Typically, if an operation is per-something, it will occur on the GPU. An example is the actual transformation - this is done once per vertex. On the other hand, if it occurs only once per large operation, it ll be on the CPU - such as creating the transformation matrix, which is only done once for each time the object s state changes, or once per frame.

That s just a general answer and some functionality will occur the other way around - as well as being implementation dependent. However, typically, it shouldn t matter to you, the programmer. As long as you allow the GPU plenty of time to do it s work while you re off doing the game sim or whatever, or have a solid threading model, you shouldn t need to worry about it that much.

@sending data to GPU: As far as I know (only used Direct3D) it s all done in-shader, that s what shaders are for.

glTranslate, glRotate and glScale change the current active transformation matrix. This is of course a CPU operation. The model view and projection matrices just describes how the GPU should transforms vertices when issue a rendering command.

So e.g. by calling glTranslate nothing is translated at all yet. Before rendering the current projection and model view matrices are multiplied (MVP = projection * modelview) then this single matrix is copied to the GPU and then the GPU does the matrix * vertex multiplications ("T&L") for each vertex. So the translation/scaling/projection of the vertices is done by the GPU.

Also you really should not be worried about the performance if you don t use these functions in an inner loop somewhere. glTranslate results in three additions. glScale and glRotate are a bit more complex.

My advice is that you should learn a bit more about linear algebra. This is essential for working with 3D APIs.

There are software rendered implementations of OpenGL, so it s possible that no OpenGL functions run on the GPU. There s also hardware that doesn t support certain render states in hardware, so if you set a certain state, switch to software rendering, and again, nothing will run on the GPU (even though there s one there). So I don t think there s any clear distinction between GPU-accelerated functions and non-GPU accelerated functions .

To be on the safe side, keep things as simple as possible. The straightforward rendering-with-vertices and basic features like Z buffering are most likely to be hardware accelerated, so if you can stick to that with the minimum state changing, you ll be most likely to keep things hardware accelerated. This is also the way to maximize performance of hardware-accelerated rendering - graphics cards like to stay in one state and just crunch a bunch of vertices.





相关问题
OpenGL 3D Selection

I am trying to create a 3D robot that should perform certain actions when certain body parts are clicked. I have successfully (sort of) implemented picking in that if you click on any x-plane part, it ...

CVDisplayLink instead of NSTimer

I have started to implement cvDisplayLink to drive the render loop instead of nstimer, as detailed in this technical note https://developer.apple.com/library/archive/qa/qa1385/_index.html Is it ...

Can the iPhone simulator handle PVR textures?

I have a really weird problem with PVR textures on the iPhone simulator- the framerate falls through the floor on the iPhone simulator, but on the iPhone itself it works just fine. Has anyone had any ...

Calculate fps (frames per second) for iphone app

I am using an opengl es iphone application. What is the most accurate way to calculate the frames per second of my application for performance tuning?

Java - Zoom / 3D Data Visualization Libraries

What are the best libraries/frameworks for doing 3D and/or Zoom interfaces in Java? I d like to be able to do some prototyping of creating new types of interfaces for navigating within data and ...

FLTK in Cygwin using Eclipse (Linking errors)

I have this assignment due that requires the usage of FLTK. The code is given to us and it should compile straight off of the bat, but I am having linking errors and do not know which other libraries ...

热门标签