Question

当我学习OpenGL ES时，我稍微修改了iPhone SDK的GLSprite示例，结果发现它非常慢。即使在模拟器上（硬件最差的情况下），所以我一定做错了什么，因为它只有400个纹理三角形。

const GLfloat spriteVertices[] = {
  0.0f, 0.0f, 
  100.0f, 0.0f,  
  0.0f, 100.0f,
  100.0f, 100.0f
};

const GLshort spriteTexcoords[] = {
  0,0,
  1,0,
  0,1,
  1,1
};

- (void)setupView {
    glViewport(0, 0, backingWidth, backingHeight);
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    glOrthof(0.0f, backingWidth, backingHeight,0.0f, -10.0f, 10.0f);
    glMatrixMode(GL_MODELVIEW);

    glClearColor(0.3f, 0.0f, 0.0f, 1.0f);

    glVertexPointer(2, GL_FLOAT, 0, spriteVertices);
    glEnableClientState(GL_VERTEX_ARRAY);
    glTexCoordPointer(2, GL_SHORT, 0, spriteTexcoords);
    glEnableClientState(GL_TEXTURE_COORD_ARRAY);

    // sprite data is preloaded. 512x512 rgba8888   
    glGenTextures(1, &spriteTexture);
    glBindTexture(GL_TEXTURE_2D, spriteTexture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, spriteData);
    free(spriteData);

    glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);

    glEnable(GL_TEXTURE_2D);
    glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
    glEnable(GL_BLEND);
} 

- (void)drawView {
  ..
    glClear(GL_COLOR_BUFFER_BIT);
    glLoadIdentity();
    glTranslatef(tx-100, ty-100,10);
    for (int i=0; i<200; i++) { 
        glTranslatef(1, 1, 0);
        glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
    }
  ..
}

每次触摸屏幕或手指在屏幕上移动时都会调用drawView，并将tx，ty设置为触摸发生的x，y坐标。

我也尝试使用GLBuffer，当翻译是预生成的且只有一个DrawArray时，但是性能相同（约4帧/秒）。

===编辑===

Meanwhile I ve modified this so that much smaller quads are used (sized: 34x20) and much less overlapping is done. There are ~400 quads->800 triangles spread on the whole screen. Texture size is 512x512 atlas and RGBA_8888 while the texture coordinates are in float. The code is very ugly in terms of API efficiency: there are two MatrixMode change along with two loads and two translation then a drawarrays for a triangle strip (quad). Now this produces ~45 FPS.

Answer 1

我知道这已经很晚了，但我无法抵挡。我还是会发布这个帖子，以防其他人到这里寻求建议。

这与纹理大小无关。我不知道为什么人们会给尼尔斯评价高。他似乎对OpenGL管道有基本误解。他似乎认为对于一个给定的三角形，整个纹理被加载并映射到该三角形上。事实相反。

一旦三角形被映射到视口中，它就被光栅化。对于三角形覆盖的每个屏幕像素，都会调用片段着色器。默认片段着色器（OpenGL ES 1.1，您正在使用）将查找最接近（GL_NEAREST）您正在绘制的像素的 texel。它可能会查找4个texel，因为您正在使用更高质量的GL_LINEAR方法来平均最佳texel。尽管如此，如果您的三角形中的像素计数为100，那么您需要读取的最多的纹理字节数为4（查找次数）*100（像素）* 4（每个颜色的字节数）。远远少于Nils所说的。他能够让它听起来像他真的知道他在说什么，这太惊人了。

关于平铺式架构，在嵌入式OpenGL设备中采用以保留参考位置的常见。我相信每个瓷砖都会暴露给每个绘制操作，快速剔除大部分操作。然后瓷砖决定在其上绘制什么。当你打开混合模式时，这将变得非常缓慢，因为你正在使用可能会重叠并与其他瓷砖混合的大三角形，因此GPU必须进行大量额外的工作。如果你不是使用带有alpha边缘的示例正方形，而是渲染实际形状（而不是形状的正方形图片），那么在场景的这个部分关闭混合模式，我敢打赌，这将极大地提高速度。

如果你想试试，只需关闭混合并看看事物加速了多少，即使它们看起来不正确。glDisable(GL_BLEND);

Answer 2

你的纹理每个像素是512 * 512 * 4字节。这是一兆字节的数据。如果您每帧呈现它200次，您将生成每帧200兆字节的带宽负载。

大约每秒4帧，仅纹理读取就消耗800MB/秒的带宽。帧和Z缓冲器的写入也需要带宽。此外，还有CPU，不要低估显示器的带宽要求。

嵌入式系统（如你的iPhone）上的RAM速度不如台式电脑上的快。您在此看到的是带宽饥饿效应。RAM无法更快地处理数据。

如何解决此问题：

选择一个合理的纹理大小。平均每个像素应该有1个纹理像素。这样可以得到清晰明亮的纹理。我知道—这不总是可能的。要运用常识。
使用 mipmaps。这会占用额外空间的33％，但如果可能的话，允许图形芯片选择使用较低分辨率的mipmap。
尝试更小的纹理格式。也许您可以使用ARGB4444格式。这将加倍渲染速度。同时，查看压缩纹理格式。解压不会导致性能下降，因为它是在硬件中完成的。实际上恰恰相反：由于内存中的较小尺寸，图形芯片可以更快地读取纹理数据。

Answer 3

I guess my first try was just a bad (or very good) test. iPhone has a PowerVR MBX Lite which has a tile based graphics processor. It subdivides the screen into smaller tiles and renders them parallel. Now in the first case above the subdivision might got a bit exhausted because of the very high overlapping. More over, they couldn t be clipped because of the same distance and so all texture coordinates had to calculated (This could be easily tested by changing the translation in the loop). Also because of the overlapping the parallelism couldn t be exploited and some tiles were sitting doing nothing and the rest (1/3) were working a lot.

我认为，虽然存储器带宽可能成为瓶颈，但在这个例子中不是这样。问题更多是由于图形硬件的工作方式和测试设置。

Answer 4

I m not familiar with the iPhone, but if it doesn t have dedicated hardware for handling floating point numbers (I suspect it doesn t) then it d be faster to use integers whenever possible.

I m currently developing for Android (which uses OpenGL ES as well) and for instance my vertex array is int instead of float. I can t say how much of a difference it makes, but I guess it s worth a try.

Answer 5

苹果对iPhone的具体硬件规格非常守口如瓶，这对我们来自游戏机行业的人来说非常奇怪。但是人们已经能够确定CPU是32位RISC ARM1176JZF。好消息是，它具有完整的浮点单元，因此我们可以像在大多数平台上一样继续编写数学和物理代码。

将此翻译为中文：http://gamesfromwithin.com/?p=239

友情链接