I worked quite some time with cocos2dx now. I looked alot into cocos2d::Renderer and noticed some flaws and possible performance issues here and there.
1.) The Renderer’s VBO gets updated multiple times
If you submit some render commands in an order like this:
renderer->addCommand(TrianglesCommand());
renderer->addCommand(CustomCommand());
renderer->addCommand(TrianglesCommand());
the Renderer under the hood will do something like this:
...
glBindBuffer(GL_ARRAY_BUFFER, vertexVBO);
glBufferData(GL_ARRAY_BUFFER, ...);
glDrawElements(...);
...
glBindBuffer(GL_ARRAY_BUFFER, vertexVBO);
glBufferData(GL_ARRAY_BUFFER, ...);
glDrawElements(...);
...
The problem here is that you call glBufferUpdate on the same vbo multiple times. Generally there’s no problem with that, but in this case, you’re updating a vbo whose data is needed by a previous draw call. In this case - according to this - the current thread halts and waits until all draw commands that could be affected by this update are finished. This could lead to some serious performance issues.
As an example, on my Samsung Galaxy S (I know it’s a pretty old device) this leads to an extra frame time of 14ms which is unacceptable.
There are multiple solutions to this. One would be to simply use a new vbo for each glBufferData. Another one would be to just use client-side vertex arrays.
Both of them are okay I guess, but I can think of a better one.
The third approach would be to gather all vertices and indices from all TrianglesCommands and QuadCommands before executing any RenderCommands, and then update the vertex and index vbo with the data one time before any rendering is done. This would also get rid of the overhead of having to call glBufferData multiple times. If there needs to be any flushes between the TrianglesCommands, add a base index to the glVertexAttribPointer calls. Like this:
glVertexAttribPointer(..., offset_in_vbo_in_bytes + offset_of_vertex_attrib);
2.) Every vertex submitted via TrianglesCommand gets transformed on the cpu
This one is generally not a bad thing, as it allows you to render lots of simple sprites with different transforms in one batched draw call.
But in a case where you submit a TrianglesCommand with lots of vertices, say 3000, it would be better to let the gpu handle all the vertex transformation, even if this would make batching unavailable for this TrianglesCommand.
A solution would be to add a flag to TrianglesCommand indicating wether the vertices should be transformed on the cpu or not.
3.) The Renderer can only handle vertices of type V3F_C4B_T2F
If you want to renderer anything using a different vertex layout, you have to wrap it in a CustomCommand and must implement batching for these yourself. Especially for opengl newbies this could be problematic.
It would be cool if there would be a render command like this:
ArbitaryVertexCommand cmd;
cmd.init(glProgramState, texturesUsed, vertexAttribFormat, vertices_as_byte_ptr, size_of_vertices, indices_as_short_ptr, size_of_indices, gl_draw_type)
// gl_draw_type is one of GL_POINTS, GL_TRIANGLES, etc.
It would be also nice if the renderer could automatically batch these if possible.
4.) GLProgram shader can only use the default glsl version
This one is pretty self-explanatory.
Normally, if you want to use something else than the default glsl version in a shader you type something like this:
#version 130
on top of your shader file. But using the GLProgram class, cocos2dx puts some pre-defined uniforms before your actual shader source. This is problematic as the shader compiler expects the #version line as the first line before anything else.
It would be nice if there would be a way to set the glsl version of the shader in GLProgram before it gets compiled.
5.) GLProgram puts a lot of pre-defined uniforms that you won’t need in most cases on top of the shader
Again, pretty self-explanatory. This feature could be helpfull in some cases, but again in most cases you won’t need it, so it would be good if there was an option to turn off specific ones.
6.) GLStateCache doesn’t handle all gl states
I recently played around with Qualcomms Adreno gpu profiler and noticed that there were a lot of redundant (gl calls that don’t change any state) and query (glIs, glGet) calls. This is due to the fact that the GLStateCache doesn’t keep track of things like depth test, blending, framebuffer binding, etc.
This - especially the amount of query calls - could lead to some performance issues one mobile devices.
Okay that’s all I noticed till now.
I’m going to make a version of the Renderer class that adresses the first three issues I mentioned, so you can see what exactly I mean and how they could be implemented/fixed, and upload it here.