Proposal for v3.0: new rendering pipeline

Hello everyone,
my name is Carlo Morgantini and I work @ zynga with Ricardo (Quesada); we are currently working on designing a whole new rendering pipeline for cocos2d-x v.3.0.
We thought that would be useful to share the “roadmap/design” document with the community for starting a conversation around it.
Here is the link to the document on google docs:

Please note that this is still a work in progress and we of course strongly encourage a constructive collaboration around it, so comments remarks suggestions and ideas are welcome! :~~)
Thank you very much!
Best.
~~ Carlo Morgantini

Hi Carlo, thank you for your contribution.

As you said, there will be an unordered layer. When drawing nodes of this layer, child may be drawn before its parent. Right?
Does it mean that every node should have its world transform?

Really excited to see the planned features. Thanks so much! Can’t wait to see details about the new material system.

Is frustum culling only gonna be effective for static layers? Feels like it will be difficult to do for moving objects.
Any thoughts on implementing partial redraw? I haven’t thought about the details but feels like with the help of stencil buffer and a strict BVH, it’s quite doable and will likely bring a lot of performance for UI like games.

Hi,

I had a look at the Google doc, looks good to me.
Have a look at this thread btw : http://www.cocos2d-x.org/boards/6/topics/30239

I’ve posted this in the Wishlist thread, this comes really close to what I had in mind about splitting the rendering pipeline. The culling and reordering can happen on an extra thread as well so you may have a similar situation as with the panda 3d engine (see http://www.panda3d.org/blog/?p=206)

However, I have one question. How do ypu handle alpha transparency. Reordering and batching is rather trivial if you have no semi-transparent pixels. But it is quite hard to draw transparent pixels out of order? So I guess for sprites, that may have semi-transparent outline, reordering will not be possible?

Under the new pipeline, will CCBatchNodes still exist or will all batching be accomplished by either automatic batching in unordered layers or by use of batch layers?

Regarding alpha transparency and unordered layers, will at least the order of nodes associated with the same material be guaranteed to be preserved? Because that way you can think of the sets of nodes associated with specific materials as forming implicite sub-layers where the z-order of the sublayers is undefined but the z-order within the sublayers is well-defined…

Andre Rudlaff wrote:

I’ve posted this in the Wishlist thread, this comes really close to what I had in mind about splitting the rendering pipeline. The culling and reordering can happen on an extra thread as well so you may have a similar situation as with the panda 3d engine (see http://www.panda3d.org/blog/?p=206)

Interesting! I like the idea of having 3 threads.

However, I have one question. How do ypu handle alpha transparency. Reordering and batching is rather trivial if you have no semi-transparent pixels. But it is quite hard to draw transparent pixels out of order? So I guess for sprites, that may have semi-transparent outline, reordering will not be possible?

If:
* You have many semi-transparent sprites
* The z-order is changing frequently
* and the resources are in different textures id, then…

in this case, little optimization can be done. the renderer won’t be able to optimize this scenario

As you mentioned, it is common in 2d engines to have semi-transparent images, and also the Z-order (in the scene graph, not the OpenGL Z value) is super important.
So, the renderer won’t sort by “material” in order to batch (since the scene graph z-order is super important) unless the Layer has the attribute unorderer.

On other scenarios, it will be possible to do some optimizations.

Joe Wezorek wrote:

Under the new pipeline, will CCBatchNodes still exist or will all batching be accomplished by either automatic batching in unordered layers or by use of batch layers?

We haven’t decided it yet, but CCBatchNode will work the same as:

// this wil be equivalent to CCSpriteBatchNode
auto layer = Layer::create();
layer->setAttribute( Layer::BATCH );

Regarding alpha transparency and unordered layers, will at least the order of nodes associated with the same material be guaranteed to be preserved? Because that way you can think of the sets of nodes associated with specific materials as forming implicite sub-layers where the z-order of the sublayers is undefined but the z-order within the sublayers is well-defined…

If you use the Layer::UNORDERER attribute, then layer’s scene graph z-order won’t be honored. The renderer will sort by material.

The difference between Layer::BATCH and Layer::UNORDERER is:

Layer::BATCH
* Works like CCBatchNode. It create batches “manually”. The children of the Layer are going to be batched. But the user is responsible of adding them, and all the children should have the same texture id, blending mode, etc. Otherwise it will crash.
* The batch is created before sending the command to the renderer. The batch is created at “scene graph”-time.

Layer::UNORDERER
* It means that the Z-order of the layer’s children are not important
* By doing this, the renderer will sort the children by material and create batches for them.
* The batch is created at renderer-time.

Ricardo Quesada wrote:

Joe Wezorek wrote:
> Under the new pipeline, will CCBatchNodes still exist or will all batching be accomplished by either automatic batching in unordered layers or by use of batch layers?
>
We haven’t decided it yet, but CCBatchNode will work the same as:
[…]
>
> Regarding alpha transparency and unordered layers, will at least the order of nodes associated with the same material be guaranteed to be preserved? Because that way you can think of the sets of nodes associated with specific materials as forming implicite sub-layers where the z-order of the sublayers is undefined but the z-order within the sublayers is well-defined…
>
If you use the Layer::UNORDERER attribute, then layer’s scene graph z-order won’t be honored. The renderer will sort by material.
>
The difference between Layer::BATCH and Layer::UNORDERER is:
>
Layer::BATCH
* Works like CCBatchNode. It create batches “manually”. The children of the Layer are going to be batched. But the user is responsible of adding them, and all the children should have the same texture id, blending mode, etc. Otherwise it will crash.
* The batch is created before sending the command to the renderer. The batch is created at “scene graph”-time.
>
>
Layer::UNORDERER
* It means that the Z-order of the layer’s children are not important
* By doing this, the renderer will sort the children by material and create batches for them.
* The batch is created at renderer-time.

Layer::UNORDER sounds great but I’m failing to come up with a scenario where the zorder become irrrelevant. What real world scenario would you use Layer::UNORDER

Linus L wrote:

Layer::UNORDER sounds great but I’m failing to come up with a scenario where the zorder become irrrelevant. What real world scenario would you use Layer::UNORDER

You use it when you need to place images that are not overlapped, like background images (composed of many mini images), like in a tile map.

For example, one efficient way to simulate tiled map would be:

auto layer = Layer::create();
layer->setAttribute( Layer::STATIC | Layer::UNORDERER );
for(...) {
  layer->addChild( tile );
}

You could even use it for foreground images, if the game is a platformer like Super Mario.

Ricardo Quesada wrote:

Andre Rudlaff wrote:
> I’ve posted this in the Wishlist thread, this comes really close to what I had in mind about splitting the rendering pipeline. The culling and reordering can happen on an extra thread as well so you may have a similar situation as with the panda 3d engine (see http://www.panda3d.org/blog/?p=206)
>
Interesting! I like the idea of having 3 threads.
>
> However, I have one question. How do ypu handle alpha transparency. Reordering and batching is rather trivial if you have no semi-transparent pixels. But it is quite hard to draw transparent pixels out of order? So I guess for sprites, that may have semi-transparent outline, reordering will not be possible?
>
If:
* You have many semi-transparent sprites
* The z-order is changing frequently
* and the resources are in different textures id, then…
>
in this case, little optimization can be done. the renderer won’t be able to optimize this scenario
>
>
As you mentioned, it is common in 2d engines to have semi-transparent images, and also the Z-order (in the scene graph, not the OpenGL Z value) is super important.
So, the renderer won’t sort by “material” in order to batch (since the scene graph z-order is super important) unless the Layer has the attribute unorderer.
>
>
>
On other scenarios, it will be possible to do some optimizations.
I seem to have an ideal to optimize the semi-transparent image problem, we first construct a DAG (Directed Acyclic Graph) that the transparent image is an successors node of other nodes which must draw before transparent image node. Then we do a Topological Sorting on the DAG, we get a node sequence, we can reorder the nodes between the transparent image nodes. The problem is how to build a DAG in acceptable time(I have no detail ideal)
It may be an unmatured ideal……

peter yu wrote:

Ricardo Quesada wrote:
> Andre Rudlaff wrote:
> > I’ve posted this in the Wishlist thread, this comes really close to what I had in mind about splitting the rendering pipeline. The culling and reordering can happen on an extra thread as well so you may have a similar situation as with the panda 3d engine (see http://www.panda3d.org/blog/?p=206)
>
> Interesting! I like the idea of having 3 threads.
>
> > However, I have one question. How do ypu handle alpha transparency. Reordering and batching is rather trivial if you have no semi-transparent pixels. But it is quite hard to draw transparent pixels out of order? So I guess for sprites, that may have semi-transparent outline, reordering will not be possible?
>
> If:
> * You have many semi-transparent sprites
> * The z-order is changing frequently
> * and the resources are in different textures id, then…
>
> in this case, little optimization can be done. the renderer won’t be able to optimize this scenario
>
>
> As you mentioned, it is common in 2d engines to have semi-transparent images, and also the Z-order (in the scene graph, not the OpenGL Z value) is super important.
> So, the renderer won’t sort by “material” in order to batch (since the scene graph z-order is super important) unless the Layer has the attribute unorderer.
>
>
>
> On other scenarios, it will be possible to do some optimizations.
I seem to have an ideal to optimize the semi-transparent image problem, we first construct a DAG (Directed Acyclic Graph) that the transparent image is an successors node of other nodes which must draw before transparent image node. Then we do a Topological Sorting on the DAG, we get a node sequence, we can reorder the nodes between the transparent image nodes. The problem is how to build a DAG in acceptable time(I have no detail ideal)
It may be an unmatured ideal……

I have an algorithm for may ideal:

  1. We set a flag for every semi-transparent node in build time.
  2. We can sort the semi-transparent nodes in Z order(ASCE), we can mark this sequence as TNode_list;
  3. We can mark other nodes as ONode_list;
    Here the algorithm:

result_list;
foreach t_node in TNode_list
{
—foreach o_node in ONode_list
—{
if
{
——ONode_list.remove(o_node);
——result_list.add(t_node);
—-}
—}
}

Finally, we can get such seq 000001001001000, 1-transparent node, 0-other node. We can reorder 0 in any seq, but we keep 1 in right order.
This may be optimize large un-transparent back nodes, such as background cells.

static layer may check it’s children. if they are not overlapped. make it unordered automatically…

So will cocos2d-x 3.0 be mostly compatible with existing code without major changes? Or would this only be for new projects/rewritten projects?

I like the idea of static layers, but I’m not sure about unordered and batched layers. From what I understand batched layers are going to do the exact thing that sprite batch nodes do without any added convenience (i.e., the scene graph is still coupled to the renderer). I think the optimisations that are going to take place using those attributes could be achieved differently.

In this article http://realtimecollisiondetection.net/blog/?p=86#comment-3601 there is a very nice method for sorting rendering commands based on a key that encodes their rendering state. Using a system like that every command that is added to the CommandQueue would be a key-value pair. So when a Node sends a rendering command it would have to create a 64-bit key that encodes the following information in the following order:

• It’s viewport (if multiple OpenGL views are used).
• It’s traslunsency type (opaque, normal, additive, or subtractive).
• It’s z order.
• It’s material (I guess this could be split to shader, and texture used).

The value of the pair would be the actual command that contains information about the graphic state to set, the texture used, the shader used, the buffer to draw, etc. In this way commands would be always sorted by z order first and then by material. This could eliminate the need for an unordered layer.

According to the article this method could be extended so that the command keys could also encode non-drawing operations that could be executed in a specified order (e.g., clear the depth buffer first).

Some other suggestions:

• The rendering back-end (or in fact any other object in the engine) should not make OpenGL calls directly. Instead a GraphicsDevice object could be used that maps all OpenGL functions (and it’s implementation could be replaced for DirectX functions).
• The rendering back-end should not do the sorting of the rendering commands. This should be handled by a front-end that also handles culling. In this way the back-end will try to send as many graphics calls as possible to the GPU.
• The rendering front-end could be responsible for creating texture atlases for nodes that use the same texture (auto-batching). If a node changes it’s texture or it’s blend settings it should be removed from the atlas (I’m not sure if this is expensive though).

Peter Yu: Thanks.

Sam Borenstein:
You will need to do some cosmetic changes in your code, like remove the CC prefix from the classes.
However, if you are overriding the draw method, then you will need to update the code a bit. The new API for draw is not defined yet, but it won’t be backwards compatible.

Nick Verigakis wrote:

I like the idea of static layers, but I’m not sure about unordered and batched layers. From what I understand batched layers are going to do the exact thing that sprite batch nodes do without any added convenience

yes. static layers will replace sprite batch nodes.

In this article http://realtimecollisiondetection.net/blog/?p=86#comment-3601 there is a very nice method for sorting rendering commands based on a key that encodes their rendering state. Using a system like that every command that is added to the CommandQueue would be a key-value pair. So when a Node sends a rendering command it would have to create a 64-bit key that encodes the following information in the following order:

Interesting article. Thanks.
In the new renderer we will try to add the standard 3d best practices, because we want the renderer to be scalable to a 3d game. But in 2d games, some of the 3d best practices are not applicable.
For example, most 2d games use Z=0 for all its nodes, but the z order of the scene graph is super important. And 90% of the objects are semi-translucent. So doing auto-batching is not trivial for 2d games. And that’s why we added some “hints” to the layer.

Some other suggestions:
>
• The rendering back-end (or in fact any other object in the engine) should not make OpenGL calls directly. Instead a GraphicsDevice object could be used that maps all OpenGL functions (and it’s implementation could be replaced for DirectX functions).

We were planning to do something like that. The new renderer should be easy to port to other DirectX, and other GPUs.

• The rendering back-end should not do the sorting of the rendering commands. This should be handled by a front-end that also handles culling. In this way the back-end will try to send as many graphics calls as possible to the GPU.

ok.

• The rendering front-end could be responsible for creating texture atlases for nodes that use the same texture (auto-batching). If a node changes it’s texture or it’s blend settings it should be removed from the atlas (I’m not sure if this is expensive though).

thanks. We started to evaluate that, but we haven’t found a way to auto-generate texture atlas without performance issues.
Our current plan is to have a function that creates a texture atlas from the nodes of the graph (or sub-graph).

Does a new rendering pipeline mean we’re more likely to see a merged back metro/wp8 directx version?

Adam Legge wrote:

Does a new rendering pipeline mean we’re more likely to see a merged back metro/wp8 directx version?

I do not know if we are going to merge back the old xna branch, but it means that we it is be easier to support more GPUs: windows, consoles, etc.

Here is the link to the document on google docs:
>
https://docs.google.com/document/d/1nDX131S-k_XwKHkh7CjP4yC4pmywLk8Do3kMtyrRYwI/edit?usp=sharing

I will continue Carlo’s work.
The new document is here: https://docs.google.com/document/d/17zjC55vbP_PYTftTZEuvqXuMb9PbYNxRFu0EGTULPK8/edit

I will start working on this branch:
https://github.com/ricardoquesada/cocos2d-x/tree/new_renderer

Once I have something working, I’ll let you know

Another proposal:

  • remove the “relative z-order” and replace it with an “absolute z-order”.

context:

cocos2d uses the scene graph for 2 things:

  1. parent-child transforms. If the parent is rotated, its children will rotate as well. If the parent is scaled, its children will scale as well, etc…
  2. forced certain draw order: a parent and all its children will get drawn after (or before) the rest of the parent’s siblings.

It is good to have 1., but why do we need the limitation in 2. ?
It would be great to be possible to render a node in any Z order regardless of its position in the scene graph.

As an example, assuming that we have this scene graph:

    A
   / \
  B   C
 / \ / \
D  E F  G

… it is not possible to render D or E, between F and G.

By supporting global z-order we will have:

  • a more flexible API
  • faster code
  • easier to maintain
  • but might break compatibility on some cases, and won’t work with manual batches.

Thoughts?