Rendering performance issue ? - cocos2d-x v3.13

floboc · September 16, 2016, 2:13pm

Hello everybody,
I am experiencing some rendering performance issue with cocos2d-x v3.13 on a simple static scene.
Note that I didn’t test any other cocos2d-x version, so I don’t know if this si the case for previous versions.

Basically, here is what I did:
I create a new Helloworl project, and remove everything on screen (ie the default sprite, the text and the close button), except the cocos2d-x statistics (to know the FPS and number of draw calls).

I created a 20x15 tile map using sprites (ie not using the TMX maps). Each tile has about 2 sprites (one on top of another, i.e. the ground and some object, like a bush). So in total I have at most 600 sprites on screen (a little less since sometimes a tile has only one sprite). All the sprites were added to the same layer.
The scene is static, i.e. I only add the sprites once in the init() function of the HelloWorld layer, and there is no update or anything else.

I tried 4 different solutions:

solution 1: put all the sprites in the same spritesheet, and use setZOrder() to affect the render order of each object. Thus, rendering is done in only 1 draw call. On my device, I get about 29 fps, which is very low considering that my game is actually doing nothing but displaying static sprites, all drawn in a single draw call !

Here is a screenshot so that you can have a better idea (note: it is a screenshot when running the win32 app, not the one on the device)

solution 2: using sprites from 2 different spritesheet, still with setZOrder(). I know that my game won’t be able to handle all sprites within the same spritesheet, so I had to test if sprites come from two different spritesheets. As expected, the number of draw calls hugely increased to about 114 draw calls per frame, since sprites with same z-order could use different spritesheets. However, I got about 28 fps on my device, so almost identical as the previous one, while there are 100 times more draw calls.

Here is a screenshot from the win32 app. The sprites with inverted colors are those of the second spritesheet:

solution 3: put all sprites in the same spritesheet, and use setZPosition() with depth-buffer enabled. So the rendering is still done in a single draw call, and i get a slight improvement with about 34 fps, probably because I didn’t use any alpha test for transparent objects, thus resulting in less pixels being drawn. However, this is still far below 60 fps.

Here is a screenshot (you can see black areas since I am using depth buffer with no alpha testing):

solution 4: using sprites from 2 different spritesheet, with setZPosition() and depth-buffer enabled. This drastically increased the number of draw calls to 233, but had no impact of the fps, which stayed arround 35.

And here is a screenshot for the last one:

Moreover, I saw important fluctuation of the fps. It is quite periodic, like each second the fps drops down then comes back up, then drops down again, etc. This is very strange since the app is always doing the same thing (i.e. just rendering this static scene).

So my questions are: how can I achieve 60 fps with such a simple scene? Or why can’t I achieve it? There is no game logic yet, so I guess the fps will drop considerably if I had some. Why do I get the same fps with 1 draw wall and 100 or 200 draw calls whie accoring to everywhere on the internet, batching is the main solution for iptimizing the rendering? And why are there these fps fluctuations ? Is the engine doing some computation each second, even if nothing changed ?
Note: I am not asking why I get black areas when using the depth-buffer, I know this is normal since I didn’t do any alpha testing.

My device is a SAMSUNG phone (GT-S7390G), running android v4.1.2. No other app was running in my app while performing the tests, this is a phone I only use for developping. Compilation was done in debug mode.

And here is the code I added to the HelloWorld init() method to build the map when using only one sprite sheet:

//Create map
    int map_size_x = 20, map_size_y = 15;
    float offset_x = 103.0f;
    float offset_y = 83.0f;
    float scale = 0.5f;
    int z_order_offset = 5;
    for (int y = 0; y < map_size_y; y++)
    {
        for (int x = 0; x < map_size_x; x++)
        {
            cocos2d::Vec2 pos(50.0f + x * offset_x * scale, 100.0f + y * offset_y * scale);
            int z_order = -y * z_order_offset;

            cocos2d::Sprite* spr = 0;
            switch (rand() % 4)
            {
                case 0:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("ground_01.png");
                    break;

                case 1:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("water_01.png");
                    break;

                case 2:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("grass_01.png");
                    break;

                case 3:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("dirt_01.png");
                    break;

            }
            spr->setPosition(pos);
            spr->setScale(scale);

            //if using z-order:
            //addChild(spr, z_order + 0);

            //if using depth-buffer:
            spr->setPositionZ(z_order * 0.1f);
            addChild(spr);

            spr = 0;
            switch (rand() % 12)
            {
                case 0:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("bush_01.png");
                    break;

                case 1:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("bush_02.png");
                    break;

                case 2:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("bush_03.png");
                    break;

                case 3:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("chest_01.png");
                    break;

                case 4:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("cristal_01.png");
                    break;

                case 5:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("rock_01.png");
                    break;

                case 6:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("bloc_02.png");
                    break;

                case 7:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("key_01.png");
                    break;

                case 8:
                    spr = cocos2d::Sprite::createWithSpriteFrameName("star_01.png");
                    break;


                default:
                    break;
            }

            if (spr)
            {
                spr->setPosition(pos);
                spr->setScale(scale);

                //if using z-order:
                //addChild(spr, z_order + 1);

                //if using depth-buffer:
                spr->setPositionZ((z_order + 1) * 0.1f);
                addChild(spr);
            }
        }
    }

I forgot to mention that each spritesheet is in .png format, 1024x1024, and sprite frames are about 100x80px.

Any help is apreciated!

Edit:
Ok, I did some further testing:

First, it seems that my spriteFrames were too big (I am not talking about the spritesheet size, but the size of each sprite in the spritesheet). By dividing the size by two, I could manage 60 fps. Is there any way to do that automatically, for instance using some kind of mipmap ? Moreover, can I reasonably expect that a device with a resolution twice bigger will have similar performance if my images are as well twice bigger?
I don’t really get this performance difference. I tried using nearest interpolation for texture but it didn’t change much the fps. Lets consider that I have two spritesheet of 1024x1024px. In the first one, I have tile sprites of 100x100px and in the second spritesheet, tiles of 50x50px. If I create two sprites, with the same visible size on screen (for instance 63x63px after scaling them, etc.), the first one with a sprite frame from the first spritesheet and the second one with a sprite frame from the second spritesheet (thus smaller than the first one). If I use no texture anti-aliasing, why would the first sprite take more time to render than the second one, while the number of drawn pixels is the same, as well as the whole size of each spritesheet they belong too ?
Finally, it seems that when using the depth buffer, automatic sprite batching is disabled (if you look at solution 4, when using z-position and depth-buffer with 2 spritesheets, I got more than 200 draw calls, so clearly sprites are not batched, or not well batched). I manually assigned sprites to two SpriteBatchNode (one for each spritesheet), and it went down to only 3 draw calls. Why is it not bathed automatically, I am missing something ?

stevetranby · September 17, 2016, 4:40am

You’ll probably need to look at using a similar implementation to the tiled maps or particle systems where you submit sprites in one draw call, but without the overhead of many Sprite instances (and the transforms and such that goes along with it). TMXLayer uses SpriteBatchNode to reduce load of many sprites. FastTMXLayer uses single TriangleCommand.

floboc · September 17, 2016, 11:10am

Thanks for your answer, but I foun TMXLayer to be quite slow as well.
I couldn’t achieve 60 fps when running the test cases using the same tileset as I used here (both with z-order and vertex z-position), while they are only display 10x7 maps…

However, as I said before, if I just reduce the size of the images in my tileset (but not the displayed size of the sprite, so the number of drawn pixels stays the same), I can achieve 60 fps, even with 100 draw calls. Do you have any idea of why the difference is so huge ? (note than in both cases, my spritesheet size was 1024x1024).

stevetranby · September 17, 2016, 10:13pm

It’s always a bit hard to give any definitive answers on performance advice since it always depends on many things.

In your case it sounds like a fill rate issue. If so I think if you try half the number of sprites at 100x100 it’ll run as fast as the full amount at 50x50.

Its definitely best to not scale down.

You could try to enable mipmaps but if your game doesn’t zoom in/out much then upfront multi-resolution assets is prob easier/better (-sd, -hd) ??’

TankorSmash · February 16, 2017, 5:09am

Did you figure this out? I’m also curious about weird performance problems, seemingly random like this.

floboc · February 16, 2017, 9:23am

Well not really, I just threw away the TMXLayer (which by the ways causes issues for alpha blending), and simply used sprites with z-order.
I used spritesheets and adjusted the resolution of the sprites to increase performance, which works fine for now…

Anyway, I still don’t get this issue since its does not seem to be a fillrate issue (the number of pixels that are drawn is the same), and I have a huge number of draw calls (above 100) but that seems to have no impac on performance, on the contrary… weird

TankorSmash · February 21, 2017, 3:24am

I’m fairly new to performance optimization, but isn’t below 500 a fairly reasonable amount of draw calls? I was under the impression over 1k was huge, while you’re dealing with 10x less.

nite · February 21, 2017, 10:45pm

Is it possible to share your test case? it’s interesting to look into it.

Darinex · February 23, 2017, 1:37am

The issue you are describing seems to be a good ol’ texture bandwidth limitation.

When reading textures from inside a shader (e.g. by using texture2D) the texture data is read from a texture cache instead of the texture itself. Reading from the texture cache is a lot faster since it prevents real memory reads who usually take quite some time. However, if the texture data requested is not present in the cache, it must be read into it from memory, effectively stalling the current shader execution. Depending on the texture bandwidth of your GPU, the time it takes to read texture data from memory varies.

As a rule of thumb, it is said that larger texture sizes equals higher cache miss rate (texture data not present in cache). And higher cache miss rate usually means (depending on the texture bandwidth) slower rendering. In your case, reducing the texture region that was mapped on the sprite did the trick (Note that the region in pixels is what matters). You could try to change the tile size back to original and use mip-mapping on the texture. You can automatically create mip-maps for textures using Texture2D's generateMipmap function.