BunnyMark poor performance?

For my own amusement I made a BunnyMark type benchmark for Cocos2D-X C++
You can get the code here: BunnyMark.cpp.h BunnyMark.cpp

Running on an iPhone6s I get a measly 1000 bunnies before the FPS drops below 60. At this point, the CPU is pegged at 90% usage, though for some reason I was unable to dig into Instruments further to see what was going on in more detail.

This seems very low. Using libSDL - which has a very basic unoptimised renderer - I was able to get 6300 bunnies on the same hardware, and libGDX managed 31000 before slowing down.

What am I doing wrong?

I suspect the problem is that Iā€™m using a Sprite per bunny and the CPU is getting bogged down updating all the Node transforms etc., but my understanding is this is the idiomatic Cocos2D-X way of doing things.

Could you share the whole benchmark project?

Download Link:
bunnymark-cocos2d-x.zip (3.7Mb)

Also got fps dropdown on iphone 6s after adding 1000 objects. Very frustrating - sprite is quite small (about 30x30). As noticed topic starter itā€™s due to CPU usage (loaded on 90%+).

Firstly i have thought that itā€™s mostly due to transforming points to model view. But after profiling i found that Renderer::render takes 30% of the CPU time (though it also quite large).
41% of the CPU is taken by Node::visit, mostly by Node::processParentFlags and TrianglesCommand::init

Dropdown FPS after 1000 triangle commands drawn in 1 draw call on top deviceā€¦ seems very bad

Though on iPhone 5, that is 32bit, fps dropdown started after 1700 sprites.

Did you mean 700? I get 700 on an iPod 5.

After removing code that works with matrices inside node::visit (_director->pushMatrix, _director->loadMatrix, _director->popMatrix) i got fps dropdown after 2000 sprites (2x better).

Not sure that itā€™s safe to remove this code, but in comments said that itā€™s actually deprecated, and exists just to ease of migration to v3.0.

In release mode fps dropdown started only after 7k bunnies (that is much better), and seems like ā€œoptimizationā€ with matrices donā€™t give any benefits in release mode.

Confirmed. Release mode with culling disabled and @MikhailSh changes to Node::visit gets me 6800 bunnies on a 6s and 2500 on a 5. Much better! It seems what I was doing wrong was running a debug build.

FWIW I wrote a little more on benchmarking.

Thanks for sharing!

Cocos2DX suffers for having to maintain a scene graph of thousands of nodes. libGDX takes a different approach and scores much higher on this synthetic benchmark as a result, but both methods have their use cases.

libgGDX also has a Scene-Graph implementation ready:

I also recently learned that Eversion and vvvvv use SDLā€™s built in rendering APIs so itā€™s certainly possible to do a commercial release with this technology. Interesting.

Are you referring to the software renderer? SDL has many rendering backends and are used for the Humble Bundle ports very well.
There are a lot of very successful games throughout many years, which are using SDL. E.g. World of Goo or Angry Birds(under webOS).

Sure, but itā€™s optional (unlike Cocos) and not used in the benchmark. I imagine if you did use libGDXā€™s scene graph then Cocos would win the benchmark, or it would be a lot closer at least.

Well the two I mentioned use SDL_RenderCopy etc. so I imagine they use whatever hardware accelerated back end is compiled in on each platform rather than the software renderer.

It was more that I hadnā€™t considered SDLā€™s rendering API something that would be used a lot any more, but really when I think about it itā€™s very well suited to the current fashion for pixelated graphics. I bet they could squeeze a lot more performance out of it though.

Are they using SDL for rendering or just for the other platform abstractions?

I agree. Because of the scene graph being optional, such benchmarks should always been taken with a grain of salt.

For sure. The same would apply for cocos2d-x and libGDX.

According to some developer diaries and forum posts they do. But alas it seems everyone is stating different things. Some say it used SDL rendering, others it was just used for threading.

Some HB games just use SDL for platform abstraction, like Dungeon Defender, which is an UE3 based game.

If you read the HB blog, their ams.js stuff is using SDL for rendering:

Regarding emscripten:

and our rendering is currently being handled by the SDL2 renderer API.

Unfortunately it does not tell, if they are also using the rendering features for all their ports or just the window handling:

At Humble Bundle, we use SDL2 for all of our game ports because it removes a significant amount of prep work in getting a game running on a new platform.

Iā€™m not arguing in favor of someone using Cocos2d-x. We use other engines/frameworks for certain projects now. Iā€™m also not arguing against using libGDX. Especially since I have no experience with it.

However, I thought Iā€™d drop my 2c here:


If you use a SpriteBatchNode you can draw ~15000 bunnies on an iPhone 6 Plus @ 60fps (16ms frame time) and around and ~25K @ 30fps (33ms). Using SpriteBatchNode or the more ā€œmodernā€ TriangleCommand to render this many sprites is essential and its usage is confirmed within the forums as well as by looking inside the particle rendering and management systems.

In Init

#if defined(USE_SPRITE_BATCH)
    bunnyBatch = SpriteBatchNode::create(filename); // class member
    bunnySprite = Sprite::createWithTexture(bunnyBatch->getTexture());
    addChild(bunnyBatch); 
#else
    bunnySprite = Sprite::create(filename);
#endif

In moreBunnies

#if defined(USE_SPRITE_BATCH)
        bunnyBatch->addChild(bunny.sprite);
#else
        this->addChild(bunny.sprite);
#endif

I have written off libGDX in the past because of Java, but have since dabbled in Scala and Closure and might be fun to give it a go in a future prototype or gamejam as a first use.

One thing I didnā€™t realize about libGDX and found interesting, but not surprising based on its performance, was that it actually uses a lot of c++ underneath. You could argue that libGDX is more like cocos2d-x + Lua than it is a pure Java framework/engine. Thatā€™s where it gets a lot of its performance, which is smart.

Every engine/framework has their pros/cons and their default usage (texture ā€˜blitsā€™, render commands, scene graphs, components, etc). Cocos2d-x could also definitely use a few versions that only merge in performance improvement PRs.

Cocos2d-xā€™s Node/Sprite classes contain a lot of excess data which could be trimmed or re-architected. Ideally cocos2d-x v4 would incorporate a lower-level API more similar to libGDX along with a scene graph API and a component API. Iā€™m not sure weā€™ll see a lot of improvement quickly enough, however :frowning:

p.s. you should probably remove I/O ( e.g. printf ) within inner loops of benchmarks ā€¦?

Hi @stevetranby, iā€™ve read it before that version 3.0 should use spritebatchnode for better performance. But from version 3.xx++, Official document recommends not to use it, then which approach is better?

Yep, use Sprite as the default or normal use case with reasonable Sprite counts. The cocos2d-x engine will auto-batch these sprites if they use the the same texture (and shader, uniforms, etc in other words the ā€œMaterialā€) and the CPU cost of this auto-batching will be negligible compared to game logic and actions. This is the reason for the ā€œofficialā€ document recommendation.

This benchmark and discussion is about the abnormal case of 1000s of sprites on screen (not including particle effects).

When dealing with higher counts if you sit back and think about the design you often either need much less information (maybe only position + velocity + texture ID) and/or you may be able to group them together so you only have one position/velocity/color/etc per group and/or you may treat them as a single group like a particle system.

The options for high sprite counts:

  • SpriteBatchNode (deprecated)
  • ParticleSystem (if your needs fit into the restrictions of the particle system)
  • Custom TriangleCommand (look at internal code for FastTMXTiledMap, Particle3D, or ParticleSystemQuad)
  • Custom OpenGL using CustomCommand (here you could version out different GL use cases where TriangleCommand is too limiting or if you want to do some fancy stuff on OpenGL 4.x desktop systems like instancing).

Ideally cocos2d-x would introduce a newer types of rendering the wrapped up instancing support, GPU particle systems, et al. For now you have to custom write your own extra performance into the system.

@nivrig i can not download the codes by you link.

Updated the link, sorry.
bunnymark-cocos2d-x.zip (3.7Mb)

Sorry, when i clicked the link, there is not effect.

Doh, fixed. :grimacing: