we at V-Play wanna change our base system from Cocos2d-x 1.0 (Qt Port 0.6) to Cocos2d-x 2.0 (Qt Port 1.0) but there are several tasks which need to be tested before changing the system. One of those tasks is the performance of the particle system. In mind that the performance will increase when changing from 1.0 to 2.0 we discovered the opposite. Version 1.0 seems to perform much better than Version 2.0. Now we would like to discover if this can be right or not and hopefully someone can give us a hint what we could miss (wrong settings of Cocos2d-x, wrong settings, … ).
To test the performance, we created a simple test scenario: Using one particle system which can be added multiple times whenever pressing the scene. This particle system uses different sized textures (4x4px and 32x32px) and a Quad Particle System.
The base of the test is the HelloWorld example of Cocos2d-x. Additionally, we enabled the touch receiver, created the batch (if needed), converted a label to a counter and added the particle system insertion routine. The source code and resources are attached.
We compared the Linux version of Cocos2d-x and the Qt (MeeGo Harmattan) version of Cocos2d-x. The test results can be found in the excel sheet. In every test case the 1.0 version was performer than 2.0. In every test case, the batched particle system had the same performance as the unbatched particle system from Cocos2d-x 2.0. The performance was measured in FPS/Particle Systems.
No performance difference between batched and unbatched systems can be seen because of the small texture which was used by every effect. So there aren’t many OpenGL state changes. Maybe a stronger effect could be seen when using huge textures with a lot transparency or much more particles (which is not possible due to the performance drop) or when using different particles which use different textures to fore OpenGL state changes.
Therefore, the performance drop in general might not be only on the graphic side, but more on the high utilization of the CPU (calculating the positions of the particles in each frame for thousands of particles i.e.: 47 particle systems with 350 particles produce a payload of 47*350 = 16450 particles, which need to be recalculated each frame). When using batched particles this is also the highest amount of particles which can be used because the GL draw function uses an unsigned short value which is reached with 16450 particles. 16450 * 4 (vertices) = 65800 (ushort = 65535). This can be seen (on the screenshots) when using batched particles vs. unbatched particles on a desktop machine. After 47 inserted particles into the batch, it will stop displaying the new effects. When more effects are added the performance still continues to decrease which illustrates that the performance-loss is mainly based on the CPU side rather than on the GPU. This can also be notices when using 4x4px textures instead of 32x32px only about 10 more effects can be placed (on desktop).
The excelsheet, screenshots and source code can be found in the archive attached.
Sourcecode (can’t be uploaded here got some error ): http://www.fantasyhaze.com/sof/Particle\_Perfromance\_Tests.zip
Particle_Performance_Tests.ods.zip (20.2 KB)