Six Tips From A Card Battle Developer On Optimizing Your Mobile Game

LukeStapley · November 18, 2022, 9:39am

Introduction: Not long ago at Cocos Star Meetings Shanghai, Tieshu, the front-end owner of OMNIDREAM GAMES, shared his team’s optimization solution for a card game in development. This article is a transcript.

Popular strategy card game “New Douluo Continent.”

OMNIDREAM GAMES is a developer whose core team has accumulated more than eight years in the card game category and has created the series “New Douluo Continent.” The games have accumulated over 4 billion RMB, with a total of 40 million+ users.

During the years of development, the team has also accumulated a set of game optimization solutions. Today I will share with you a few proven card game optimization methods and some of my own insights based on a brand new card game of the team, “Omniheros.” The game is currently in beta, so stay tuned!

In this article, I will focus on the following six optimization methods.

Red dot (notification) design: reduce double counting of judgment logic, support multiple display types, and cycle refresh.
Audio acceleration: support for custom variable speed for different audio.
ASTC chunking compression: reduce memory, improve loading speed, and select the compression chunk size more precisely.
Performance grading: giving performance boost to low-end and high-end machines.
Dynamic drawing: reduce overdraw, reduce drawcall, and increase frame rate.
Small packet strategy: reducing first packet size and improving smoothness.

Red dot (notification) design

Red dots can be seen (usually in the upper left or right of an image) when you have notifications from different apps or within the app.

Problem-solving

Reduce double counting of judgment logic.
The parent-child relationship between the red dots is clearly visible.
Support multiple display types.
Supports periodic refresh.

The red dot tree concept

Each red dot that needs to be displayed in the UI has an id.
A tree structure is used to establish a hierarchy between the red dots and the red dots.
Use a data structure that holds the activation state of the red dots.

Red Dot Tree Design

Only leaf nodes have and must have their own judgment logic.
The leaf node judgment method does not require parameters or requires fixed parameters.
The red dot display status of a non-leaf node depends on the display status of its children.
The business layer only needs to care about the judgment logic of each child node, when it is refreshed, and whether parameters need to be passed when it is refreshed.

Red Dot Activation Process

104, 105, 106, 107, and 108 are leaf nodes, each bound to its own check method.
The check method of 104 has a fixed pass parameter 1, which is passed each time it is judged.
When 105 turns from inactive to active under func(2), it will pass the state to 101, which has a child node with an active state and will become active itself without caring about the other children, and will pass the active state to 100, which will also become active.
When 107 becomes inactive under the judgment of func(), it will pass the status to 103. 103, just based on 107’s inactivity, does not directly decide its own activation status, it needs to check the status of other child nodes, so it needs to check 108’s activation status. If 108 is active, it will activate itself. If 108 is not active, it will be inactive itself and pass the status to 100.
If there is a change in the state before and after 107 is checked, it will pass the state to the parent node 103. If there is no change, it does not need to pass to the parent node.

Parameter mixing tree structure

The parameters required by the leaf nodes are not fixed, i.e., dynamic parameters are required.
The children of non-leaf nodes require the same dynamic parameters.
The children of non-leaf nodes require different dynamic parameters.

Red Dot Activation Process

104 and 105 require the same dynamic parameter e1, i.e., the same dynamic parameter is required for 101 child nodes.
107 requires dynamic parameter g1, and 108 requires dynamic parameter g2, i.e., the children nodes of 103 require different dynamic parameters.
In a tree structure with dynamic parameters, for leaf nodes 107 or 105, there is no way to determine whether the parameters of other sibling leaf nodes are the same as their own. Hence, a principle must be followed here: only the activation state can be passed to the parent node, and dynamic parameters cannot be passed.
For 103, the child nodes need different dynamic parameters, so for 107 and 108, the only option is to refresh each separately and then pass the state to 103 simultaneously.
For 101, the child nodes need the same dynamic parameters. For 104 and 105, you can choose to refresh each separately, or you can choose to refresh 101 and pass the dynamic parameters to the child nodes 104 and 105.

The list structure of the red dot

Because each element of the list is not fixed and the number of elements is not fixed, there is no way to bind the red dot, i.e., the red dot handling of the list currently chooses to judge itself and does not go through the tree structure.

Red Dot configuration table

Some ids may be dummy ids and have no corresponding interface in the UI.
name is the comment.
parent is the parent node of the red dot, and only the big root node has no parent.
update_type is the update type.
- The default is 0 front-end judgment, as determined by the bound check method.
- It can be set to 1 for back-end judgment because, in the process of logging into the game to the main city, part of the function is not required to pull the corresponding message. This part of the red dot activation status can not be judged by the logic of the data, only by the server to evaluate the results of the return until the corresponding function pulls the information of the function, then take over for the binding of the check method judgment.
refresh_type is the refresh type.
Default 0 refreshes in time, no need to mark.
1 for login refresh. Each login will be red only once. After marking this as a login, it will not be red again.
2 for a daily refresh. Only one red per day, no more red for that day after marking.
3 is a weekly refresh, only once a week. After marking it, it will not be reddened again this week.
function_id is the function id, which is used for the red dot function on judgment to reduce the invalid judgment time consumption.
Priority is priority. When update_type is 1, you can set the priority to let the simpler judgment logic judge first, reducing the back-end judgment’s time-consuming work.
show_type is the display type used with the display priority. The higher the priority is, the higher the display priority is. You can freely set the display priority for each display type.
- Default 0, red dots.
- 1 for green arrows, only green arrows are displayed.
- 2 for full, showing only full.
- 3 for new, showing new only.
- -1 is any type, depending on the judgment logic or the type with the highest display priority passed by the child node.

Red Dot Business Development Jobs

Configure the red dot id relationship table.
Mount the red dot display component on the UI.
Bind the check method for red dot ids.
Refresh the red dot when it needs to be changed.
Mark if the red dot was clicked.

2. Audio variable speed

Problem-solving

The audio can be set to variable speed.
Different sound effects can be customized with different variable speeds.

There are several ways to change the speed of the audio

Modify the underlying support for audio variable speed.
Connected to wwise engine.
Connect to the fmod engine.

Modify the underlying support for audio shifting

Modify AudioMixerController.cpp to support audio scaling.

Pros

Fast development, only the underlying code needs to be modified.

Low implementation cost, no additional spending.

Disadvantages.

Only one channel, and all audio can only be shifted at the same time.

Access to the wwise engine

wwise is a comprehensive audio middleware solution for game development that creates sophisticated, rich interactive audio. It consists of tightly integrated design tools and a sound engine that allows sound designers and programmers to produce stunning audio in less time and more cost-effectively.

wwise Official Website.

https://www.audiokinetic.com/zh/products/wwise/

Pros

The audio works relatively well.
Audio is shared, and audio resources are significantly reduced.
More program friendly. Just send events.

Disadvantages.

Need to learn wwise software.
Relatively expensive and costly.
No web support, no direct MP3 playback, and it is not easy to develop.
The pre-compiled library is very large.

Access to the fmod engine

The fmod engine is a cross-platform audio engine that can be used on windows, Android, iOS, web, and other development platforms. Multiple channels can be set up, and variable speed can be applied to individual channels, for example, if the game only wants variable speed for skill sounds.

fmod Official website.

https://www.fmod.com/

Pros

Support for different channel shifts.
You can play MP3 directly.
Cheaper

Disadvantages

The documentation is all in English. (Editor: Not an issue here)

3. ASTC chunking compression scheme

Problem-solving

Reduce memory for images.
Improve image loading speed.
More accurate selection of compression block size.

What is ASTC?

Same as pvrtc, etc1 etc2, an image compression texture format.

Advantages

Higher compression magnification and freely selectable compression rate.
Better display.
Compatible with iOS and Android.

Support

Need OpenGL ES3.0 or above to support. According to the survey, we found that 99.2% of overseas players’ phones are supported, the non-support rate only accounts for 0.8%, and this situation is more optimistic in China. For this part of players, we can choose to discard, or we can choose to do a soft solution to support the small number of players.

Cocos Creator access to ASTC.

ASTC is supported in Creator 3.0 onwards.
Creator 2.4 requires modifying the underlying C++ logic to load ASTC, which is readily available online.

ASTC compression chunking

After learning about ASTC, we found that this kind of texture has many block compression methods. As shown in the figure, the larger the block, the higher the compression ratio, and of course, the lower the quality of the picture, so how should we choose the block?

In this regard, many teams actually proposed a better way to configure a whitelist for folders, as shown in the following figure:

This does find a better-compressed chunk, but if the following two images are in the same folder.

The picture on the left has more details and richer colors, so choose 6x6 squares for a better display effect. In contrast, the picture on the right is more of a single color, so just choose 10x10 or even 10x12. If we follow the folder way, 6x6 for the right picture will not be enough compression multiplier to cause waste, so the whitelist way for the folder is more difficult for a single image to find the most suitable chunking solution! For this, we propose a more accurate one - the similarity matching scheme.

Similarity-matching schemes

For each image, turn all chunks ahead of time.
Match the similarity to the original image by block size from largest to smallest.
Develop a similarity value, e.g., 80% for a pass.
Some images also support the whitelist approach.

This is mainly to compare the similarity between the original and compressed images to find what we think is a more appropriate compression ratio. So how to achieve this similarity matching, here we provide a similarity matching algorithm for reference only.

Similarity algorithm

Divides the image into many blocks according to the selected block size.
The difference between the RGB values of each pixel in each block is multiplied by that pixel’s transparency to find the difference’s brightness RGB.
Take the difference luminance of all pixels in a single block and find a mean value.
Compare the difference in luminance of all blocks to get a maximum difference in luminance.
Convert to a percentage from 0 to 100 as a configurable parameter.

As shown in the figure, A chunk has more color information, and B has more of a single color. After compression, the variance of A is the largest of all the squares, so we will choose A as the variance of the whole image to display more accurate results and prioritize the preservation of art quality.

Similarity algorithm advantages and disadvantages

Pros

More accurate block selection for images, with separate block selection for each image.
Game performance and performance effects, pros consider the performance effects.

Disadvantages

Need to cache all block images on your computer.
Rather slow (git hooks).

4. Performance rating scheme

Problem Solving

Higher-end machines have better results.
Lower-end machines can also have a better experience.

The grading process

Statistics 2021 overseas top 150 devices, make grading (iOS direct whitelist)

Get phone device information (phone model, CPU max frequency, GPU frequency, max memory, remaining external memory, number of cores)

Set the score coefficients corresponding to each parameter and calculate the total score.

score = cpuScorePcpuRcpu + gpuScorePgpu + ramScorePram + memScore*Pmem

Make a high, medium, or low grading based on the total score and perform different operations.

LOW: score < Smiddle
MIDDLE: Smiddle <= score < Shigh
HIGH: score >= Shigh

Grading process

Preload quantity setting
Set whether the interface is resident
Number of objects to cache
Adjusting the game frame rate
Resource recycling frequency
Special effects playback rating
…

Performance Test Standard Grading

After performance grading, different models have different performance standards.

This grading solves most of the performance effects to match the phone hardware. Still, in this process, we observed that some phones might have less CPU computing power but more memory, so we made a more fine-grained grading scheme.

Finer grading

Mainly in order to take full advantage of all the advantages of the mobile phone hardware.

CPU grading: frame rate, recycle frequency, effects playback grading …
Memory hierarchy: preload, interface resident, number of object caches …

5、Dynamic drawing

Problem-solving

Reduce overdraw.
Reduce drawcall.
Increase frame rate.

Cocos Rendering Pain Points

The way the UI is drawn, there is no non-transparent culling mechanism. The general optimization for this is to reduce the number of draws and save them once, for which many games use a more practical optimization.

Dynamic drawing of pop-up boxes

Set the screenshot blur pad to go behind the popup box when it is displayed.
Then hide all the scenes behind the blurred image. Using this idea, some other optimizations can be made on top of this.

Dynamic Camera Optimization

As shown below, create a new group postRender for the camera.
Assign nodes that need to be optimized, such as more complex lists, to the postRender group so that the main camera does not render the list.
postRender draws the list once, saves the texture postImage, and renders the postImage to the main camera.
With the interface at rest, turn off postRender drawing and turn it back on if there are changes.

The purpose of this is to use another camera to render this complex UI only once, allowing the main camera’s rendering of this complex UI to turn into a rendering of a single mapping in such a way that the result is:

One more postImage memory.
The list section then has only one drawcall.

The efficiency is much improved, as shown below, the drawcall is much reduced, and the fps is improved more.

Pre-optimization, After

Dynamic large map optimization

The following figure is a large map UI, the part of the phone display is only as big as the middle blue box. Still, the map is much larger, has multiple layers and effects, and will be an extra copy as a loop to move the articulation. Hence, loading such a large map UI takes a lot of time. Poorer models to open this interface are bound to appear with black screen time. In order to solve this black screen time, we also made a better optimization.

Optimization steps

Create a new resident cache node preloadNode on the root of the game.

The first time you open the map, when it finishes loading, render a screenshot of the first screen of the scene.
Bind the screenshot to a newly created node, cached on the preload node.
On subsequent opens, the cached node is placed on the map node first for display.
When the big map finishes loading, rebind the newly rendered scene screenshot to the cache node and put it back on the preload node.

Optimization results

This uses this cached screenshot to avoid loading black screen time effectively and can be applied to other loading black screen scenarios in addition to large maps. If you feel that this cached screenshot may cause a significant memory overhead, you can reduce the memory overhead by using the mapping size in half

6、Small package solution

Problem-solving

Reduce the size of the first package (under 200M).
Resources can be downloaded as you play.
Players play as smoothly as possible.

Small package process

Run through the pre-game and get the small packet of resources used for this process.
Repackage into smaller packages and the remaining resources as a complement to the smaller packages into zip sub-packages. Whole package = smaller packages + sub-packages.
Upload a file package containing all the loose file resources to the server.
After the installation of the small package starts, the backend will download the sub-packages and unzip each one when it finishes.
If you use a resource during the game that is unavailable on your phone, you immediately download it from the server with all the loose file resources.

Problems

This process as a whole is still relatively complete, but there will be a problem, downloading the sub-package takes time. Suppose the resources in the sub-package are used in the process of playing the game. In that case, you need to download from the prose set list, waiting for the download to complete before continuing, causing the game to lag. To optimize this problem, combined with our game is a loose file update method, we have done optimization on the basis of this small package.

Tianshen packet process

When packing, we write down all the resources of this package as List_all.
Pre-running the game, mark all the resources used, record the list of resources as List_small, and this List_small is ordered.
Pack the packets according to the packet size requirements, and play the list_small into the packet (in order) until the packet size limit is reached. The list not played into the packet is regenerated into a list_pre as a priority download queue.
The normal forced update is performed first when entering the game (the comparison is between the List_all and the current pre-online version resources).
During the game, the resources in List_pre are downloaded first. If there is a useful resource that is not available locally, it is downloaded immediately.
Other small packages with missing resources are downloaded in the background (List_small and the missing parts of the current online version).
The backend needs to limit the number of download threads to keep the player playing properly.

Boost points

The packet size can be freely controlled as required.
List_pre can be downloaded in order of priority to ensure that the resources used can be downloaded in advance.
Controllable download threads ensure player experience.

Personal Tips

All kinds of optimization directions are to put forward higher requirements on the original basis. Then we must boldly try the solution, thoroughly verify, and finally make trade-offs within a reasonable time and energy to achieve a better game optimization effect.

In addition, in game development, there are actually many aspects of performance that can be optimized. This selection is a few of the more typical solutions accumulated by our team. There are some other solutions. If partners are interested, we can exchange together.