May I suggest, if you are going to do a number of specific systems (like Particle) I would caution a very modular approach. Any system should be able to be swapped out by the user for their own implementation without any change in downstream code. (Interface based architecture perhaps?).
I think I got your idea, but Iām not 100% sureā¦ Let me add more info:
For example, Node will have the particle componentā¦ which will be only dataā¦ probably a PoD.
And the particle system will transform the particle components according to some algorithm.
If a user doesnāt like the algorithm used by the ParticleSystem, he can create a new Systemā¦ and if that is not enough, he can create its own Particle Components, without the need to change the cocos2d-x code.
Basically users who want to change the default behavior of cocos2d-x will be able to do it without changing the cocos2d-x code.
And having these Systems wonāt break the āComponentsā that we are adding in v3.8ā¦ in fact we can have a System that calls the v3.8 components.
I guess itās hard to discuss code thatās not even written yet I understand what you are saying, hereās what Iām getting atā¦
Say you have a built in ParticleSystem āSystemā that works on ParticleComponents. If I have a case where I want to do some special case optimization, you say I can create an OptimizedParticleSystem to then work on the ParticleComponents, or even create an OptimizedParticleComponent and work with that.
Depending on implementation, this would then have the core engine iterating through itās built in systems, and potentially double processing. In this scenario both ParticleSystem (because itās built in) and OptimizeParticleSystem will both be processed. Again, implementation dependent, this may or may not have a performance impact. How would ParticleSystem know that Iāve replaced it with my own particle render when it process the ParticleComponent
What I am suggesting is that if there is a built in ParticleSystem the engine should implement an IParticleSystem interface and then setParticleSystem(new ParticleSystem()) or allow the end user to setParticleSystem(new OptimizedParticleSystem()). In the end as long as both classes implement IParticleSystem the engine can work with it, and the user can be completely free to replace it. Along with it should go an IParticleComponent so that the system can work with any particle component regardless of itās base class.
The interface approach allows for cleaner implementation of custom Systems/Components, not requiring subclassing of ParticleSystem or ParticleComponent to implement customization.
This all hinges on a ton of assumptions of course, just some loose thoughts Iād share. Sorry if Iām off base.
yeah Probably we should discuss this in more detail in a few months, once we start designing this feature for v4.
Let me add 2 more things:
The engine should support Registering / Unregistering systems. So you can replace the āCocos Particle Systemā with another one.
Iām not sure if adding interfaces in the Components is a good thing or notā¦ In my mind a āComponentā is just a struct with no functionsā¦ just Plain-Old-Data.
Yes, using only PoDs is what we should try to achieve, and then store all of them in a contiguous array in order to have good data locality.
But what you did is a step in the right direction.
adding interfaces in Components is a ānonoā for ECS. Everything that leads to put virtual table in components makes no sense. (however you can have a common interface non-virtual and enable CRTP and folks).
I agree with Rick_S suggestion about extending systems. Define a common interface for all systems,cocos engine provides some implementations of this interface for common systems and we should be able to further customize those systems or implementing our own.
I think cocos should make use of interfaces. Right now everything is coded against implementations instead of interfaces and its a bad software engineering practice.
But which approach should it take? In C++ interfaces are often implemented with classes having just pure virtual functions. Therefore the implementation would still rely on inheritance(which we should avoid), if you donāt use CRTP.
Relying on something like PIMPL will slow down your code, as you have to de-reference the pointer.
So how can we deal with not having inheritance and no de-referencing?
Yes, interfaces should be defined using pure virtual functions and therefore implemented using inheritance. I donāt agree inheritance should be avoided āper seā. It has its use cases and its no the Evil
What is bad is having a deep inheritance structure, bloat objects and coupled code.
BTW I did not mean that customization should be implemented via inheritance. I said that, in case it is decided to customize using inheritance, it should be against interfaces instead of against implementations.
I do prefer using strategy pattern to inject customized code, but is not always the best way. For example, if a design determines that there are several customization points in a class, it might be better to use the template pattern via inheritance. It would reduce verbosity, heap allocations and so on.
For example, in ECS a component of course should not be accessed through a virtual table, that would be against the principles of ECS. It could tho, using regular functions which could be inlined etc.
But what about a system? performance wise it does not matter (totally negligible). In my opinion and (short) experience using ECS there are not deep hierarchies in systems, because its purpose should be very VERY specific. So, you have to customize a very specific behavior: I do not see a problem at all defining an interface and customize properly for each system. This will also make easy to manage the systems using all the goods of dynamic polimorphism: add/remove systems in runtime, use dictionaries/service locator for access/discoverability, factories/builders for creation, etc.
Having only POD in components is attractive. But i have a question about it. Because all logical codes are put in systems, then the system will update all components with the same logical, such as
Each system will care about a set of component types. So, each system must know the layout of each component type it cares about. System needs to know how to locate entities which match the component types they care about.
Working more your example, you could have (using CRTP) :
template<typename T>
struct Component
{
static int getId() { return T::component_id; }
}
struct HPComponent : public Component
{
int hp;
static int component_id;
}
void HpDecayingSystem::update(float dt)
{
for (auto hp_comp : _hpcomponents)
hp_comp->hp-=10;
}
It is very important to not have to avoid making several indirections to access the components. Example I just used here is a toy example. And you can access all components without making any indirection, you only have to know what is the start pointer and how many HPcomponents there are.
However, letās say a system cares about all entities which has flying component and hp component. There are flyers without HP and entities which have HP but not fly. So your hypothetical FlyersHPDecayingSystem could not rely on having all components layout contiguosly in memory, you must make some indirections / mantain a list of entities the system cares about.
Oh, iām sorry I did not understand your question!
If you want to impose conditions on some components, thats because the entity has a special flag, right? Letās say there are some entities which must be reduced in 10 hp and other whcih must be reduced in 20 hp. If this is the case, I would make an special component called flag_reduce_20 and attach to those special entities.
If the amount of reduced health can vary a lot between different entities, then it would be wise to make another component called HP_reduction_amount and then process both by the DecayingHPSystem
the two examples would work this way:
a) system with some exceptions (flags). Example of some entities which must be reduced by 10 excepto someones by 20:
struct flagReduce10 : public Component
{
static int component_id;
}
struct flagReduce20 : public Component
{
static int component_id;
}
void HpDecaying10System::update(float dt)
{
for (auto hp_comp : _hpcomponents)
(*hp_comp)->hp-=10;
}
void HpDecaying20System::update(float dt)
{
for (auto hp_comp : _hpcomponents)
(*hp_comp)->hp-=20;
}
each system will deal only with hp_components belonging to entities which has the corresponding flag-component.
Beware that the benefits of this proposal are that each system could be updated seperately BUT in this concrete case, due to false-sharing, its likely a multithread approach could have some issues.
b) In the general case, you could have two components: HP_Component and Decaying_HP_Component. Then those entities which have a Decaying_HP_Component would seen its HP_Component updated:
struct HPComponent : public Component
{
int hp;
static int component_id;
}
struct DecayingHPComponent : public Component
{
int amount;
static int component_id;
}
void HpDecayingSystem::update(float dt)
{
for (auto comp : _components)
comp.first->hp-= comp.second->amount;
}
Donāt define a value directly in the system itself. The system is just to apply the logic to the components, based on their values.
It will read the healthPoints from the health points component and the hitPoints from the hit points component and apply it to the health point component.
Alternatively you can define another value in the health points component: int hitPointsReceived. The health system will update the values accordingly.
I would just create/call it HealthPointComponent and HitPointComponent.
Another approach would be as mentioned above, by adding a value named hitPoints and hitPointsReceived, which could even be changed/set by a hit-Event. Therefore you will only need one component and one system with even better data locality.
struct HPComponent : public Component
{
int healthPoints;
int hitPoints;
int hitPointsReceived;
static int component_id;
}
void HealthSystem::update(float dt)
{
for (auto comp : _components)
comp.healthPoints -= comp->hitPointsReceived;
}
With that, of course, every entity would have itās own specific healthComponent instead of a shared one.
in my eyes healthPoints, hitPoints and hitPointsReceived are separate concepts. What about if some other system needs to know only about remaining hitpoints? when processing the HPComponents it will miss 3x the cache
also, if the āhealth reductionā is event based and not a continuum, then I would model it as an event. For example, you can have a system which process all health_reduction events. These events indicates the entity and the hitpoints to reduce. You can do this in a super-pure fashion creating a component which represents the event too.
This depends on the capacity of one cache line. It will only have two cache misses, if one cache line could only store an int. If will read the full cache line during data transfer.
Each line has 4-64 bytes in it. During data transfer, a whole line is read or written.
Assuming it only has 4 bytes, it will be 2 cache misses, not 3. It will just miss on healthPoints and hitPoints. hitPointsReceived will be a cache hit. At least if Iām not missing something here Will the static int get arranged as first entry into the struct?
Besides that, I agree with you. Only keep the data hot, which is needed.
Out of curiosity: is there some statistics about cache hits/misses regarding cocos2d-x/code samples?
i dont want to be a jerk but if your data layout is 12B and you only need 4B, you will have triple the cache misses if you acceeed enough elements
It not depends on cache line size, it does not matter if your caches are 16B or 64B. If you bring data to the cache that you are not going to use inmediatly you are either evicting some useful data or not fully utilizing the current line (which of course will produce other evictions as side-effect). The idea is to fully utilize the cache and not evict unncessary data, for fully exploting the caches.
I do not think there are statistics about cache hits/misses but my educated guess is that it must be awful for real games
Of course, with large enough caches and small enough code samples the entire working set could fit the cache
any static variable is not arranged in object instances memory layout. So they will not pollute the cache.
I do not think any mobile has caches that short (actually I do not know about any architecture which has that short caches, they would only exploit temporal locality and not spatial locality at all, which is the main practical purpose of caches).
So If the cache line had only 4Bytes it would be weird as hell but in this particular case then the bloating of the object would not interfer in the cache at all (letting prefetching on a side) because your cache line will be filled only with data you need.
The problem with the 3 int fields is only for those systems which do not care about 2 of these 3. Because when you read any of them, you will bring an entire cache line.
I will try with an easy and isolated example:
struct A
{
int a[3]
} some_array[1000];
struct B
{
int b;
} some_array[1000];
for (int i = 0; i < 1000; ++i)
{
a[0].a[0] +=1; //#1
b[i].b +=1; //#2
}
//#1 will produce 3x cache misses // pollution // evictions than #2
So, if some system has to access only a part of the component (lets say only the reamining hit points) it would be interesting that the component had only this data, for not polluting the cache.
BTW. In my toy example, any compiler with agressive optimization will perform loop unrollong and likely prefetching. So the code could run equally fast but you will be using more cache lines (so potentially evicting useful data from elsewhere your program)
This is not a cache miss, but a waste of the cache lines or cache pollution. 12B are already in the cache line, and one of the 4B will be hit. The other three are transferred, but not used. so in fact there are zero cache misses.
In the worst case scenario, where data is read with 4B interleaving(every 4B has to be read with a new cache line), you will miss twice, as the third 4B is the one you need and will result in a hit. Three consecutive reads -> two misses, one hit.
Sure, but this is about utilization. If there are 12B in the cache line, and you only need the 4B itās a 25% usage.
If you access the 2nd 4B later, they could be still in the cache line and you donāt get a miss.
Yeah, I bet on that!
The basic āHello, World!ā?
The cache line block sizes are 4-64bytes. Of course the new architectures have all 64. E.g, the cache line of a Cortex-A8 is 16-words wide.
You will only bring a new cache line on a miss.
It also depends on the type of the cache. You may get misses on type A but not type B and vice versa.
In fact not misses, but cache waste. Whatever, I know what you mean
You probably get a miss for the next data, as the cache is wasted with cold data.
Sure, I agree on that. You should only read data, that you need right now. If you keep data hot, which could be needed in the future, you are hampering the cache and chances are high, that you get a lot of misses for other data.
My example assumed, that no other system needs the data from that component
Sure. Itās not a problem with modern architectures.
Cache lines: 16 Byte (Intel 80486) and 128 Bytes (Sandy Bridge).
Itās hard to tell, which program will have more misses than the other, as there are also cache hierarchies with inclusive or exclusive caches.
on my example of Bās and Aās objects, you will have 3x miss caches. Because you will waste 2/3 of your cache with cold-data, is not the current access but the future access
trivial example, lets suppose 15B lines caches for the sake of simplicity U means useful W means waste. You are accessing Aās objects:
UWWUWWUWWUWWUWW
On the first 5 access there will be the same misses -> 1 miss
On the next 5 access you will get a new miss -> 2 miss
On the next 5 access you will get a new miss -> 3 miss
instead, Bās objects:
UUUUUUUUUUUUUUU
if insteda you were accessing 15 type B objects, you will get 1 miss for the first 15 access. So you are effectively having 3x cache misses because you waste 2/3 of your cache lines.
So you see a systems being very generic and only hooking into the engine in thing like update , onEnter (possibly) etc? Then the systemās update will determine if it needs to do anything that frame?
I could see that working. Might need to work out a priority system, so say the PhysicsSystem would do itās update before the RenderSystem or whatever.
A different direction than I was picturing, but I think better
I know this is probably to maintain compatibility for 3.x but there seems to be a lot of abstraction to get to the raw physics engine. CCPhysicsSystem which is dependent on CCPhysicsWorld which is dependent on Chipmunk. For 4.0 I would think it would make sense to remove the CCPhysicsWorld abstraction and have different physics systems , i.e. CCPhysicsSystemChipmunk, CCPhysicsSystemBox2D. Perhaps Iām missing something
From HelloWorldScene.cpp:
physicsComponent->setPhysicsBody(physicsBody);
May I suggest not limiting this to 1 body per component (at least in 4.0)?
This. Simply using the inheritance syntax isnāt bad, itās inheritance based design that people look to avoid in these cases. You arenāt actually inheriting code, C++ just happens to implement interfaces that way.
I have to admit my low-level C++ knowledge isnāt as strong as some of the people here but for general use cases this ECS will provide is worrying about vtable lookups and cache misses really important enough to design around? (Honest question) My gut is telling me these are in the realm of micro-optimizations, but again, Iām not an expert in the lower levels of C++.