Proposal for v3.0: shared_ptr vs. manual retain release

Kevin H wrote:

After doing some tests on a Nexus 7 the average time for dereferencing a shared_ptr was around 2.25x slower than a raw pointer. I’m in agreement with Andre and Linus that the overhead of shared_ptr would likely be small and that specific subsystems could be optimized if required. The only metric I can think of to really give a good idea is to measure the percentage of time spent dereferencing raw pointers in the current build of cocos2d-x.

The overhead becomes barely noticeable once you start passing shared_ptrs by ref :D. I can’t even think of a common scenario where you have to pass shared_ptrs by value. As long as you don’t have another thread changing the validity of the passed-in pointer, its content should almost remain valid in its current call frame.

@Linus

I used the SharedPtr test repository for the testcase. Just past the code I’ve written above somewhere in HelloWorldScene.cpp. And make sure to call someFunc() somewhere. (I created a small loop in an extra thread, but you may as well simply call it in onEnter())

If you want to measure only the time that is used by the active thread, you could change the Profile class. (Note that this code won’t work on all platforms)

//
//  Profile.h
//  SmartPointer
//
//  Created by Walzer on 6/10/13.
//  Copyright (c) 2013 cocos2d-x. All rights reserved.
//

#ifndef SmartPointer_Profile_h
#define SmartPointer_Profile_h

#include 
#include 


class Profile
{
public:
    void start()
    {
        reset();
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &_TimeStart);
    };

    // returns mili second
    void stop()
    {
        clock_gettime(CLOCK_THREAD_CPUTIME_ID, &_TimeStop);
    }

    double getDuration()
    {
        // return (_TimeStop.tv_sec  * 1000 + _TimeStop.tv_usec  / 1000) -
        //        (_TimeStart.tv_sec * 1000 - _TimeStart.tv_usec / 1000);
        double seconds = 0;
        double nanoseconds = 0;
        if (_TimeStop.tv_nsec < _TimeStart.tv_nsec) {
            seconds = _TimeStop.tv_sec - _TimeStart.tv_sec - 1;
            nanoseconds = 1000000000 - (_TimeStart.tv_nsec - _TimeStop.tv_nsec);
        } else {
            seconds = _TimeStop.tv_sec - _TimeStart.tv_sec;
            nanoseconds = _TimeStop.tv_nsec - _TimeStart.tv_nsec;
        }

        printf("_timeStart.tv_sec = %lu, tv_usec = %lu\n", _TimeStart.tv_sec, _TimeStart.tv_nsec);
        printf("_timeStop.tv_sec = %lu, tv_usec = %lu\n", _TimeStop.tv_sec, _TimeStop.tv_nsec);
        return seconds + (nanoseconds/1000000000.0f);
    }

private:
    void reset()
    {
        _TimeStart = {0,0};
        _TimeStop = {0,0};
    }

    struct timespec _TimeStart;
    struct timespec _TimeStop;
};

#endif

Andre Rudlaff wrote:

@Ricardo
I wouldn’t worry about changing the API. If the names of the object classes or the cocos namespace is about to change, I wouldn’t consider cocos2dx-3 backwards compatible with 2.0 anyway.
And if you want to get away from the ObjC style syntax the API most certainly will be changed.

Yes, v3.0 won’t be backwards compatible. But that doesn’t mean that it should be a “foreign” engine to v2.1 users. We should try to make the engine friendly to c++ users, and whenever possible, easy to use to v2.1 users as well.

Will there be some general discussion about what should change in 3.0 or how the API is about to change?

One of the goals for v3.0 is to get rid of the objective-c patterns. So, if there is a particular objective-c pattern that you would like to see removed or improved, let us know.
We are discussing them in this forum or in this list: https://groups.google.com/forum/#!forum/cocos2d-js-devel

Thanks.

Some thoughts on the matter.

A. Use of autorelease.
Pros:

  • clean pattern, temporary objects clean themselves up.
  • less risk for memory leaks with inexperienced engineers.
  • familiar to cocos users.
    Cons:
  • still an objective-c pattern
  • great difficulty with multithreaded scenarios. What does end of frame even mean in a multithreaded environment. Which auto release pool can be solved using thread local storage, but ensuring the pool is drained is problematic. Possibly an autorelease action that performs after some delay could work, but it’s tricky.
  • memory spikes when instantiating a large number of objects in a single frame, e.g. when loading.

B. shader_ptr
Pros:

  • same pros as A
    Cons:
  • shared_ptr is a heavy weight class with thread safety built in. This can make it perform poorly. Probably not more poorly than objective-c autorelease, but poorer than cocos2d-x autorelease.
  • still need to implement some sort of delayed destruction of a shared pointer when returning objects, so same timing issues arise with multi thread environments.

C. manual retain/release
Pros:

  • simple and clear
  • works in multi threaded environments (can provide separate CCObject and CCObjectThreadSafe as base classes)
    Cons:
  • use cases are ugly.
  • high risk of leaks and over release issues.

I really see A and B as the same thing, unless I am missing something. The issue is when to relinquish control of an object, which in multithreaded environments is a difficult problem.

One possibility is the release action I mentioned earlier. If actions were pushed down to CCObject instead of CCNode, that would allow actions to be used on any object as well as allowing for a delayed release action to be performed. The scheduler would need to be thread safe, but that should happen anyway imo.

Another possibility is to implement our own shared_ptr in cocos and have a lightweight variant for classes that are not intended to be used in separate threads, and the standard version for objects that tend to be used in multiple threads, e.g. socket wrappers, timers, and other asynchronous things.

Another way I have used C is to agree on a convention with regards to receiving pointers from method calls. The convention being that if the pointer is returned by return value, then it is not retained, but if the method takes a pointer reference, then the reference is filled with a retained pointer. The downside of this is that you still have all the over/under release issues unless you do the right thing whereas A and B do the right thing unless you do something, which I think is a better pattern.

To sum up…

I am in favor of A or B provided that…

  1. The threading issues are solved in an elegant fashion.
  2. There exist lightweight implementations for classes that do not need thread safety.

Thanks
Justin

Some random ideas regarding this feature

scripting-like API

One of the strong features of cocos2d, is that you can create actions without worrying about memory.

auto action = Sequence::create(
    MoveBy::create( 2, ccp(240,0)),
    RotateBy::create( 2,  540),
    ScaleTo::create(1,0.1f),
    RemoveSelf::create(),
    NULL);

And I think the API should continue to be like that (either with retain/release/autorelease or with smart_ptrs)
I find super convenient that you can create actions like in C**.
smart_ptr vs retain/release/autorelease
On the one hand:
* std::shared_ptrs are more c** friendly than manual retain/release/autorelease.
* you can use them with any std:: container. That means that we could get rid of all the CC containers (CCDictionary, CCSet, CCArray, CCString… etc.)
* but the API is more complex: you are returning a shared_ptr instead of a ptr. As Andre Rudlaff mentioned, you can simplify it by using typedefs, but still, it adds complexity.

On the other hand:
* retain/release/autorelease is an “imported pattern from objective-c”… in particular the autorelease concept. It is not uncommon for c++ engines to use the retain/release model, but without the autorelease.
* And usually the API is easier to read and understand… you don’t need all the std::shared_ptr<> (or typedef’s for all the them)
* But then you need to have you own class containers (CCDictionary, CCArray, etc…), or you need to use your own smart_ptr to deal with your classes, which would be somewhat confusing

how to proceed

  • I think that we still don’t know all the implications of migrating from retain/release/autorelease to shared_ptr. So, for the moment, at least for v3.0-alpha0 I wouldn’t migrate to shared_ptrs.
  • I would like to understand all the implications: needed API changes, potential bugs and limitations, potential performance issues, understand the shared_ptr best practices, and then make the decision.

UPDATE: Yes, I agree with you Justin.

Regarding multithreading, I agree with Sergey (he mentioned this on the email list) that cocos2d should be able to run “tasks” in different threads (play sounds, effects, update the scene graph, render the scene graph, simulate physics), but we should not aim to have a completely multithreaded engine… that would be really difficult to maintain (it will add a lot of complexity in the code) and will be difficult to debug.

Justin Graham wrote:

Some thoughts on the matter.
>
A. Use of autorelease.
Pros:

  • clean pattern, temporary objects clean themselves up.
  • less risk for memory leaks with inexperienced engineers.
  • familiar to cocos users.
    Cons:
  • still an objective-c pattern
  • great difficulty with multithreaded scenarios. What does end of frame even mean in a multithreaded environment. Which auto release pool can be solved using thread local storage, but ensuring the pool is drained is problematic. Possibly an autorelease action that performs after some delay could work, but it’s tricky.
  • memory spikes when instantiating a large number of objects in a single frame, e.g. when loading.
    >
    B. shader_ptr
    Pros:
  • same pros as A
    Cons:
  • shared_ptr is a heavy weight class with thread safety built in. This can make it perform poorly. Probably not more poorly than objective-c autorelease, but poorer than cocos2d-x autorelease.
  • still need to implement some sort of delayed destruction of a shared pointer when returning objects, so same timing issues arise with multi thread environments.
    >
    C. manual retain/release
    Pros:
  • simple and clear
  • works in multi threaded environments (can provide separate CCObject and CCObjectThreadSafe as base classes)
    Cons:
  • use cases are ugly.
  • high risk of leaks and over release issues.
    >
    >
    I really see A and B as the same thing, unless I am missing something. The issue is when to relinquish control of an object, which in multithreaded environments is a difficult problem.
    >
    One possibility is the release action I mentioned earlier. If actions were pushed down to CCObject instead of CCNode, that would allow actions to be used on any object as well as allowing for a delayed release action to be performed. The scheduler would need to be thread safe, but that should happen anyway imo.
    >
    Another possibility is to implement our own shared_ptr in cocos and have a lightweight variant for classes that are not intended to be used in separate threads, and the standard version for objects that tend to be used in multiple threads, e.g. socket wrappers, timers, and other asynchronous things.
    >
    Another way I have used C is to agree on a convention with regards to receiving pointers from method calls. The convention being that if the pointer is returned by return value, then it is not retained, but if the method takes a pointer reference, then the reference is filled with a retained pointer. The downside of this is that you still have all the over/under release issues unless you do the right thing whereas A and B do the right thing unless you do something, which I think is a better pattern.
    >
    To sum up…
    >
    I am in favor of A or B provided that…
    >
  1. The threading issues are solved in an elegant fashion.
  2. There exist lightweight implementations for classes that do not need thread safety.
    >
    Thanks
    Justin

shared_ptrs are not heavy weight if you pass them by reference. More often than not you would want to do this, not even to avoid the atomic add inside but simply that the context made otherwise unnecessary. If you pass them by reference everywhere, it sort of becomes a lightweight version of itself that ignores thread safety.

Hi Riq,
Since we are moving towards C**, can you replace all those enums with enum classes -

http://www.cprogramming.com/c**11/c++11-nullptr-strongly-typed-enum-class.html

I believe this would improve the overall readability of the code and also helps when using intelisense.
Isn’t TextAlignment::Left better than kCCTextAlignmentLeft?

Please look into this.

regards,
Raghu S

I believe this would improve the overall readability of the code and also helps when using intelisense.
Isn’t TextAlignment::Left better than kCCTextAlignmentLeft?

yes, you are correct. strongly typed enums seems be much better than regular enums.

I think removing the autorelease feature will break the API more as will the introduction of shared_ptr.
Removing autorelease will introduce subtle runtime errors (i.e. memory leaks) on existing code that might get ported to v3. Introducing shared pointers would cause compile time issues only.

I would recommend the following steps:

  • create v3 API with the retain/release/autorelease feature
  • once the alpha version is completed, we should port this to shared_ptr + STL containers (including the Test Application)
  • once done we could check any issues with performance and third party libraries

Some other things I’d like to see changed for v3:
* separate the library from projects/samples (the library should reside in a sub path of the projects and not the other way around)
* separate platform code and create clean interfaces for subsystems like window management, font rendering, input handling. Platform specific project files should be removed from the source tree and put in an extra folder).

Should I open a new thread for these issues or is there an existing place where such things could be discussed? (Not sure if the JS group is the right place)

Should I open a new thread for these issues or is there an existing place where such things could be discussed? (Not sure if the JS group is the right place)

Thanks for the feedback.
Yes, please, open a new thread for that.
Feel free to discuss it on the “cocos2d-js-devel” as well. The name says “js”, but currently we are discussing any kind of cocos2d-x / cocos2d-html5 devel stuff.
Thanks!

Andre Rudlaff wrote:

I think removing the autorelease feature will break the API more as will the introduction of shared_ptr.
Removing autorelease will introduce subtle runtime errors (i.e. memory leaks) on existing code that might get ported to v3. Introducing shared pointers would cause compile time issues only.
>
I would recommend the following steps:

  • create v3 API with the retain/release/autorelease feature
  • once the alpha version is completed, we should port this to shared_ptr + STL containers (including the Test Application)
  • once done we could check any issues with performance and third party libraries
    >
    Some other things I’d like to see changed for v3:
    * separate the library from projects/samples (the library should reside in a sub path of the projects and not the other way around)
    * separate platform code and create clean interfaces for subsystems like window management, font rendering, input handling. Platform specific project files should be removed from the source tree and put in an extra folder).
    >
    Should I open a new thread for these issues or is there an existing place where such things could be discussed? (Not sure if the JS group is the right place)

I our project we already ported big codebase with manual retain/release to automatic reference counting with smart pointers (CCPointer from Igor Zavorotkin and CCWeakPointer inside our tweaked cocos2d-x). So this code can be ported to shared_ptr/weak_ptr without any leaks. I see no actual difficulties, but maybe we should wrote manual about porting code that uses autorelease pool to new automatic and multithreaded mechanism. This manual can be part of cocos2dx 3.0 porting guide.

Andre Rudlaff wrote:

Some other things I’d like to see changed for v3:
* separate the library from projects/samples (the library should reside in a sub path of the projects and not the other way around)
* separate platform code and create clean interfaces for subsystems like window management, font rendering, input handling. Platform specific project files should be removed from the source tree and put in an extra folder).
Also, amount of platform-dependent code can be reduced significantly. At least in CocosDenshion, which can use one OpenAL engine on all platforms except android, marmalade and windows phone / winrt: https://github.com/sergey-shambir/cocos2d-x/commits/pitchpangain

I our project we already ported big codebase with manual retain/release to automatic reference counting with smart pointers (CCPointer from Igor Zavorotkin and CCWeakPointer inside our tweaked cocos2d-x). So this code can be ported to shared_ptr/weak_ptr without any leaks. I see no actual difficulties, but maybe we should wrote manual about porting code that uses autorelease pool to new automatic and multithreaded mechanism. This manual can be part of cocos2dx 3.0 porting guide.

If you ported everything to shared_ptr you won’t have any leaks. I meant leaks will occur if the autorelease feature will be removed and we still need to use retain/release:

This code will work using shared_ptr and retain/release with autorelease.

cocos2d::CCAction* action = cocos2d::CCAction::create();

However if the autorelease feature is removed, this code will introduce a memory leak, as the object is no longer destroyed by draining the autorelease pool. Therefore it would be extremly hard and error-prone to port existing code to cocos2d-v3.
Did you noticed any issues when porting to shared_ptr? Performance/Bugs?

I’ve created another thread for further discussing the project structure/platform issues: http://www.cocos2d-x.org/boards/6/topics/30456

All things being equal, it would be better to use std::shared_ptr going forward than the current memory management system. I think doing so would remove one of the biggest hurdles to lots of people adopting cocos2d-x actually. When you first start using cocos2d-x the retain()/release()/autorelease() stuff is really confusing — it takes a long time and many debugger sessions to convince yourself that you do not have memory leaks — especially for people who are not coming from an Objective-C background, which at this point is pretty much everyone.

Also I agree with Andre Rudlaff above, removing autorelease() functionality but leaving retain()/release() semantics will subtly break just about every project that uses cocos2d-x that has ever been written. It would be better to make a change that causes compile time errors when used naively with old code; otherwise, these forums will be filled with posts about mysterious memory leaks and replies explaining that v3.0 doesn’t do autorelease ad nauseum for the rest of time.

If it is not practical to switch over to shared_ptrs in a reasonable amount of development time then don’t change anything regarding managing memory. Otherwise switch to shared_ptr etc. however don’t get into a situation where the scope of this change holds up the release of v3.0 forever such as what happened to SDL 1.3 and Perl 6.

Joe Wezorek wrote:

Also I agree with Andre Rudlaff above, removing autorelease() functionality but leaving retain()/release() semantics will subtly break just about every project that uses cocos2d-x that has ever been written. It would be better to make a change that causes compile time errors when used naively with old code; otherwise, these forums will be filled with posts about mysterious memory leaks and replies explaining that v3.0 doesn’t do autorelease ad nauseum for the rest of time.

agreed. Either we keep using retain/release/autorelease all together, or with switch to shared pointers. Keeping retain/release without autorlease will bring us many problems.

however don’t get into a situation where the scope of this change holds up the release of v3.0 forever such as what happened to SDL 1.3 and Perl 6.

:slight_smile: Nice advise. Thanks!

Sergey Shambir wrote:

I our project we already ported big codebase with manual retain/release to automatic reference counting with smart pointers (CCPointer from Igor Zavorotkin and CCWeakPointer inside our tweaked cocos2d-x). So this code can be ported to shared_ptr/weak_ptr without any leaks. I see no actual difficulties, but maybe we should wrote manual about porting code that uses autorelease pool to new automatic and multithreaded mechanism. This manual can be part of cocos2dx 3.0 porting guide.

Did you notice any performance issue when you migrated to shared pointers ?
Did you also migrate cocos2d-x to shared pointers, or you just wrapped the cocos2d-x objects in your own custom shared pointer ?

Thanks!

Ricardo Quesada wrote:

Sergey Shambir wrote:
>
> I our project we already ported big codebase with manual retain/release to automatic reference counting with smart pointers (CCPointer from Igor Zavorotkin and CCWeakPointer inside our tweaked cocos2d-x). So this code can be ported to shared_ptr/weak_ptr without any leaks. I see no actual difficulties, but maybe we should wrote manual about porting code that uses autorelease pool to new automatic and multithreaded mechanism. This manual can be part of cocos2dx 3.0 porting guide.
>
Did you notice any performance issue when you migrated to shared pointers ?
Did you also migrate cocos2d-x to shared pointers, or you just wrapped the cocos2d-x objects in your own custom shared pointer ?
>
Thanks!

I’ve not profiled smart pointers separately, but there was no any significant portion (even 1%) when runned profiler to debug other things.

Used shared pointer: https://github.com/ivzave/cocos2dx-ext/blob/9ea6260f740959392446e429af6a32386470c9e9/CCPointer.h (with minimal tweaks like printing stack trace when accessing null pointer via -> or * operator).

Used weak pointer: https://github.com/cocos2d/cocos2d-x/pull/2481 (used for CCNode, CCNode destructor updates weak refs map to reduce perfomance overhead).

Used rules:
* Each CCNode * class member stored by weak reference or as raw pointer, since scene already handles it’s lifetime
* Each data structure member stored by strong reference, since it doesn’t keep big chunks of data like texture
* Http requests used with own domain-specific APIs which wrap around HttpRequest extension and take std::function callback. Each callback should be lambda function which captures strong reference to object:

 so object still exists when callback executed.
});

To be honest, even the performance of shared_ptr is as bad as mentioned by Zhe Wang, nearly 5x slower than raw pointer. it’s still a slight costs on almost all the platforms nowadays. And as a result of this potential modification, the readability and usability of engine interfaces could get a rapid improvement.

Moreover, in case of some heavy situation that never take into my consideration, i want to address some basic practices for shared_ptr.

  1. use make_shared when create a shared_ptr;
  2. call-by-ref;
    *****3. use raw pointer when the operation happens inside the engine if necessary .

Ekaj Mai wrote:

To be honest, even the performance of shared_ptr is as bad as mentioned by Zhe Wang, nearly 5x slower than raw pointer. it’s still a slight costs on almost all the platforms nowadays. And as a result of this potential modification, the readability and usability of engine interfaces could get a rapid improvement.
>
Moreover, in case of some heavy situation that never take into my consideration, i want to address some basic practices for shared_ptr.

  1. use make_shared when create a shared_ptr;
  2. call-by-ref;
    *****3. use raw pointer when the operation happens inside the engine if necessary .

Second that. 5x will be amortized to nonexistence in any decently complicated game, not to mention pass-by-ref can be used to get rid of almost any penalty in most situations. Please avoid explicit retain/release since it’s ugly to a point that it reduces productivity.

Linus L wrote:

Second that. 5x will be amortized to nonexistence in any decently complicated game, not to mention pass-by-ref can be used to get rid of almost any penalty in most situations. Please avoid explicit retain/release since it’s ugly to a point that it reduces productivity.

Interesting. I agree that passing smart pointers as reference will be faster. Carlo and Justin (teammates) did some experiments, and if you pass the smart pointers as reference the compiler won’t accept polymorphism.

// passing by value
// it accepts polymorphism, eg: accepts classes like std::shared_ptr
void Node::addChild( std::shared_ptr child ){ }

// passing by reference
// But in this case, it won't accept std::shared_ptr
void Node::addChild( std::shared_ptr &child ){ }

Any idea how to solve it ? Is it possible at all ?
Thanks!