Proposal for v3.0: shared_ptr vs. manual retain release

ricardo · June 21, 2013, 8:57pm

For cocos2d-x v3.0.

should we use shared_ptr or manual retain / release ?

So, my first was to use shared_ptr, but then I realized that that would imply changing the whole API, so I though that using manual retain/release would be OK (like in v2.1), so I proposed to keep using manual retain/release.

If we are going to use manual retain/release, should we still use autorelease, or not ?

Let’s see:

A)

// manual retain/release with autorelease
    auto action = Sequence::create(
        MoveBy::create( 2, ccp(240,0)),
        RotateBy::create( 2,  540),
        ScaleTo::create(1,0.1f),
        RemoveSelf::create(),
        NULL);

B)

// same example with shared_ptr
    auto action = Sequence::create(
        MoveBy::create( 2, ccp(240,0)),
        RotateBy::create( 2,  540),
        ScaleTo::create(1,0.1f),
        RemoveSelf::create(),
        NULL);

C)

// same example with manual retain/release, but WITHOUT autorelease
// too long
    auto a1 = MoveBy::create( 2, ccp(240,0)),
    auto a2 = RotateBy::create( 2,  540),
    auto a3 = ScaleTo::create(1,0.1f),
    auto a4 = RemoveSelf::create(),

    auto action = Sequence::create(a1, a2, a3, a4, NULL);

    a1->release();
    a2->release();
    a3->release();
    a4->release();

And I don’t like the code for retain/release without autorelease.

I would go with A) or B).

What are your thoughts ?

k3vn · June 21, 2013, 11:01pm

I’d like to see the objective-c style retain/release/autorelease code removed in favor of shared_ptr (or something similar to Boost Pointer Container).

In comparison to the C++ engines I’ve used in the past cocos2d-x uses an extremely objective-c influenced style, which is understandable considering it’s roots; Although, if the goal is to do a major rewrite of much of the core architecture it would be wise to break away from the objective-c paradigm. Gameplay3D is a very good example to look at for influence. Out of curiosity, is there a purpose to having classes such as CCString, CCDictionary, ccArray, CCFloat, CCObject, etc, instead of using STL? Thanks

ricardo · June 22, 2013, 12:08am

Kevin H wrote:

I’d like to see the objective-c style retain/release/autorelease code removed in favor of shared_ptr (or something similar to Boost Pointer Container).

Yes, that’s what we are trying to do for v3.0. To remove all the objective-c patterns.

Gameplay3D is a very good example to look at for influence.

As far as I know, GamePlay3d doesn’t use smart_prt, instead it uses grab, release but without autorelease.

Out of curiosity, is there a purpose to having classes such as CCString, CCDictionary, ccArray, CCFloat, CCObject, etc, instead of using STL? Thanks

They were added, I guess, to be compatible with the objective-C version, and also because they support the “retain/release” model.

k3vn · June 22, 2013, 2:02am

Yes, that’s what we are trying to do for v3.0. To remove all the objective-c patterns.

Ricardo, Thanks for your response. Sorry about the Gameplay3D reference that was a little out of context, since it does indeed use a retain/release model. I’m excited about the direction you plan to take with v3.0 and really appreciate your contributions.

zhangxm · June 22, 2013, 3:04am

Yep.
Manual retain and release without autorelease will generate ugly codes.
Use share_ptr without instead of retain/release will be more c++ friendly and make codes clean.

I have not idea.
To be honest, i prefer to use share_ptr without thinking the big modification.

zhangxm · June 22, 2013, 3:05am

But if our goal is to be c++ friendly, we should do it completely.

hohohmm · June 22, 2013, 6:07am

Glad you are working on getting rid of all that. Right now my eye just bleed when I see retain/release.
I strongly suggest using shared_ptrs though Walzer(Wang Zhe?) was saying shared_ptr is 5x slower than raw pointers.
I don’t have too many reasons other than that shared_ptr is kind of native to C++, and that it just makes writing code a lot easier, without having to understand autorelease pools. It’s kinda intrusive but I don’t mind that.

quiet_readonly · June 22, 2013, 7:23am

Kevin H wrote:

I’d like to see the objective-c style retain/release/autorelease code removed in favor of shared_ptr (or something similar to Boost Pointer Container).
>
In comparison to the C++ engines I’ve used in the past cocos2d-x uses an extremely objective-c influenced style, which is understandable considering it’s roots; Although, if the goal is to do a major rewrite of much of the core architecture it would be wise to break away from the objective-c paradigm. Gameplay3D is a very good example to look at for influence.
I see no benefits in moving from retain/release to shared_ptr, since retain/release methods allow to create shared pointer upon them. Moreover, shared_ptr have 1 reference counter for 1 object, but keeps it apart from object, so user cannot use std::shared_ptr(this). There are workaround [1], but retain/release which keeps reference counter with object is more natural than shared_ptr from STL.

However, current implementation have limitations
* Reference counter is not an atomic variable, so multithread reference counting causes races.
* Autorelease pool works only in main thread. It can be fixed in two ways: 1) remove autorelease 2) put autorelease pool to thread local storage (unfortunately, c++11 TLS still have issues on older iOS/MacOSX and implemented properly in clang 3.2; but pthreads also have thread local storage). Multithreaded autorelease also requires own threading API which can ensure that thread-local autorelease pool will be created, updated and destroyed.
* Retain/release also needs own smart pointer, and maybe own weak pointer.
* CCObject subclasses are copyable and can be created not only by pointer.

[1] http://en.cppreference.com/w/cpp/memory/enable_shared_from_this

walzer · June 22, 2013, 11:21am

Sergey Shambir, I think the benefit of std::shared_ptr keeps reference counter outside of object, is std::weak_ptr support.

Others:

(A) retain/release + autorelease pool

Using retain/release + autorelease pool will make multi-threading very complicated.
In 2.x, we purge autorelease pool at the end of each frame; while in multi-threads, we don’t know an object when will an object be released: at the end of each frame drawing in render thread? at the end of each schedule call of physics thread? or other threads? So using v2.x retain/release + autorelease is not a good idea.

(B) shared_ptr

Is using shared_ptr very slow? No, it’s not very slow, it’s SUPER slow.
Take a look at my performance test https://github.com/walzer/cocos2d-x/tree/SmartPointerPerformance. In this test, I use retain/release vs. shared_ptr to implement a basic node tree then update it. Finally I add 10 layers in a scene, and 50 nodes in each layer, so totally I visit 50 * 10 10 1 = 511 nodes in each frame. Then I run it 100 times to get the average time.

||
|Test Device|CPU|iOS version|build|shared_ptr|retain/release|x times slower|
|iPad Mini|1 GHz dual-core ARM Cortex-A9|iOS 6.1.3|release|13.229430 ms|5.392210 ms|2.45|
|Galaxy Nexus|1.2 GHz dual-core ARM Cortex-A9|Android 4.1.2|debug|5.724440 ms|1.210641 ms|4.72|

As I mentioned before, 5 times slow happens on Android.
Please point out what’s wrong in my code https://github.com/walzer/cocos2d-x/tree/SmartPointerPerformance.
If 2711 times slower on iOS is true, then that’s a big deal

On the other hand, please take a look at google v8’s source code https://code.google.com/p/v8/wiki/Source. V8 heavily uses C*+ templates, but in most codes they use C*+ pointer and new/delete C*+ objects manually, only use Handle to wrap c*+ objects in a few parts. So I think in a performance-eager project, we should avoid using shared_ptr. “FAST” is a selling point of cocos2d for a long long history.

© retain/release manually

That’s the way I proposed.

ricardo · June 22, 2013, 12:02pm

This performance tests were run on debug build on both iOS & Android.

Could you run the test again in release mode ?
We should not run performance tests in debug mode since the compiler does not use optimizations. And in the end, users will only submit release builds to the App Store.
Thanks!

IDP, I find it 2711 times slower on iOS 6.

Ouch!
Hopefully in release mode this number is much smaller.

gelldur · June 22, 2013, 12:02pm

I think speed is most important and std::shared_ptr is to slow. Autorelease is needed without it we have a lot of ugly code :(. Also retain/release is bad for multi thread and as we know mobile devices have 2 + cores nowdays so we should focus how to solve problem on multicore release/retain.

walzer · June 22, 2013, 12:56pm

Minggo, I don’t think you will define “clean” with the shared_ptr usage:

std::shared_ptr CCSequence::create( std::vector>& arrayOfActions);

or even

std::shared_ptr CCSequence::create( std::shared_ptr>> arrayOfActions);

walzer · June 22, 2013, 1:00pm

Ricardo Quesada wrote:

Could you run the test again in release mode ?
We should not run performance tests in debug mode since the compiler does not use optimizations. And in the end, users will only submit release builds to the App Store.
Hopefully in release mode this number is much smaller.
Hey man, you work overnight to 3:00am?

I just test iOS release mode,
Cocos2d: Avg share_ptr duration 13.229430 ms
Cocos2d: Avg c*+ ptr duration 5.392210 ms
2.45 times slower. It’s much better than debug mode.
But obviously, something wrong here: share_ptr and c*+ pointer in release mode ran slower than debug mode on iOS

fabiobh · June 23, 2013, 12:57pm

If we are going to use manual retain/release, should we still use autorelease, or not ?

I vote for the A style, my first contact with C++ happens when I choose to use cocos2d-x. I like the autorelease, use

Is using shared_ptr very slow? No, it’s not very slow, it’s SUPER slow.

I think is better choose a style that games run faster than a easy way to code.

Weeds · June 24, 2013, 12:37am

I would prefer using the shared_ptr solution, if the performance hit can be put into an acceptable range and the following assumptions are true:
* shared_ptr can be used on all major plattforms (Win/Mac/Linux, iOS/Android)
* shared_ptr works when using third-party (C-)libraries. (i.e. can I put a shared_ptr in some void* userdata hold by a library object)

`Zhe Wang
How do you profile the code, you should not use gettimeofday or another realtime clock for that purpose. You should use something like:

clock_gettime(CLOCK_THREAD_CPUTIME_ID, &timer);

As this only measures the time taken for the current thread. Otherwise you may get inaccurate results depending on whats going on at the device.

For clarity of code. I would assume that there will be typedefs for accessing the object classes. Maybe even the current base name is a typedef:

typedef std::shared_ptr CCAction;

class CCActionType {

}

Performance issues:
Do you add the objects to the scene graph somewhere? If no, then the shared_ptr implementation may delete the objects causing an additional call to the destructor of each object.
The C-Pointer test keeps the references in the autorelease pool, so the objects are not destroyed during your test case.

You may also check function parameters. We should use call-by-reference for shared_ptr to reduce copying these objects.
So instead of writing:

void addChild(shared_ptr node);

It is better to use:

void addChild(const shared_ptr& node);

Another thing that might improve performance is using std::make_shared for creating the object.
See http://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared

This function typically allocates memory for the T object and for the shared_ptr’s control block with a single memory allocation (it is a non-binding equirement in the Standard). In contrast, the declaration std::shared_ptr p(new T(Args…)) performs at least two memory allocations, which may incur unnecessary overhead.

Using shared ptr has the benefit of being thread-safe (AFAIK). This will allow us to implement a multi-threaded engine much more easily. And I would guess splitting game code and rendering code into two different threads will bring a greater performance gain then the hit we suffer from using shared_ptr.
I won’t also consider option C as an alternative here, as this takes the whole burden of synchronizing data access to the application programmer.

`Ricardo
I wouldn’t worry about changing the API. If the names of the object classes or the cocos namespace is about to change, I wouldn’t consider cocos2dx-3 backwards compatible with 2.0 anyway.
And if you want to get away from the ObjC style syntax the API most certainly will be changed.

Will there be some general discussion about what should change in 3.0 or how the API is about to change?

Weeds · June 24, 2013, 1:40am

I’ve written a small test-case for the performance issues. It simply creates 1.000.000 pointer objects, calls a function with the pointer object and accesses data members of the pointer object:

class TestObject {
    public:
        TestObject(int a) : a(a), b(1), c(1){
        }

        virtual ~TestObject() {

        }

        int a;
        int b;
        int c;
};

int t = 0;
void foo1(TestObject* tt) {
    tt->a = tt->a + tt->b + tt->b;
    t += tt->a;
}

void foo2(std::shared_ptr tt) {
    tt->a = tt->a + tt->b + tt->b;
    t += tt->a;
}

void foo3(std::shared_ptr const& tt) {
    tt->a = tt->a + tt->b + tt->b;
    t += tt->a;
}

void foo4(std::shared_ptr const& tt1) {
    TestObject* tt = tt1.get();
    tt->a = tt->a + tt->b + tt->b;
    t += tt->a;
}

void foo5(std::shared_ptr tt1) {
    TestObject* tt = tt1.get();
    tt->a = tt->a + tt->b + tt->b;
    t += tt->a;
}


void somefunc() {
    Profile profile;
    int counter = 1000000;
    t = 0;
    profile.start(); 
    for ( int i = 0; i < counter; i++ ) {
        TestObject* pointer = new TestObject(i);
        foo1(pointer);
        delete pointer;
    }
    profile.stop();
    CCLog("CPointer::update takes %lf s\n", profile.getDuration());

    t = 0;
    profile.start(); 
    for ( int i = 0; i < counter; i++ ) {
        std::shared_ptr pointer(new TestObject(i));
        foo2(pointer);
    }
    profile.stop();
    CCLog("Shared::update takes %lf s\n", profile.getDuration());

    t = 0;
    profile.start(); 
    for ( int i = 0; i < counter; i++ ) {
        std::shared_ptr pointer(new TestObject(i));
        foo3(pointer);
    }
    profile.stop();
    CCLog("Shared(call-by-ref)::update takes %lf s\n", profile.getDuration());

    t = 0;
    profile.start(); 
    for ( int i = 0; i < counter; i++ ) {
        std::shared_ptr pointer = std::make_shared(i);
        foo3(pointer);
    }
    profile.stop();
    CCLog("Shared(call-by-ref + make_shared)::update takes %lf s\n", profile.getDuration());
}

I have tested this on a HTC Senstion (CyanogenMod), I’ve used clock_gettime to measure the results:
CPointer::update takes 0.751556 s
Shared::update takes 1.974335 s
Shared(call-by-ref)::update takes 1.717468 s
Shared(call-by-ref + make_shared)::update takes 1.047577 s

hohohmm · June 24, 2013, 2:57am

Zhe Wang wrote:

Ricardo Quesada wrote:
> Could you run the test again in release mode ?
> We should not run performance tests in debug mode since the compiler does not use optimizations. And in the end, users will only submit release builds to the App Store.
> Hopefully in release mode this number is much smaller.
Hey man, you work overnight to 3:00am?
>
I just test iOS release mode,
Cocos2d: Avg share_ptr duration 13.229430 ms
Cocos2d: Avg c*+ ptr duration 5.392210 ms
2.45 times slower. It’s much better than debug mode.
>
But obviously, something wrong here: share_ptr and c*+ pointer in release mode ran slower than debug mode on iOS

Should take the tests further since the numbers are apparently wrong.
I wouldn’t consider 2.45 times much of a slowdown. Most of the time the game is doing a lot more than pointer manipulations. The slowdown is likely to disappear compared to other intense computation.

quiet_readonly · June 24, 2013, 5:36am

Another thing that might improve performance is using std::make_shared for creating the object.
See http://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared
>
> This function typically allocates memory for the T object and for the shared_ptr’s control block with a single memory allocation (it is a non-binding equirement in the Standard). In contrast, the declaration std::shared_ptr p(new T (Args…)) performs at least two memory allocations, which may incur unnecessary overhead.
Thanks for this clarification.

k3vn · June 24, 2013, 5:47am

After doing some tests on a Nexus 7 the average time for dereferencing a shared_ptr was around 2.25x slower than a raw pointer. I’m in agreement with Andre and Linus that the overhead of shared_ptr would likely be small and that specific subsystems could be optimized if required. The only metric I can think of to really give a good idea is to measure the percentage of time spent dereferencing raw pointers in the current build of cocos2d-x.

hohohmm · June 24, 2013, 7:33am

Andre Rudlaff wrote:

I’ve written a small test-case for the performance issues. It simply creates 1.000.000 pointer objects, calls a function with the pointer object and accesses data members of the pointer object:
[…]
>
I have tested this on a HTC Senstion (CyanogenMod), I’ve used clock_gettime to measure the results:
CPointer::update takes 0.751556 s
Shared::update takes 1.974335 s
Shared(call-by-ref)::update takes 1.717468 s
Shared(call-by-ref + make_shared)::update takes 1.047577 s

Andre could you paste the compele test case so I could try it out myself. As you mentioned, the majority of the shared_ptr overhead comes from incrementing/decrementing the internal counter and you could almost get rid of that by passing by const ref. As long as pass-by-const-ref is not off by too much compared to raw pointers, it should be good enough.
A post explaining why it’s a good idea to pass by ref in this case.