Segfaults on Android

Hello,

I am developing an Android application (API version 17, NDK version r8e) with the Cocos2D-x framework, version 2.1rc0-x-2.1.3 and running into segfault issues. The application is nearly complete and works perfectly on iOS. I believe that my installation and setup are working as intended, although I have not found installation instructions that are both comprehensive and up-to-date, so that might be the issue.

The relevant organization works like this: a controller class (call it the Controller) manages a list of Square objects, each of which has a pointer to a Shadow. The Square and Shadow objects are subclasses of CCSprite, and are held in a CCArray in the Controller. The Squares are constructed by the game into a 4x4 grid, and only some Squares have a Shadow (if a Square does not have a Shadow, the pointer to the Shadow is set to NULL). The Controller populates the Squares in its constructor, and by logging each Square’s pointer to its Shadow, I can see that they are valid after they are created. However, very soon afterward, some of the Squares suddenly have very strange pointer locations; some of the non-null pointers have turned to addresses such as 0x11 and some of the NULL pointers have also changed to invalid addresses. What is very strange about this behavior is that it is entirely repeatable: the exact same Squares have the exact same invalid Shadow pointer each execution. Moreover, if I remove all of the references to the Shadows, the application continues to segfault at the address 0xdeadbaad, which I believe is a placeholder pointer that indicates an issue in malloc itself.

Some reference code (I can’t distribute all of the code, unfortunately, but these are the most relevant parts):

Controller

Controller::Controller(CCSize size) {

    // some initialization code here... 

    for (int row = 0; row < kNumRows; row++) {
        CCArray* squareRow = CCArray::create();
        for (int col = 0; col < kNumCols; col++) {
            Square *square = Square::create();

            // initialize the square here; i.e. set its position, texture, etc.

            if (row == 0) { // only squares in the top row get shadows

                // initialize the shadow here

                square->setShadow(shadow);
                sqaure->setHasShadow(true);
                this->addChild(shadow,-1);
            }

            square->retain();

            this->addChild(square);
            squareRow->addObject(square);
        }

        this->squares->addObject(squareRow);
    }

    this->squares->retain();

    // more initialization here...
}

Square

CCSprite* Square::getShadow() {
    CCLog("got shadow: %p", this->shadow);

    if (!this->hasShadow) {
        return NULL;
    }

    return this->shadow;
}

void Square::setShadow(CCSprite* shadow) {
    CCLog("set shadow: %p", shadow);
    this->shadow = shadow;
    this->shadow->retain();
}

bool Square::getHasShadow() {
    return this->hasShadow;
}

void Square::setHasShadow(bool hasShadow) {
    this->hasShadow = hasShadow;
}

Square* Square::create() {
    Square* s = (Square*)CCSprite::create();

    // run some initialization on the square here

    s->hasShadow = false;
    s->shadow = NULL;

    return s;
}

And, finally, a typical stack trace, with some extra debugging information that is not in the code above:

07-16 18:05:30.118: D/cocos2d-x debug info(1332): set shadow: 0x2a14feb0
07-16 18:05:30.118: D/cocos2d-x debug info(1332): set shadow: 0x2a14a9c0
07-16 18:05:30.128: D/cocos2d-x debug info(1332): set shadow: 0x2a14f7e0
07-16 18:05:30.128: D/cocos2d-x debug info(1332): set shadow: 0x2a13b1b0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 0, 0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x2a14feb0        // this one is correct
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 0, 1
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x2a14a9c0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 0, 2
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x11              // should be 0x2a14f7e0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 0, 3
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x13              // should be 0x2a13b1b0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 2, 0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 2, 1
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 2, 2
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 2, 3
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 3, 0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x8b              // should be 0x0
07-16 18:05:30.138: D/cocos2d-x debug info(1332): square: 3, 1
07-16 18:05:30.138: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.158: D/cocos2d-x debug info(1332): square: 3, 2
07-16 18:05:30.158: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.158: D/cocos2d-x debug info(1332): square: 3, 3
07-16 18:05:30.158: D/cocos2d-x debug info(1332): got shadow: 0x0
07-16 18:05:30.158: A/libc(1332): @@@ ABORTING: LIBC: HEAP MEMORY CORRUPTION IN dlmalloc
07-16 18:05:30.158: A/libc(1332): Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 1346 (Thread-78)

You can see that the error was caused by “heap memory corruption”, but the issue that I’m more worried about is that some of the shadow pointers are incorrectly set when they’re being outputted.

Any suggestions would be greatly appreciated! I’ve been attempting to fix this problem for quite some time, and although I’m not extremely experience in C++ and this is my first Cocos2d-x project, I have tried everything that I could think of.

Thanks!

Probably you decreased reference count to zero somewhere else, for either Square or Shadow object.

P.S. You can use smart pointers to ensure that this never happens: https://github.com/ivzave/cocos2dx-ext/blob/master/CCPointer.h

Sergey Shambir wrote:

Probably you decreased reference count to zero somewhere else, for either Square or Shadow object.
>
P.S. You can use smart pointers to ensure that this never happens: https://github.com/ivzave/cocos2dx-ext/blob/master/CCPointer.h

Hi! Thanks for the quick reply.

I’m afraid I don’t follow you entirely, but I have put checks in place to make sure that I am not modifying the value of any Shadow pointer. There are only two direct references to a Shadow pointer in the Square class (and the pointer is private): in the getter (getShadow()) and setter (getShadow()) methods. In both methods, the pointer value is logged, and so you can see in the example output that I am not directly calling the setter with bad pointers. Of course, you could be right, can you clarify what you mean a bit more?

Thanks for the smart pointer tip - I’ll look into that.

You can try using ndk-stack.

On Windows

  1. Add adb to paths if you haven’t.
  2. Connect the device to your machine, test it, then replicate the crash.
  3. Open Command Prompt and type in:

adb logcat | <YOUR_NDK_ROOT_FOLDER>\ndk-stack.exe -sym <YOUR_PROJ.ANDROID_FOLDER>\obj\local\armeabi

  1. A bunch of addresses should pop-up. If you read the lines, you can see some **.cpp files on the end. That’s where you want to look at.
    **On Mac*
  2. Do step 1 and 2 above.
  3. Open Terminal and type in:

adb logcat | <YOUR_NDK_ROOT_FOLDER>/ndk-stack -sym <YOUR_PROJ.ANDROID_FOLDER>/obj/local/armeabi

  1. See step 4 above.

Lance Gray wrote:

You can try using ndk-stack.
>
On Windows

  1. Add adb to paths if you haven’t.
  2. Connect the device to your machine, test it, then replicate the crash.
  3. Open Command Prompt and type in:
    >
    adb logcat | <YOUR_NDK_ROOT_FOLDER> dk-stack.exe -sym <YOUR_PROJ.ANDROID_FOLDER>objlocalarmeabi
    >
  4. A bunch of addresses should pop-up. If you read the lines, you can see some **.cpp files on the end. That’s where you want to look at.
    >
    >**On Mac*
  5. Do step 1 and 2 above.
  6. Open Terminal and type in:
    >
    adb logcat | <YOUR_NDK_ROOT_FOLDER>/ndk-stack -sym <YOUR_PROJ.ANDROID_FOLDER>/obj/local/armeabi
    >
  7. See step 4 above.

Thanks for the reply Lance!

I have been using the NDK stack, but with no real purpose. The underlying problem is that the shadow pointers get off in the first place, it makes sense afterward that the program segfaults on their faulty addresses. Do you have any idea how the pointers could be corrupted?

Anyway, here is the output matching the original stack trace:

Build fingerprint: 'generic/sdk/generic:4.2.2/JB_MR1.1/576024:eng/test-keys'
pid: 733, tid: 747, name: UNKNOWN  >>> com.my.project <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad
Stack frame #00  pc 0000efd4  /system/lib/libc.so
Stack frame #01  pc 000100cf  /system/lib/libc.so (dlmalloc+358)
Stack frame #02  pc 0000ceff  /system/lib/libc.so (malloc+10)
Stack frame #03  pc 0046e12c  /data/app-lib/com.my.project-1/libgame.so (operator new(unsigned int)+20): Unable to locate routine information for address 46e12c in module /home/me/cocos2d-x/myproject/android/obj/local/armeabi/libgame.so
Crash dump is completed

When I was not checking and validating each pointer, I would get stack traces that would directly point to the line attempting to access the pointer, which makes sense. When I added

if (!this->hasShadow) {
    return NULL;
}

to Square::getShadow() I started getting this error instead.

I rebuilt the application nearly from scratch with very few features and no dangerous pointers, but it still crashes. However, it now only crashes about 50% of the time, randomly. I have come to the conclusion that the issue lies with the memory allocation on Android: either it has an intrinsic bug (unlikely), I have configured it incorrectly (my best guess) or the application is managing memory incorrectly (unlikely in my opinion since it runs perfectly on iOS). I would greatly appreciate any suggestions to how I can resolve this problem: either to properly set up memory allocation (if there’s anything specific that I need to do and may have missed) or to debug my code. Again, the NDK stack traces lead nowhere, and seem to mostly point to problems inside cocos2dx.

Thanks in advance for all the help.

I am also facing these seemingly random crashes surround malloc. Has anyone made any progress on this issue?

I have had this error

@ ABORTING: LIBC: HEAP MEMORY CORRUPTION IN dlmalloc

and other ones in malloc many times when I get messy with my memory.

Just like every native system (memory, sound, GL, etc) they all have quirks — I suspect you’ve got the kind of bugs I often have with leaks under Android. It seems to be much more sensitive than iOS.

I am very new to cocos2d-x, so my answer may be very naive and totally incorrect, but I think the problem is the following:

  1. CCSprite is an object of size N bytes.
  2. Square is a subclass of CCSprite, so its size is N + K bytes, where K is the size of the extra members in Square.
  3. Square::create() calls CCSprite::create() which (presumably) allocates a CCSprite object of size N bytes.
  4. Square::create() then casts the CCSprite pointer to a Square pointer. To me, this does not seem to make sense if the object is really a CCSprite object.
  5. Square::create() then writes to s~~>hasShadow and s~~>shadow. Note that this is beyond the end of the CCSprite object which is only N bytes long.

Ultimately, the issue is that the object from CCSprite::create() is only a CCSprite object, but it’s being used as a Square object which is supposed to be bigger. (As well as the fact that the Square constructor never ran on the object because it is just a CCSprite object.)

Why does this only sometimes crash? The theoretical answer is because the result of accessing memory outside of allocated space is ‘undefined’. The practical answer is because different memory allocators may have different padding after allocations, so sometimes you’ll write into padding space and sometimes you’ll overwrite some memory allocator internal data structure or whatever happens to be afterwards in memory.

How does one fix this? Allocate the Square object the “proper” way. Unfortunately, I’m too new to cocos2d-x to know what that way is, but I presume this is answered in early tutorials or someone else can chime in.

Thanks. (And my apologies if I’m totally wrong.)

In C*+ this line:
Square* s = CCSprite::create;
is called a C-cast and it’s not the right thing to do.
Malcolm is right on the money, above.
“static_cast<>() gives you a compile time checking ability, C-Style cast doesn’t.”
If you use static_cast this line won’t even compile because it’s not valid C*+.