Blur at pause works very slow

I would suggest heavily downsampling the scene image, blurring it and scaling back up. Another option would be to try using native ios blurring which seem to be quite fast, perhaps it’s worth checking out how it works (exact parameters like size of the blurring window, downsampling rate etc)

1 Like

Hi,
try this last example, is clearer and the blur looks ok to me.
RenderTexture.zip (46.0 KB)

On my computer It is also three times faster than the shader you had before. By downscaling the image before applying the blur, it becomes 9 times faster aprox.

for the record this is the difference in the image:
Complex blur:

Uglier but faster blur:

@Victor_K have you tried some other fast techniques? Also, maybe you have some ideas about creating same as in badland 2 blur type?

@zerodarkzone thank you. I’ve just tested it on device and slow speed was a code for creating RenderTexture, so I’ve modified it and creating at the beginning. Will test it more, need more test on real devices, such as iPad Mini, iPad Air 1 and 2, and iPhone 7+ and so on. And omg, some crappy android devices)
Current version is here https://github.com/KAMIKAZEUA/RenderTextureBlur

In our game, Run Cow Run (Cocos2d-x 2.1) we are doing a blur by “cheating” and it works well.
Blurring on a smaller texture and resizing it with anti aliasing.

We also use it on Slots Surprise that is using an updated version of Coccos2d-x (3.x)

First of all, some shader tipps:

1.) When possible, don’t use loops/ifs (or anything else that requires branching). The GPU is optimized the most for a linear flow of execution, meaning no jumps (other than from intstruction to instruction :smiley:). Modern desktop GPUs may be better at handling branches, but embedded and especially old GPUs are not.

2.) Reduce texture cache misses. The memory latency on GPUs is usually high. This means that fetching memory off chip, uses a quite significant amount of time. Although many GPUs try to hide this by doing other work while waiting for the texture data. To reduce off chip texture memory fetches, the GPU uses a texture data cache. When a cache miss happens (a texel is fetched that is not in the cache), the current gpu thread stalls and waits for the data to be read from memory.

One common root of cache misses are dependent texture reads/fetches. These happen when the texture coordinate (the second value passed to the texture2D function) is computed at pixel-shader time. Like in the following example:

varying vec2 texCoord;
uniform float value;
void main() {
    gl_FragColor = texture2D(tex, vec2(texCoord.x - value * 0.25, texCoord.y - value * 0.25));
}

Here the texture coordinate is computed at every pixel shader iteration. This is a dependent-texture-read. An independent-texture-read (don’t know if that’s really how it’s called, but I’ll call it that way for now :smiley:) looks like this:

varying vec2 texCoord;
void main() {
    gl_FragColor = texture2D(tex, texCoord);
}

The texture coordinate value passed to the texture2D function comes straight from a varying.

But since some calculation needs to be done on the texture coordinate the above can’t be done by just removing the calculations.

This: texCoord.x - value * 0.25 forms a linear (mathematical) function. With linear functions, the following statement can be made.

f(x) = x - value * 0.25
lerp(f(x1), f(x2), t) = f(lerp(x1, x2, t))

If you already know or don’t care you can skip the proof.

f(x) = x - v * c
lerp(x,y,t) = (y - x) * t + x
f(lerp(x1, x2, t)) = ((x2 - x1) * t + x1) - v * c
                   = (x2 - x1) * t + x1 - v * c
lerp(f(x1), f(x2), t) = (f(x2) - f(x1)) * t + f(x1)
                      = ((x2 - v * c) - (x1 - v * c)) * t + (x1 - v * c)
                      = (x2 - v * c - x1 + v * c) * t + x1 - v * c
                      = (x2 - x1 - v * c + v * c) * t + x1 - v * c
                      = (x2 - x1) * t + x1 - v * c

What the above states, is that it doesn’t matter for the result if you either lineary interpolate between the results of f(x1) and f(x2), or interpolate between x1 and x2 and then use it as input to the linear function. Since varying are lineary interpolated across the primitive, this behaviour can be exploited to move the calculations to the vertex shader.

(NOTE: Varyings are actually not lineary interpolated across the primitive in general, but rather interpolated with perspective correction. But since we are drawing a non-perspective rectangle (gl_Position.w = 1.0), the interpolation remains linear.)


Now about the blur. The blur used in the video you provided seems to be a gaussian type of blur. One thing that is noticeable, is that there is almost no more detail left after blurring. This probably is the result of a very large blur radius or a heavily downscaled source image with a smaller blur radius.

Gaussian blur filters are usually very heavy (atleast on mobile), so they are avoided when possible. One trick is to downscale the source image in multiple passes to a size like 16px or so. Then draw it at full screen resolution with linear sampling enabled. This should give a reasonably fast blur effect.

3 Likes

Thats sounds like a very good answer :slight_smile: But, is there any working in real life code sample?
I’ve already posted link to the project created with help of people here, thanks again about that. And it works, but it’s different, quality if different too and speed I believe not so good as it can be.
And again I have no idea of shaders, can’t edit any shader or understand how it works… I don’t know openGL at all. But blur is a cool feature. So as a noob I just want to add some “blur panel” under pause popup or create a blur sprite with some simple function…

I used RenderTextures (not with blur) in my projects and one thing I noticed is that every single draw operation done on it (visit()) is quite slow! Even for very small image sizes.

It would be interesting to see if performance would improve if you just loaded a texture without using RenderTexture (just to try and see if the slowness issue is isolated to RenderTextures).

I don’t think it can be used in visit. It’s purpose to render once… however I would love to have so fast that it can be used in visit and update at live scene.

Hi, this looks ok to me.

Original Image:

Blurred Image:

I just downscaled the image by a factor of eight, applied a blur with a low radius and then upscaled the image to fit the screen.
RenderTextureBlur.zip (561.1 KB)

The shader is very fast, but render to texture is quite slow in mobile devices so I don’t know if there is a faster way to do it.

1 Like

Hey, but you can publish your changes at forked repo: https://github.com/zerodarkzone/RenderTextureBlur that would be easy to check. Thanks.

Also, I’ve already tried to downscale image, however after upscaling quality looks ugly(on iPhone most noticeable)… not like a pro … not like badland.

Maybe scaling should be smooth? Not just linear, like when you just setScale(); ?

And yes, shader works fast, however I’ve tried to apply it on a fly, so render scene every time in update(). But probably yes, RenderTexture is too slow.
I have a problem, because my game is moving fast vertically, and I want to apply blur at gameNode on a fly… Well, I have an idea - render a bigger part of the screen and move rendered image during pause popup animates and game slowing down… But renderTexture has some problems … I’ve created a new topic about it - RenderTexture setPosition()?

Hi,
I just made the changes to the github repo.
did you used “sprite->getTexture()->setAntiAliasTexParameters();”, this way, you apply antialising to the image when it’s scaled

Yes, I tried, scale down was 0.25 only. Results for my game was not good…
Well, it works fast currently even for a whole screen, so I probably leave as it is, without scaling down. Only one problem left is that renderTexture.(and actually main problem that image isn’t same type like badland… but I probably will accept it).

Try this branch


It is going to be slower but it looks better, I modified the code to use the first shader you tried to use but with all the changes already made so it is still a lot faster than the original example where you got the shader.

1 Like

check this out … it just might suit your needs:

Fragment shader:

#ifdef GL_ES
precision lowp float;
#endif

varying vec4 v_fragmentColor;
varying vec2 v_texCoord;

uniform vec2 PixelSize;
uniform vec2 PixelSizeHalf;

// Kernel width 35 x 35
const int stepCount = 9;
float gWeights[stepCount] = float[](
	0.10855,
	0.13135,
	0.10406,
	0.07216,
	0.04380,
	0.02328,
	0.01083,
	0.00441,
	0.00157
);
float gOffsets[stepCount] = float[](
	0.66293,
	2.47904,
	4.46232,
	6.44568,
	8.42917,
	10.41281,
	12.39664,
	14.38070,
	16.36501
);


vec3 GaussianBlur9(sampler2D tex0, vec2 centreUV, vec2 halfPixelOffset, vec2 pixelOffset) {
	vec3 colOut = vec3(0, 0, 0);
	
	vec2 texCoordOffset0 = gOffsets[0] * pixelOffset;
	vec3 col0 = texture(tex0, centreUV + texCoordOffset0).xyz + texture(tex0, centreUV - texCoordOffset0).xyz;
	
	vec2 texCoordOffset1 = gOffsets[1] * pixelOffset;
	vec3 col1 = texture(tex0, centreUV + texCoordOffset1).xyz + texture(tex0, centreUV - texCoordOffset1).xyz;
	
	vec2 texCoordOffset2 = gOffsets[2] * pixelOffset;
	vec3 col2 = texture(tex0, centreUV + texCoordOffset2).xyz + texture(tex0, centreUV - texCoordOffset2).xyz;
	
	vec2 texCoordOffset3 = gOffsets[3] * pixelOffset;
	vec3 col3 = texture(tex0, centreUV + texCoordOffset3).xyz + texture(tex0, centreUV - texCoordOffset3).xyz;
	
	vec2 texCoordOffset4 = gOffsets[4] * pixelOffset;
	vec3 col4 = texture(tex0, centreUV + texCoordOffset4).xyz + texture(tex0, centreUV - texCoordOffset4).xyz;
	
	vec2 texCoordOffset5 = gOffsets[5] * pixelOffset;
	vec3 col5 = texture(tex0, centreUV + texCoordOffset5).xyz + texture(tex0, centreUV - texCoordOffset5).xyz;
	
	vec2 texCoordOffset6 = gOffsets[6] * pixelOffset;
	vec3 col6 = texture(tex0, centreUV + texCoordOffset6).xyz + texture(tex0, centreUV - texCoordOffset6).xyz;
	
	vec2 texCoordOffset7 = gOffsets[7] * pixelOffset;
	vec3 col7 = texture(tex0, centreUV + texCoordOffset7).xyz + texture(tex0, centreUV - texCoordOffset7).xyz;
	
	vec2 texCoordOffset8 = gOffsets[8] * pixelOffset;
	vec3 col8 = texture(tex0, centreUV + texCoordOffset8).xyz + texture(tex0, centreUV - texCoordOffset8).xyz;
	
	colOut += gWeights[0] * col0;
	colOut += gWeights[1] * col1;
	colOut += gWeights[2] * col2;
	colOut += gWeights[3] * col3;
	colOut += gWeights[4] * col4;
	colOut += gWeights[5] * col5;
	colOut += gWeights[6] * col6;
	colOut += gWeights[7] * col7;
	colOut += gWeights[8] * col8;
	
	return colOut;
}

void main() {
	gl_FragColor.xyz = GaussianBlur9( CC_Texture0, v_texCoord, PixelSize, PixelSizeHalf );
    gl_FragColor.w = 0.05;
}

then in cocos2d :

	spr = Sprite::create("badland.png");
	spr->retain();
	spr->setPosition(center);
	auto texSize = spr->getTexture()->getContentSizeInPixels();

	std::string fragSource1 = FileUtils::getInstance()->getStringFromFile(FileUtils::getInstance()->fullPathForFilename("frag1.fsh"));
	auto prog1 = GLProgram::createWithByteArrays(ccPositionTextureColor_noMVP_vert, fragSource1.data());
	auto programState1 = GLProgramState::getOrCreateWithGLProgram(prog1);

	auto prog2 = GLProgram::createWithByteArrays(ccPositionTextureColor_noMVP_vert, fragSource1.data());
	auto programState2 = GLProgramState::getOrCreateWithGLProgram(prog2);

	programState1->setUniformVec2("PixelSize", Vec2(1.f / texSize.width, 0));
	programState1->setUniformVec2("PixelSizeHalf", Vec2(0.5 / texSize.width, 0));

	programState2->setUniformVec2("PixelSize", Vec2(0, 1.f / texSize.height));
	programState2->setUniformVec2("PixelSizeHalf", Vec2(0, 0.5 / texSize.height));

	rtA = RenderTexture::create(winSize.width, winSize.height);
	rtA->retain();
	rtA->getSprite()->setGLProgramState(programState1);
	rtA->getSprite()->setPosition(center);
	

	rtB = RenderTexture::create(winSize.width, winSize.height);
	this->addChild(rtB);
	rtB->getSprite()->setGLProgramState(programState2);
	rtB->getSprite()->setPosition(center);


	spr->getTexture()->setAntiAliasTexParameters();
	rtA->getSprite()->getTexture()->setAntiAliasTexParameters();
	rtB->getSprite()->getTexture()->setAntiAliasTexParameters();

	const int downScale = 8;
	spr->setScale(spr->getScale() / downScale);
	rtB->getSprite()->setScale(rtB->getSprite()->getScale() * downScale);

	rtA->beginWithClear(0, 0, 0, 0);
	spr->visit();
	rtA->end();

	rtB->beginWithClear(0, 0, 0, 0);
	rtA->getSprite()->visit();
	rtB->end();

Original img:

Blurred img:

i tried it in an update loop like this and it runs at 60fps!!

void GameScene::update(float dt) {
	rtA->beginWithClear(0, 0, 0, 0);
	spr->visit();
	rtA->end();

	rtB->beginWithClear(0, 0, 0, 0);
	rtA->getSprite()->visit();
	rtB->end();
}

and if you are curious … the shader is from here (zip file):

PS: am not an expert in shaders but i think the fragment shader can be improved even more by making the arrays uniform instead of initializing theme each time.

2 Likes

Looks very interesting.
60fps on what kind device? Some low android or like iPad mini or iPhone 4s, which is slow devices…?

i guess someone should write book on shaders for cocos2dx with c++ and how to implement them and bind them.
I have many shaders but cant use it because just don’t know how to bind them

am on win32 desktop platform.
but have you tried it first? what was the results? if you did and it didn’t give good performance you can also try this one maybe it will perform better:

#ifdef GL_ES
precision lowp float;
#endif

varying vec4 v_fragmentColor;
varying vec2 v_texCoord;

uniform vec2 dir;
uniform float radius;
uniform float resolution;

void main()
{
	//const float radius = 1.0;
	//const float resolution = 1024.0;
	
    vec4 sum = vec4(0.0);
    vec2 tc = v_texCoord;
    float blur = radius/resolution;
    float hstep = dir.x;
    float vstep = dir.y;
    //apply blurring, using a 9-tap filter with predefined gaussian weights

    sum += texture2D(CC_Texture0, vec2(tc.x - 4.0*blur*hstep, tc.y - 4.0*blur*vstep)) * 0.0162162162;
    sum += texture2D(CC_Texture0, vec2(tc.x - 3.0*blur*hstep, tc.y - 3.0*blur*vstep)) * 0.0540540541;
    sum += texture2D(CC_Texture0, vec2(tc.x - 2.0*blur*hstep, tc.y - 2.0*blur*vstep)) * 0.1216216216;
    sum += texture2D(CC_Texture0, vec2(tc.x - 1.0*blur*hstep, tc.y - 1.0*blur*vstep)) * 0.1945945946;

    sum += texture2D(CC_Texture0, vec2(tc.x, tc.y)) * 0.2270270270;

    sum += texture2D(CC_Texture0, vec2(tc.x + 1.0*blur*hstep, tc.y + 1.0*blur*vstep)) * 0.1945945946;
    sum += texture2D(CC_Texture0, vec2(tc.x + 2.0*blur*hstep, tc.y + 2.0*blur*vstep)) * 0.1216216216;
    sum += texture2D(CC_Texture0, vec2(tc.x + 3.0*blur*hstep, tc.y + 3.0*blur*vstep)) * 0.0540540541;
    sum += texture2D(CC_Texture0, vec2(tc.x + 4.0*blur*hstep, tc.y + 4.0*blur*vstep)) * 0.0162162162;
	
    //discard alpha for our simple demo, multiply by vertex color and return
    gl_FragColor = v_fragmentColor * vec4(sum.rgb, 1.0);
}

	spr = Sprite::create("badland.png");
	spr->retain();
	spr->setPosition(center);
	auto texSize = spr->getTexture()->getContentSizeInPixels();

	std::string fragSource1 = FileUtils::getInstance()->getStringFromFile(FileUtils::getInstance()->fullPathForFilename("frag1.fsh"));
	auto prog1 = GLProgram::createWithByteArrays(ccPositionTextureColor_noMVP_vert, fragSource1.data());
	auto programState1 = GLProgramState::getOrCreateWithGLProgram(prog1);

	auto prog2 = GLProgram::createWithByteArrays(ccPositionTextureColor_noMVP_vert, fragSource1.data());
	auto programState2 = GLProgramState::getOrCreateWithGLProgram(prog2);

	programState1->setUniformVec2("dir", Vec2(1, 0));
	programState1->setUniformFloat("radius", 1.f);
	programState1->setUniformFloat("resolution", 1024);

	programState2->setUniformVec2("dir", Vec2(0, 1));
	programState2->setUniformFloat("radius", 1.f);
	programState2->setUniformFloat("resolution", 1024);


	rtA = RenderTexture::create(winSize.width, winSize.height);
	rtA->retain();
	rtA->getSprite()->setGLProgramState(programState1);
	rtA->getSprite()->setPosition(center);
	

	rtB = RenderTexture::create(winSize.width, winSize.height);
	this->addChild(rtB);
	rtB->getSprite()->setGLProgramState(programState2);
	rtB->getSprite()->setPosition(center);


	spr->getTexture()->setAntiAliasTexParameters();
	rtA->getSprite()->getTexture()->setAntiAliasTexParameters();
	rtB->getSprite()->getTexture()->setAntiAliasTexParameters();

	const int downScale = 12;
	spr->setScale(spr->getScale() / downScale);
	rtB->getSprite()->setScale(rtB->getSprite()->getScale() * downScale);

	rtA->beginWithClear(0, 0, 0, 0);
	spr->visit();
	rtA->end();

	rtB->beginWithClear(0, 0, 0, 0);
	rtA->getSprite()->visit();
	rtB->end();

Update:

i never seen a high quality gpu blur shader like the one used by badland.

the best ever blur quality that i ever seen is a cpu blur called stackblur but it will never be usable even on desktop platforms and specialty if used in a 60fps update loop let alone mobile devices.

even with that being said i can’t help it but show you how good its quality is:

first cocos2d code:

spr = Sprite::create("badland.png");
spr->retain();
spr->setPosition(center);


auto rt = RenderTexture::create(winSize.width, winSize.height);
rt->retain();
rt->beginWithClear(0, 0, 0, 0);
spr->visit();
rt->end();
_director->getRenderer()->render();

auto img = rt->newImage();
stackblur(img->getData(), img->getWidth(), img->getHeight(), 127);

auto tx = new Texture2D;
tx->initWithImage(img);
img->release();

spr->release();
spr = Sprite::createWithTexture(tx);
spr->setPosition(center);
this->addChild(spr);

cpu stackblur with radius 127 :

now that is a Blur!! … and we can even go higher on the radius!!

stackblur_sequential_version.cpp (9.6 KB)

it’s from here if you wanna check:
http://vitiy.info/stackblur-algorithm-multi-threaded-blur-for-cpp/

PS: i say it again i don’t think that it will suit your needs and i just posted it just for the sake of curiosity and exploration.

2 Likes

Thanks a lot for your posting. Last looks like what I need! Soon I will try it(just need to get to pc).
I believe it should work fast when just working with one static Sprite…

What I’m trying to do next - render scene in update and apply blur. So scene can be “alive”, animating under blur… so that spr should be re-rendered on mobile device each update…

A little update: @Joseph39 I want to use current scene and apply blur on a fly. Can you please help with sample code to do so. I’m not sure how to better do this, should I just create a new render and get sprite from it then render it in rtA? Also, to test, say on the scene we have a some rotating sprite, and it should be blurred while rotating, is that possible :slight_smile: ?

upd2: tried stackblur - looks really cool and what I want! But … works very slow on mobile device…

Just tried this and it looks very strange on the device - some part of the scene going to black color… Previous version looks good and works fast. THank anyway. I’m even don’t use downscaling, but just full size image.

i think i discovered something that might give you a chance at using stackblur even in mobile phones if you down scale properly!

the idea is to apply the stackblur function at a heavily downscaled image with a smaller blur radius.

spr = Sprite::create("badland.png");
spr->retain();
spr->setPosition(center);

// remember that you can even downscale more in here
const int downSample = 24;// 16-24-32
rt = RenderTexture::create(winSize.width / downSample, winSize.height / downSample);
rt->retain();
rt->setKeepMatrix(true);

rt->beginWithClear(0, 0, 0, 0);
spr->visit();
rt->end();
_director->getRenderer()->render();

auto img = rt->newImage();
// stackblur now will work on a very downscaled/small texture width and height which will make it much faster!!
stackblur(img->getData(), img->getWidth(), img->getHeight(), 5);

auto tx = new Texture2D;
tx->initWithImage(img);
img->release();

sp2 = Sprite::createWithTexture(tx);
sp2->setPosition(center);
this->addChild(sp2);

sp2->setScale(spr->getScale() * downSample);

i think now you can even use it in an update loop with a combination of Texture2d::updateWithData to update the texture instead of creating a new one each time

Results:

first this is how the image looks like when downscaled 24 times:

blurred and upscaled 24 times:

Update:

this is how you should use stackblur combined with Texture2d::updateWithData in an update loop

const int downSample = 24;// 16-24-32

spr = Sprite::create("badland.png");
spr->retain();
spr->setPosition(center);

rt = RenderTexture::create(winSize.width / downSample, winSize.height / downSample);
rt->retain();
rt->setKeepMatrix(true);

rt->beginWithClear(0, 0, 0, 0);
spr->visit();
rt->end();
_director->getRenderer()->render();

auto img = rt->newImage();
auto tx = new Texture2D;
tx->initWithImage(img);
img->release();

sp2 = Sprite::createWithTexture(tx);
sp2->setPosition(center);
this->addChild(sp2);

sp2->setScale(rt->getSprite()->getScale() * downSample);

void GameScene::update(float dt) {
	
	rt->beginWithClear(0, 0, 0, 0);
	spr->visit();
	rt->end();
	_director->getRenderer()->render();
	
	auto img = rt->newImage();
	stackblur(img->getData(), img->getWidth(), img->getHeight(), 5);

	sp2->getTexture()->updateWithData(img->getData(), 0, 0, img->getWidth(), img->getHeight()); 
	img->release();
}

it runs at 60fps in case you were wondering!!

2 Likes