Yesterday I attempted to replace a custom glsl function with a more generic one, which uses a for loop based on a compile time range. Should be fine, the glsl compiler can safely unroll the loop and everything is well. Apparently not. The first version was unbearably slow:
const int blockSize1d = 2;Hmmm. Maybe the compiler doesn't like nested loops, even though a decent compiler should be able to figure out it can be unrolled completely. Let us remove one loop (nevermind the swizzling):
float sampleRecoTex_generic(vec2 pos, float[blockSize1d*blockSize1d] textureData) {
pos *= blockSize1d;
float value = 0;
for(int x = 0; x != blockSize1d; ++x) {
for(int y = 0; y != blockSize1d; ++y) {
vec2 center = vec2(x + 0.5, y + 0.5);
vec2 dist = abs(center - pos);
vec2 weight = 1 - min(dist, vec2(1,1));
value += weight.x * weight.y * textureData[y + x * blockSize1d];
}
}
return value;
}
float sampleRecoTex_generic(vec2 pos, float[blockSize1d*blockSize1d] textureData) {This ran at a decent framerate, but produced consistently incorrect results. After way too much manual tweaking, second opinions and more wasted time, I unrolled the loop myself. It ran fine. I should have suspected the compiler much earlier ... I double checked the issue by booting into Windows (and struggling through the usual half hour of updates), where a recent AMD driver was installed. All fine. Reboot into Ubuntu, removed the repository driver (for reference, fglrx-amdcccle-updates:amd64 (9.000-0ubuntu3), fglrx-updates:amd64 (9.000-0ubuntu3), fglrx-updates-dev:amd64 (9.000-0ubuntu3)), installed the latest non beta driver from amd.com and everything ran fine!
pos.xy = pos.yx;
pos *= blockSize1d;
pos = clamp(pos, vec2(0.5, 0.5), vec2(1.5, 1.5));
float value = 0;
for(int texIndex = 0; texIndex != blockSize1d*blockSize1d; ++texIndex) {
int x = texIndex % blockSize1d;
int y = texIndex / blockSize1d;
vec2 center = vec2(x + 0.5, y + 0.5);
vec2 dist = abs(center - pos);
vec2 weight = 1 - min(dist, vec2(1,1));
value += weight.x * weight.y * textureData[texIndex];
}
return value;
}
No comments:
Post a Comment