Independent game developers - making Avoyd

Normal generation in the pixel shader

Doug Binks - 31 Jan 2015


As usual, whilst working on one aspect of Avoyd I hit a hurdle and decided to take a break by tweaking some visuals - specifically looking at the normals for my surfaces. I added a step to generate face normals in the pixel shader using the derivatives of world space position [see: Normals without normals by Angelo Pesce and the Volumes of Fun wiki on Computing Normals], and immediately noticed precision issues when close to the surface. I'll demonstrate the issue and my quick fix which uses eye relative position instead of world space, before explaining what's happening in full.

Precision issues with normals from position derivatives in the pixel shader. Figure 1: The image on the left shows the face normals calculated in the pixel shader using the world space position, and on the right we take the eye relative world space position.

Anyone who's spent a fair amount of time with floating point numbers will be familiar with precision issues, so I realised that since I was taking deltas of two values which were close together I would best be served by having those values be near the origin. Rather than use the world space position I should be using an eye relative position, i.e. I emit eye_pos - world_pos from the vertex shader and use that to generate my normals.

I did this, and made an image for a tweet, getting some replies asking for further details, so I'm writing this up rather than working on my entity collision detection.

So what's happening?

My original shaders (pseudo glsl) were:

// Vertex shader, much removed
layout(std140) uniform uboWanKenobi
{
	mat4 matModelToWorldToViewToProj;
	mat4 matModelToWorld;
};

in  vec3  in_pos;
out vec3  world_pos;

void main(void)
{
	vec4 pos = vec4(in_pos, 1.0);
	gl_Position = matModelToWorldToViewToProj * pos;
	world_pos = (matModelToWorld * pos ).xyz;
}

// fragment shader, much removed
in  vec3 world_pos;
out vec4 fragCol;

void main(void)
{
	vec3 dFdxPos = dFdx( world_pos );
	vec3 dFdyPos = dFdy( world_pos );
	vec3 facenormal = normalize( cross(dFdxPos,dFdyPos ));
	fragCol = vec4(facenormal*0.5 + 0.5,1.0);
}

This outputs the world space position to the pixel shader, which then calculates the screen space derivatives of that to get the face normal.

The 24 bits of mantissa in a 32 bit float has approximately 7 places of precision. It turns out that this isn't enough to represent dFdx( world_pos ) or dFdy( world_pos ) to a precision where the error won't show on an 8-bit monitor (~3 decimal places of precision) for the circumstances I took the screenshot in. For more on floating points do check out Bruce Dawson's blog posts about floating point issues and Tom Forsyth's post on precision.

In the screenshot I'm using a first person camera (as Avoyd is a First Person Editor), and am fairly close to the surface so the edge of the nearby two triangles is showing a distance in world space coordinates of about 0.2f across on screen. The camera is at a position in world space of around ( 300.0f, 300.0f, 300.0f ). With the image being around 1000 pixels tall, the distance between each pixel is about 0.0002f. Note that 32 bit floats can represent this number to high accuracy, but they can't represent the difference between 300.00000f and 300.00002f to very high accuracy since this difference is at the 7th decimal place when written in floating point format as 3.0000002 * 10^2 (this should be done in binary, which we do below).

In other words, when you're taking the gradient of a value emitted by the vertex shader you're not looking at a gradient derived by taking ( P1 - P0 )/ num_pixels with P0 and P1 being the world space positions at vertices 0 and 1. You're instead taking the difference between the interpolated position at neighbouring pixels in the quad being rasterized. Naturally I'm talking GPUs here so there's some divergence in how this is done with some implementations only calculating one gradient for the whole quad and others doing the per-pixel calculation. Recent GLSL additions allow you to select these coarse and fine derivatives.

The solution: use eye relative position

The solution is to move the absolute value of the quantity closer to the origin. If I'm comparing 0.20000f and 0.20002f then I have 3 more places of precision. You can do this by calculating world_pos - eye_pos. This works because things which are close take up more space than things which are far away, so you get the accuracy where you need it - close to your viewpoint. If you want even more accuracy, then calculate world_pos - eye_pos - eye_forwards*near_dist so that you get the full possible precision.

The corrected shaders become:

// Vertex shader, much removed
layout(std140) uniform uboWanKenobi
{
	mat4 matModelToWorldToViewToProj;
	mat4 matModelToWorld;
	vec3 eye_pos; // world space eye position
};

in  vec3  in_pos;
out vec3  eye_relative_pos;

void main(void)
{
	vec4 pos = vec4(in_pos, 1.0);
	gl_Position = matModelToWorldToViewToProj * pos;
	eye_relative_pos = (matModelToWorld * pos ).xyz - eye_pos;
}

// fragment shader, much removed
in  vec3 eye_relative_pos;
out vec4 fragCol;

void main(void)
{
	vec3 dFdxPos = dFdx( eye_relative_pos );
	vec3 dFdyPos = dFdy( eye_relative_pos );
	vec3 facenormal = normalize( cross(dFdxPos,dFdyPos ));
	fragCol = vec4(facenormal*0.5 + 0.5,1.0);
}

A visual explanation of what's happening

Armed with some knowledge, runtime reloading of shaders, Runtime Compiled C++ and Mikko Mononen's excellent NanoVG I bring you this image demonstrating the problem of floating point accuracy and per-pixel gradients:

Calculating the precision of using eye relative space versus world space for derivatives. Figure 2: Drawing a graph of calculating dFdx(Pos) across a Pos width of 0.2f along 1024 pixels (only graphing a few hundred), along with displaying the normal problem and its solution.

Here I'm displaying a similar view to that from before, with a graph displayed with NanoVG using the values calculated using:

// precision test...
float P = 0.2f; // set to 300.0f for
float P0 = 000.0f + P;
float P1 = 000.2f + P;
const int N = 1024;
float dFdxPos[N];
float dPA = (P1-P0)/(float)N;
for(int i = 0; i < N; ++i)
{
	float ti = (float)i/1024.0f;
	float tip1 = (float)(i+1)/1024.0f;
	float dt = tip1-ti;
	float Pi = P0+(P1-P0)*ti;
	float Pip1 = P0+(P1-P0)*tip1;
	dFdxPos[i] = Pip1 - Pi;
}

//then render graph of values with height 2.0f * dPA

Here I'm calculating dFdx( Pos ) by taking the finite difference between two values of Pos interpolated between P0 and P1 along 1024 points, and displaying them on a graph with height twice the calculated gradient from the positions at P0 and P1.

You can see that for the case where Pos is 300.0f, the graphed value of dFdx( Pos ) calculated per pixel jumps around the actual value.

Fixed point would help to some extent with these issues (much of the actual computation in the non programmable parts of the GPU is done with fixed point). With 32 bits of precision we get ~9 decimal places, so to get 3 places of accuracy for an 8-bit monitor over 1000 pixels spanning a world space of 0.2 we could have a distance of 10,000.2000 - i.e. about 10km in my mapping of 1.0 unit to a meter. This is less than the distances I need, so the fixed exponent would have to be varied for different draw calls to fit in the entire scene. For large scale scenes this type of solution is required anyway.

In conclusion

  • Use eye relative space where possible if you need floating point position values.

  • 32 bits don't float your boat when the ocean is large.

  • An Interpolation shader stage before the pixel shader would be useful (this is currently possible in a geometry shader but there are issues).

Afterward

You might ask me why I'm bothering to calculate face normals in the pixel shader. Well, I wanted to have non-smooth terrain since it's more readable given the particular voxel polygon generation of the scenery. I could do this by making face normals in my geometry, but this would lead to more vertices since I can't share them. I could generate them in a geometry shader, but the performance of this solution (mainly on Apple OS X) isn't suitable.


comments powered by Disqus
 › 2017
 › Speeding up Runtime Compiled C++ compile times in MSVC with d2cgsummary
 › Multiplayers toxic last hit kill and how to heal it
 › Avoyd Editor Prototype
 › 2016
 › Black triangles and Peter Highspot
 › Colour palettes and lighting
 › Concept art by Rebecca Michalak
 › 2015
 › Internals of a lightweight task scheduler
 › Implementing a lightweight task scheduler
 › Feral Vector
 ›› Normal generation in the pixel shader 
 › 2014
 › Python Google App Engine debugging with PyCharm CE
 › Lighting voxel octrees and procedural texturing
 › Patterns and spheres
 › Python Google App Engine debugging with PyTools
 › Interview
 › Domain masking using Google App Engine
 › Octree streaming - part 4
 › Black triangles and nervous_testpilot
 › Presskit for Google App Engine
 › Octree streaming - part 3
 › Octree streaming - part 2
 › Octree streaming
 › 2013
 › LAN discovery with multiple adapters
 › Playing with material worlds
 › Developer Diary archive
 › Website redesign
 › First Person Editor
 › First Avoyd tech update video
 › Implementing a static website in Google App Engine
 › Multiplayer editing
 › First screenshots
 › Thoughts on gameplay modes
 › Back in 1999
 › 2002
 › ECTS 2002
 › Avoyd Version 1.6.1 out
 › Avoyd Version 1.6 out
 › 2001
 › Biting the bullet
 › Avoyd version 1.5 out
 › Monday Mayhem
 › Avoyd version 1.5 alpha 1 out
 › Avoyd version 1.4 out
 › ECTS 2001
 › Fun with Greek letters
 › Closer just a little closer
 › Back already
 › Artificial Humanity
 › Products and promises
 › Ecommerce
 › Explosions galore
 › Spring fixes
 › Open source and ports to other operating systems
 › Avoyd LAN Demo Version 1.1 is out
 › Thanks for the support
 › Avoyd LAN Demo Ready
 › Game Tech
 › Speeding up Runtime Compiled C++ compile times in MSVC with d2cgsummary
 › Internals of a lightweight task scheduler
 › Implementing a lightweight task scheduler
 ›› Normal generation in the pixel shader 
 › Lighting voxel octrees and procedural texturing
 › Octree streaming - part 4
 › Octree streaming - part 3
 › Octree streaming - part 2
 › Octree streaming
 › LAN discovery with multiple adapters
 › enkiTS
 › Internals of a lightweight task scheduler
 › Implementing a lightweight task scheduler
 › RCC++
 › Speeding up Runtime Compiled C++ compile times in MSVC with d2cgsummary
 › Web Tech
 › Python Google App Engine debugging with PyCharm CE
 › Python Google App Engine debugging with PyTools
 › Domain masking using Google App Engine
 › Presskit for Google App Engine
 › Implementing a static website in Google App Engine
 › Avoyd
 › Multiplayers toxic last hit kill and how to heal it
 › Avoyd Editor Prototype
 › Black triangles and Peter Highspot
 › Colour palettes and lighting
 › Concept art by Rebecca Michalak
 › Feral Vector
 › Patterns and spheres
 › Interview
 › Black triangles and nervous_testpilot
 › Playing with material worlds
 › Website redesign
 › First Person Editor
 › First Avoyd tech update video
 › Multiplayer editing
 › First screenshots
 › Thoughts on gameplay modes
 › Back in 1999
 › Avoyd 1999
 › Developer Diary archive
 › Back in 1999
 › ECTS 2002
 › Avoyd Version 1.6.1 out
 › Avoyd Version 1.6 out
 › Biting the bullet
 › Avoyd version 1.5 out
 › Monday Mayhem
 › Avoyd version 1.5 alpha 1 out
 › Avoyd version 1.4 out
 › ECTS 2001
 › Fun with Greek letters
 › Closer just a little closer
 › Back already
 › Artificial Humanity
 › Products and promises
 › Ecommerce
 › Explosions galore
 › Spring fixes
 › Open source and ports to other operating systems
 › Avoyd LAN Demo Version 1.1 is out
 › Thanks for the support
 › Avoyd LAN Demo Ready