GPU programming notes

I’ve been writing an OpenGL application using OpenGL Shading Language (GLSL) shaders, framebuffer objects (FBOs), and draw buffers (DBs). I’ve come across a few gotchas along the way and made a few notes.

Follow the jump to find out more about how to get started, what to look out for, and a couple of the hacks I’ve used to make my shaders do what I want them to do. (Specifically, I describe how to clear colour attachments to different values and how to store tags in a floating point channel.)

The basics

First some sparse notes on the basics for people new to FBOs and DBs:

  • The FBOs extension enables application-created FBOs that can be used rather than window-system created FBOs.
  • Texture objects or render buffer objects (new) can be attached to (and detached from) application-created FBOs and rendered into.
  • FBOs are defined as a collection of 0-N colour attachments, an optional depth attachment, and and optional stencil attachment.
  • The DBs extension is used in conjunction with fragment shaders (or programs) to draw to the extra colour attachments.
  • To draw to colour attachment N, a fragment shader simply writes to gl_FragData[N] rather than to gl_FragColor.

Things to look out for

Now a few gotchas to look out for when you get going:

  • Color attachments are just that: colours. You can’t use textures which store luminance (GL_LUMINANCE) or depth values (GL_DEPTH_COMPONENT).
  • Moreover, all colour attachments must be the same format.
  • There appears to be no method to define seperate clear colours for each colour attachments.

These things seems fairly small, but they can really affect the “elegance” of your program. For example, you may want to write colours to one buffer and surface normals to another. The standard output for a colour image is 24 or 32 bit (GL_RGB or GL_RGBA), but this is too limited for normals (which probably require at least 16 bits each for i, j, and k). Since you can’t have a 24-bit colour buffer and a 48-bit normal buffer, you will probably need to compromise and store colours with 16 bits per channel. More memory = not good. Things can get even worse if you only want to store very basic additional information (such as 8-bit tags).

I’ve devised a few work arounds to problems caused by these gotchas, and I’ll present two of them now.

Clearing colour attachments independently

I’ve used a shader pair to do this. The vertex shader just passes the vertices straight through (without transforming them into eye-space). I’m not sure if this provides any speedup, but it seems nicer to me. The fragment shader writes programmer defined clear values to each buffer. I’m also not sure if it’s worth clearing depth values in the fragment shader (glClear() is probably faster, but I’m processing fragments anyhow).

Vertex shader:

void main()
{
  gl_Position = gl_Vertex;
};

Fragment shader:

uniform vec4 clear_data_0;
uniform vec4 clear_data_1;
uniform float clear_depth;

void main()
{
  gl_FragData[0] = clear_data_0;
  gl_FragData[1] = clear_data_1;
  gl_FragDepth = clear_depth;
}

Invoking the program (which includes both shaders):


glUseProgramObjectARB(clear_pg);

glUniform4fARB(glGetUniformLocationARB(clear_pg, "clear_data_0"), 0.0f, 0.0f, 0.0f, 0.0f);
glUniform4fARB(glGetUniformLocationARB(clear_pg, "clear_data_1"), 1.0f, 1.0f, 1.0f, 1.0f);
glUniform1iARB(glGetUniformLocationARB(clear_pg, "clear_depth"), 1.0f);

glDepthFunc(GL_ALWAYS);

glRectf(-1.0, -1.0, 1.0, 1.0); //fullscreen quad

I also noticed that the glClearColor() and glClear() functions affect all attached colour buffers (I’m not sure if this is consistent or portable). However, I think that you can use these functions safely if you attach and detach buffers accordingly:

//bind framebuffer
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fb);

//clear first color attachment to black
glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);
glClearColor(0.0, 0.0, 0.0, 0.0);
glClearBuffer(GL_COLOR_BUFFER_BIT);

//clear second color attachment to white
glDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);
glClearColor(1.0, 1.0, 1.0, 1.0);
glClearBuffer(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

//attach both (cleared) color buffers for use
GLenum drawbuffers[2] = {GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT};
glDrawBuffersARB(2, drawbuffers);

//use program that writes to glFragData[0] and glFragData[1]
glUseProgramObjectARB(pg);

//render stuff

I have no idea which of these is faster (or if there is any significant difference).

Storing a tag in a floating point channel

I needed to store some high-precision data so I decided to use 32-bit floating point buffers. However, I also needed a buffer to store unique identifers. The problem with using floats for identifiers is that you can’t expect true equality between numbers due to precision errors (so you always have to use an error margin). So the naive approach of incrementing a float counter by a small amount each time you need a new identifier probably won’t work (or will at least require a few abs(), a subtraction, and a less than).

My dodgy workaround was to use a 32-bit integer counter (to generate unique identifiers) and then treating the space it occupies in memory as a 32-bit float when I send it on to the graphics card. An unsigned integer might seem better, but I just happened to use an integer (I hardly need that many identifiers anyway). Here’s some code:

int id;

void
reset_id(void)
{
 id = 0;
}

void
load_new_id()
{
 glUniform1fvARB(glGetUniformLocationARB(tag_pg, "id"), 1, (GLfloat*)&id);
 id++;
}

In the fragment shader, tags are compared using the == operator and everything appears to work fine…

However, there’s something about this that seems very wrong to me…