{"id":472,"date":"2006-05-31T20:42:19","date_gmt":"2006-05-31T12:42:19","guid":{"rendered":""},"modified":"2010-06-25T11:55:03","modified_gmt":"2010-06-25T03:55:03","slug":"gpu_programming_notes","status":"publish","type":"post","link":"https:\/\/nick.onetwenty.org\/index.php\/2006\/05\/31\/gpu_programming_notes\/","title":{"rendered":"GPU programming notes"},"content":{"rendered":"<p>I&#8217;ve been writing an OpenGL application using <a href=\"http:\/\/www.opengl.org\/documentation\/glsl\/\">OpenGL Shading Language<\/a> (GLSL) shaders, <a href=\"http:\/\/oss.sgi.com\/projects\/ogl-sample\/registry\/EXT\/framebuffer_object.txt\">framebuffer objects<\/a> (FBOs), and <a href=\"http:\/\/oss.sgi.com\/projects\/ogl-sample\/registry\/ARB\/draw_buffers.txt\">draw buffers<\/a> (DBs). I&#8217;ve come across a few gotchas along the way and made a few notes.<\/p>\n<p>Follow the jump to find out more about how to get started, what to look out for, and a couple of the hacks I&#8217;ve used to make my shaders do what I want them to do. (Specifically, I describe how to clear colour attachments to different values and how to store tags in a floating point channel.)<\/p>\n<p><!--more--><\/p>\n<p><strong>The basics<\/strong><\/p>\n<p>First some sparse notes on the basics for people new to FBOs and DBs:<\/p>\n<ul>\n<li>The FBOs extension enables application-created FBOs that can be used rather than window-system created FBOs.<\/li>\n<li>Texture objects or render buffer objects (new) can be attached to (and detached from) application-created FBOs and rendered into.<\/li>\n<li>FBOs are defined as a collection of 0-N colour attachments, an optional depth attachment, and and optional stencil attachment.<\/li>\n<li>The DBs extension is used in conjunction with fragment shaders (or programs) to draw to the extra colour attachments.<\/li>\n<li>To draw to colour attachment N, a fragment shader simply writes to gl_FragData[N] rather than to gl_FragColor.<\/li>\n<\/ul>\n<p><strong>Things to look out for<\/strong><\/p>\n<p>Now a few gotchas to look out for when you get going:<\/p>\n<ul>\n<li>Color attachments are just that: colours. You can&#8217;t use textures which store luminance (GL_LUMINANCE) or depth values (GL_DEPTH_COMPONENT).<\/li>\n<li>Moreover, <em>all<\/em> colour attachments must be the same format.<\/li>\n<li>There appears to be no method to define seperate clear colours for each colour attachments.<\/li>\n<\/ul>\n<p>These things seems fairly small, but they can really affect the &#8220;elegance&#8221; of your program. For example, you may want to write colours to one buffer and surface normals to another. The standard output for a colour image is 24 or 32 bit (GL_RGB or GL_RGBA), but this is too limited for normals (which probably require at least 16 bits each for i, j, and k). Since you can&#8217;t have a 24-bit colour buffer and a 48-bit normal buffer, you will probably need to compromise and store colours with 16 bits per channel. More memory = not good. Things can get even worse if you only want to store very basic additional information (such as 8-bit tags).<\/p>\n<p>I&#8217;ve devised a few work arounds to problems caused by these gotchas, and I&#8217;ll present two of them now.<\/p>\n<p><strong>Clearing colour attachments independently<\/strong><\/p>\n<p>I&#8217;ve used a shader pair to do this. The vertex shader just passes the vertices straight through (without transforming them into eye-space). I&#8217;m not sure if this provides any speedup, but it seems nicer to me. The fragment shader writes programmer defined clear values to each buffer. I&#8217;m also not sure if it&#8217;s worth clearing depth values in the fragment shader (glClear() is probably faster, but I&#8217;m processing fragments anyhow).<\/p>\n<p>Vertex shader:<\/p>\n<blockquote>\n<pre><code>void main()\r\n{\r\n  gl_Position = gl_Vertex;\r\n};<\/code><\/pre>\n<\/blockquote>\n<p>Fragment shader:<\/p>\n<blockquote>\n<pre><code>uniform vec4 clear_data_0;\r\nuniform vec4 clear_data_1;\r\nuniform float clear_depth;\r\n\r\nvoid main()\r\n{\r\n  gl_FragData[0] = clear_data_0;\r\n  gl_FragData[1] = clear_data_1;\r\n  gl_FragDepth = clear_depth;\r\n}<\/code><\/pre>\n<\/blockquote>\n<p>Invoking the program (which includes both shaders):<\/p>\n<blockquote>\n<pre><code>\r\nglUseProgramObjectARB(clear_pg);\r\n\r\nglUniform4fARB(glGetUniformLocationARB(clear_pg, \"clear_data_0\"), 0.0f, 0.0f, 0.0f, 0.0f);\r\nglUniform4fARB(glGetUniformLocationARB(clear_pg, \"clear_data_1\"), 1.0f, 1.0f, 1.0f, 1.0f);\r\nglUniform1iARB(glGetUniformLocationARB(clear_pg, \"clear_depth\"), 1.0f);\r\n\r\nglDepthFunc(GL_ALWAYS);\r\n\r\nglRectf(-1.0, -1.0, 1.0, 1.0); \/\/fullscreen quad<\/code><\/pre>\n<\/blockquote>\n<p>I also noticed that the glClearColor() and glClear() functions affect all attached colour buffers (I&#8217;m not sure if this is consistent or portable).  However, I think that you can use these functions safely if you attach and detach buffers accordingly:<\/p>\n<blockquote>\n<pre><code>\/\/bind framebuffer\r\nglBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fb);\r\n\r\n\/\/clear first color attachment to black\r\nglDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);\r\nglClearColor(0.0, 0.0, 0.0, 0.0);\r\nglClearBuffer(GL_COLOR_BUFFER_BIT);\r\n\r\n\/\/clear second color attachment to white\r\nglDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);\r\nglClearColor(1.0, 1.0, 1.0, 1.0);\r\nglClearBuffer(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);\r\n\r\n\/\/attach both (cleared) color buffers for use\r\nGLenum drawbuffers[2] = {GL_COLOR_ATTACHMENT0_EXT, GL_COLOR_ATTACHMENT1_EXT};\r\nglDrawBuffersARB(2, drawbuffers);\r\n\r\n\/\/use program that writes to glFragData[0] and glFragData[1]\r\nglUseProgramObjectARB(pg);\r\n\r\n\/\/render stuff<\/code><\/pre>\n<\/blockquote>\n<p>I have no idea which of these is faster (or if there is any significant difference).<\/p>\n<p><strong>Storing a tag in a floating point channel<\/strong><\/p>\n<p>I needed to store some high-precision data so I decided to use 32-bit floating point buffers. However, I also needed a buffer to store unique identifers. The problem with using floats for identifiers is that you can&#8217;t expect true equality between numbers due to precision errors (so you always have to use an error margin). So the naive approach of incrementing a float counter by a small amount each time you need a new identifier probably won&#8217;t work (or will at least require a few abs(), a subtraction, and a less than).<\/p>\n<p>My dodgy workaround was to use a 32-bit integer counter (to generate unique identifiers) and then treating the space it occupies in memory as a 32-bit float when I send it on to the graphics card. An unsigned integer might seem better, but I just happened to use an integer (I hardly need that many identifiers anyway). Here&#8217;s some code:<\/p>\n<blockquote>\n<pre><code>int id;\r\n\r\nvoid\r\nreset_id(void)\r\n{\r\n id = 0;\r\n}\r\n\r\nvoid\r\nload_new_id()\r\n{\r\n glUniform1fvARB(glGetUniformLocationARB(tag_pg, \"id\"), 1, (GLfloat*)&amp;id);\r\n id++;\r\n}<\/code><\/pre>\n<\/blockquote>\n<p>In the fragment shader, tags are compared using the == operator and everything appears to work fine&#8230;<\/p>\n<p>However, there&#8217;s something about this that seems very wrong to me&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been writing an OpenGL application using OpenGL Shading Language (GLSL) shaders, framebuffer objects (FBOs), and draw buffers (DBs). I&#8217;ve come across a few gotchas along the way and made a few notes. Follow the jump to find out more about how to get started, what to look out for, and a couple of the &hellip; <a href=\"https:\/\/nick.onetwenty.org\/index.php\/2006\/05\/31\/gpu_programming_notes\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;GPU programming notes&#8221;<\/span><\/a><\/p>\n","protected":false},"author":67,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[1],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paLsRH-7C","_links":{"self":[{"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/posts\/472"}],"collection":[{"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/users\/67"}],"replies":[{"embeddable":true,"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/comments?post=472"}],"version-history":[{"count":1,"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/posts\/472\/revisions"}],"predecessor-version":[{"id":3837,"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/posts\/472\/revisions\/3837"}],"wp:attachment":[{"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/media?parent=472"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/categories?post=472"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nick.onetwenty.org\/index.php\/wp-json\/wp\/v2\/tags?post=472"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}