OpenGL SuperBible 6th Edition

This book is designed both for people who are learning computer graphics through OpenGL and for people who may already know about graphics but want to learn about OpenGL.


Graham Sellers, Richard S. Wright, Jr, Nicholas Haemel


853 Pages

44138 Reads

55 Downloads

English

PDF Format

87.7 MB

Game Development

Download PDF format


  • Graham Sellers, Richard S. Wright, Jr, Nicholas Haemel   
  • 853 Pages   
  • 19 Feb 2015
  • Page - 1

    read more..

  • Page - 2

    read more..

  • Page - 3

    read more..

  • Page - 4

    read more..

  • Page - 5

    read more..

  • Page - 6

    read more..

  • Page - 7

    read more..

  • Page - 8

    read more..

  • Page - 9

    read more..

  • Page - 10

    read more..

  • Page - 11

    read more..

  • Page - 12

    read more..

  • Page - 13

    read more..

  • Page - 14

    read more..

  • Page - 15

    read more..

  • Page - 16

    read more..

  • Page - 17

    read more..

  • Page - 18

    read more..

  • Page - 19

    read more..

  • Page - 20

    read more..

  • Page - 21

    read more..

  • Page - 22

    read more..

  • Page - 23

    read more..

  • Page - 24

    read more..

  • Page - 25

    read more..

  • Page - 26

    read more..

  • Page - 27

    read more..

  • Page - 28

    read more..

  • Page - 29

    read more..

  • Page - 30

    Listing 8.33 Drawing a face normal in the geometry shader .... 327 Listing 8.34 Geometry shader for rendering quads ......... 335 Listing 8.35 Fragment shader for rendering quads .......... 336 Listing 8.36 Rendering to multiple viewports in a geometry shader ............................ 338 Listing 9.1 Setting up scissor rectangle arrays ............ 346 Listing 9.2 Example stencil buffer usage, read more..

  • Page - 31

    Listing 9.32 Fragment shader for the star field effect ........ 423 Listing 9.33 Fragment shader for generating shaped points .... 425 Listing 9.34 Naïve rotated point sprite fragment shader ...... 427 Listing 9.35 Rotated point sprite vertex shader ........... 427 Listing 9.36 Rotated point sprite fragment shader .......... 427 Listing 9.37 Taking a screenshot with glReadPixels() ....... 430 read more..

  • Page - 32

    Listing 12.5 Blinn-Phong fragment shader .............. 514 Listing 12.6 Rim lighting shader function .............. 516 Listing 12.7 Vertex shader for normal mapping ........... 520 Listing 12.8 Fragment shader for normal mapping ......... 521 Listing 12.9 Spherical environment mapping vertex shader .... 523 Listing 12.10 Spherical environment mapping fragment shader . . 524 Listing 12.11 read more..

  • Page - 33

    Listing 14.1 Registering a window class ................ 628 Listing 14.2 Creating a simple window ................ 629 Listing 14.3 Declaration of PIXELFORMATDESCRIPTOR ........ 631 Listing 14.4 Choosing and setting a pixel format .......... 632 Listing 14.5 Windows main message loop .............. 633 Listing 14.6 Finding a pixel format with wglChoosePixelFormatARB() ............... 639 Listing 14.7 read more..

  • Page - 34

    Foreword OpenGL® SuperBible has long been an essential reference for 3D graphics developers, and this new edition is more relevant than ever, particularly given the increasing importance of multi-platform deployment. In our line of work, we spend a lot of time at the interface between high-level rendering algorithms and fast-moving GPU and API targets. Even though, between us, we read more..

  • Page - 35

    leadership. In fact, there has been so much progress in the past five years that OpenGL has reached a tipping point and is again viable for game development, particularly as more and more developers are adopting a multiplatform strategy that includes OS X and Linux. OpenGL even has advantages to developers primarily targeting Windows, allowing them to access the very latest GPU read more..

  • Page - 36

    Preface About This Book This book is designed both for people who are learning computer graphics through OpenGL and for people who may already know about graphics but want to learn about OpenGL. The intended audience is students of computer science, computer graphics, or game design; professional software engineers; or simply just hobbyists and people who are interested in learning read more..

  • Page - 37

    OpenGL on their own machines and using extensions (bonus features that add capabilities to OpenGL not required by the main specification). The Architecture of the Book This book breaks down roughly into three major parts. In the first part, we explain what OpenGL is, how it connects to the graphics pipeline, and give minimal working examples that are sufficient to demonstrate read more..

  • Page - 38

    • Chapter 3, “Following the Pipeline,” takes a more careful look at OpenGL and its various components, introducing each in a little more detail and adding to the simple example presented in the previous chapter. • Chapter 4, “Math for 3D Graphics,” introduces the foundations of math that will be essential for effective use of OpenGL and the creation of interesting 3D read more..

  • Page - 39

    applications that touch on multiple aspects of OpenGL. We also get into the practicalities of building larger OpenGL applications and deploying them across multiple platforms. In this part, you will find • Chapter 12, “Rendering Techniques,” covers several applications of OpenGL for graphics rendering, from simulation of light to artistic methods and even some non-traditional read more..

  • Page - 40

    In this edition, the printed copy of the OpenGL reference pages, or “man” pages, is gone. The reference pages are available online at http://www.opengl.org/sdk/docs/man4/ and as a live document are kept up to date. A printed copy of those pages is somewhat redundant and leads to errors — several were found in the reference pages after the fifth edition went to print with read more..

  • Page - 41

    where possible, we’ve made minor modifications to the sample applications to allow them to run on earlier versions of OpenGL. • There were several months between when this book’s text was finalized for printing and when the sample applications were packaged and posted to the Web. In that time, we discovered opportunities for improvement, whether that was uncovering new bugs, read more..

  • Page - 42

    Acknowledgments First and foremost, I would like to thank my wife, Chris, and my two wonderful kids, Jeremy and Emily. For the never-ending evenings, weekends, and holidays that I spent holed up in my office or curled up with a laptop instead of hanging out with you guys.... I appreciate your patience. I’d like to extend a huge thank you to our tech reviewers, Piers read more..

  • Page - 43

    move years later. For more than fifteen years, countless editors, reviewers, and publishers have made me look good and smarter than I am. There are too many to name, but I have to single out Debra Williams-Cauley for braving more than half this book’s lifetime, and, yes, thank you Laura Lewin for taking over for Debra.... You are a brave soul! Thanks to Full Sail read more..

  • Page - 44

    on track and making sure we get all the details right. Special thanks to Xi Chen at NVIDIA for all your help on Android sample code. And of course, I couldn’t have completed yet another project without the support of my family and friends. To my wife, Anna: You have put up with all of my techno mumbo jumbo all these years while at the same time saving lives and read more..

  • Page - 45

    This page intentionally left blank read more..

  • Page - 46

    About the Authors Graham Sellers is a classic geek. His family got their first computer (a BBC Model B) right before his sixth birthday. After his mum and dad stayed up all night programming it to play “Happy Birthday,” he was hooked and determined to figure out how it worked. Next came basic programming and then assembly language. His first real exposure to graphics read more..

  • Page - 47

    system simulator and their full-dome theater planetarium products, and works on their mobile products and scientific imaging applications. Previously with Real 3D/Lockheed Martin, Richard was a regular OpenGL ARB attendee and contributed to the OpenGL 1.2 specification and conformance tests back when mammoths still walked the earth. Since then, Richard has worked in multi-dimensional read more..

  • Page - 48

    control unit for robotic arms and other remotely programmable devices. Fast-forward twenty-five years and the devices being controlled are GPUs and SoCs smaller than the size of a fingernail but with more than eight billion transistors. Nick’s interests also extend to business leadership and management, bolstered by an MBA from the University of Wisconsin–Madison. Nick currently read more..

  • Page - 49

    This page intentionally left blank read more..

  • Page - 50

    Part I Foundations read more..

  • Page - 51

    This page intentionally left blank read more..

  • Page - 52

    Chapter 1 Introduction WHAT YOU’LL LEARN IN THIS CHAPTER • What the graphics pipeline is and how OpenGL relates to it • The origins of OpenGL and how it came to be the way that it is today • Some of the fundamental concepts that we’ll be building on throughout the book This book is about OpenGL. OpenGL is an interface that your application can use to access read more..

  • Page - 53

    OpenGL and the Graphics Pipeline Generating a product at high efficiency and volume generally requires two things: scalability and parallelism. In factories, this is achieved by using production lines. While one worker installs the engine in a car, another can be installing the doors and yet another can be installing the wheels. By overlapping the phases of production of the read more..

  • Page - 54

    resolution, processor architecture, installed operating system, and so on. On the other hand, the level of abstraction must be low enough that programmers can gain access to the underlying hardware and make best use of it. If OpenGL presented too high of an abstraction level, then it would be easy to create programs that fit the model, but very hard to use advanced features read more..

  • Page - 55

    Vertex fetch Vertex shader Tessellation Tessellation control shader Tessellation evaluation shader Geometry shader Fragment shader Rasterization Framebuffer operations Figure 1.1: Simplified graphics pipeline manufacturer would generally supply it as part of a driver, firmware, or other system software. The Origins and Evolution of OpenGL OpenGL has its origins at Silicon Graphics, Inc., (SGI) and their read more..

  • Page - 56

    That year, SGI was also instrumental in establishing the OpenGL Architectural Review Board (ARB), the original members of which included companies such as Compaq, DEC, IBM, Intel, and Microsoft. Soon, other companies such as Hewlett Packard, Sun Microsystems, Evans & Sutherland, and Intergraph joined the group. The OpenGL ARB is the standards body that designs, governs, and produces read more..

  • Page - 57

    Core Profile OpenGL Twenty years is a long time in the development of cutting edge technology. In 1992, the top-of-the-line Intel CPU was the 80486, math co-processors were still optional, and the Pentium had not yet been invented (or at least released). Apple computers were still using Motorola 68K derived processors, and the PowerPC processors to which they would later switch read more..

  • Page - 58

    invented, they were simply added to OpenGL, resulting in it having multiple ways of doing the same thing. For many years, the ARB held a strong position on backwards compatibility, as it still does today. However, this backwards compatibility comes at a significant cost. Best practices have changed — what may have worked well or was not really a significant bottleneck on read more..

  • Page - 59

    This book covers only the core profile of OpenGL, and this is the last time we will mention the compatibility profile. Primitives, Pipelines, and Pixels As discussed, the model followed by OpenGL is that of a production line, or pipeline. Data flow within this model is generally one way, with data formed from commands called by your programs entering the front of the read more..

  • Page - 60

    is essentially a vector representation into a large number of independent pixels. These are handed off to the back end, which includes depth and stencil testing, fragment shading, blending, and updating the output image. As you progress through this book, you will see how to tell OpenGL to start working for you. We’ll go over how to create buffers and textures and hook them read more..

  • Page - 61

    This page intentionally left blank read more..

  • Page - 62

    Chapter 2 Our First OpenGL Program WHAT YOU’LL LEARN IN THIS CHAPTER • How to create and compile shader code • How to draw with OpenGL • How to use the book’s application framework to initialize your programs and clean up after yourself In this chapter, we introduce the simple application framework that is used for almost all of the samples in this book. This shows read more..

  • Page - 63

    Creating a Simple Application To introduce the application framework that’ll be used in the remainder of this book, we’ll start with an extremely simple example application. The application framework is brought into your application by including sb6.h in your source code. This is a C++ header file that defines a namespace called sb6 that includes the declaration of an application read more..

  • Page - 64

    Figure 2.1: The output of our first OpenGL application The example shown in Listing 2.1 simply clears the whole screen to red. This introduces our first OpenGL function, glClearBufferfv() . The prototype of glClearBufferfv() is void glClearBufferfv(GLenum buffer, GLint drawBuffer, const GLfloat * value); All OpenGL functions start with gl and follow a number of naming conventions such as read more..

  • Page - 65

    fourth component that is associated with a color and is often used to encode the opacity of a fragment. When used this way, setting alpha to zero will make the fragment completely transparent, and setting it to one will make it completely opaque. The alpha value can also be stored in the output image and used in some parts of OpenGL’s calculations, even though you read more..

  • Page - 66

    inputs and outputs along the pipeline until pixels3 come out the end. In order to draw anything at all, you’ll need to write at least a couple of shaders. OpenGL shaders are written in a language called the OpenGL Shading Language, or GLSL. This is a language that has its origins in C, but has been modified over time to make it better suited to running on graphics read more..

  • Page - 67

    #version 430 core void main(void) { gl_Position = vec4(0.0, 0.0, 0.5, 1.0); } Listing 2.3: Our first vertex shader Next, our fragment shader is given in Listing 2.4. Again, this is extremely simple. It too starts with a #version 430 core declaration. Next, it declares color as an output variable using the out keyword. In fragment shaders, the value of output variables will be read more..

  • Page - 68

    "} \n" }; // Source code for fragment shader static const GLchar * fragment_shader_source[] = { "#version 430 core \n" " \n" "out vec4 color; \n" " \n" "void main(void) \n" "{ \n" " color = vec4(0.0, 0.8, 1.0, 1.0); \n" "} \n" }; // Create and compile vertex shader vertex_shader = glCreateShader(GL_VERTEX_SHADER); read more..

  • Page - 69

    • glLinkProgram() links all of the shader objects attached to a program object together. • glDeleteShader() deletes a shader object. Once a shader has been linked into a program object, the program contains the binary code and the shader is no longer needed. The shader source code from Listing 2.3 and Listing 2.4 is included in our program as constant strings that are passed read more..

  • Page - 70

    The vertex array object maintains all of the state related to the input to the OpenGL pipeline. We will add calls to glGenVertexArrays() and glBindVertexArray() to our startup() function. In Listing 2.6, we have overridden the startup() member function of the sb6::application class and put our own initialization code in it. Again, as with render() , the startup() function is defined as read more..

  • Page - 71

    // Our rendering function void render(double currentTime) { const GLfloat color[]={(float)sin(currentTime) * 0.5f+0.5f, (float)cos(currentTime) * 0.5f+0.5f, 0.0f, 1.0f }; glClearBufferfv(GL_COLOR, 0, color); // Use the program object we created earlier for rendering glUseProgram(rendering_program); // Draw one point glDrawArrays(GL_POINTS, 0, 1); } Listing 2.7: Rendering a single point The glDrawArrays() read more..

  • Page - 72

    Figure 2.2: Rendering our first point Figure 2.3: Making our first point bigger implementation defined, but OpenGL guarantees that it’s at least 64 pixels. By adding the following line glPointSize(40.0f); to our rendering function in Listing 2.7, we set the diameter of points to 40 pixels, and are presented with the image in Figure 2.3. Using Shaders 23 read more..

  • Page - 73

    Drawing Our First Triangle Drawing a single point is not really that impressive (even if it is really big!) — we already mentioned that OpenGL supports many different primitive types, and that the most important are points, lines, and triangles. In our toy example, we draw a single point by passing the token GL_POINTS to the glDrawArrays() function. What we really want to do read more..

  • Page - 74

    array vertices form a triangle, and if we modify our rendering function to pass GL_TRIANGLES to glDrawArrays() instead of GL_POINTS , as shown in Listing 2.9, then we obtain the image shown in Figure 2.4. // Our rendering function void render(double currentTime) { const GLfloat color[] = { 0.0f, 0.2f, 0.0f, 1.0f }; glClearBufferfv(GL_COLOR, 0, color); // Use the program object we read more..

  • Page - 75

    In this chapter, you have been briefly introduced to the sb6 application framework, compiled a shader, cleared the window, and drawn points and triangles. You have seen how to change the size of points using the glPointSize() function and have seen your first drawing command — glDrawArrays() . 26 Chapter 2: Our First OpenGL Program read more..

  • Page - 76

    Chapter 3 Following the Pipeline WHAT YOU’LL LEARN IN THIS CHAPTER • What each of the stages in the OpenGL pipeline does • How to connect your shaders to the fixed-function pipeline stages • How to create a program that uses every stage of the graphics pipeline simultaneously In this chapter, we will walk all the way along the OpenGL pipeline from start to finish, read more..

  • Page - 77

    Passing Data to the Vertex Shader The vertex shader is the first programmable stage in the OpenGL pipeline and has the distinction of being the only mandatory stage in the pipeline. However, before the vertex shader runs, a fixed-function stage known as vertex fetching, or sometimes vertex pulling, is run. This automatically provides inputs to the vertex shader. Vertex Attributes read more..

  • Page - 78

    functions, glVertexAttrib*(). The prototype for glVertexAttrib4fv() , which we use in this example, is void glVertexAttrib4fv(GLuint index, const GLfloat * v); Here, the parameter index is used to reference the attribute and v is a pointer to the new data to put into the attribute. You may have noticed the layout (location = 0) code in the declaration of the offset attribute. This read more..

  • Page - 79

    shader using the out keyword. However, it’s also possible to send your own data from shader stage to shader stage using the same in and out keywords. Just as you used the out keyword in the fragment shader to create the output variable that it writes its color values to, you can create an output variable in the vertex shader by using the out keyword as well. Anything read more..

  • Page - 80

    #version 430 core // Input from the vertex shader in vec4 vs_color; // Output to the framebuffer out vec4 color; void main(void) { // Simply assign the color we were given by the vertex shader // to our output color = vs_color; } Listing 3.4: Fragment shader with an input Interface Blocks Declaring interface variables one at a time is possibly the simplest way to communicate read more..

  • Page - 81

    Note that the interface block in Listing 3.5 has both a block name (VS_OUT, upper case) and an instance name (vs_out, lower case). Interface blocks are matched between stages using the block name (VS_OUT in this case), but are referenced in shaders using the instance name. Thus, modifying our fragment shader to use an interface block gives the code shown in Listing 3.6. read more..

  • Page - 82

    Tessellation Control Shaders The first of the three tessellation phases is the tessellation control shader (sometimes known as simply the control shader, or abbreviated to TCS). This shader takes its input from the vertex shader and is primarily responsible for two things: the first being the determination of the level of tessellation that will be sent to the tessellation engine, read more..

  • Page - 83

    Listing 3.7 shows a simple tessellation control shader. It sets the number of output control points to three (the same as the default number of input control points) using the layout (vertices = 3) out; layout qualifier, copies its input to its output (using the built-in variables gl_in and gl_out ), and sets the inner and outer tessellation level to 5. The built-in input read more..

  • Page - 84

    vertex produced by the tessellator. When the tessellation levels are high, this means that the tessellation evaluation shader could run an extremely large number of times, and so you should be careful with complex evaluation shaders and high tessellation levels. Listing 3.8 shows a tessellation evaluation shader that accepts input vertices produced by the tessellator as a result of read more..

  • Page - 85

    modes will be explained shortly. mode says how we want our polygons to be rendered. As we want to render in wireframe mode (i.e., lines), we set this to GL_LINE . The result of rendering our one triangle example with tessellation enabled and the two shaders of Listing 3.7 and Listing 3.8 is shown in Figure 3.1. Figure 3.1: Our first tessellated triangle Geometry Shaders The read more..

  • Page - 86

    Another unique feature of geometry shaders is that they can change the primitive mode mid-pipeline. For example, they can take triangles as input and produce a bunch of points or lines as output, or even create triangles from independent points. An example geometry shader is shown in Listing 3.9. #version 430 core layout (triangles) in; layout (points, max_vertices =3) out; void read more..

  • Page - 87

    Figure 3.2: Tessellated triangle after adding a geometry shader Primitive Assembly, Clipping, and Rasterization After the front end of the pipeline has run (which includes vertex shading, tessellation, and geometry shading) comes a fixed-function part of the pipeline that performs a series of tasks that take the vertex representation of our scene and convert it into a series of read more..

  • Page - 88

    type and that the positions that we have produced by writing to it are all four-component vectors. This is what is known as a homogeneous coordinate. The homogeneous coordinate system is used in projective geometry as much of the math ends up simpler in homogeneous coordinate space than it does in a regular Cartesian space. Homogeneous coordinates have one more component than read more..

  • Page - 89

    device coordinates. However, the window that you’re drawing to has coordinates that start from (0, 0) at the bottom left and range to (w − 1,h − 1), where w and h are the width and height of the window in pixels, respectively. In order to place your geometry into the window, OpenGL applies the viewport transform, which applies a scale and offset to the vertices’ read more..

  • Page - 90

    reversed by calling glFrontFace() with either dir set to either GL_CW or GL_CCW (where CW and CCW stand for clockwise and counter clockwise, respectively). This is known as the winding order of the triangle, and the clockwise or counterclockwise terms refer to the order in which the vertices appear in window space. By default, this state is set to GL_CCW , indicating that triangles read more..

  • Page - 91

    for doing this, but most OpenGL systems will settle on a half-space-based method for triangles as it lends itself well to parallel implementation. Essentially, OpenGL will determine a bounding box for the triangle in window coordinates and test every fragment inside it to determine whether it is inside or outside the triangle. To do this, it treats each of the triangle’s three read more..

  • Page - 92

    #version 430 core out vec4 color; void main(void) { color = vec4(sin(gl_FragCoord.x * 0.25) * 0.5 + 0.5, cos(gl_FragCoord.y * 0.25) * 0.5 + 0.5, sin(gl_FragCoord.x * 0.15) * cos(gl_FragCoord.y * 0.15), 1.0); } Listing 3.10: Deriving a fragment’s color from its position As you can see, the color of each pixel in Figure 3.4 is now a function of its position, and a simple read more..

  • Page - 93

    The inputs to the fragment shader are somewhat unlike inputs to other shader stages in that OpenGL interpolates their values across the primitive that’s being rendered. To demonstrate, we take the vertex shader of Listing 3.3 and modify it to assign a different, fixed color for each vertex, as shown in Listing 3.11. #version 430 core // "vs_color" is an output that read more..

  • Page - 94

    Figure 3.5: Result of Listing 3.12 Framebuffer Operations The framebuffer represents the last stage of the OpenGL graphics pipeline. It can represent the visible content of the screen and a number of additional regions of memory that are used to store per-pixel values other than color. On most platforms, this means the window you see on your desktop (or possibly the whole screen read more..

  • Page - 95

    whether it even belongs in the window. Each of these things may be turned on or off by your application. The first thing that could happen is the scissor test, which tests your fragment against a rectangle that you can define. If it’s inside the rectangle, then it’ll get processed further, and if it’s outside, it’ll get thrown away. Next comes the stencil test. read more..

  • Page - 96

    framebuffer and calculate new values that are written back to the framebuffer. If the framebuffer contains unnormalized integer values, then logical operations such as logical AND, OR, and XOR can be applied to the output of your shader and the value currently in the framebuffer to produce a new value that will be written back into the framebuffer. Compute Shaders The first read more..

  • Page - 97

    The shader in Listing 3.13 tells OpenGL that the size of the local workgroup is going to be 32 by 32 work items, but then proceeds to do nothing. In order to make a compute shader that actually does something useful, you’re going to need to know a bit more about OpenGL, and so we’ll revisit this later in the book. Summary In this chapter, you have taken a read more..

  • Page - 98

    Chapter 4 Math for 3D Graphics WHAT YOU’LL LEARN IN THIS CHAPTER • What a vector is, and why you should care • What a matrix is, and why you should care more • How we use matrices and vectors to move geometry around • The OpenGL conventions and coordinate spaces So far, you have learned to draw points, lines, and triangles and have written simple shaders that read more..

  • Page - 99

    Is This the Dreaded Math Chapter? In most books on 3D graphics programming, yes, this would be the dreaded math chapter. However, you can relax; we take a more moderate approach to these principles than some texts. One of the fundamental mathematical operations that will be performed by your shaders is the coordinate transform, which boils down to multiplying matrices with vectors read more..

  • Page - 100

    manipulation yourself, it’s still a good idea to know what they are and how to apply them. See — you can eat your cake and have it too! A Crash Course in 3D Graphics Math There are a good many books on the math behind 3D graphics, and a few of the better ones that we have found are listed in Appendix A, “Further Reading.” We do not pretend here that we read more..

  • Page - 101

    where we are going, for example, which way is the camera pointing, or in which direction do we want to move to get away from that crocodile! The vector is so fundamental to the operation of OpenGL that vectors of various sizes are first-class types in GLSL and are given names such as vec3 and vec4 (representing 3- and 4-element vectors, respectively). Y X Z (X, Y, Z) read more..

  • Page - 102

    your shaders. However, this is not so in languages like C++. To allow you to use them in your C++ programs, the vmath library that is provided with this book’s source code contains classes that can represent vectors and matrices that are named similarly to their GLSL counterparts: For instance, vmath::vec3 can represent a three-component floating-point vector (x, y , z ), and read more..

  • Page - 103

    We need to be careful here not to gloss over that fourth W component too much. Most of the time when you specify geometry with vertex positions, a three-component vertex is all you want to store and send to OpenGL. For many directional vectors, such as a surface normal (a vector pointing perpendicular to a surface that is used for lighting calculations), again, a read more..

  • Page - 104

    vectors, you’d need to take the inverse cosine (or arc-cosine) of this value. The dot product is used extensively during lighting calculations and is taken between a surface normal vector and a vector pointing toward a light source in diffuse lighting calculations. We will delve deeper into this type of shader code in the section “Lighting Models” in Chapter 12. Figure 4.2 read more..

  • Page - 105

    Cross Product Another useful mathematical operation between two vectors is the cross product, which is also sometimes known as the vector product. The cross product between two vectors is a third vector that is perpendicular to the plane in which the first two vectors lie. The cross product of two vectors v 1 and v 2 is defined as v 1 × v 2= v 1 v 2 sin (θ)n where read more..

  • Page - 106

    vec3 a(...); vec3 b(...); vec3c=a.cross(b); vec3 d = cross(a, b); Unlike the dot product, the order of the vectors is important. In Figure 4.3, v 3 is the result of v 2 cross v 1. If you were to reverse the order of v 1 and v 2, the resulting vector v 3 would point in the opposite direction. Applications of the cross product are numerous, from finding surface normals of read more..

  • Page - 107

    Rin Rreflect Rrefract,η1 Rrefract,η2 Rrefract,η3 Rrefract,η4 N Θ Θ Figure 4.4: Reflection and refraction Although Figure 4.4 shows the system in only two dimensions, we are interested in computing this in three dimensions (this is a 3D graphics book, after all). The math for calculating Rreflect is Rreflect = Rin − (2N · Rin )N and the math for calculating Rrefract for a given read more..

  • Page - 108

    z coordinates were as well. This kind of dependency between the variables and solution is just the sort of problem that matrices excel at. For fans of Matrix movies who have a mathematical inclination, the term matrix is indeed an appropriate title. Mathematically, a matrix is nothing more than a set of numbers arranged in uniform rows and columns — in programming terms, a read more..

  • Page - 109

    library highly portable and easy to understand. You’ll also find it has a very GLSL-like syntax. In your 3D programming tasks with OpenGL, you will use three sizes of matrix extensively; 2 × 2, 3 × 3, and 4 × 4. The vmath library has matrix data types that match those defined by GLSL, such as vmath::mat2 m1; vmath::mat3 m2; vmath::mat4 m3; As in GLSL, the matrix read more..

  • Page - 110

    Representing the above matrix in column-major order in memory produces an array as follows: static const float A[] = { A00, A01, A02, A03, A10, A11, A12, A13, A20, A21, A22, A23, A30, A31, A32, A33 }; Whereas representing it in row-major order would require a layout such as static const float A[] = { A00, A10, A20, A30, A01, A11, A21, A31, A20, A21, A22, A23, A30, A31, read more..

  • Page - 111

    0.0 0.0 0.0 1.0 β 0 β 1 α 0,0 α 1,0 α 2,0 α 0,1 α 1,1 α 2,1 α 0,2 α 1,2 α 2,2 β 2 Figure 4.5: A 4 × 4 matrix representing rotation and translation transformed to the new coordinate system. This means that any position in space and any desired orientation can be uniquely defined by a 4 × 4 matrix, and if you multiply all of an object’s vertices by this read more..

  • Page - 112

    respect to the last transformation performed. On the top of Figure 4.6, the square is rotated with respect to the origin first. On the bottom of Figure 4.6, after the square is translated, the rotation is performed around the newly translated origin. y x y1 x1 θ y x y1 x1 y x θ x1 x y x y x1 y1 x y x y1 θ x1 Figure 4.6: Modeling transformations: rotation then translation, read more..

  • Page - 113

    commonly used in OpenGL programming. Any number of geometric transformations can occur between the time you specify your vertices and the time they appear on the screen, but the most common are modeling, viewing, and projection. In this section, we examine each of the coordinate spaces commonly used in 3D computer graphics (and summarized in Table 4.1) and the transforms used to read more..

  • Page - 114

    model as rotating the object about that point would apply significant translation as well as rotation. World Coordinates The next common coordinate space is world space. This is where coordinates are stored relative to a fixed, global origin. To continue the spaceship analogy, this could be the center of a play-field or other fixed body such as a nearby planet. Once in world read more..

  • Page - 115

    When you draw in 3D with OpenGL, you use the Cartesian coordinate system. In the absence of any transformations, the system in use is identical to the eye coordinate system just described. Clip and Normalized Device Space Clip space is the coordinate space in which OpenGL performs clipping. When your vertex shader writes to gl_Position , this coordinate is considered to be in read more..

  • Page - 116

    (a) (b) (c) Figure 4.8: The modeling transformations identity matrix. As shown below, the identity matrix contains all zeros except a series of ones that traverse the matrix diagonally. The 4 × 4 identity matrix looks like this: ⎡ ⎢ ⎢ ⎣ 1.00.00.00.0 0.01.00.00.0 0.00.01.00.0 0.00.00.01.0 ⎤ ⎥ ⎥ ⎦ Multiplying a vertex by the identity matrix is equivalent to multiplying it by read more..

  • Page - 117

    Obviously, identity matrices for 2 × 2 matrices, 3 × 3 matrices, and matrices of other dimensions exist and simply have ones in their diagonal as you can see above. All identity matrices are square. There are no non-square identity matrices. Any identity matrix is its own transpose. You can make an identity matrix for OpenGL in C++ code like this: // Using a raw array: read more..

  • Page - 118

    +y +x +z +10 Figure 4.9: A cube translated ten units in the positive y direction of the reasons why we need to use four-dimensional homogeneous coordinates to represent positions in 3D graphics. Consider the position vector v , whose w component is 1.0. Multiplying by a translation matrix of the form above yields ⎡ ⎢ ⎢ ⎢ ⎣ 1.00.00.0 tx 0.01.00.0 ty 0.00.01.0 tz 0.00.00.01.0 read more..

  • Page - 119

    construct a 4 × 4 translation matrix for you from either three separate components or from a 3D vector: template <typename T> static inline Tmat4<T> translate(T x, T y, T z) { ... } template <typename T> static inline Tmat4<T> translate(const vecN<T,3>& v) { ... } Rotation Matrices To rotate an object about one of the three coordinate axes, or indeed read more..

  • Page - 120

    You can also perform a rotation around an arbitrary axis by specifying x , y , and z values for that vector. To see the axis of rotation, you can just draw a line from the origin to the point represented by (x,y,z). The vmath library also includes code to produce this matrix from an angle-axis representation: template <typename T> static inline Tmat4<T> rotate(T read more..

  • Page - 121

    Notice in this example the use of degrees. This function internally converts degrees to radians because unlike computers, many programmers prefer to think in terms of degrees. Euler Angles Euler angles are a set of three angles3 that represent orientation in space. Each angle represents a rotation around one of three orthogonal vectors that define our frame (for example, the x , read more..

  • Page - 122

    the vertices along the three axes by the factors specified. A scaling matrix has the form ⎡ ⎢ ⎢ ⎣ sx 0.00.00.0 0.0 sy 0.00.0 0.00.0 sz 0.0 0.00.00.01.0 ⎤ ⎥ ⎥ ⎦ Here, sx, sy, and sz represent the scaling factors in the x , y , and z dimensions, respectively. Creating a scaling matrix with the vmath library is similar to the method for creating a translation or read more..

  • Page - 123

    +y +x +z Figure 4.11: A non-uniform scaling of a cube transformations in reverse order. For example, consider the following code sequence: vmath::mat4 translation_matrix = vmath::translate(4.0f, 10.0f, -20.0f); vmath::mat4 rotation_matrix = vmath::rotate(45.0f, vmath::vec3(0.0f, 1.0f, 0.0f)); vmath::vec4 input_vertex = vmath::vec4(...); vmath::vec4 transformed_vertex = translation_matrix * rotation_matrix * read more..

  • Page - 124

    Here, composite_matrix is formed by multiplying the translation matrix by the rotation matrix, forming a composite that represents the rotation followed by the translation. This matrix can then be used to transform any number of vertices or other vectors. If you have a lot of vertices to transform, this can greatly speed up your calculation. Each vertex now takes only one read more..

  • Page - 125

    As with complex numbers, multiplication of quaternions is non-commutative. Addition and subtraction for quaternions is defined as simple vector addition and subtraction, with the terms being added or subtracted on a component-by-component basis. Other functions such as unary negation and magnitude also behave as expected for a four-component vector. Although a quaternion is a four-component read more..

  • Page - 126

    Because this transform takes vertices from model space (which is also sometimes known as object space) directly into view space and effectively bypasses world space, it is often referred to as the model-view transform and the matrix that encodes this transformation is known as the model-view matrix. The model transform essentially places objects into world space. Each object is likely read more..

  • Page - 127

    then point it in the right direction. In order to orient the camera correctly, you also need to know which way is up. Otherwise, the camera could spin around its forward axis, and even though it would still be technically be pointing in the right direction, this is almost certainly not what you want. So, given an origin, a point of interest, and a direction that we read more..

  • Page - 128

    Next, take the cross of f and u to construct a side vector s : s = f × u Now, construct a new up vector, u in our camera’s reference: u = s × f Finally, we construct a rotation matrix representing a reorientation into our newly constructed orthonormal basis: R = ⎡ ⎢ ⎢ ⎣ s.x u .x f.x 0.0 s.y u .y f.y 0.0 s.z u .z f.z 0.0 0.00.00.01.0 ⎤ ⎥ ⎥ ⎦ Right, we’re read more..

  • Page - 129

    volume and establishes clipping planes. The clipping planes are plane equations in 3D space that OpenGL uses to determine whether geometry can be seen by the viewer. More specifically, the projection transformation specifies how a finished scene (after all the modeling is done) is projected to the final image on the screen. You will learn more about two types of projections read more..

  • Page - 130

    Figure 4.12: A side-by-side example of an orthographic versus perspective projection Perspective Matrices Once your vertices are in view space, we need to get them into clip space, which we do by applying our projection matrix, which may represent a perspective or orthographic projection (or some other projection all together). A commonly used perspective matrix is called a frustum read more..

  • Page - 131

    simpler to specify, and only produces symmetric frustra. However, this is almost always what you’ll want. The vmath function to do this is vmath::perspective : static inline mat4 perspective(float fovy /* in degrees */, float aspect, float n, float f) { ... } Orthographic Matrices If you wish to use an orthographic projection for your scene, then you can construct a (somewhat read more..

  • Page - 132

    It is easy to see that when t is zero, P is equal to A , and when t is one, P is equal to A + B − A , which is simply B . Such a line is shown in Figure 4.13. A B P t Figure 4.13: Finding a point on a line If t lies between 0.0 and 1.0, then P is going to end up somewhere between A and B . Values of t outside this range will push P off the ends read more..

  • Page - 133

    A B C D E P t t t Figure 4.14: A simple Bézier curve The curve shown in Figure 4.14 has three control points, A , B , and C . A and C are the end points of the curve and B defines the shape of the curve. If we join points A and B with one line and points B and C together with another line, then we can interpolate along the two lines using a simple linear read more..

  • Page - 134

    You should recognize this as a quadratic equation in t . The curve that it describes is known as a quadratic Bézier curve. We can actually implement this very easily in GLSL using the mix function as all we’re doing is linearly interpolating (mixing) the results of two previous interpolations. vec4 quadratic_bezier(vec4 A, vec4 B, vec4 C, float t) { vec4 D = mix(A, B, t); read more..

  • Page - 135

    points H and I , between which we can interpolate to find our final point, P . Therefore, we have the equations shown below. E = A + t (B − A ) F = B + t (C − B ) G = C + t (D − C ) H = E + t (F − E ) I = F + t (G − F ) P = H + t (I − H ) If you think the equations above look familiar, you’d be right — our points E , F , and read more..

  • Page - 136

    Now that we see this pattern, we can take it further and produce even higher order curves. For example, a quintic Bézier curve (one with five control points) can be implemented as vec4 quintic_bezier(vec4 A, vec4 B, vec4 C, vec4 D, vec4 E, float t) { vec4 F = mix(A, B, t); //F=A+t(B-A) vec4 G = mix(B, C, t); //G=B+t(C-B) vec4 H = mix(C, D, t); //H=C+t(D-C) vec4 I = read more..

  • Page - 137

    A B C D E F G H I J Figure 4.16: A cubic Bézier spline Thus, the integer part of t determines the curve segment along which we are interpolating, and the fractional part of t is used to interpolate along that segment. Of course, we can scale t as we wish. For example, if we take a value between 0.0 and 1.0 and multiply it by the number of segments in the curve, we read more..

  • Page - 138

    If we use a spline to determine the position or orientation of an object, we will find that we must be very careful about our choice of control point locations in order to keep motion smooth and fluid. The rate of change in the value of our interpolated point P (i.e., its velocity) is the differential of the equation of the curve with respect to t . If this read more..

  • Page - 139

    simply a cspline. The cspline is an extremely useful tool for producing smooth and natural animations. Summary In this chapter, you learned some mathematical concepts crucial to using OpenGL for the creation of 3D scenes. Even if you can’t juggle matrices in your head, you now know what matrices are and how they are used to perform the various transformations. You also learned read more..

  • Page - 140

    Chapter 5 Data WHAT YOU’LL LEARN IN THIS CHAPTER • How to create buffers and textures that you can use to store data that your program can access • How to get OpenGL to supply the values of your vertex attributes automatically • How to access textures from your shaders for both reading and writing In the examples you’ve seen so far, we have either used hard-coded read more..

  • Page - 141

    Buffers In OpenGL, buffers are linear allocations of memory that can be used for a number of purposes. They are represented by names, which are essentially opaque handles that OpenGL uses to identify them. Before you can start using buffers, you have to ask OpenGL to reserve some names for you and then use them to allocate memory and put data into that memory. The memory read more..

  • Page - 142

    Table 5.1: Buffer Object Usage Models Buffer Usage Description GL_STREAM_DRAW Buffer contents will be set once by the application and used infrequently for drawing. GL_STREAM_READ Buffer contents will be set once as output from an OpenGL command and used infrequently for drawing. GL_STREAM_COPY Buffer contents will be set once as output from an OpenGL command and used infrequently for drawing read more..

  • Page - 143

    // The type used for names in OpenGL is GLuint GLuint buffer; // Generate a name for the buffer glGenBuffers(1, &buffer); // Now bind it to the context using the GL_ARRAY_BUFFER binding point glBindBuffer(GL_ARRAY_BUFFER, buffer); // Specify the amount of storage we want to use for the buffer glBufferData(GL_ARRAY_BUFFER, 1024 * 1024, NULL, GL_STATIC_DRAW); Listing 5.1: Generating, read more..

  • Page - 144

    Another method for getting data into a buffer object is to ask OpenGL for a pointer to the memory that the buffer object represents and then copy the data there yourself. Listing 5.3 shows how to do this using the glMapBuffer() function. // This is the data that we will place into the buffer object static const float data[] = { 0.25, -0.25, 0.5, 1.0, -0.25, -0.25, 0.5, read more..

  • Page - 145

    void glClearBufferSubData(GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const void * data); The glClearBufferSubData() function takes a pointer to a variable containing the values that you want to clear the buffer object to and, after converting it to the format specified in internalformat , replicates it across the range of the buffer’s read more..

  • Page - 146

    The readtarget and writetarget are the targets where the two buffers you want to copy data between are bound. These can be buffers bound to any of the available buffer binding points. However, since buffer binding points can only have one buffer bound at a time, you couldn’t copy between two buffers both bound to the GL_ARRAY_BUFFER target, for example. This means that when read more..

  • Page - 147

    fill it automatically using the data stored in a buffer object that we supply. To tell OpenGL where in the buffer object our data is, we use the glVertexAttribPointer() function2 to describe the data, and then enable automatic filling of the attribute by calling glEnableVertexAttribArray() . The prototypes of glVertexAttribPointer() and glEnableVertexAttribArray() are void read more..

  • Page - 148

    Finally, pointer is, despite its name, the offset into the buffer that is currently bound to GL_ARRAY_BUFFER where the vertex attribute’s data starts. An example showing how to use glVertexAttribPointer() to configure a vertex attribute is shown in Listing 5.4. Notice that we also call glEnableVertexAttribArray() after setting up the pointer. This tells OpenGL to use the data in the read more..

  • Page - 149

    If you are done using data from a buffer object to fill a vertex attribute, you can disable that attribute again with a call to glDisableVertexAttribArray() , whose prototype is void glDisableAttribArray(GLuint index); Once you have disabled the vertex attribute, it goes back to being static and passing the value you specify with glVertexAttrib*() to the shader. Using Multiple Vertex read more..

  • Page - 150

    When attributes are separate, that means that they are either located in different buffers, or at least at different locations in the same buffer. For example, if you want to feed data into two vertex attributes, you could create two buffer objects, bind the first to the GL_ARRAY_BUFFER target and call glVertexAttribPointer() , then bind the second buffer to the GL_ARRAY_BUFFER read more..

  • Page - 151

    Now we have two inputs to our vertex shader (position and color) interleaved together in a single structure. Clearly, if we make an array of these structures, we have an array-of-structures (AoS) layout for our data. To represent this with calls to glVertexAttribPointer() , we have to use its stride parameter. The stride parameter tells OpenGL how far apart in bytes the start read more..

  • Page - 152

    being either too simple or too over-engineered. Complete documentation for the format is contained in Appendix B. The sb6 framework also includes a loader for this model format, called sb6::object . To load an object file, create an instance of sb6::object , and call its load function as follows: sb6::object my_object; my_object.load("filename.sbm"); If successful, the model will be read more..

  • Page - 153

    shaders to manipulate vertex positions and other vectors. Any shader variable can be specified as a uniform, and uniforms can be in any of the shader stages (even though we only talk about vertex and fragment shaders in this chapter). Making a uniform is as simple as placing the keyword uniform at the beginning of the variable declaration: uniform float fTime; uniform int read more..

  • Page - 154

    to get the location of a uniform variable named vColorValue , we would do something like this: GLint iLocation = glGetUniformLocation(myProgram, "vColorValue"); In the previous example, passing "myUniform" to glGetUniformLocation() would result in the value 17 being returned. If you know a priori where your uniforms are because you assigned locations to them in your read more..

  • Page - 155

    For example, consider the following four variables declared in a shader: uniform float fTime; uniform int iIndex; uniform vec4 vColorValue; uniform bool bSomeFlag; To find and set these values in the shader, your C/C++ code might look something like this: GLint locTime, locIndex, locColor, locFlag; locTime = glGetUniformLocation(myShader, "fTime"); locIndex = glGetUniformLocation(myShader, read more..

  • Page - 156

    then in C/C++, you could represent this as an array of floats: GLfloat vColor[4] = { 1.0f, 1.0f, 1.0f, 1.0f }; But this is a single array of four values, so passing it into the shader would look like this: glUniform4fv(iColorLocation, 1, vColor); On the other hand, if you had an array of color values in your shader, uniform vec4 vColors[2]; then in C++, you could read more..

  • Page - 157

    already stored in column-major ordering (the way OpenGL prefers). Setting this value to GL_TRUE causes the matrix to be transposed when it is copied into the shader. This might be useful if you are using a matrix library that uses a row-major matrix layout instead (for example, some other graphics APIs use row-major ordering and you may wish to use a library designed for one read more..

  • Page - 158

    uniform TransformBlock { float scale; // Global scale to apply to everything vec3 translation; // Translation in X, Y, and Z float rotation[3]; // Rotation around X, Y, and Z axes mat4 projection_matrix; // A generalized projection matrix to apply // after scale and rotate } transform; Listing 5.9: Example uniform block declaration This code declares a uniform block whose name is read more..

  • Page - 159

    Another alternative is to let OpenGL decide where it would like the data. This can produce the most efficient shaders, but it means that your application needs to figure out where to put the data so that OpenGL can read it. Under this scheme, the data stored in uniform buffers is arranged in a shared layout. This is the default layout and is what you get if you read more..

  • Page - 160

    which is eight bytes long in memory, always starts on an eight-byte boundary. Three- and four-element vectors always start on a 4N-byte boundary; so vec3 and vec4 types start on 16-byte boundaries, for instance. Each member of an array of scalar or vector types (int sor vec3 s, for example) always start boundaries defined by these same rules, but rounded up to the alignment read more..

  • Page - 161

    If you really want to use the shared layout, you can determine the offsets that OpenGL assigned to your block members. Each member of a uniform block has an index that is used to refer to it to find its size and location within the block. To get the index of a member of a uniform block, call void glGetUniformIndices(GLuint program, GLsizei uniformCount, const GLchar ** read more..

  • Page - 162

    GL_UNIFORM_ARRAY_STRIDE , and GL_UNIFORM_MATRIX_STRIDE , respectively. Listing 5.13 shows what the code looks like. GLint uniformOffsets[4]; GLint arrayStrides[4]; GLint matrixStrides[4]; glGetActiveUniformsiv(program, 4, uniformIndices, GL_UNIFORM_OFFSET, uniformOffsets); glGetActiveUniformsiv(program, 4, uniformIndices, GL_UNIFORM_ARRAY_STRIDE, arrayStrides); glGetActiveUniformsiv(program, 4, uniformIndices, GL_UNIFORM_MATRIX_STRIDE, read more..

  • Page - 163

    element of our uniformIndices array. Listing 5.14 shows how to set the value of the single float. Table 5.3: Uniform Parameter Queries via glGetActiveUniformsiv() Value of pname What You Get Back GL_UNIFORM_TYPE The data type of the uniform as a GLenum . GL_UNIFORM_SIZE The size of arrays, in units of whatever GL_UNIFORM_TYPE gives you. If the uniform is not an array, this will always read more..

  • Page - 164

    Next, we can initialize data for TransformBlock.translation . This is a vec3 , which means it consists of three floating-point values packed tightly together in memory. To update this, all we need to do is find the location of the first element of the vector and store three consecutive floats in memory starting there. This is shown in Listing 5.15. // Put three consecutive read more..

  • Page - 165

    was used to initialize the vec3 TransformBlock.translation . This setup code is given in Listing 5.17. // The first column of TransformBlock.projection_matrix is at // uniformOffsets[3] bytes into the buffer. The columns are // spaced matrixStride[3] bytes apart and are essentially vec4s. // This is the source matrix - remember, it’s column major so const GLfloat matrix[] = { 1.0f, read more..

  • Page - 166

    GL_MAX_TESS_CONTROL_UNIFORM_BUFFERS , GL_MAX_TESS_EVALUATION_UNIFORM_BUFFERS ,or GL_MAX_FRAGMENT_UNIFORM_BUFFERS for the vertex, tessellation control and evaluation, geometry, and fragment shader limits, respectively. To find the index of a uniform block in a program, call GLuint glGetUniformBlockIndex(GLuint program, const GLchar * uniformBlockName); This returns the index of the named uniform block. In our read more..

  • Page - 167

    Notice that the binding layout qualifier can be specified at the same time as the std140 (or any other) qualifier. Assigning bindings in your shader source code avoids the need to call glUniformBlockBinding() , or even to determine the block’s index from your application, and so is usually the best method of assigning block location. Once you’ve assigned binding points to the read more..

  • Page - 168

    That doesn’t matter. There could be a buffer bound there, but the program doesn’t use it. The code to set this up is simple and is given in Listing 5.18. // Get the indices of the uniform blocks using glGetUniformBlockIndex GLuint harry_index = glGetUniformBlockIndex(program, "Harry"); GLuint bob_index = glGetUniformBlockIndex(program, "Bob"); GLuint susan_index = read more..

  • Page - 169

    in the shader can be useful for a number of reasons. First is that it reduces the number of calls to OpenGL that your application must make. Second, it allows the shader to associate a uniform block with a particular binding point without the application needing to know its name. This can be helpful if you have some data in a buffer with a standard layout, but want read more..

  • Page - 170

    multiple surfaces with different materials using a single drawing command. Using Uniforms to Transform Geometry Back in Chapter 4, “Math for 3D Graphics,” you learned how to construct matrices that represent several common transformations including scale, translation, and rotation, and how to use the sb6::vmath library to do the heavy lifting for you. You also saw how to multiply read more..

  • Page - 171

    0.25f, -0.25f, -0.25f, 0.25f, 0.25f, -0.25f, -0.25f, 0.25f, -0.25f, /* MORE DATA HERE */ -0.25f, 0.25f, -0.25f, 0.25f, 0.25f, -0.25f, 0.25f, 0.25f, 0.25f, 0.25f, 0.25f, 0.25f, -0.25f, 0.25f, 0.25f, -0.25f, 0.25f, -0.25f }; // Now generate some data and put it in a buffer object glGenBuffers(1, &buffer); glBindBuffer(GL_ARRAY_BUFFER, buffer); glBufferData(GL_ARRAY_BUFFER, sizeof(vertex_positions), vertex_positions, read more..

  • Page - 172

    to update the projection matrix is shown in Listing 5.22 and the remainder of the rendering loop is shown in Listing 5.23. void onResize(int w, int h) { sb6::application::onResize(w, h); aspect = (float)info.windowWidth / (float)info.windowHeight; proj_matrix = vmath::perspective(50.0f, aspect, 0.1f, 1000.0f); } Listing 5.22: Updating the projection matrix for the spinning cube // Clear the read more..

  • Page - 173

    #version 430 core out vec4 color; in VS_OUT { vec4 color; }fs_in; void main(void) { color = fs_in.color; } Listing 5.25: Spinning cube fragment shader A few frames of the resulting application are shown in Figure 5.2. Figure 5.2: A few frames from the spinning cube application Of course, now that we have our cube geometry in a buffer object and a model-view matrix in a uniform, read more..

  • Page - 174

    because we’re going to render many cubes in this example, we’ll need to clear the depth buffer before rendering the frame. Although not shown here, we also modified our startup function to enable depth testing and set the depth test function to GL_LEQUAL . The result of rendering with our modified program is shown in Figure 5.3. Figure 5.3: Many cubes! // Clear the read more..

  • Page - 175

    sinf(1.3f * f) * cosf(1.5f * f) * 2.0f); // Update the uniform glUniformMatrix4fv(mv_location, 1, GL_FALSE, mv_matrix); // Draw - notice that we haven’t updated the projection matrix glDrawArrays(GL_TRIANGLES, 0, 36); } Listing 5.26: Rendering loop for the spinning cube Shader Storage Blocks In addition to the read-only access to buffer objects that is provided by uniform blocks, buffer read more..

  • Page - 176

    layout (binding = 0, std430) buffer my_storage_block { vec4 foo; vec3 bar; int baz[24]; my_structure veggies; }; Listing 5.27: Example shader storage block declaration The members of a shader storage block can be referred to just as any other variable. To read from them, you could, for example use them as a parameter to a function, and to write into them you simply assign to them. read more..

  • Page - 177

    out VS_OUT { vec3 color; }vs_out; void main(void) { gl_Position = transform_matrix * vertices[gl_VertexID].position; vs_out.color = vertices[gl_VertexID].color; } Listing 5.28: Using a shader storage block in place of vertex attributes Although it may seem that shader storage blocks offer so many advantages that they almost make uniform blocks and vertex attributes redundant, you should be aware read more..

  • Page - 178

    To get around this problem, atomic operations cause the complete read-modify-write cycle to complete for one invocation before any other invocation gets a chance to even read from memory. In theory, if multiple shader invocations perform atomic operations on different memory locations, then everything should run nice and fast and work just as if you had written the naïve m=m+1; read more..

  • Page - 179

    Table 5.4: Atomic Operations on Shader Storage Blocks Atomic Function Behavior atomicAdd(mem, data) Reads from mem , adds it to data , writes the result back to mem , and then returns the value originally stored in mem . atomicAnd(mem, data) Reads from mem , logically ANDs it with data , writes the result back to mem , and then returns the value originally stored in mem . read more..

  • Page - 180

    • A Write-After-Write (WAW) hazard can occur when a program performs a write to the same memory location twice in a row. You might expect that whatever data was written last would overwrite the data written first and be the values that end up staying in memory. Again, on some architectures this is not guaranteed, and in some circumstances the first data written by the read more..

  • Page - 181

    that you can add together to be more precise about what you want to synchronize. A few examples are listed below: • Including GL_SHADER_STORAGE_BARRIER_BIT tells OpenGL that you want it to let any accesses (writes in particular) performed by shaders that are run before the barrier complete before letting any shaders access the data after the barrier. This means that if you read more..

  • Page - 182

    If you call memoryBarrier() from your shader code, any memory reads or writes that you might have performed will complete before the function returns. This means that it’s safe to go read data back that you might have just written. Without a barrier, it’s even possible that when you read from a memory location that you just wrote to that OpenGL will return old data to read more..

  • Page - 183

    bound to the buffer bound to atomic counter binding point 3, then we could write layout (binding =3, offset =8) uniform atomic_uint my_variable; In order to provide storage for the atomic counter, we can now bind a buffer object to the GL_ATOMIC_COUNTER_BUFFER indexed binding point. Listing 5.29 shows how to do this. // Generate a buffer name GLuint buf; glGenBuffers(1, &buf); // read more..

  • Page - 184

    Now that you have created a buffer and bound it to an atomic counter buffer target, and you declared an atomic counter uniform in your shader, you are ready to start counting things. First, to increment an atomic counter, call uint atomicCounterIncrement(atomic_uint c); This function reads the current value of the atomic counter, adds one to it, writes the new value back to read more..

  • Page - 185

    qualifier) and won’t write any data into the framebuffer. In fact, we’ll disable writing to the framebuffer while we run this shader. To turn off writing to the framebuffer, we can call glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE); To turn framebuffer writes back on again, we can call glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); Because atomic counters are stored in buffers, read more..

  • Page - 186

    Synchronizing Access to Atomic Counters Atomic counters represent locations in buffer objects. While shaders are executing, their values may well reside in special memory inside the graphics processor (which is what makes them faster than simple atomic memory operations on members of shader storage blocks, for example). However, when your shader is done executing, the values of the read more..

  • Page - 187

    similar to binding a buffer object to one of the buffer binding points. However, once you bind a texture name to a texture target, it takes the type of that target until it is destroyed. Creating and Initializing Textures The full creation of a texture involves generating a name and binding it to one of the texture targets, and then telling OpenGL what size image you read more..

  • Page - 188

    generate_texture(data, 256, 256); // Assume the texture is already bound to the GL_TEXTURE_2D target glTexSubImage2D(GL_TEXTURE_2D, // 2D texture 0, // Level 0 0, 0, // Offset 0, 0 256, 256, // 256 x 256 texels, replace entire image GL_RGBA, // Four channel data GL_FLOAT, // Floating-point data data); // Pointer to data // Free the memory we allocated before - OpenGL now has our data read more..

  • Page - 189

    The GL_TEXTURE_2D texture target is probably the one you will deal with the most. This is our standard, two-dimensional image that you imagine could be wrapped around objects. The GL_TEXTURE_1D and GL_TEXTURE_3D types allow you to create one-dimensional and three-dimensional textures, respectively. A 1D texture behaves just like a 2D texture with a height of 1, for the most part. A read more..

  • Page - 190

    Reading from Textures in Shaders Once you’ve created a texture object and placed some data in it, you can read that data in your shaders and use it to color fragments, for example. Textures are represented in shaders as sampler variables and are hooked up to the outside world by declaring uniforms with sampler types. Just as there can be textures with various read more..

  • Page - 191

    Figure 5.4: A simple textured triangle them. Table 5.6 lists the basic texture types and the sampler that should be used in shaders to access them. Table 5.6: Basic Texture Targets and Sampler Types Texture Target Sampler Type GL_TEXTURE_1D sampler1D GL_TEXTURE_2D sampler2D GL_TEXTURE_3D sampler3D GL_TEXTURE_RECTANGLE sampler2DRect GL_TEXTURE_1D_ARRAY sampler1DArray GL_TEXTURE_2D_ARRAY sampler2DArray read more..

  • Page - 192

    GL_TEXTURE_1D target and then use a sampler1D variable in your shader to read from it. Likewise, for 2D textures, you’d use GL_TEXTURE_2D and sampler2D , and for 3D textures, you’d use GL_TEXTURE_3D and sampler3D , and so on. The GLSL sampler types sampler1D , sampler2D , and so on represent floating-point data. It is also possible to store signed and unsigned integer data in read more..

  • Page - 193

    All of the texture functions return a four-component vector, regardless of whether that vector is floating point or integer, and independently from the format of the texture object bound to the texture unit referenced by the sampler variable. If you read from a texture that contains fewer than four channels, the default value of zero will be filled in for the green and blue read more..

  • Page - 194

    unsigned int arrayelements; unsigned int faces; unsigned int miplevels; unsigned int keypairbytes; }; Listing 5.36: The header of a .KTX file In this header, identifier contains a series of bytes that allow the application to verify that this is a legal .KTX file and endianness contains a known value that will be different depending on whether a little-endian or big-endian machine created read more..

  • Page - 195

    of the texture you passed in (or the one it generated for you), and if it fails for some reason, it will return zero. After the loader returns, it leaves the texture bound to the texture unit that was active when it was called. That means that you can call glActiveTexture() , then call sb6::ktx::file::load , and the texture will be left bound to your selected texture read more..

  • Page - 196

    A simple vertex shader that accepts a single texture coordinate and passes it through to the fragment shader is shown in Listing 5.38 with the corresponding fragment shader shown in Listing 5.39. #version 430 core uniform mat4 mv_matrix; uniform mat4 proj_matrix; layout (location = 0) in vec4 position; layout (location = 4) in vec2 tc; out VS_OUT { vec2 tc; }vs_out; void main(void) { read more..

  • Page - 197

    By passing a texture coordinate with each vertex, we can wrap a texture around an object. Texture coordinates can then be generated offline procedurally or assigned by hand by an artist using a modeling program and stored in an object file. If we load a simple checkerboard pattern into a texture and apply it to an object, we can see how the texture is wrapped around read more..

  • Page - 198

    void glSamplerParameteri(GLuint sampler, GLenum pname, GLint param); and void glSamplerParameterf(GLuint sampler, GLenum pname, GLfloat param); Notice that glSamplerParameteri() and glSamplerParameterf() both take the sampler object name as the first parameter. This means that you can directly modify a sampler object without binding it to a target first. You will need to bind a sampler object to read more..

  • Page - 199

    or void glTexParameteri(GLenum target, GLenum pname, GLint param); In these cases, the target parameter specifies the target to which the texture you want to access is bound, and pname and param have the same meanings as for glSamplerParameteri() and glSamplerParameterf() . Using Multiple Textures If you want to use multiple textures in a single shader, you will need to create multiple read more..

  • Page - 200

    Once you have bound multiple textures to your context, you need to make the sampler uniforms in your shaders refer to the different units. There are two ways to do this; the first is to use the glUniform1i() function to set the value of the sampler uniform directly from your application’s code. Because samplers are declared as uniforms in your shader code, you can call read more..

  • Page - 201

    that is not an integer, this function isn’t going to cut it. Here, we need a more flexible function, and that function is simply called texture() . Like texelFetch() , it has several overloaded prototypes: vec4 texture(sampler1D s, float P); vec4 texture(sampler2D s, vec2 P); ivec4 texture(isampler2D s, vec2 P); uvec4 texture(usampler3D s, vec3 P); As you might have noticed, unlike read more..

  • Page - 202

    average of the texels surrounding the texture coordinate (a linear interpolation). For this interpolated fragment to match the texel color exactly, the texture coordinate needs to fall directly in the center of the texel. Linear filtering is characterized by “fuzzy” graphics when a texture is stretched. This fuzziness, however, often lends a more realistic and less artificial look read more..

  • Page - 203

    texture space. This causes texturing performance to suffer greatly as the size of the texture increases and the sparsity of access becomes greater. The solution to both of these problems is to simply use a smaller texture map. However, this solution then creates a new problem: When near the same object, it must be rendered larger, and a small texture map will then be read more..

  • Page - 204

    Figure 5.8: A series of mipmapped images Mipmap Filtering Mipmapping adds a new twist to the two basic texture filtering modes GL_NEAREST and GL_LINEAR by giving four permutations for mipmapped filtering modes. They are listed in Table 5.7. Textures 155 read more..

  • Page - 205

    Table 5.7: Texture Filters, Including Mipmapped Filters Constant Description GL_NEAREST Perform nearest neighbor filtering on the base mip level. GL_LINEAR Perform linear filtering on the base mip level. GL_NEAREST_MIPMAP_NEAREST Select the nearest mip level, and perform nearest neighbor filtering. GL_NEAREST_MIPMAP_LINEAR Perform a linear interpolation between mip levels, and perform nearest neighbor read more..

  • Page - 206

    Using nearest as the mipmap selector (as in both examples in the preceding paragraph), however, can also leave an undesirable visual artifact. For oblique views, you can often see the transition from one mip level to another across a surface. It can be seen as a distortion line or a sharp transition from one level of detail to another. The GL_LINEAR_ MIPMAP_LINEAR and read more..

  • Page - 207

    make up the textures are stored in the .KTX files containing the texture data. The tunnel has a brick wall pattern with different materials on the floor and ceiling. The output from tunnel is shown in Figure 5.9 with the texture minification mode set to GL_LINEAR_MIPMAP_LINEAR .Asyou can see, the texture becomes blurrier as you get further down the tunnel. Figure 5.9: A tunnel read more..

  • Page - 208

    The GL_REPEAT wrap mode simply causes the texture to repeat in the direction in which the texture coordinate has exceeded 1.0. The texture repeats again for every integer texture coordinate. This mode is useful for applying a small tiled texture to large geometric surfaces. Well-done seamless textures can lend the appearance of a seemingly much larger texture, but at the cost of read more..

  • Page - 209

    GL_CLAMP_TO_EDGE mode is used. In this case, the bright band is continued to the top and right of the texture data. Figure 5.10: Example of texture coordinate wrapping modes The bottom right square is drawn using the GL_REPEAT mode, which wraps the texture over and over. As you can see, there are several copies of our arrow texture, and all the arrows are pointing in the read more..

  • Page - 210

    image, and with cube mapping, where each face of the cube map has its own image and even its own set of mip levels. With texture arrays, however, you can have a whole array of texture images bound to a single texture object and then index through them in the shader, thus greatly increasing the amount of texture data available to your application at any one time. Most read more..

  • Page - 211

    for (int i=0;i<100; i++) { glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 0, 0, 0, i, 256, 256, 1, GL_RGBA, GL_UNSIGNED_BYTE, image_data[i]); } Listing 5.40: Initializing an array texture Conveniently, the .KTX file format supports array textures, and so the book’s loader code can load them directly from disk. Simply use sb6::ktx::file::load to load an array texture from a file. To demonstrate texture read more..

  • Page - 212

    layout (std140) uniform droplets { droplet_t droplet[256]; }; void main(void) { const vec2[4] position = vec2[4](vec2(-0.5, -0.5), vec2( 0.5, -0.5), vec2(-0.5, 0.5), vec2( 0.5, 0.5)); vs_out.tc = position[gl_VertexID].xy + vec2(0.5); float co = cos(droplet[alien_index].orientation); float so = sin(droplet[alien_index].orientation); mat2 rot = mat2(vec2(co, so), vec2(-so, co)); vec2 pos = 0.25 * rot * read more..

  • Page - 213

    texture function as normal, but pass in a three-component texture coordinate. The first two components of this texture coordinate, the s and t components, are used as typical two-dimensional texture coordinates. The third component, the p element, is actually an integer index into the texture array. Recall we set this in the vertex shader, and it is going to vary from 0 to 63, read more..

  • Page - 214

    glVertexAttribI*() 1i variant of glVertexAttrib*() as we are using an integer input to our vertex shader. The final output of the alien rain sample program is shown in Figure 5.11. Figure 5.11: Output of the alien rain sample Writing to Textures in Shaders A texture object is a collection of images that, when the mipmap chain is included, support filtering, texture coordinate read more..

  • Page - 215

    Table 5.8: Image Types Image Type Description image1D 1D image image2D 2D image image3D 3D image imageCube Cube map image imageCubeArray Cube map array image imageRect Rectangle image image1DArray 1D array image image2DArray 2D array image imageBuffer Buffer image image2DMS 2D multi-sample image image2DMSArray 2D multi-sample array image Once you have an image variable, you can read from it using the read more..

  • Page - 216

    ivec4 imageLoad(readonly iimage2D image, ivec2 P); void imageStore(iimage2D image, ivec2 P, ivec4 data); uvec4 imageLoad(readonly uimage2D image, ivec2 P); void imageStore(uimage2D image, ivec2 P, uvec4 data); To bind a texture for load and store operations, you need to bind it to an image unit using the glBindImageTexture() function, whose prototype is void glBindImageTexture(GLuint unit, GLuint read more..

  • Page - 217

    Table 5.9: Image Data Format Classes Format Class GL_RGBA32F 4x32 GL_RGBA32I 4x32 GL_RGBA32UI 4x32 GL_RGBA16F 4x16 GL_RGBA16UI 4x16 GL_RGBA16I 4x16 GL_RGBA16_SNORM 4x16 GL_RGBA16 4x16 GL_RGBA8UI 4x8 GL_RGBA8I 4x8 GL_RGBA8_SNORM 4x8 GL_RGBA8 4x8 GL_R11F_G11F_B10F (a) GL_RGB10_A2UI (b) GL_RGB10_A2 (b) GL_RG32F 2x32 GL_RG32UI 2x32 GL_RG32I 2x32 GL_RG16F 2x16 GL_RG16UI 2x16 GL_RG16I 2x16 GL_RG16_SNORM 2x16 GL_RG16 2x16 GL_RG8UI 2x8 read more..

  • Page - 218

    Table 5.9: Continued Format Class GL_R16 1x16 GL_R8UI 1x8 GL_R8I 1x8 GL_R8 1x8 GL_R8_SNORM 1x8 Referring to Table 5.9, you can see that the GL_RGBA32F , GL_RGBA32I , and GL_RGBA32UI formats are in the same format class (4x32), which means that you can take a texture that has a GL_RGBA32F internal format and bind one of its levels to an image unit using the GL_RGBA32I or GL_RGBA32UI read more..

  • Page - 219

    Table 5.10: Continued Format Format Qualifier GL_RGBA16 rgba16 GL_RGB10_A2UI rgb10_a2ui GL_RGB10_A2 rgb10_a2 GL_RGBA8UI rgba8ui GL_RGBA8I rgba8i GL_RGBA8_SNORM rgba8_snorm GL_RGBA8 rgba8 GL_R11F_G11F_B10F r11f_g11f_b10f GL_RG32F rg32f GL_RG32UI rg32ui GL_RG32I rg32i GL_RG16F rg16f GL_RG16UI rg16ui GL_RG16I rg16i GL_RG16_SNORM rg16_snorm GL_RG16 rg16 GL_RG8UI rg8ui GL_RG8I rg8i GL_RG8_SNORM rg8_snorm GL_RG8 rg8 GL_R32F r32f GL_R32UI r32ui read more..

  • Page - 220

    #version 430 core // Uniform image variables: // Input image - note use of format qualifier because of loads layout (binding =0, rgba32ui) readonly uniform uimage2D image_in; // Output image layout (binding =1) uniform writeonly uimage2D image_out; void main(void) { // Use fragment coordinate as image coordinate ivec2 P= ivec2(gl_FragCoord.xy); // Read from input image uvec4 data = read more..

  • Page - 221

    Table 5.11: Atomic Operations on Images Atomic Function Behavior imageAtomicAdd Reads from image at P , adds it to data , writes the result back to image at P , and then returns the value originally stored in image at P . imageAtomicAnd Reads from image at P , logically ANDs it with data , writes the result back to image at P , and then returns the value originally stored in read more..

  • Page - 222

    For example, we have uint imageAtomicAdd(uimage1D image, int P, uint data); uint imageAtomicAdd(uimage2D image, ivec2 P, uint data); uint imageAtomicAdd(uimage3D image, ivec3 P, uint data); and so on. The imageAtomicCompSwap is unique in that it takes an additional parameter, comp , which it compares with the existing content in memory. If the value of comp is equal to the value read more..

  • Page - 223

    2. Use imageAtomicExchange to exchange the updated counter value with the current head pointer. 3. Store your data into your data store. The structure for each element includes a next index, which you fill with the previous value of the head pointer retrieved in step 2. If the “head pointer” image is a 2D image the size of the framebuffer, then you can use this method read more..

  • Page - 224

    You might notice the use of the gl_FrontFacing built-in variable. This is a Boolean input to the fragment shader whose value is generated by the back-face culling stage that is described in “Primitive Assembly, Clipping, and Rasterization” back in Chapter 3, “Following the Pipeline.” Even if back-face culling is disabled, this variable will still contain true if the polygon is read more..

  • Page - 225

    { list_item item[]; }; layout (location = 0) out vec4 color; const uint max_fragments = 10; void main(void) { uint frag_count = 0; float depth_accum = 0.0; ivec2 P= ivec2(gl_FragCoord.xy); uint index = imageLoad(head_pointer, P).x; while (index != 0xFFFFFFFF && frag_count < max_fragments) { list_item this_item = item[index]; if (this_item.facing != 0) { depth_accum -= this_item.depth; } read more..

  • Page - 226

    Figure 5.12: Resolved per-fragment linked lists You should call glMemoryBarrier() with the GL_SHADER_IMAGE_ACCESS_BIT set when something has written to an image that you want read from images later — including other shaders. Similarly, there is a version of the GLSL memoryBarrier() function, memoryBarrierImage() , that ensures that operations on images from inside your shader are completed read more..

  • Page - 227

    the graphics processor needs to read less data when fetching from a compressed texture, less memory bandwidth is required when compressed textures are used. There are a number of compressed texture formats supported by OpenGL. All OpenGL implementations support at least the compression schemes listed in Table 5.12. Table 5.12: Native OpenGL Texture Compression Formats Formats read more..

  • Page - 228

    that it is implementation specific, and although your code will work on many platforms, the result of rendering with them might not be the same. The RGTC (Red-Green Texture Compression) format breaks a texture image into 4 × 4 texel blocks, compressing the individual channels within that block using a series of codes. This compression mode works only for one- and two-channel read more..

  • Page - 229

    Using Compression You can ask OpenGL to compress a texture in some formats when you load it, although it’s strongly recommended to compress textures yourself and store the compressed texture in a file. If OpenGL does support compression for your selected format, all you have to do is request that the internal format be one of the compressed formats and OpenGL will take your read more..

  • Page - 230

    glCompressedTexSubImage3D() to upload data into it. When you do this, you need to ensure that the xoffset , yoffset , and other parameters obey texture format specific rules. In particular, most texture compression formats compress blocks of texels. These blocks are usually sizes such as 4 × 4 texels. The regions that you update with glCompressedTexSubImage2D() need to line up on read more..

  • Page - 231

    memory. For example, you might take a texture with an internal format of GL_RGBA32F (i.e., four 32-bit floating-point components per texel) and create a view of it that sees them as GL_RGBA32UI (four 32-bit unsigned integers per texel) so that you can get at the individual bits of the texels. Of course, you can do both of these things at the same time — that is, take read more..

  • Page - 232

    first mipmap level and number of mipmap levels to include in the view. This allows you to create a texture view that represents part of an entire mipmap pyramid of another texture. For example, to create a texture that represented just the base level (level 0) of another texture, you can set minlevel to 0 and numlevels to 1. To create a view that represented the 4 read more..

  • Page - 233

    compatible with one another. To be compatible, two formats must be in the same class. There are several format classes, and they are listed, along with the internal formats that are members of that class, in Table 5.14. Table 5.14: Texture View Format Compatibility Format Class Members of the Class 128-bit GL_RGBA32F , GL_RGBA32UI , GL_RGBA32I 96-bit GL_RGB32F , GL_RGB32UI , GL_RGB32I read more..

  • Page - 234

    you create a 2D non-array texture view of one of its layers, you can call glTexSubImage2D() to put data into the view, and the same data will end up in the corresponding layer of the array texture. As another example, you can create a 2D non-array texture view of a single layer of a 2D array texture and access it from a sampler2D uniform in a shader. Likewise, you read more..

  • Page - 235

    This page intentionally left blank read more..

  • Page - 236

    Chapter 6 Shaders and Programs WHAT YOU’LL LEARN IN THIS CHAPTER • The fundamentals of the OpenGL shading language • How to find out if your shaders compiled, and what went wrong if they didn’t • How to retrieve and cache binaries of your compiled shaders and use them later for rendering By this point in the book, you have read about the OpenGL pipeline, written read more..

  • Page - 237

    Language Overview GLSL is in the class of languages that can be considered “C-like.” That is, its syntax and model are much like that of C with a number of differences that make it more suitable for graphics and parallel execution in general. One of the major differences between C and GLSL is that matrix and vector types are first class citizens. That means that they read more..

  • Page - 238

    Signed and unsigned integers behave as would be expected in a C program. That is, signed integers are stored as two’s complement and have a range from -2,147,483,648 to 2,147,483,647, and unsigned integers have a range from 0 to 4,294,967,295. If you add numbers together such that they overflow their ranges, they will wrap around. Floating-point numbers are effectively defined read more..

  • Page - 239

    Vectors and Matrices Vectors of all supported scalar types and matrices of single- and double-precision floating-point types are supported by GLSL. Vector and matrix type names are decorated with their underlying scalar type’s name, except for floating-point vectors and matrices, which have no decoration. Table 6.2 shows all of the vector and matrix types in GLSL. Table 6.2: Vector read more..

  • Page - 240

    In addition to accesses as an array, vectors may be accessed as if they were structures with fields representing their components. The first component can be accessed through the .x , .s ,or .r field. The second component is accessed through the .y , .t ,or .g field. The third is accessed through the .z , .p ,or .b field, and finally, the fourth component can be read more..

  • Page - 241

    that array (which is therefore a vector) represents a column of the matrix. Because each of those vectors can also be treated like an array, a column of a matrix behaves as an array, effectively allowing matrices to be treated like two-dimensional arrays. For example, if we declare bar as a mat4 type, then bar[0] is a vec4 representing its first column, and bar[0][0] is the read more..

  • Page - 242

    To a C programmer, this may seem odd. However, it’s actually a very powerful feature as it allows types to be implicitly defined without the typedef keyword, which GLSL lacks. One example use of this is to declare a function that returns an array: vec4[4] functionThatReturnsArray() { vec4[4]foo =... return foo; } Declaring array types in this form also implicitly defines the read more..

  • Page - 243

    to note that because there is a duality between vectors and arrays in GLSL, the .length() function works on vectors (giving their size, naturally) and that because matrices are essentially arrays of vectors, .length() when applied to a matrix gives you the number of columns it has. The following are a few examples of applications of the .length() function: float a[10]; // Declare read more..

  • Page - 244

    multiple definitions, each with a different set of parameters. Rather than enumerate all of the types supported for each of the functions, some standard terminology is used in the GLSL Specification to group classes of data types together such that families of functions can be referred to more concisely. We will sometimes use those terms here to refer to groups of types also. read more..

  • Page - 245

    that finding the inverse of a matrix is fairly expensive, and so if the matrix is likely to be constant, calculate the inverse in your application and load it into your shader as a uniform. Non-square matrices do not have inverses and so are not supported by the inverse() function. Similarly, the determinant() function calculates the determinant of any square matrix. For read more..

  • Page - 246

    Likewise, the faceforward() function takes an input vector and two surface normals — if the dot product of the input vector and the second normal vector is negative, then it returns the first normal vector; otherwise, it returns the negative of the first normal vector. As you might have guessed from its name, this can be used to determine whether a plane is front- or read more..

  • Page - 247

    The step() function generates a step function (a function that has a value of either 0.0 or 1.0) based on its two inputs. It is defined as vec4 step(vec4 edge, vec4 x); and it returns 0.0 if x < edge and 1.0 if x >= edge . The smoothstep() function is not as aggressive and produces a smooth fade between two of its inputs based on where the value of its read more..

  • Page - 248

    graphics processors, the fused multiply-add function may be more efficient than a sequence of a multiplication followed by a separate addition operation. Most of the math functions in GLSL presume that you are using floating-point numbers in the majority of your shader code. However, there are a few cases where you might be using integers, and GLSL includes a handful of read more..

  • Page - 249

    this, however, as not all bit combinations form valid floating-point numbers, and it’s quite possible to generate NaNs (Not-a-Number), denormals or infinities. To test whether a floating-point number represents a NaN or an infinity, you can call isnan() or isinf() . In addition to being able to tear apart floating-point numbers and then put them back together again, GLSL read more..

  • Page - 250

    Other bitfield operations supported by GLSL include bitfieldReverse() , bitCount() , findLSB() , and findMSB() functions, which reverse the order of a subset of bits in an integer, count the number of set bits in an integer, and find the index of the least significant or most significant bit that is set in an integer, respectively. Compiling, Linking, and Examining Programs Each read more..

  • Page - 251

    Other values for pname that can be passed to glGetShaderiv() are • GL_SHADER_TYPE , which returns the type of shader that the object is (GL_VERTEX_SHADER, GL_FRAGMENT_SHADER , etc.), • GL_DELETE_STATUS , which will return GL_TRUE or GL_FALSE to indicate whether glDeleteShader() has been called on the shader object, • GL_SHADER_SOURCE_LENGTH , which returns the total length of the source read more..

  • Page - 252

    // Allocate a string for it... std::string str; str.reserve(log_length); // Get the log... glGetShaderInfoLog(fs, log_length, NULL, str.c_str()); Listing 6.1: Retrieving the compiler log from a shader If your shader contains errors or suspect code that might generate compiler warnings, then OpenGL’s shader compiler will tell you about it in the log. Consider the following shader, which read more..

  • Page - 253

    It seems that we have forgotten the type of the scale uniform. We can fix that by giving scale a type (it’s supposed to be vec4 ). The next three issues are on the same line: ERROR: 0:10: error(#143) Undeclared identifier: scale WARNING: 0:10: warning(#402) Implicit truncation of vector from size: 4 to size: 3 ERROR: 0:10: error(#162) Wrong operand types: no operation read more..

  • Page - 254

    has quite a bit more status than a compiled shader, and you can retrieve it all by using glGetProgramiv() , whose prototype is void glGetProgramiv(GLuint program, GLenum pname, GLint * params); You’ll notice that glGetProgramiv() is very similar to glGetShaderiv() . The first parameter, program , is the name of the program object whose information you want to retrieve, and the last read more..

  • Page - 255

    The parameters to glGetProgramInfoLog() work just the same as they do for glGetShaderInfoLog() , except that in place of shader , we have program , which is the name of the program object whose log you want to read. Now, consider the shader shown in Listing 6.2. #version 430 core layout (location = 0) out vec4 color; vec3 myFunction(); void main(void) { color = vec4(myFunction(), read more..

  • Page - 256

    a vertex shader that contributes to an output that is never used by the subsequent fragment shader, for example. However, this scheme comes at a potential cost of flexibility and possibly performance to the application. For every combination of vertex, fragment, and possibly other shaders, you need to have a unique program object, and linking all those programs doesn’t come read more..

  • Page - 257

    // Create a vertex shader GLuint vs = glCreateShader(GL_VERTEX_SHADER); // Attach source and compile glShaderSource(vs, 1, vs_source, NULL); glCompileShader(vs); // Create a program for our vertex stage and attach the vertex shader to it GLuint vs_program = glCreateProgram(); glAttachShader(vs_program, vs); // Important part - set the GL_PROGRAM_SEPARABLE flag to GL_TRUE *then* link read more..

  • Page - 258

    The glCreateShaderProgramv() function takes the type of shader you want to compile (GL_VERTEX_SHADER, GL_FRAGMENT_SHDAER , etc.), the number of source strings, and a pointer to an array of strings (just like glShaderSource() ), and compiles those strings into a new shader object. Then, it internally attaches that shader object to a new program object, sets its separable hint to true, read more..

  • Page - 259

    If you link shaders for multiple stages together in a single program object, OpenGL may realize that an interface member isn’t required and that it can eliminate it from the shader(s). As an example, if the vertex shader only writes a constant to a particular output and the fragment shader then consumes that data as an input, OpenGL might remove the code to produce that read more..

  • Page - 260

    parameter of glGetProgramResourceiv() . glGetProgramResourceiv() returns multiple properties in a single function call, and the number of properties to return is given in propCount . props is an array of tokens specifying which properties you’d like to retrieve. Those properties will be written to the array whose address is given in params and the size of which (in elements) is given read more..

  • Page - 261

    Again, program , programInterface , and index have the same meaning as they do for glGetProgramResourceiv() . bufSize is the size of the buffer pointed to by name , and, if it is not NULL, length points to a variable that will have the actual length of the name written into it. As an example, Listing 6.4 shows a simple program that will print information about the active read more..

  • Page - 262

    Notice that the listing of the active outputs appears in the order that they were declared in. However, since we explicitly specified output location 2 for data , the GLSL compiler went back and used location 1 for extra .We are also able to correctly tell the types of the outputs using this code. Although in your applications, you will likely know the types and names of read more..

  • Page - 263

    When you link a program that includes subroutines, each subroutine in each stage is assigned an index. If you are using version 430 of GLSL or newer (this is the version shipped with OpenGL 4.3), you can assign the indices yourself in shader code using the index layout qualifier. So, we could declare the subroutines from Listing 6.5 as follows: layout (index = 2) subroutine read more..

  • Page - 264

    in a particular stage of a program can be determined by calling glGetProgramStageiv() : void glGetProgramStageiv(GLuint program, GLenum shadertype, GLenum pname, GLint *values); Again, program is the name of the program object containing the shader, and shadertype indicates which stage of the program you’re asking about. To get the number of active subroutines in the relevant stage of the read more..

  • Page - 265

    In our simple example, after linking our program object, we can run the following code to determine the indices of our subroutine functions as we haven’t assigned explicit locations to them in our shader code: subroutines[0] = glGetProgramResourceIndex(render_program, GL_FRAGMENT_SHADER_SUBROUTINE, "myFunction1"); subroutines[1] = glGetProgramResourceIndex(render_program, GL_FRAGMENT_SHADER_SUBROUTINE, read more..

  • Page - 266

    the program. At some point in the future, your application can hand that binary back to OpenGL and bypass the compiler and linker. If you wish to use this feature, you should call glProgramParameteri() with pname set to GL_PROGRAM_BINARY_RETRIEVABLE_HINT set to GL_TRUE before calling glLinkProgram() . This tells OpenGL that you plan to get the binary data back from it and that it read more..

  • Page - 267

    GLuint program; program = glCreateProgram(); glAttachShader(program, shader); // Set the binary retrievable hint and link the program glProgramParameteri(program, GL_PROGRAM_BINARY_RETRIEVABLE_HINT, GL_TRUE); glLinkProgram(program); // Get the expected size of the program binary GLint binary_size = 0; glGetProgramiv(program, GL_PROGRAM_BINARY_SIZE, &binary_size); // Allocate some memory to store the program read more..

  • Page - 268

    assuming that is the way it will be used. If it is used in a way that is not handled by this default compilation of the shaders, the OpenGL implementation may need to at least partially recompile parts of the shader to deal with the changes. That can cause a noticeable stutter in the execution of the application. For this reason, it’s strongly recommended that you read more..

  • Page - 269

    This page intentionally left blank read more..

  • Page - 270

    Part II In Depth read more..

  • Page - 271

    This page intentionally left blank read more..

  • Page - 272

    Chapter 7 Vertex Processing and Drawing Commands WHAT YOU’LL LEARN IN THIS CHAPTER • How to get data from your application into the front of the graphics pipeline • What the various OpenGL drawing commands are and what their parameters do • How your transformed geometry gets into your application’s window In Chapter 3, we followed the OpenGL pipeline from start to read more..

  • Page - 273

    Vertex Processing The first programmable stage in the OpenGL pipeline (i.e., one that you can write a shader for) is the vertex shader. Before the shader runs, OpenGL will fetch the inputs to the vertex shader in the vertex fetch stage, which we will describe first. Your vertex shader’s responsibility is to set the position1 of the vertex that will be fed to the next read more..

  • Page - 274

    In order to understand how these functions work, first, let’s consider a simple vertex shader fragment that declares a number of inputs. In Listing 7.1, notice the use of the location layout qualifier to set the locations of the inputs explicitly in the shader code. #version 430 core // Declare a number of vertex attributes layout (location = 0) in vec4 position; layout read more..

  • Page - 275

    GL_UNSIGNED_BYTE . This is where the normalized parameter comes in. As you probably know, the range of values representable by an unsigned byte is 0 to 255. However, that’s not what we want in our vertex shader. There, we want to represent colors as values between 0.0 and 1.0. If you set normalized to GL_TRUE , then OpenGL will automatically divide through each component of read more..

  • Page - 276

    In addition to the scalar types shown in Table 7.1, glVertexAttribFormat() also supports several packed data formats that use a single integer to store multiple components. The two packed data formats supported by OpenGL are GL_UNSIGNED_INT_2_10_10_10_REV and GL_INT_2_10_10_10_REV , which both represent four components packed into a single 32-bit word. The GL_UNSIGNED_INT_2_10_10_10_REV format read more..

  • Page - 277

    for integer types — type must be one of the integer types (GL_BYTE, GL_SHORT , GL_INT , one of their unsigned counterparts, or one of the packed data formats), and integer inputs to a vertex shader are never normalized. Thus, the complete code to describe our vertex format is // position glVertexAttribFormat(0, 4, GL_FLOAT, GL_FALSE, offsetof(VERTEX, position)); // normal read more..

  • Page - 278

    However, we could establish a more complex binding scheme. Let’s say, for example, that we wanted to store position , normal , and tex_coord in one buffer, color in a second, and material_id in a third. We could set this up as follows: void glVertexAttribBinding(0, 0); // position void glVertexAttribBinding(1, 0); // normal void glVertexAttribBinding(2, 0); // tex_coord void read more..

  • Page - 279

    Variable Point Sizes By default, OpenGL will draw points with a size of a single fragment. However, as you saw way back in Chapter 2, you can change the size of points that OpenGL draws by calling glPointSize() . The maximum size that OpenGL will draw your points is implementation defined, but it will be least 64 pixels. You find out what the actual upper limit is by read more..

  • Page - 280

    Drawing Commands Until now, we have written every example using only a single drawing command — glDrawArrays() . OpenGL includes many drawing commands, however, and while some could be considered supersets of others, they can be generally categorized as either indexed or non-indexed and direct or indirect. Each of these will be covered in the next few sections. Indexed Drawing read more..

  • Page - 281

    0 12 3 4 5 67 8 9 10 11 12 13 14 12 3 2 4 5 67 9 11 13 x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z Indices Vertices Figure 7.1: Indices used in an indexed draw of OpenGL. The set of the most generalized OpenGL drawing commands is given in Table 7.2 — all other OpenGL drawing commands can be expressed in terms read more..

  • Page - 282

    set of 36 indices that tell OpenGL which corner to use for each vertex of each triangle. The new setup code looks like this: static const GLfloat vertex_positions[] = { -0.25f, -0.25f, -0.25f, -0.25f, 0.25f, -0.25f, 0.25f, -0.25f, -0.25f, 0.25f, 0.25f, -0.25f, 0.25f, -0.25f, 0.25f, 0.25f, 0.25f, 0.25f, -0.25f, -0.25f, 0.25f, -0.25f, 0.25f, 0.25f, }; static const GLushort vertex_indices[] = { 0, read more..

  • Page - 283

    vertex_indices , we need to bind a buffer to the GL_ELEMENT_ARRAY_BUFFER and put the indices in it just as we did with the vertex data. In Listing 7.2, we do that right after we set up the buffer containing vertex positions. Once you have a set of vertices and their indices in memory, you’ll need to change your rendering code to use glDrawElements() (or one of the read more..

  • Page - 284

    0 12 3 4 5 67 8 9 10 11 12 13 14 12 3 2 4 5 67 9 11 13 x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x y z x x y y z z x x x y y z z x x y y z z x x y y z z Indices Vertices basevertex + + + + + + + + + + + + + + + + + + + + + + + + + + Figure 7.2: Base vertex read more..

  • Page - 285

    functionality that is about to be introduced is used, a set of strips needs to be rendered with separate calls to OpenGL. This means that there is likely to be a lot more function calls in a program that uses stripified geometry, and if the stripping application hasn’t done a decent job or if the model just doesn’t lend well to stripification, this can eat any read more..

  • Page - 286

    GL_UNSIGNED_INT , 0xFFFF for GL_UNSIGNED_SHORT , and 0xFF for GL_UNSIGNED_BYTE ) because you can be almost certain that it will not be used as a valid index of a vertex. Many stripping tools have an option to either create separate strips or to create a single strip with the restart index in it. The stripping tool may use a predefined index or output the index it used read more..

  • Page - 287

    modified only slightly from instance to instance. A simple application might just loop over all of the individual blades of grass in a field and render them separately, calling glDrawArrays() once for each blade and perhaps updating a set of shader uniforms on each iteration. Supposing each blade of grass were made up of a strip of four triangles, the code might look read more..

  • Page - 288

    count for glDrawArraysInstanced() , and mode , count , type , and indices for glDrawElementsInstanced() ) take the same meaning as in the regular, non-instanced versions of the functions. When you call one of these functions, OpenGL makes any preparations it needs to draw your geometry (such as copying vertex data to the graphics card’s memory, for example) only once and then read more..

  • Page - 289

    If all that these functions did were send many copies of the same vertices to OpenGL as if glDrawArrays() or glDrawElements() had been called in a tight loop, they wouldn’t be very useful. One of the things that makes instanced rendering usable and very powerful is a special, built-in variable in GLSL named gl_InstanceID . The gl_InstanceID variable appears in the vertex as if read more..

  • Page - 290

    The value of gl_InstanceID can be used directly as a parameter to a shader function or to index into data such as textures or uniform arrays. To return to our example of the field of grass, let’s figure out what we’re going to do with gl_InstanceID to make our field not just be thousands of identical blades of grass growing out of a single point. Each of our read more..

  • Page - 291

    Our uniform grid of grass probably looks a little plain, as if a particularly attentive groundskeeper hand-planted each blade. What we really need to do is displace each blade of grass by some random amount within its grid square. That’ll make the field look a little less uniform. A simple way of generating random numbers is to multiply a seed value by a large number read more..

  • Page - 292

    probably want to have control over, so we use a texture to hold information about blades of grass. You have an x and a z coordinate for each blade of grass that was calculated by generating a grid coordinate directly from gl_InstanceID and then generating a random number and displacing the blade within the xz plane. That coordinate pair can be used as a coordinate to look read more..

  • Page - 293

    Our field is still looking a little bland. The grass just sticks straight up and doesn’t move. Real grass sways in the wind and gets flattened when things roll over it. We need the grass to bend, and we’d like to have control over that. Why not use another channel from the parameter texture (the blue channel) to control a bend factor? We can use that as another read more..

  • Page - 294

    Now, our final field has a million blades of grass, evenly distributed, with application control over length, “flatness,” direction of bend, or sway and color. Remember, the only input to the shader that differentiates one blade of grass from another is gl_InstanceID , the total amount of geometry sent to OpenGL is six vertices, and the total amount of code required to read more..

  • Page - 295

    Pass the index of the attribute to the function in index and set divisor to the number of instances you’d like to pass between each new value being read from the array. If divisor is zero, then the array becomes a regular vertex attribute array with a new value read per vertex. If divisor is non-zero, however, then new data is read from the array once every divisor read more..

  • Page - 296

    two, a new value of color will be presented every second instance; if the divisor is three, color will be updated every third instance; and so on. If we render geometry using this simple shader, each instance will be drawn on top of the others. We need to modify the position of each instance so that we can see each one. We can use another instanced array for this. read more..

  • Page - 297

    attach it to a vertex array object. Some of the data is used as per-vertex positions, but the rest is used as per-instance colors and positions. static const GLfloat square_vertices[] = { -1.0f, -1.0f, 0.0f, 1.0f, 1.0f, -1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, -1.0f, 1.0f, 0.0f, 1.0f }; static const GLfloat instance_colors[] = { 1.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f, read more..

  • Page - 298

    Now all that remains is to set the vertex attrib divisors for the instance_color and instance_position attribute arrays: glVertexAttribDivisor(1, 1); glVertexAttribDivisor(2, 1); Now we draw four instances of the geometry that we previously put into our buffer. Each instance consists of four vertices, each with its own position, which means that the same vertex in each instance has the read more..

  • Page - 299

    When you have instanced vertex attributes, you can use the baseInstance parameter to drawing commands such as glDrawArraysInstancedBaseInstance() to offset where in their respective buffers the data is read from. If you set this to zero (or call one of the functions that lacks this parameter), the data for the first instance comes from the start of the array. However, if you read more..

  • Page - 300

    void glDrawArraysIndirect(GLenum mode, const void * indirect); and void glDrawElementsIndirect(GLenum mode, GLenum type, const void * indirect); For both functions, mode is one of the primitive modes such as GL_TRIANGLES or GL_PATCHES . For glDrawElementsIndirect() , type is the type of the indices to be used (just like the type parameter to glDrawElements() ) and should be set to read more..

  • Page - 301

    However, the one difference here is that the firstIndex parameter is in units of indices rather than bytes, and so is multiplied by the size of the index type to form the offset that would have been passed in the indices parameter to glDrawElements() . As handy as it may seem to be able to do this, what makes this feature particularly powerful is the multi versions of read more..

  • Page - 302

    Listing 7.10 shows a simple example of how glMultiDrawArraysIndirect() might be used. typedef struct { GLuint vertexCount; GLuint instanceCount; GLuint firstVertex; GLuint baseInstance; } DrawArraysIndirectCommand; DrawArraysIndirectCommand draws[] = { { 42, // Vertex count 1, // Instance count 0, // First vertex 0 // Base instance }, { 192, 1, 327, 0, }, { 99, 1, 901, 0 } }; // Put "draws[]" read more..

  • Page - 303

    used to describe it. We can retrieve these from the object loader by calling sb6::object::get_sub_object_info() . The total number of sub-objects in the .sbm file is made available through the sb6::object::get_sub_object_count() function. Therefore, we can construct an indirect draw buffer for our asteroid field using the code shown in Listing 7.11. read more..

  • Page - 304

    #version 430 core layout (location = 0) in vec4 position; layout (location = 1) in vec3 normal; layout (location = 10) in uint draw_id; Listing 7.12: Vertex shader inputs for asteroids As usual, we have a position and normal input. However, we’ve also used an attribute at location 10, draw_id , to store our draw index. This attribute is going to be instanced and associated read more..

  • Page - 305

    layout (location = 10) in uint draw_id; out VS_OUT { vec3 normal; vec4 color; }vs_out; uniform float time = 0.0; uniform mat4 view_matrix; uniform mat4 proj_matrix; uniform mat4 viewproj_matrix; const vec4 color0 = vec4(0.29, 0.21, 0.18, 1.0); const vec4 color1 = vec4(0.58, 0.55, 0.51, 1.0); void main(void) { mat4 m1; mat4 m2; mat4 m; float t = time * 0.1; float f= float(draw_id) / read more..

  • Page - 306

    m=m * m1; // Non-uniform scale float f1 = 0.65 + cos(f * 1.1) * 0.2; float f2 = 0.65 + cos(f * 1.1) * 0.2; float f3 = 0.65 + cos(f * 1.3) * 0.2; m1[0] = vec4(f1, 0.0, 0.0, 0.0); m1[1] = vec4(0.0, f2, 0.0, 0.0); m1[2] = vec4(0.0, 0.0, f3, 0.0); m1[3] = vec4(0.0, 0.0, 0.0, 1.0); m=m * m1; gl_Position = viewproj_matrix * m * position; vs_out.normal = mat3(view_matrix read more..

  • Page - 307

    first, count, 1, j); } } Listing 7.15: Drawing asteroids As you can see from Listing 7.15, we first bind the object’s vertex array object by calling object.get_vao() and passing the result to glBindVertexArray() . When mode is MODE_MULTIDRAW , the entire scene is drawn with a single call to glMultiDrawArraysIndirect() . However, if mode is MODE_SEPARATE_DRAWS , we loop over all of the read more..

  • Page - 308

    500 vertices, which means that we’re rendering almost a billion vertices per second, and our bottleneck is almost certainly not the rate at which we are submitting drawing commands. With clever use of the draw_id input (or other instanced vertex attributes), more interesting geometry with more complex variation could be rendered. For example, we could use texture mapping to apply read more..

  • Page - 309

    Using Transform Feedback To set up transform feedback, we must tell OpenGL which of the outputs from the front end we want to record. The outputs from the last stage of the front end are sometimes referred to as varyings. The function to tell OpenGL which ones to record is glTransformFeedbackVaryings() , and its prototype is void glTransformFeedbackVaryings(GLuint program, GLsizei read more..

  • Page - 310

    glTransformFeedbackVaryings(program, num_varyings, varying_names, GL_INTERLEAVED_ATTRIBS); Not all of the outputs from your vertex (or geometry) shader need to be stored into the transform feedback buffer. It is possible to save a subset of the vertex shader outputs to the transform feedback buffer and send more to the fragment shader for interpolation. Likewise, it is also possible to save read more..

  • Page - 311

    one of the indexed transform feedback binding points. There are actually multiple GL_TRANSFORM_FEEDBACK_BUFFER binding points for this purpose, which are conceptually separate but related to the general binding GL_TRANSFORM_FEEDBACK_BUFFER binding point. A schematic of this is shown in Figure 7.10. GL_TRANSFORM_FEEDBACK_BUFFER Generic Binding Point Binding Point 2 GL_TRANSFORM_FEEDBACK_BUFFER read more..

  • Page - 312

    void glBindBufferRange(GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size); The glBindBufferRange() function allows you to bind a section of a buffer to an indexed binding point, whereas glBindBuffer() and glBindBufferBase() can only bind the whole buffer at once. The first three parameters (target, index, and buffer) have the same meanings as in glBindBufferBase() . The read more..

  • Page - 313

    If you need to, you can leave gaps in the output structures stored in the transform feedback buffer. When you do this, OpenGL will write a few elements, then skip some space in the output buffer, then write a few more components, and so on, leaving the unused space in the buffer unmodified. To do this you can include one of the “virtual” varying names, read more..

  • Page - 314

    drawing function, the basic geometric type must match what you have specified as the transform feedback primitive mode, or you must have a geometry shader that outputs the appropriate primitive type. For example, if primitiveMode is GL_TRIANGLES , then the last stage of the front end must produce triangles. This means that if you have a geometry shader, it must output read more..

  • Page - 315

    currently bound transform feedback buffers. Each time glBeginTransformFeedback() is called, OpenGL starts writing data at the beginning of the buffers bound for transform feedback, overwriting what might be there already. Some care should be taken while transform feedback is active as changing transform feedback state between calls to glBeginTransformFeedback() and glEndTransformFeedback() is not read more..

  • Page - 316

    use transform feedback to store the positions and velocities of each of the masses between each iteration of the algorithm. For each vertex, we need a position, velocity, and mass. We can pack the positions and masses into one vertex array and pack the velocities into another. Each element of the position array is actually a vec4 , with the x , y , and z components read more..

  • Page - 317

    Because for each of the connection vectors we either store the index of the vertex to which we are connected or -1 to indicate that no connection is present, we know that by storing a -1 in each of the connection vector components, we can fix that vertex in place. No matter what forces are acting on it, its position won’t be updated. This allows us to fix the read more..

  • Page - 318

    glBindVertexArray(m_vao[i]); glBindBuffer(GL_ARRAY_BUFFER, m_vbo[POSITION_A + i]); glBufferData(GL_ARRAY_BUFFER, POINTS_TOTAL * sizeof(vmath::vec4), initial_positions, GL_DYNAMIC_COPY); glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 0, NULL); glEnableVertexAttribArray(0); glBindBuffer(GL_ARRAY_BUFFER, m_vbo[VELOCITY_A + i]); glBufferData(GL_ARRAY_BUFFER, POINTS_TOTAL * sizeof(vmath::vec3), initial_velocities, GL_DYNAMIC_COPY); read more..

  • Page - 319

    where F is the force exerted by the spring, k is the spring constant (how stiff the spring is), and x is the extension of the spring. The spring’s extension is relative to its resting length. For our system, we keep the rest length of the springs the same and store it in a uniform. Any stretching of the spring produces a positive value of x , and any compression read more..

  • Page - 320

    Here, F is the force we just calculated using gravity, the damping coefficient, and Hooke’s law; m is the mass of the vertex (stored in the w component of the position attribute); and a is the resulting acceleration. Given the initial velocity (which we get from our other attribute array), we can plug it into the following equations of motion to find out what our final read more..

  • Page - 321

    vec3 F = gravity * m-c * u; // F is the force on the mass bool fixed_node = true; // Becomes false when force is applied for (int i=0;i<4;i++) { if (connection[i] != -1) { // q is the position of the other vertex vec3 q = texelFetch(tex_position, connection[i]).xyz; vec3 d=q-p; float x = length(d); F+=-k * (rest_length - x) * normalize(d); fixed_node = false; } } // If read more..

  • Page - 322

    buffer via two different methods. To set this up, we generate two textures, bind them to the GL_TEXTURE_BUFFER binding point, and attach the buffers to them using glTexBuffer() , as explained earlier in this book. When we bind VAO A, we also bind texture A. When we bind VAO B, we bind texture B. That way, the same data appears in both the position vertex attribute and read more..

  • Page - 323

    int i; glUseProgram(m_update_program); glEnable(GL_RASTERIZER_DISCARD); for (i = iterations_per_frame; i != 0; --i) { glBindVertexArray(m_vao[m_iteration_index & 1]); glBindTexture(GL_TEXTURE_BUFFER, m_pos_tbo[m_iteration_index & 1]); m_iteration_index++; glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, m_vbo[POSITION_A+(m_iteration_index & 1)]); glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 1, m_vbo[VELOCITY_A+(m_iteration_index read more..

  • Page - 324

    The image in Figure 7.12 is not particularly interesting, but it does demonstrate that our simulation is running correctly. To make the visual result more appealing, we can set the point size to a larger value, and we can also issue a second, indexed draw using glDrawElements() and GL_LINES primitives to visualize the connections between nodes. Note that the same vertex positions read more..

  • Page - 325

    Clipping As explained in Chapter 3, “Following the Pipeline,” clipping is the process of determining which primitives may be fully or partially visible and constructing a set of primitives from them that will lie entirely inside the viewport. For points, clipping is trivial — if the coordinate of the point is inside the region, then it should be processed further, whereas if read more..

  • Page - 326

    rectangle). The line marked B is trivially rejected because both of its endpoints are outside the left edge of the viewport. Line C is clipped against the top edge of the viewport and line D is clipped against the left and bottom edges of the viewport. This is non-trivial clipping and results in vertices being moved along the line to make it fit into the viewport. Line read more..

  • Page - 327

    As you can see, the triangle marked A in Figure 7.15 is trivially accepted because all three of its vertices lie inside the viewport. Triangle B is trivially discarded because all three of its vertices lie outside of the same edge of the viewport. Triangle C crosses the left edge of the viewport and must be clipped. An additional vertex is generated by OpenGL, and the read more..

  • Page - 328

    The presence of a guard band does not affect trivially accepted or trivially rejected triangles — those are either passed through or thrown away as they were before. However, triangles that clip against one or more edge of the viewport but otherwise fall inside the guard band are also considered to be trivially accepted and not are broken up. Only triangles that clip read more..

  • Page - 329

    glEnable(GL_CLIP_DISTANCE0 + n); Here, n is the index of the clip distance to enable. The tokens GL_CLIP_DISTANCE1 , GL_CLIP_DISTANCE2 , and so on up to GL_CLIP_DISTANCE5 are usually defined in standard OpenGL header files. However, the maximum value of n is implementation defined and can be found by calling glGetIntegerv() with the token GL_MAX_CLIP_DISTANCES . You can disable the read more..

  • Page - 330

    moving slowly toward and eventually off the edge of the screen, it will suddenly disappear when the center of the point exits the view volume and the vertex representing that point is clipped. Likewise, OpenGL can render wide lines. If a line is drawn whose vertices are both outside one of the clipping planes but would otherwise be visible, nothing will be drawn. This can read more..

  • Page - 331

    Figure 7.17: Rendering with user clip distances Summary This chapter covered in some detail the mechanisms by which OpenGL reads vertex data from the buffers that you provide and how you map the inputs to your vertex shader to those inputs. We’ve also discussed the responsibilities of the vertex shader and the built-in output variables that it can write. You have seen how read more..

  • Page - 332

    Chapter 8 Primitive Processing WHAT YOU’LL LEARN IN THIS CHAPTER • How to use tessellation to add geometric detail to your scenes • How to use geometry shaders to process whole primitives and create geometry on the fly In the previous chapters, you’ve read about the OpenGL pipeline and have been at least briefly introduced to the functions of each of its stages. read more..

  • Page - 333

    Tessellation As introduced in the section “Tessellation” in Chapter 3, tessellation is the process of breaking a large primitive referred to as a patch into many smaller primitives before rendering them. There are many uses for tessellation, but the most common application is to add geometric detail to otherwise lower fidelity meshes. In OpenGL, tessellation is produced using three read more..

  • Page - 334

    coordinates. When the tessellation engine is generating lines or triangles, those coordinates are simply a pair of normalized values indicating the relative position of the vertex. This is stored in the gl_TessCoord input variable. This setup is shown in the schematic of Figure 8.1. TESSELLATION ENGINE TESSELLATION CONTROL SHADER TESSELLATION EVALUATION SHADER gl_TessLevelOuter[ ] gl_TessLevelInner[ read more..

  • Page - 335

    gl_TessLevelOuter[1] gl_TessLevelOuter[3] gl_TessLevelInner[0] gl_TessLevelInner[0] gl_TessLevelInner[1] gl_TessLevelInner[1] gl_TessLevelOuter[0] (0,0) (0,1) gl_TessLevelOuter[2] (1,1) (1,0) Figure 8.2: Tessellation factors for quad tessellation When the quad is tessellated, the tessellation engine generates vertices across a two-dimensional domain normalized within the quad. The value stored in the gl_TessCoord input read more..

  • Page - 336

    In Figure 8.3, the inner tessellation factors in the u and v directions were set to 9.0 and 7.0, respectively. The outer tessellation factors were set to 3.0 and 5.0 in the u and v directions. This was accomplished using the very simple tessellation control shader shown in Listing 8.1. #version 430 core layout (vertices = 4) out; void main(void) { if (gl_InvocationID == 0) { read more..

  • Page - 337

    gl_in[3].gl_Position, gl_TessCoord.x); // Now interpolate those two results using the y component // of tessellation coordinate gl_Position = mix(p1, p2, gl_TessCoord.y); } Listing 8.2: Simple quad tessellation evaluation shader example Tessellation Using Triangles When the tessellation mode is set to triangles (again, using an input layout qualifier in the tessellation control shader), the read more..

  • Page - 338

    Figure 8.5: Triangle tessellation example constants into the inner and outer tessellation levels and pass through the control point positions unmodified. #version 430 core layout (vertices = 3) out; void main(void) { if (gl_InvocationID == 0) { gl_TessLevelInner[0] = 5.0; gl_TessLevelOuter[0] = 8.0; gl_TessLevelOuter[1] = 8.0; gl_TessLevelOuter[2] = 8.0; } gl_out[gl_InvocationID].gl_Position = read more..

  • Page - 339

    #version 430 core layout (triangles) in; void main(void) { gl_Position = (gl_TessCoord.x * gl_in[0].gl_Position) + (gl_TessCoord.y * gl_in[1].gl_Position) + (gl_TessCoord.z * gl_in[2].gl_Position); } Listing 8.4: Simple triangle tessellation evaluation shader example Again, to produce a position for each vertex generated by the tessellation engine, we simply calculate a weighted sum of the input read more..

  • Page - 340

    The tessellation control shader shown in Listing 8.5 simply set both the outer tessellation levels to 5.0 and doesn’t write to the inner tessellation levels. The corresponding tessellation evaluation shader is shown in Listing 8.6. #version 430 core layout (vertices = 4) out; void main(void) { if (gl_InvocationID == 0) { gl_TessLevelOuter[0] = 5.0; gl_TessLevelOuter[1] = 5.0; } read more..

  • Page - 341

    Figure 8.7: Isoline tessellation example If, however, we change the tessellation evaluation shader to that shown in Listing 8.7, we can generate the image shown in Figure 8.8. #version 430 core layout (isolines) in; void main(void) { float r=(gl_TessCoord.y + gl_TessCoord.x / gl_TessLevelOuter[0]); float t=gl_TessCoord.x * 2.0 * 3.14159; gl_Position = vec4(sin(t) * r, cos(t) * r, 0.5, read more..

  • Page - 342

    Figure 8.8: Tessellated isoline spirals example points. This is known as point mode and is enabled using the point_mode input layout qualifier in the tessellation evaluation shader just like any other tessellation mode. When you specify that point mode should be used, the resulting primitives are points. However, this is somewhat orthogonal to the use of the quads , triangles ,or read more..

  • Page - 343

    Figure 8.9: Triangle tessellated using point mode Tessellation Subdivision Modes The tessellation engine works by generating a triangle or quad primitive and then subdividing its edges into a number of segments determined by the inner and outer tessellation factors produced by the tessellation control shader. It then groups the generated vertices into points, lines, or triangles and sends read more..

  • Page - 344

    With fractional even spacing, the tessellation factor is rounded to the next lower even integer and the edge subdivided as if that were the tessellation factor. With fractional odd spacing, the tessellation factor is rounded down to the next lower odd number and the edge subdivided as if that were the tessellation factor. Of course, with either scheme, there is a small read more..

  • Page - 345

    Controlling the Winding Order In Chapter 3, “Following the Pipeline,” we introduced culling and explained how the winding order of a primitive affects how OpenGL decides whether to render it. Normally, the winding order of a primitive is determined by the order in which your application presents vertices to OpenGL. However, when tessellation is active, OpenGL generates all the read more..

  • Page - 346

    The tessellation control shader processes this group of control points and produces a new group of control points that may or may not have the same number of elements in it as the original group. The tessellation control shader actually runs once for each control point in the output group, but each invocation of the tessellation control shader has access to all of the input read more..

  • Page - 347

    As the outputs of the tessellation control shader are arrays, so the inputs to the tessellation evaluation shader are also similarly sized arrays. The tessellation evaluation shader runs once per generated vertex and, like the tessellation control shader, has access to all of the data for all of the vertices in the patch. In addition to the per-vertex data passed from read more..

  • Page - 348

    GL_PATCH_VERTICES . In this case, the input to the tessellation evaluation shader comes directly from the vertex shader. That is, the input to the tessellation evaluation shader is an array formed from the outputs of the vertex shader invocations that generated the patch. Communication between Shader Invocations Although the purpose of output variables in tessellation control shaders is read more..

  • Page - 349

    Tessellation Example — Terrain Rendering To demonstrate a potential use for tessellation, we will cover a simple terrain rendering system based on quadrilateral patches and displacement mapping. The code for this example is part of the dispmap sample. A displacement map is a texture that contains the displacement from a surface at each location. Each patch represents a small region read more..

  • Page - 350

    shader is shown in Listing 8.8. The shader uses the instance number (stored in gl_InstanceID ) to calculate an offset for the patch, which is a one-unit square in the xz plane, centered on the origin. In this application, we will render a grid of 64 × 64 patches, and so the x and y offsets for the patch are calculated by taking gl_InstanceID modulo 64 and gl_InstanceID read more..

  • Page - 351

    minimum of the outer tessellation factors calculated from the edge lengths in the horizontal or vertical directions. You may also have noticed a piece of code in Listing 8.9 that checks whether all of the z coordinates of the projected control points are less than zero and then sets the outer tessellation levels to zero if this happens. This is an optimization that culls read more..

  • Page - 352

    gl_TessLevelOuter[1] = l1; gl_TessLevelOuter[2] = l2; gl_TessLevelOuter[3] = l3; gl_TessLevelInner[0] = min(l1, l3); gl_TessLevelInner[1] = min(l0, l2); } } gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position; tcs_out[gl_InvocationID].tc = tcs_in[gl_InvocationID].tc; } Listing 8.9: Tessellation control shader for terrain rendering Once the tessellation control shader has calculated the tessellation read more..

  • Page - 353

    The tessellation evaluation shader shown in Listing 8.10 first calculates the texture coordinate of the generated vertex by linearly interpolating the texture coordinates passed from the tessellation control shader of Listing 8.9 (which were in turn generated by the vertex shader of Listing 8.8). It then applies a similar interpolation to the incoming control point positions to produce read more..

  • Page - 354

    Figure 8.12: Terrain rendered using tessellation Figure 8.13: Tessellated terrain in wireframe to increase the number of polygons in the scene. This is a type of brute force, data driven approach to geometric complexity. In the cubicbezier example described here, we will use math to drive geometry — we’re going Tessellation 305 read more..

  • Page - 355

    to render a cubic Bézier patch. If you look back to Chapter 4, you’ll see that we’ve covered all the number crunching we’ll need here. A cubic Bézier patch is a type of higher order surface and is defined by a number of control points3 that provide input to a number of interpolation functions that define the surface’s shape. A Bézier patch has 16 control read more..

  • Page - 356

    Once our control points are in view space, they are passed to our tessellation control shader. In a more advanced4 algorithm, we could project the control points into screen space, determine the length of the curve, and set the tessellation factors appropriately. However, in this example, we’ll settle with a simple fixed tessellation factor. As in previous examples, we set the read more..

  • Page - 357

    out TES_OUT { vec3 N; }tes_out; vec4 quadratic_bezier(vec4 A, vec4 B, vec4 C, float t) { vec4 D = mix(A, B, t); vec4 E = mix(B, C, t); return mix(D, E, t); } vec4 cubic_bezier(vec4 A, vec4 B, vec4 C, vec4 D, float t) { vec4 E = mix(A, B, t); vec4 F = mix(B, C, t); vec4 G = mix(C, D, t); return quadratic_bezier(E, F, G, t); } vec4 evaluate_patch(vec2 at) { vec4 P[4]; read more..

  • Page - 358

    vectors that lie on the patch and then taking their cross product. This is passed to the fragment shader shown in Listing 8.15. #version 430 core out vec4 color; in TES_OUT { vec3 N; }fs_in; void main(void) { vec3 N = normalize(fs_in.N); vec4 c= vec4(1.0, -1.0, 0.0, 0.0) * N.z + vec4(0.0, 0.0, 0.0, 1.0); color = clamp(c, vec4(0.0), vec4(1.0)); } Listing 8.15: Cubic Bézier patch read more..

  • Page - 359

    Because the rendered patch shown in Figure 8.14 is smooth, it is hard to see the tessellation that has been applied to the shape. The left of Figure 8.15 shows a wireframe representation of the tessellated patch, and the right side of Figure 8.15 shows the patch’s control points and the control cage, which is formed by creating a grid of lines between the control read more..

  • Page - 360

    shader become the inputs to the geometry shader, and the outputs of the geometry shader are what are interpolated and fed to the fragment shader. The geometry shader can further process the output of the vertex or tessellation evaluation shader, and if it is generating new primitives (this is called amplification), it can apply different transformations to each primitive as it read more..

  • Page - 361

    triangle_strip for the output. Other primitive types, along with the layout qualifier, are covered later. For the geometry shader’s output, not only do we specify the primitive type, but the maximum number of vertices expected to be generated by the shader (through the max_vertices qualifier). This shader produces individual triangles (generated as very short triangle strips), so we read more..

  • Page - 362

    unconnected triangle strips (remember, geometry shaders can create new or amplify geometry), we could call EndPrimitive() between each one to mark their boundaries. If you don’t call EndPrimitive() somewhere in your shader, the primitive is automatically ended when the shader ends. Using Geometry Shaders in an Application Geometry shaders, like the other shader types, are created by read more..

  • Page - 363

    When tessellation is active, the mode you use in your drawing commands should always be GL_PATCHES , and OpenGL will convert the patches into points, lines, or triangles during the tessellation process. In this case, the input primitive mode of the geometry shader should match the tessellation primitive mode. The input primitive type is specified in the body of the geometry read more..

  • Page - 364

    The corresponding input to the geometry shader would be in vec4 color[]; in vec3 normal[]; Notice that both the color and normal varyings have become arrays in the geometry shader. If you have a large amount of data to pass from the vertex to the geometry shader, it can be convenient to wrap per-vertex information passed from the vertex shader to the geometry shader into an read more..

  • Page - 365

    You also need to specify the primitive type that will be generated by the geometry shader. Again, this is determined using a layout qualifier, like so: layout (primitive_type) out; This is similar to the input primitive type layout qualifier, the only difference being that you are declaring the output of the shader using the out keyword. The allowable output primitive types read more..

  • Page - 366

    EmitVertex() tells the geometry shader that you’ve finished filling in all of the information for this vertex. Setting up the vertex works much like the vertex shader. You need to write into the built-in variable gl_Position . This sets the clip-space coordinates of the vertex that is produced by the geometry shader, just like in a vertex shader. Any other attributes that you read more..

  • Page - 367

    EmitVertex() and EndPrimitive() allow you to programmatically append new vertices to your triangle or line strip and to start new strips. You can call them as many times as you want (until you reach the maximum defined by your implementation). You’re also allowed to not call them at all. This allows you to clip geometry away and discard primitives. If your geometry shader read more..

  • Page - 368

    world space. Assuming we have the model-view matrix in a uniform, simply multiply the normal by this matrix. To be more accurate, we should multiply the vector by the inverse of the transpose of the upper-left 3 × 3 submatrix of the model-view matrix. This is known as the normal matrix, and you’re free to implement this and put it in its own uniform if you like. read more..

  • Page - 369

    geometry shader is a triangle strip, our strips only contain a single triangle. Therefore, there doesn’t strictly need to be a call to EndPrimitive() .We just leave it there for completeness. Figure 8.16 shows a the result of this shader. Figure 8.16: Geometry culled from different viewpoints In Figure 8.16, the virtual viewer has been moved to different positions. As you can read more..

  • Page - 370

    from a glDrawArrays() or a glDrawElements() function call, or whether the primitive type was GL_TRIANGLES , GL_TRIANGLE_STRIP ,or GL_TRIANGLE_FAN . Unless the geometry shader outputs more than three vertices, the result is independent, unconnected triangles. In this next example, we “explode” a model by pushing all of the triangles out along their face normals. It doesn’t matter whether read more..

  • Page - 371

    Figure 8.17: Exploding a model using the geometry shader Generating Geometry in the Geometry Shader Just as you are not required to call EmitVertex() or EndPrimitive() at all if you don’t want to produce any output from the geometry shader, it is also possible to call EmitVertex() and EndPrimitive() as many times as you need to produce new geometry. That is, until you reach read more..

  • Page - 372

    #version 330 in vec4 position; void main(void) { gl_Position = position; } Listing 8.25: Pass-through vertex shader This shader only passes the vertex position to the geometry shader. If you have other attributes associated with the vertices such as texture coordinates or normals, you need to pass them through the vertex shader to the geometry shader as well. As in the previous read more..

  • Page - 373

    // Find a scaled version of their midpoints vec3 d=(a+b) * stretch; vec3 e=(b+c) * stretch; vec3 f=(c+a) * stretch; // Now, scale the original vertices by an inverse of the midpoint // scale a *= (2.0 - stretch); b *= (2.0 - stretch); c *= (2.0 - stretch); Listing 8.27: Generating new vertices in a geometry shader Because we are going to generate several triangles using read more..

  • Page - 374

    Figure 8.18: Basic tessellation using the geometry shader Note that using the geometry shader for heavy tessellation may not produce the most optimal performance. If something more complex than that shown in this example is desired, it’s best to use the hardware tessellation functions of OpenGL. However, if simple amplification of between two and four output primitives for each read more..

  • Page - 375

    For our geometry shader, in addition to the members of the gl_in structure, we need the per-vertex normal, and that will have to be passed through the vertex shader. An updated version of the pass-through vertex shader from Listing 8.25 is given in Listing 8.30. #version 330 in vec4 position; in vec3 normal; out Vertex { vec3 normal; } vertex; void main(void) { gl_Position = read more..

  • Page - 376

    // Uniform to hold the model-view-projection matrix uniform mat4 mvp; // Uniform to store the length of the visualized normals uniform float normal_length; Listing 8.31: Setting up the “normal visualizer” geometry shader Each input vertex is transformed into its final position and emitted from the geometry shader, and then a second vertex is produced by displacing the input vertex read more..

  • Page - 377

    gl_Position = mvp * tri_centroid; gs_out.normal = gs_in[0].normal; gs_out.color = gs_in[0].color; EmitVertex(); gl_Position = mvp * (tri_centroid + vec4(face_normal * normal_length, 0.0)); gs_out.normal = gs_in[0].normal; gs_out.color = gs_in[0].color; EmitVertex(); EndPrimitive(); Listing 8.33: Drawing a face normal in the geometry shader Now when we render a model, we get the image shown in Figure read more..

  • Page - 378

    major limitations when multiple output streams are used in a geometry shader; first, the output primitive mode from the geometry shader for all streams must be set to points . Second, although it’s possible to simultaneously render geometry and to store data into transform feedback buffers, only the first stream may be rendered — the others are for storage only. If your read more..

  • Page - 379

    GL_TRIANGLES_ADJACENCY , and GL_TRIANGLE_STRIP_ADJACENCY . These primitive types are really only useful when rendering with a geometry shader active. When the new adjacency primitive types are used, for each line or triangle passed into the geometry shader, it not only has access to the vertices defining that primitive, but it also has access to the vertices of the primitive that is read more..

  • Page - 380

    1 2 1 2 34 5 Figure 8.20: Lines produced using lines with adjacency primitives vertices. This means that the inputs to the geometry shader are six-element arrays. As before, you can do anything you want to the vertices using the geometry shader; GL_TRIANGLES_ADJACENCY is a good way to get arbitrary six-vertex primitives into the geometry shader. Figure 8.21 shows this. Figure 8.21: read more..

  • Page - 381

    Figure 8.22: Triangles produced using GL_TRIANGLE_STRIP_ADJACENCY Figure 8.23: Ordering of vertices for GL_TRIANGLE_STRIP_ADJACENCY Rendering Quads Using a Geometry Shader In computer graphics, the word quad is used to describe a quadrilateral – a shape with four sides. Modern graphics APIs do not support rendering quads directly, primarily because modern graphics hardware does not support read more..

  • Page - 382

    In many cases, breaking a quad into a pair of triangles works out just fine and the visual image isn’t much different than what would have been rendered had native support for quads been present. However, there are a large class of cases where breaking a quad into a pair of triangles doesn’t produce the correct result. Take a look at Figure 8.24. Figure 8.24: read more..

  • Page - 383

    Next, we need to deal with the rasterizer. Recall, the output of the geometry shader can only be points, lines, or triangles, and so the best we can do is to break each quad (represented by a lines_adjacency primitive) into a pair of triangles. You might think this leaves us in the same spot as we were before. However, we now have the advantage that we can pass read more..

  • Page - 384

    any interpolant will move smoothly between vertex A and B and between C and D with the x component of the vector. Likewise, a value along the edge AB will move smoothly to the corresponding value on edge CD . Thus, given the values of the attributes at the vertices A through D ,we can use the domain parameter to interpolate a value of each attribute at any point inside read more..

  • Page - 385

    EndPrimitive(); gl_Position = gl_in[0].gl_Position; gs_out.uv = vec2(0.0, 0.0); EmitVertex(); gl_Position = gl_in[2].gl_Position; gs_out.uv = vec2(1.0, 1.0); EmitVertex(); gl_Position = gl_in[3].gl_Position; gs_out.uv = vec2(0.0, 1.0); // Again, only write the output color for the last vertex gs_out.color[0] = gs_in[1].color; gs_out.color[1] = gs_in[0].color; gs_out.color[2] = gs_in[2].color; gs_out.color[3] = read more..

  • Page - 386

    Figure 8.26: Quad rendered using a geometry shader multiple virtual windows within a single larger framebuffer. Furthermore, OpenGL also allows you to use multiple viewports at the same time. This feature is known as viewport arrays. To use a viewport array, we first need to tell OpenGL what the bounds of the viewports we want to use are. To do this, call glViewportIndexedf() read more..

  • Page - 387

    Likewise, each viewport also has its own depth range, which can be specified by calling glDepthRangeIndexed() , whose prototype is void glDepthRangeIndexed(GLuint index, GLdouble n, GLdouble f); Again, index may be between 0 and 15. In fact, glViewport() really sets the extent of all of the viewports to the same range, and glDepthRange() sets the depth range of all viewports to the read more..

  • Page - 388

    void main(void) { for (int i=0;i<gl_in.length(); i++) { gs_out.color = gs_in[i].color; gl_Position = mvp_matrix[gl_InvocationID] * gl_in[i].gl_Position; gl_ViewportIndex = gl_InvocationID; EmitVertex(); } EndPrimitive(); } Listing 8.36: Rendering to multiple viewports in a geometry shader When the shader of Listing 8.36 executes, it produces four invocations of the shader. On each invocation, it sets read more..

  • Page - 389

    You can clearly see the four copies of the cube rendered by Listing 8.36 in Figure 8.27. Because each was rendered into its own viewport, it is clipped separately, and so where the cubes extend past the edges of their respective viewports, their corners are cut off by OpenGL’s clipping stage. Summary In this chapter, you have read about the two tessellation shader stages, read more..

  • Page - 390

    Chapter 9 Fragment Processing and the Framebuffer WHAT YOU’LL LEARN IN THIS CHAPTER • How data is passed into fragment shaders, how to control the way it’s sent there, and what to do with it once it gets there • How to create your own framebuffers and control the format of data that they store • How to produce more than just one output from a single fragment read more..

  • Page - 391

    Fragment Shaders You have already been introduced to the fragment shader stage. It is the stage in the pipeline where your shader code determines the color of each fragment before it is sent for composition into the framebuffer. The fragment shader runs once per fragment, where a fragment is a virtual element of processing that might end up contributing to the final color of read more..

  • Page - 392

    flat in vec4 foo; flat in int bar; flat in mat3 baz; You can also apply interpolation qualifiers to input blocks, which is where the smooth qualifier comes in handy. Interpolation qualifiers applied to blocks are inherited by its members — that is, they are applied automatically to all members of the block. However, it’s possible to apply a different qualifier to individual read more..

  • Page - 393

    Interpolating without Perspective Correction As you have learned, OpenGL interpolates the values of fragment shader inputs across the face of primitives, such as triangles, and presents a new value to each invocation of the fragment shader. By default, the interpolation is performed smoothly in the space of the primitive being rendered. That means that if you were to look at the read more..

  • Page - 394

    Figure 9.1: Contrasting perspective-correct and linear interpolation The top image of Figure 9.1 shows perspective-correct interpolation applied to a pair of triangles as its angle to the viewer changes. Meanwhile, the bottom image of Figure 9.1 shows how the noperspective storage qualifier has affected the interpolation of texture coordinates. As the pair of triangles moves to a more read more..

  • Page - 395

    Unlike the viewport, geometry is not clipped directly against the scissor rectangle, but rather individual fragments are tested against the rectangle as part of post-rasterization2 processing. As with viewport rectangles, OpenGL supports an array of scissor rectangles. To set them up, you can call glScissorIndexed() or glScissorIndexedv() , whose prototypes are void glScissorIndexed(GLuint index, read more..

  • Page - 396

    int scissor_width = (7 * info.windowWidth) / 16; int scissor_height = (7 * info.windowHeight) / 16; // Four rectangles - lower left first... glScissorIndexed(0, 0, 0, scissor_width, scissor_height); // Lower right... glScissorIndexed(1, info.windowWidth - scissor_width, 0, info.windowWidth - scissor_width, scissor_height); // Upper left... glScissorIndexed(2, 0, info.windowHeight - scissor_height, read more..

  • Page - 397

    also lead to errors if you leave the scissor test enabled at the end of a frame and then try to clear the framebuffer ready for the next frame. Stencil Testing The next step in the fragment pipeline is the stencil test. Think of the stencil test as cutting out a shape in cardboard and then using that cutout to spray-paint the shape on a mural. The spray paint only read more..

  • Page - 398

    buffer are compared. In pseudo-code, the operation of the stencil test is effectively implemented as GLuint current = GetCurrentStencilContent(x, y); if (compare(current & mask, ref & mask, front_facing ? front_op : back_op)) { passed = true; } else { passed = false; } Table 9.1: Stencil Functions Function Pass Condition GL_NEVER Never pass test. GL_ALWAYS Always pass test. GL_LESS Reference read more..

  • Page - 399

    drawBuffer set to 0, and value pointing to a variable containing zero. Next, a window border is drawn that may contain details such as a player’s score and statistics. Set up the stencil test to always pass with the reference value being 1 by calling glStencilFuncSeparate() . Next, tell OpenGL to replace the value in the stencil buffer only when the depth test passes by read more..

  • Page - 400

    There are also two other stencil functions: glStencilFunc() and glStencilOp() . These behave just as glStencilFuncSeparate() and glStencilOpSeparate() would if you were to set the face parameter to GL_FRONT_AND_BACK . Controlling Updates to the Stencil Buffer By clever manipulation of the stencil operation modes (setting them all to the same value, or judicious use of GL_KEEP , for example), read more..

  • Page - 401

    the depth buffer. If depth writes are also enabled and the fragment has passed the depth test, the depth buffer is updated with the depth value of the fragment. If the depth test fails, the fragment is discarded and does not pass to the following fragment operations. The input to primitive assembly is a set of vertex positions that make up primitives. Each has a z read more..

  • Page - 402

    you don’t actually need depth testing and only wish to update the depth buffer). The glDepthMask() function takes a Boolean flag that turns writes to the depth buffer on if it’s GL_TRUE and off if GL_FALSE . For example, glDepthMask(GL_FALSE); will turn writes to the depth buffer off, regardless of the result of the depth test. You can use this, for example, to draw read more..

  • Page - 403

    Depth Clamping OpenGL represents the depth of each fragment as a finite number, scaled between zero and one. A fragment with a depth of zero is intersecting the near plane (and would be jabbing you in the eye if it were real), and a fragment with a depth of one is at the farthest representable depth but not infinitely far away. To eliminate the far plane and draw read more..

  • Page - 404

    into the depth buffer. Figure 9.4 shows how this translates to a real application. Figure 9.4: A clipped object with and without depth clamping In the left image of Figure 9.4, the geometry has become so close to the viewer that it is partially clipped against the near plane. As a result, the portions of the polygons that would have been behind the near plane are simply read more..

  • Page - 405

    variable, the interpolated depth generated by OpenGL is used as the fragment’s depth value. Your fragment shader can either calculate an entirely new value for gl_FragDepth , or it can derive one from the value gl_FragCoord.z . This new value is subsequently used by OpenGL both as the reference for the depth test and as the value written to the depth buffer should the depth read more..

  • Page - 406

    would have been otherwise. In this case, results from the GL_LESS and GL_LEQUAL comparison functions remain valid. Similarly, using depth_greater indicates that your shader will only make the fragment’s depth greater than it would have been and, therefore, the result of the GL_GREATER and GL_GEQUAL tests remain valid. The final qualifier, depth_unchanged , is somewhat unique. This tells read more..

  • Page - 407

    and disabled by calling glDisable(GL_BLEND); The blending functionality of OpenGL is powerful and highly configurable. It works by multiplying the source color (the value produced by your shader) by the source factor, then multiplying the color in the framebuffer by the destination factor, and then combining the results of these multiplications using an operation that you can choose read more..

  • Page - 408

    Table 9.4: Blend Functions Blend Function RGB Alpha GL_ZERO ( 0, 0, 0) 0 GL_ONE ( 1, 1, 1) 1 GL_SRC_COLOR (Rs0, Gs0, Bs0) As0 GL_ONE_MINUS_SRC_COLOR ( 1, 1, 1)-(Rs0, Gs0, Bs0) 1 - As0 GL_DST_COLOR (Rd, Gd, Bd) Ad GL_ONE_MINUS_DST_COLOR ( 1, 1, 1)-(Rd, Gd, Bd) 1 − Ad GL_SRC_ALPHA (As0, As0, As0) As0 GL_ONE_MINUS_SRC_ALPHA ( 1, 1, 1)-(As0, As0, As0) 1 − As0 GL_DST_ALPHA (Ad, Ad, Ad) Ad read more..

  • Page - 409

    GL_SRC1_ALPHA, GL_ONE_MINUS_SRC1_ALPHA }; static const int num_blend_funcs = sizeof(blend_func) / sizeof(blend_func[0]); static const float x_scale = 20.0f / float(num_blend_funcs); static const float y_scale = 16.0f / float(num_blend_funcs); const float t=(float)currentTime; glEnable(GL_BLEND); glBlendColor(0.2f, 0.5f, 0.7f, 0.5f); for (j=0;j<num_blend_funcs; j++) { for (i=0;i<num_blend_funcs; i++) { vmath::mat4 read more..

  • Page - 410

    Dual-Source Blending You may have noticed that some of the factors in Table 9.4 use source 0 colors (Rs0, Gs0, Bs0, and As0), and others use source 1 colors (Rs1, Gs1, Bs1, and As1). Your shaders can export more than one final color for a given color buffer by setting up the outputs used in your shader by assigning them indices using the index layout qualifier. An read more..

  • Page - 411

    Table 9.5: Blend Equations Equation RGB Alpha GL_FUNC_ADD Srgb ∗ RGBs + Sa ∗ As + Drgb ∗ RGBd Da ∗ Ad GL_FUNC_SUBTRACT Srgb ∗ RGBs − Sa ∗ As − Drgb ∗ RGBd Da ∗ Ad GL_FUNC_REVERSE_ Drgb ∗ RGBd − Da ∗ Ad − SUBTRACT Srgb ∗ RGBs Sa ∗ As GL_MIN min (RGBs,RGBd) min (As,Ad) GL_MAX max (RGBs,RGBd) min (As,Ad) and destination blend factors; and Sa and Da represent the source read more..

  • Page - 412

    Table 9.6: Logic Operations Operation Result GL_CLEAR Set all values to 0 GL_AND Source & Destination GL_AND_REVERSE Source & ~Destination GL_COPY Source GL_AND_INVERTED ~Source & Destination GL_NOOP Destination GL_XOR Source ^Destination GL_OR Source | Destination GL_NOR ~(Source | Destination) GL_EQUIV ~(Source ^Destination) GL_INVERT ~Destination GL_OR_REVERSE Source | ~Destination GL_COPY_INVERTED ~Source read more..

  • Page - 413

    the color buffer. You can pass in GL_TRUE to one of these parameters to allow writes for the corresponding channel to occur, or GL_FALSE to mask these writes off. The first function, glColorMask() , allows you to mask all buffers currently enabled for rendering, while the second function, glColorMaski() , allows you to set the mask for a specific color buffer (there can be read more..

  • Page - 414

    output of your fragment shader goes into the back buffer, which is normally owned by the operating system or window system that your application is running on, and is eventually displayed to the user. Its parameters are set when you choose a format for the rendering context. As a platform-specific operation, this means that you have little control over what the underlying read more..

  • Page - 415

    To bind a framebuffer for reading only, set target to GL_READ_FRAMEBUFFER . Likewise, to bind a framebuffer just for rendering to, set target to GL_DRAW_FRAMEBUFFER . The framebuffer bound for drawing will be the destination for all of your rendering (including stencil and depth values used during their respective tests and colors read during blending). The framebuffer bound for reading read more..

  • Page - 416

    to the framebuffer, and level is the mipmap level of the texture you want to render into. Listing 9.4 shows a complete example of setting up a framebuffer object with a depth buffer and a texture to render into. // Create a framebuffer object and bind it glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); // Create a texture for our color buffer glGenTextures(1, read more..

  • Page - 417

    // Set our uniforms and draw the cube. glUniformMatrix4fv(proj_location, 1, GL_FALSE, proj_matrix); glUniformMatrix4fv(mv_location, 1, GL_FALSE, mv_matrix); glDrawArrays(GL_TRIANGLES, 0, 36); // Now return to the default framebuffer glBindFramebuffer(GL_FRAMEBUFFER, 0); // Reset our viewport to the window width and height, clear the // depth and color buffers. glViewport(0, 0, info.windowWidth, read more..

  • Page - 418

    Figure 9.6: Result of rendering into a texture Another extremely useful feature of user-defined framebuffers is that they support multiple attachments. That is, you can attach multiple textures to a single framebuffer and render into them simultaneously with a single fragment shader. Recall that to attach your texture to your FBO, you called glFramebufferTexture() and passed read more..

  • Page - 419

    glBindTexture(GL_TEXTURE_2D, color_texture[i]); glTexStorage2D(GL_TEXTURE_2D, 9, GL_RGBA8, 512, 512); // Set its default filter parameters glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); // Attach it to our framebuffer object as color attachments glFramebufferTexture(GL_FRAMEBUFFER, draw_buffers[i], color_texture[i], 0); } // Now create a depth read more..

  • Page - 420

    of layers that you can index into in a shader. It’s also possible to render into array textures by attaching them to a framebuffer object and using a geometry shader to specify which layer you want the resulting primitives to be rendered into. Listing 9.8 is taken from the gslayered sample and illustrates how to set up a framebuffer object that uses a 2D array texture read more..

  • Page - 421

    each with a different model-view matrix, into an array texture and passes a per-invocation color along to the fragment shader. #version 430 core // 16 invocations of the geometry shader, triangles in // and triangles out layout (invocations =16, triangles) in; layout (triangle_strip, max_vertices =3) out; in VS_OUT { vec4 color; vec3 normal; }gs_in[]; out GS_OUT { vec4 color; vec3 normal; read more..

  • Page - 422

    The result of running the geometry shader shown in Listing 9.9 is that we have an array texture with a different view of a model in each slice. Obviously, we can’t directly display the contents of an array texture, so we must now use our texture as the source of data in another shader. The vertex shader in Listing 9.10, along with the corresponding fragment shader in read more..

  • Page - 423

    of quads. Finally, it also produces a texture coordinate using the x and y components of the vertex along with the instance index as the third component. Because we will use this to fetch from an array texture, this third component will select the layer. The fragment shader in Listing 9.11 simply reads from the array texture using the supplied texture coordinates and sends read more..

  • Page - 424

    The glFramebufferTextureLayer() function works just like glFramebufferTexture() , except that it takes one additional parameter, layer , which specifies the layer of the texture that you wish to attach to the framebuffer. For instance, the code in Listing 9.12 creates a 2D array texture with eight layers and attaches each of the layers to the corresponding color attachment of a read more..

  • Page - 425

    you write 0 into gl_Layer in your geometry shader, rendering will go to the positive x face of the cube map. Writing 1 into gl_Layer sends output to the negative x face, writing 2 sends output to the positive y face, and so on, until eventually, writing 5 sends output to the negative z face. If you create a cube map array texture and attach it to a framebuffer object, read more..

  • Page - 426

    There are two categories of completeness: attachment completeness and whole framebuffer completeness. Attachment Completeness Each attachment point of an FBO must meet certain criteria to be considered complete. If any attachment point is incomplete, the whole framebuffer will also be incomplete. Some of the cases that cause an attachment to be incomplete are • No image is associated read more..

  • Page - 427

    Many of these return values are helpful when debugging an application but are less useful after an application has shipped. Nonetheless, the first sample application checks to make sure none of these conditions occurred. It pays to do this check in applications that use FBOs, making sure your use case hasn’t hit some implementation-dependent limitation. An example of how this read more..

  • Page - 428

    case GL_FRAMEBUFFER_UNSUPPORTED: // Reconsider formats used for attached buffers break; case GL_FRAMEBUFFER_INCOMPLETE_MULTISAMPLE: // Make sure the number of samples for each // attachment is the same break; case GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS: // Make sure the number of layers for each // attachment is the same break; } } Listing 9.13: Checking completeness of a framebuffer object If you read more..

  • Page - 429

    doesn’t really care about how the image is displayed, only that you wish to render two views of the scene — one for the left eye and one for the right. To display images in stereo requires some cooperation from the windowing or operating system, and therefore the mechanism to create a stereo display is platform specific. The gory details of this are covered for a read more..

  • Page - 430

    The simplest form of stereo view matrix pairs simply translates the left and right views away from each other on the horizontal axis. Optionally, you can also rotate the view matrices inwards towards the center of view. Alternatively, you can use the vmath::lookat function to generate your view matrices for you. Simply place your eye at the left eye location (slightly left of read more..

  • Page - 431

    Clearly, the code in Listing 9.15 renders the entire scene twice. Depending on the complexity of your scene, that could be very, very expensive — literally doubling the cost of rendering the scene. One possible tactic is to switch between the GL_BACK_LEFT and GL_BACK_RIGHT draw buffers between each and every object in your scene. This can mean that updates to state (such as read more..

  • Page - 432

    mat4 model_matrix; mat4 view_matrix[2]; mat4 projection_matrix; }; in VS_OUT { vec4 color; vec3 normal; vec2 texture_coord; }gs_in[]; out GS_OUT { vec4 color; vec3 normal; vec2 texture_coord; }gs_out; void main(void) { // Calculate a model-view matrix for the current eye mat4 model_view_matrix = view_matrix[gl_InvocationID] * model_matrix; for (int i=0;i<gl_in.length(); i++) { // Output layer is invocation read more..

  • Page - 433

    void main(void) { color_left = texture(back_buffer, vec3(tex_coord, 0.0)); color_right = texture(back_buffer, vec3(tex_coord, 1.0)); } Listing 9.17: Copying from an array texture to a stereo back buffer A photograph running this application is shown in Figure 9.8. A photograph is necessary here as a screenshot would not show both of the images in the stereo pair. However, the double image read more..

  • Page - 434

    There are two main approaches to deal with aliasing. The first is filtering, which removes high-frequency content from the signal before or during sampling. The second is increasing the sampling rate, which allows the higher frequency content to be recorded. The additional samples captured can then be processed for storage or reproduction. Methods for reducing or eliminating aliasing read more..

  • Page - 435

    and blending are enabled, but the scene is otherwise unchanged. Notice how the lines appear much smoother and the jagged edges are much reduced. Zooming into the inset, we see that the lines have been blurred slightly. This is the effect of filtering that is produced by calculating the coverage of the lines and using it to blend them with the background color. The code read more..

  • Page - 436

    half white and black, producing a mid-gray pixel. Next, our second, adjacent triangle comes along and covers the other half of the pixel. Again, OpenGL figures that half the pixel is covered by the new triangle and mixes the white of the triangle with the existing framebuffer content... except now the framebuffer is 50% gray! Mixing white and 50% gray produces 75% gray, read more..

  • Page - 437

    the default framebuffer when you set up your rendering window. In the sample programs included with this book, the application framework takes care of this for you. To enable multi-sampling with the sb6::application framework, simply override the sb6::application::init() function, call the base class method, and then set the samples member of the info structure to the desired sample count. read more..

  • Page - 438

    and of course, you can turn it back on again by calling glEnable(GL_MULTISAMPLE); When multi-sampling is disabled, OpenGL proceeds as if the framebuffer were a normal single-sample framebuffer and samples each fragment once. The only difference being that the shading results are written to every sample in the pixel. Multi-sample Textures You have already learned about how to render read more..

  • Page - 439

    object being rendered in exactly the same way regardless of where it is in the framebuffer. Once you have allocated storage for your texture, you can attach it to a framebuffer with glFramebufferTexture() as normal. An example of creating a depth and a color multi-sample texture is shown in Listing 9.20. GLuint color_ms_tex; GLuint depth_ms_tex; glGenTextures(1, &color_ms_tex); read more..

  • Page - 440

    samples contributing to a pixel to produce its final color. However, if you render into a multi-sample texture and then draw a full-screen quad using a fragment shader that samples from that texture and combines its samples with code you supply, then you can implement any algorithm you wish. The example shown in Listing 9.21 demonstrates taking the brightest sample of those read more..

  • Page - 441

    of 40%, then it will produce an output sample mask of 40% × 66%, which is roughly 25%. Thus, for an 8-sample MSAA buffer, two of that pixel’s samples would be written to. Because the alpha value was already used to decide how many subsamples should be written, it wouldn’t make sense to then blend those subsamples with the same alpha value. To help prevent these read more..

  • Page - 442

    coverage information generated by OpenGL during rasterization. The second variable is an output that you can write to in the shader to update coverage. Each bit of each element of the arrays corresponds to a single sample (starting from the least significant bit). If the OpenGL implementation supports more than 32 samples in a single framebuffer, then the first element of the read more..

  • Page - 443

    { float val = abs(fs_in.tc.x + fs_in.tc.y) * 20.0f; color = vec4(fract(val) >= 0.5 ? 1.0 : 0.25); } Listing 9.22: Fragment shader producing high-frequency output This extremely simple shader produces stripes with hard edges (which produce a high-frequency signal). For any given invocation of the shader, the output will either be bright white or dark gray, depending on the incoming read more..

  • Page - 444

    For example, if you want OpenGL to run your shader for at least half of the samples in the framebuffer, set the value parameter set to 0.5f .To uniquely shade every sample hit by the geometry, set value to 1.0f .As you can see from the right image of Figure 9.13, the jaggies on the interior of the cube have been eliminated. We set the minimum sampling fraction to 1.0 read more..

  • Page - 445

    Figure 9.14: Partially covered multi-sampled pixels Take a look at the left of Figure 9.14. It shows the edge of a triangle passing through several pixels. The solid dots represent samples that are covered by the triangle, and the clear dots represent those that are not. OpenGL has chosen to interpolate the fragment shader inputs to the sample closest to the pixel’s center. read more..

  • Page - 446

    Now look at the right side of Figure 9.14. OpenGL has still chosen to interpolate the fragment shader inputs to the samples closest to the pixel centers for fully covered pixels. However, for those pixels that are partially covered, it has instead chosen another sample that lies within the triangle (marked with larger arrows). This means that the inputs presented to the read more..

  • Page - 447

    storage qualifier was not used. You can use this knowledge to your advantage. To extract edge information from this, declare two inputs to your fragment shader, one with and one without the centroid storage qualifier, and assign the same value to each of them in the vertex shader. It doesn’t matter what the values are, so long as they are different for each vertex. The read more..

  • Page - 448

    ones wherever there was an edge in the scene and zeros wherever there was no edge. Later, you can render a full-screen quad with an expensive fragment shader that only runs for pixels that represent the edges of geometry where a sample would have been chosen that was outside the triangle by enabling the stencil test, setting the stencil function to GL_EQUAL , and leaving read more..

  • Page - 449

    Normally, when a framebuffer object has one or more attachments, it derives its maximum width and height, layer count, and sample count from those attachments. These properties define the size to which the viewport will be clamped and so on. When a framebuffer object has no attachments, limits imposed by the amount of memory available for textures, for example, are removed. read more..

  • Page - 450

    // Generate a framebuffer name and bind it. Gluint fbo; glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); // Set the default width and height to 10000 glFramebufferParameteri(GL_FRAMEBUFFER_DEFAULT_WIDTH, 10000); glFramebufferParameteri(GL_FRAMEBUFFER_DEFAULT_HEIGHT, 10000); Listing 9.23: A 100-megapixel virtual framebuffer If you render with the framebuffer object created in Listing 9.23 read more..

  • Page - 451

    monitors or displays6 that can understand and display floating-point data, you are still limited by the final output device. That doesn’t mean floating-point rendering isn’t useful though. Quite the contrary! You can still render to textures in full floating-point precision. Not only that, but you have complete control over how floating-point data gets mapped to a fixed output read more..

  • Page - 452

    As you can see, there are 16- and 32-bit floating-point formats with one, two, three, and four channels. There is also a special format, GL_R11F_G11F_B10F , that contains two 11-bit floating-point components and one 10-bit component, packed together in a single 32-bit word. These are special, unsigned floating-point formats7 with a 5-bit exponent and a 6-bit mantissa in the 11-bit read more..

  • Page - 453

    images are also shown in Color Plate 2). The top left image is rendered at a very low exposure and shows all of the detail of lights even though they are very bright. The top right image increases the exposure such that you start to see details in the ribbon. On the bottom left, the exposure is increased to the level that you can see details in the pine cones, read more..

  • Page - 454

    The first sample program, hdrtonemap , uses three approaches to map the high-definition output to the low-definition screen. The first method, enabled by pressing the 1 key, is a simple and naïve direct texturing of the floating-point image to the screen. The histogram of the HDR image in Figure 9.15 is shown in Figure 9.16. From the graph, it is clear while that most read more..

  • Page - 455

    Figure 9.17: Naïve tone mapping by clamping #version 430 core layout (binding = 0) uniform sampler2D hdr_image; uniform float exposure = 1.0; out vec4 color; void main(void) { vec4 c = texelFetch(hdr_image, ivec2(gl_FragCoord.xy), 0); c.rgb = vec3(1.0) - exp(-c.rgb * exposure); color = c; } Listing 9.24: Applying simple exposure coefficient to an HDR image In the sample application, you read more..

  • Page - 456

    current texel. All of the surrounding samples are then converted to luminance values, which are then weighted and added together. The sample program uses a non-linear function to convert the luminance to an exposure. In this example, the default curve is defined by the function y = 8.0(x +0.25) The shape of the curve is shown in Figure 9.18. 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 read more..

  • Page - 457

    for (i=0;i<25;i++) { vec2 tc = (2.0 * gl_FragCoord.xy + 3.5 * vec2(i%5-2,i/5-2)); vec3 col = texture(hdr_image, tc * tex_scale).rgb; lum[i] = dot(col, vec3(0.3, 0.59, 0.11)); } // Calculate weighted color of region vec3 vColor = texelFetch(hdr_image, 2 * ivec2(gl_FragCoord.xy), 0).rgb; float kernelLuminance = ( (1.0 * (lum[0] + lum[4] + lum[20] + lum[24])) + (4.0 * (lum[1] + lum[3] + read more..

  • Page - 458

    Figure 9.19: Result of adaptive tone mapping program Making Your Scene Bloom One of the effects that works very well with high dynamic range images is the bloom effect. Have you ever noticed how the sun or a bright light can sometimes engulf tree branches or other objects between you and the light source? That’s called light bloom. Figure 9.20 shows how light bloom can read more..

  • Page - 459

    Notice how you can see all the detail in the lower exposure of the left side of Figure 9.20. The right side is a much higher exposure, and the grid in the stained glass is covered by the light bloom. Even the wooden post on the bottom right looks smaller as it gets covered by bloom. By adding bloom to a scene you can enhance the sense of brightness in certain read more..

  • Page - 460

    layout (binding =1, std140) uniform MATERIAL_BLOCK { material_t material[32]; } materials; void main(void) { // Normalize the incoming N, L, and V vectors vec3 N = normalize(fs_in.N); vec3 L = normalize(fs_in.L); vec3 V = normalize(fs_in.V); // Calculate R locally vec3 R = reflect(-L, N); material_t m = materials.material[fs_in.material_index]; // Compute the diffuse and specular components for read more..

  • Page - 461

    Figure 9.21: Original and thresholded output for bloom example each dimension, sampling from the 25 samples around the center of the filter and multiplying each texel by a fixed set of weights. To apply a separable filter, we make two passes. In the first pass, we filter in the horizontal dimension. However, you may notice that we use gl_FragCoord.yx to determine the center read more..

  • Page - 462

    0.0043538453346397, 0.0024499299678342); void main(void) { vec4 c= vec4(0.0); ivec2 P= ivec2(gl_FragCoord.yx) - ivec2(0, weights.length() >> 1); int i; for (i = 0; i < weights.length(); i++) { c += texelFetch(hdr_image, P + ivec2(0, i), 0) * weights[i]; } color = c; } Listing 9.27: Blur fragment shader The result of applying blur to the thresholded image shown on the right of read more..

  • Page - 463

    The exposure shader shown in Listing 9.28 is used to draw a screen-sized textured quad to the window. That’s it! Dial up and down the bloom effect to your heart’s content. Figure 9.23 shows the hdrbloom sample program with a high bloom level. #version 430 core layout (binding =0) uniform sampler2D hdr_image; layout (binding =1) uniform sampler2D bloom_image; uniform float exposure read more..

  • Page - 464

    Integer Framebuffers By default, the window system will provide your application with a fixed-point back buffer. When you declare a floating-point output from your fragment shader (such as a vec4 ), OpenGL will convert the data you write into it into a fixed-point representation suitable for storage in that framebuffer. In the previous section we covered floating-point framebuffer read more..

  • Page - 465

    GL_FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE . The value returned in params will be GL_FLOAT , GL_INT , GL_UNSIGNED_INT , GL_SIGNED_NORMALIZED ,or GL_UNSIGNED_NORMALIZED depending on the internal format of the color attachments. There is no requirement that the attachments to a framebuffer object all be of the same type. This means that you can have a combination of attachments, some of which are read more..

  • Page - 466

    To make matters worse, γ didn’t always take the same value. For NTSC systems (the television standard used in North America, much of South America, and parts of Asia), γ was about 2.2. However, with SECAM and PAL systems (the standards used in Europe, Australia, Africa, and other parts of Asia) used a γ value of 2.8. That means that if you put a voltage of half the read more..

  • Page - 467

    that some implementations will do this. Figure 9.24 shows the transfer functions of linear to sRGB and sRGB back to linear on the left, and a pair of simple power curves using the powers 2.2 and 0.45454 on the right. You should notice that the shapes of these curves are so close as to be almost indistinguishable. 0.2 0.4 0.6 0.8 y=x^0.454545... y=x^2.2 0.2 0.4 0.6 0.8 0.1 0.2 read more..

  • Page - 468

    Point Sprites The term point sprites is usually used to refer to textured points. OpenGL represents each point by a single vertex, and so there is no opportunity to specify texture coordinates that can be interpolated as there is with the other primitive types. To get around this, OpenGL will generate an interpolated texture coordinate for you with which you can do anything read more..

  • Page - 469

    Texturing Points Point sprites are easy to use. On the application side, the only thing you have to do is simply bind a 2D texture and read from it in your fragment shader using a built-in variable called gl_PointCoord , which is a two-component vector that interpolates the texture coordinates across the point. Listing 9.30 shows the fragment shader for the PointSprites example read more..

  • Page - 470

    2D texture so far. Points can also be mipmapped, and because they can range from very small to very large, it’s probably a good idea to do so. Figure 9.26: The star texture map We are not going to cover all of the details of setting up the star field effect, as it’s pretty routine and you can check the source yourself if you want to see how we pick random read more..

  • Page - 471

    We are going to use additive blending to blend our stars with the background. Because the dark area of our texture is black (zero in color space), we can get away with just adding the colors together as we draw. Transparency with alpha would require that we depth-sort our stars, and that is an expense we certainly can do without. After turning on point size program read more..

  • Page - 472

    way, they fade into view instead of just popping up near the far clipping plane. The star color is passed to the fragment shader shown in Listing 9.32, which simply fetches from our star texture and multiplies the result by the computed star color. #version 430 core layout (location = 0) out vec4 color; uniform sampler2D tex_star; flat in vec4 starColor; void main(void) { color read more..

  • Page - 473

    point sprite. On the left, we see the origin on the upper left of the point sprite, and on the right, we see the origin as the lower left. Figure 9.28: Two potential orientations of textures on a point sprite The default orientation for point sprites is GL_UPPER_LEFT . Setting the GL_POINT_SPRITE_COORD_ORIGIN parameter to GL_LOWER_LEFT places the origin of the texture coordinate read more..

  • Page - 474

    Or perhaps an interesting flower shape: vec2 temp = gl_PointCoord * 2.0 - vec2(1.0); if (dot(temp, temp) > sin(atan(temp.y, temp.x) * 5.0)) discard; These are simple code snippets that allow arbitrary shaped points to be rendered. Figure 9.29 shows a few more examples of interesting shapes that can be generated this way. Figure 9.29: Analytically generated point sprite shapes To read more..

  • Page - 475

    if (dot(p, p) > 1.0) discard; } else if (shape == 1) { // Hollow circle if (abs(0.8 - dot(p, p)) > 0.2) discard; } else if (shape == 2) { // Flower shape if (dot(p, p) > sin(atan(p.y, p.x) * 5.0)) discard; } else if (shape == 3) { // Bowtie if (abs(p.x) < abs(p.y)) discard; } } Listing 9.33: Fragment shader for generating shaped points The advantage of calculating read more..

  • Page - 476

    #version 430 uniform sampler2D sprite_texture; in float angle; out vec4 color; void main(void) { const float sin_theta = sin(angle); const float cos_theta = cos(angle); const mat2 rotation_matrix = mat2(cos_theta, sin_theta, -sin_theta, cos_theta); const vec2 pt=gl_PointCoord - vec2(0.5); color = texture(sprite_texture, rotation_matrix * pt + vec2(0.5)); } Listing 9.34: Naïve rotated point sprite read more..

  • Page - 477

    flat in float cos_theta; out vec4 color; void main(void) { mat2 m= mat2(cos_theta, sin_theta, -sin_theta, cos_theta); const vec2 pt = gl_PointCoord - vec2(0.5); color = texture(sprite_texture, rotation_matrix * pt + vec2(0.5)); } Listing 9.36: Rotated point sprite fragment shader As you can see, the potentially expensive sin and cos functions have been moved out of the fragment shader and read more..

  • Page - 478

    Reading from a Framebuffer To allow you to read pixel data from the framebuffer, OpenGL includes the glReadPixels() function, whose prototype is void glReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid * data); The glReadPixels() function will read the data from a region of the framebuffer currently bound to the GL_READ_FRAMEBUFFER target, or from read more..

  • Page - 479

    attachments, you need to specify which attachment you want to read from, and so you must call glReadBuffer() if you are using your own framebuffer object. When you call glReadPixels() with the format parameter set to GL_DEPTH_COMPONENT , the data read will come from the depth buffer. Likewise, if format is GL_STENCIL_INDEX , then the data comes from the stencil buffer. The special read more..

  • Page - 480

    #pragma pack (push, 1) struct { unsigned char identsize; // Size of following ID field unsigned char cmaptype; // Color map type 0 = none unsigned char imagetype; // Image type 2 = rgb short cmapstart; // First entry in palette short cmapsize; // Number of entries in palette unsigned char cmapbpp; // Number of bits per palette entry short xorigin; // X origin short yorigin; // Y read more..

  • Page - 481

    to direct, efficient bit-level data/memory copies. There are many theories of the origin of this term, but the most likely candidates are Bit-Level-Image-Transfer or Block-Transfer. Whatever the etymology of blit may be, the action is the same. Performing these copies is simple; the function looks like this: void glBlitFramebuffer(GLint srcX0, Glint srcY0, GLint srcX1, Glint srcY1, GLint read more..

  • Page - 482

    Assume the width and height of the attachments of the FBO bound in the preceding code is 800 and 600. This code creates a copy of the whole of the first color attachment of readFBO , scales it down to 80% of the total size, and places it in the upper-left corner of the first color attachment of drawFBO . Copying Data into a Texture As you read in the last read more..

  • Page - 483

    void glCopyImageSubData(GLuint srcName, GLenum srcTarget, GLint srcLevel, GLint srcX, GLint srcY, GLint srcZ, GLuint dstName, GLenum dstTarget, GLint dstLevel, GLint dstX, GLint dstY, GLint dstZ, GLsizei srcWidth, GLsizei srcHeight, GLsizei srcDepth); Unlike many of the other functions in OpenGL, this function operates directly on the texture objects you specify by name, rather than on objects bound read more..

  • Page - 484

    The glGetTexImage() function works similarly to glReadPixels() , except that it does not allow a small region of a texture level to be read — instead, it only allows the entire level to be retrieved in one go. The format and type parameters have the same meanings as in glReadPixels() , and the img parameter is equivalent to the data parameter to glReadPixels() , including its read more..

  • Page - 485

    Finally, we covered ways to get at the data you have rendered. Putting data into textures falls out naturally from attaching them to framebuffers and rendering directly to them. However, we also showed how you can copy data from a framebuffer into a texture, from framebuffer to framebuffer, from texture to texture, and from the framebuffer to your application’s own memory or read more..

  • Page - 486

    Chapter 10 Compute Shaders WHAT YOU’LL LEARN IN THIS CHAPTER • How to create, compile, and dispatch compute shaders • How to pass data between compute shader invocations • How to synchronize compute shaders and keep their work in order Compute shaders are a way to take advantage of the enormous computational power of graphics processors that implement OpenGL. Just like all read more..

  • Page - 487

    Using Compute Shaders Modern graphics processors are extremely powerful devices capable of performing a huge amount of numeric calculation. You were briefly introduced to the idea of using compute shaders for non-graphics work back in Chapter 3, but there we only really skimmed the surface. In fact, the compute shader stage is effectively its own pipeline, somewhat disconnected from read more..

  • Page - 488

    "void main(void) \n" "{ \n" " // Do nothing \n" "} \n" }; // Create a shader, attach source, and compile. compute_shader = glCreateShader(GL_COMPUTE_SHADER); glShaderSource(compute_shader, 1, compute_source, NULL); glCompileShader(compute_shader); // Create a program, attach shader, link. compute_program = glCreateProgram(); glAttachShader(compute_program, compute_shader); read more..

  • Page - 489

    However, we need to understand how these parameters are interpreted in order to use them effectively. Global and Local Work Groups Compute shaders execute in what are called work groups. A single call to glDispatchCompute() or glDispatchComputeIndirect() will cause a single global work group1 to be sent to OpenGL for processing. That global work group will then be subdivided into a read more..

  • Page - 490

    layout (local_size_x = 512) in; will create a 1D local work group of 512 ( × 1 × 1) items and layout (local_size_x=64, local_size_y=64) in; will create a 2D local work group of 64 × 64 ( × 1) items. The local work group size is used when you link the program to determine the size and dimensions of the work groups executed by the program. You can find the local read more..

  • Page - 491

    than the local work group size in the corresponding dimension (x, y ,or z ). The local work group size is stored in the gl_WorkGroupSize variable, which is also implicitly declared as a uvec3 type. Again, even if you only declared your local work group size to be 1D or 2D, the work group will still essentially be 3D, but with the size of the unused dimensions set to read more..

  • Page - 492

    gl_LocalInvocationID.x gl_LocalInvocationID.y gl_WorkGroupID.z Figure 10.1: Global and local compute work group dimensions The values stored in these variables allow your shader to know where it is in the local and global work groups and can then be used as indices into arrays of data, texture coordinates, random seeds, or for any other purpose. Now we come to outputs. We started this read more..

  • Page - 493

    #version 430 core layout (local_size_x=32, local_size_y=32) in; layout (binding =0, rgba32f) uniform image2D img_input; layout (binding =1) uniform image2D img_output; void main(void) { vec4 texel; ivec2 p= ivec2(gl_GlobalInvocationID.xy); texel = imageLoad(img_input, p); texel = vec4(1.0) - texel; imageStore(img_output, p, texel); } Listing 10.2: Compute shader image inversion In order to execute this read more..

  • Page - 494

    single patch, tessellation control shaders can write to variables qualified with the patch storage qualifier and, if they are synchronized correctly, read the values that other invocations in the same patch wrote to them. As such, this allows a limited form of communication between the tessellation control shader invocations in a single patch. However, this comes with substantial read more..

  • Page - 495

    that a chunk of invocations is completed before any more chunks from the same local work group begin, but more than likely there will be many “live” chunks present on the processor at any given time. Because these chunks can effectively run out of order but are allowed to communicate, we need a way to ensure that messages received by a recipient are the most recent read more..

  • Page - 496

    #version 430 core layout (local_size_x=1024) in; layout (binding =0, r32ui) uniform uimageBuffer image_in; layout (binding =1) uniform uimageBuffer image_out; shared uint temp_storage[1024]; void main(void) { // Load from the input image uint n = imageLoad(image_in, gl_LocalInvocationID.x).x; // Store into shared storage temp_storage[gl_LocalInvocationID.x] = n; // Uncomment this to avoid the race read more..

  • Page - 497

    Figure 10.2: Effect of race conditions in a compute shader This is known as a race condition. The shader invocations race each other to the same point in the shader, and some invocations will read from the temp_storage shared variable before others have written their data into it. The result is that they pick up stale data that then gets written into the output buffer read more..

  • Page - 498

    Figure 10.3: Effect of barrier() on race conditions and then C and D both store their results to the image. Finally, invocations A and B read from the shared storage and write their results out to memory. As you can see, no invocation tried to read data that hasn’t been written yet. The presence of the barrier() functions affected the scheduling of the invocations with read more..

  • Page - 499

    Compute Shader Parallel Prefix Sum A prefix sum operation is an algorithm that, given an array of input values, computes a new array where each element of the output array is the sum of all of the values of the input array up to (and optionally including) the current array element. A prefix sum operation that includes the current element is known as an inclusive read more..

  • Page - 500

    You should appreciate that as the number of elements in the input and output arrays grows, the number of addition operations grows too and can become quite large. Also, as the result written to each element of the output array is the sum of all elements before it (and therefore dependent on all of them), it would seem at first glance that this type of algorithm does read more..

  • Page - 501

    Figure 10.5: Breaking a prefix sum into smaller chunks The recursive nature of this algorithm is apparent in Figure 10.5. The number of additions required by this method is actually more than the sequential algorithm for prefix sum calculation would require. In this example, we would require 15 additions to compute the prefix sum with a sequential algorithm, whereas here we read more..

  • Page - 502

    #version 430 core layout (local_size_x=1024) in; layout (binding =0) coherent buffer block1 { float input_data[gl_WorkGroupSize.x]; }; layout (binding =1) coherent buffer block2 { float output_data[gl_WorkGroupSize.x]; }; shared float shared_data[gl_WorkGroupSize.x * 2]; void main(void) { uint id=gl_LocalInvocationID.x; uint rd_id; uint wr_id; uint mask; // The number of steps is the log base 2 of the // read more..

  • Page - 503

    The shader shown in Listing 10.6 has a local workgroup size of 1024, which means it will process arrays of 2048 elements, as each invocation computes two elements of the output array. The shared variable shared_data is used to store the data that is in flight, and at the start of execution, the shader loads two adjacent elements from the input arrays into the array. Next, read more..

  • Page - 504

    columns of the intermediate image, producing an output containing the 2D prefix sum of the original image, shown in Figure 10.6 (c). Such an image is called a summed area table, and is an extremely important data structure with many applications in computer graphics. We can modify our shader of Listing 10.6 to compute the prefix sums of the rows of an image variable read more..

  • Page - 505

    memoryBarrierShared(); } imageStore(output_image, P.yx, vec4(shared_data[id * 2])); imageStore(output_image, P.yx + ivec2(0, 1), vec4(shared_data[id * 2 + 1])); } Listing 10.7: Compute shader to generate a 2D prefix sum Each local work group of the shader in Listing 10.7 is still one dimensional. However, when we launch the shader for the first pass, we create a one-dimensional global work read more..

  • Page - 506

    Now, the number of pixels contained in any given rectangle of the summed area table is simply the rectangle’s area. Given this, we know that if we take the sum of all the elements contained with the rectangle and divide this through by its area, we will be left with the average value of the elements inside the rectangle. Averaging a number of values together is a read more..

  • Page - 507

    An example of this is seen in the photograph4 shown in Figure 10.9. The glass closest to the camera is in sharp focus. However, as the row of glasses progresses from front to back, they become successively less well defined. The basket of oranges in the background is quite out of focus. The true blur of an image due to out of focus lenses is caused by a number read more..

  • Page - 508

    current pixel and uses it to build a filter width (m), reading data from the summed area table to produce blurry pixels. #version 430 core layout (binding =0) uniform sampler2D input_image; layout (location = 0) out vec4 color; uniform float focal_distance = 50.0; uniform float focal_depth = 30.0; void main(void) { // s will be used to scale our texture coordinates before // read more..

  • Page - 509

    vec3 b = textureLod(input_image, P1, 0).rgb; vec3 c = textureLod(input_image, P2, 0).rgb; vec3 d = textureLod(input_image, P3, 0).rgb; // Calculate the sum of all pixels inside the kernel. vec3 f=a-b-c+d; // Scale radius -> diameter. m *=2; // Divide through by area f/= float(m * m); // Output final color color = vec4(f, 1.0); } Listing 10.8: Depth of field using summed area read more..

  • Page - 510

    Figure 10.10: Applying depth of field to an image Figure 10.11: Effects achievable with depth of field the magnitude of the data gets higher, summed area tables can suffer from precision loss. As the values of all of the pixels in the image are summed together, the values stored in the summed area tables can become very large. Then, as the output image is reconstructed, read more..

  • Page - 511

    • Pre-bias our rendered image by −0.5, which keeps the summed area table values closer to zero even for larger images, thereby improving precision. Compute Shader Flocking The following example uses a compute shader to implement a flocking algorithm. Flocking algorithms show emergent behavior within a large group by updating the properties of individual members independently of all read more..

  • Page - 512

    Figure 10.12: Stages in the iterative flocking algorithm On the top left, we perform the update for an even frame. The first buffer containing position and velocity is bound as a shader storage buffer that can be read by the compute shader, and the second buffer is bound such that it can be written by the compute shader. Next we render, on the top right of Figure read more..

  • Page - 513

    The code to set all that up is shown in Listing 10.9. It isn’t particularly complex, but there is a fair amount of repetition, making it long. The listing contains the bulk of the initialization. glGenBuffers(2, flock_buffer); glBindBuffer(GL_SHADER_STORAGE_BUFFER, flock_buffer[0]); glBufferData(GL_SHADER_STORAGE_BUFFER, FLOCK_SIZE * sizeof(flock_member), NULL, GL_DYNAMIC_COPY); read more..

  • Page - 514

    velocities of the flock members. The position of the goal is updated, the storage buffers are bound to the first and second GL_SHADER_STORAGE_BUFFER binding points for reading and writing, and then the compute shader is dispatched. Next, the window is cleared, the rendering program is activated, and we update our transform matrices, bind our VAO, and draw. The number of instances read more..

  • Page - 515

    direction to travel in. Each rule considers the current properties of the flock member and the properties of the other members of the flock as perceived by the individual being updated. Most of the rules require access to the other member’s position and velocity data, so update_program uses a shader storage buffer containing that information. Listing 10.11 shows the start of read more..

  • Page - 516

    • Members of the flock try to reach a common goal. • Members try to keep with the rest of the flock. They will fly toward the center of the flock. The first two rules are the intra-member rules. That is, the effect of each of the members on each other is considered individually. Listing 10.12 contains the shader code for the first rule. If we’re closer to read more..

  • Page - 517

    workgroup (the size of which we have defined as 256 elements). Because every member of the flock needs to interact in some way with every other member of the flock, this algorithm is considered an O (N2) algorithm. This means that each of the N flock members will read all of the other N members’ positions and velocities, and that each of the N members’ positions and read more..

  • Page - 518

    acceleration += rule2(me.position, me.velocity, them.position, them.velocity) * rule2_weight; } } barrier(); } flock_center /= float(gl_NumWorkGroups.x * gl_WorkGroupSize.x); new_me.position = me.position + me.velocity * timestep; acceleration += normalize(goal - me.position) * rule3_weight; acceleration += normalize(flock_center - me.position) * rule4_weight; new_me.velocity = me.velocity + acceleration * timestep; if read more..

  • Page - 519

    In this shader, position and normal are regular inputs from our geometry buffer, which in this example contains a simple model of a paper airplane. The bird_position and bird_velocity inputs will be the instanced attributes, provided by the compute shader and whose instance divisor is set with the glVertexAttribDivisor() function. The body of our shader (given in Listing 10.16) uses the read more..

  • Page - 520

    Figure 10.13: Output of compute shader flocking program A possible enhancement that could be made to this program is to calculate the lookat matrix in the compute shader. Here, we calculate it in the vertex shader and therefore redundantly calculate it for every vertex. It doesn’t matter so much in this example because our mesh is small, but if our instanced mesh were read more..

  • Page - 521

    image processing, which is an obvious fit for computer graphics. Next, we showed you how you might use compute shaders for physical simulation when we implemented the flocking algorithm. This should have allowed you to imagine some of the possibilities for the use of compute shaders in your own applications — from artificial intelligence, pre- and post-processing, or even audio read more..

  • Page - 522

    Chapter 11 Controlling and Monitoring the Pipeline WHAT YOU’LL LEARN IN THIS CHAPTER • How to ask OpenGL about the progress of your commands down the graphics pipeline • How to measure the time taken for your commands to execute • How to synchronize your application with OpenGL and how to synchronize multiple OpenGL contexts with each other This chapter is about the read more..

  • Page - 523

    Queries Queries are a mechanism to ask OpenGL what’s happening in the graphics pipeline. There’s plenty of information that OpenGL can tell you; you just need to know what to ask — and how to ask the question. Remember way back to your early days in school. The teacher wanted you to raise your hand before asking a question. This was almost like reserving your place read more..

  • Page - 524

    more for the application later. To return the resources to OpenGL, call glDeleteQueries() : void glDeleteQueries(GLsizei n, const GLuint *ids); This works similarly to glGenQueries() — it takes the number of query objects to delete and the address of a variable or array holding their names: glDeleteQueries(10, ten_queries); glDeleteQueries(1, &one_query); After the queries are deleted, they read more..

  • Page - 525

    rendered since you told it to start counting, you tell it to stop by calling glEndQuery() : glEndQuery(GL_SAMPLES_PASSED); This tells OpenGL to stop counting samples that have passed the depth test and made it through the fragment shader without being discarded. All the samples generated by all the drawing commands between the call to glBeginQuery() and glEndQuery() are added up. read more..

  • Page - 526

    If the result of the query object is not immediately available and trying to retrieve it would cause your application to have to wait for OpenGL to finish what it is working on, the result becomes GL_FALSE . If OpenGL is ready and has your answer, the result becomes GL_TRUE . This tells you that retrieving the result from OpenGL will not cause any delays. Now you can read more..

  • Page - 527

    glBeginQuery(GL_SAMPLES_PASSED, the_query); RenderSimplifiedObject(object); glEndQuery(GL_SAMPLES_PASSED); glGetQueryObjectuiv(the_query, GL_QUERY_RESULT, &the_result); if (the_result != 0) RenderRealObject(object); Listing 11.1: Getting the result from a query object RenderSimplifiedObject is a function that renders the low-fidelity version of the object, and RenderRealObject renders the object with all of its read more..

  • Page - 528

    is another way for the application to avoid having to wait for OpenGL. OpenGL can only count and add up results into one query object at a time, but it can manage several query objects and perform many queries back-to-back. We can expand our example to render multiple objects with multiple occlusion queries. If we had an array of ten objects to render, each with a read more..

  • Page - 529

    int n; for (n=0;n<10; n++) { glBeginQuery(GL_SAMPLES_PASSSED, ten_queries[n]); RenderSimplifiedObject(&object[n]); glEndQuery(GL_SAMPLES_PASSED); } for (n=0;n<10; n+) { glGetQueryObjectuiv(ten_queries[n], GL_QUERY_RESULT_AVAILABLE, &the_result); if (the_result != 0) glGetQueryObjectuiv(ten_queries[n], GL_QUERY_RESULT, &the_result); else the_result = 1; if (the_result != 0) RenderRealObject(&object[n]); } Listing 11.4: read more..

  • Page - 530

    object says it should. This is called predication, and fortunately, it is possible through a technique called conditional rendering. Conditional rendering allows you to wrap up a sequence of OpenGL drawing commands and send them to OpenGL along with a query object and a message that says “ignore all of this if the result stored in the query object is zero.” To mark the read more..

  • Page - 531

    for — after all, the application doesn’t have to wait for results to be ready any more. As mentioned earlier, OpenGL operates as a pipeline, which means that it may not have finished dealing with RenderSimplifiedObject before your call to glBeginConditionalRender() or before the first drawing function called from RenderRealObject reaches the beginning of the pipeline. In this case, read more..

  • Page - 532

    // Render the more complex versions of the objects, skipping them // if the occlusion query results are available and zero for (n=0;n<10;n++) { glBeginConditionalRender(ten_queries[n], GL_QUERY_NO_WAIT); RenderRealObject(&object[n]); glEndConditionalRender(); } Listing 11.6: A more complete conditional rendering example In this example, simplified versions of ten objects are rendered first, each read more..

  • Page - 533

    In particular, it will count as soon as a sample might pass the depth and stencil tests. Many implementations of OpenGL implement some form of hierarchical depth testing, where the nearest and furthest depth values for a particular region of the screen are stored, and then as primitives are rasterized, the depth values for large blocks of them are tested against this read more..

  • Page - 534

    // Stop the last query glEndQuery(GL_TIME_ELAPSED); // Now, we can retrieve the results from the three queries. glGetQueryObjectuiv(queries[0], GL_QUERY_RESULT, &world_time); glGetQueryObjectuiv(queries[1], GL_QUERY_RESULT, &objects_time); glGetQueryObjectuiv(queries[2], GL_QUERY_RESULT, &HUD_time); // Done. world_time, objects_time, and hud_time contain the values we want. // Clean up after ourselves. read more..

  • Page - 535

    // Create four query objects glGenQueries(4, queries); // Get the start time glQueryCounter(GL_TIMESTAMP, queries[0]); // Render the world RenderWorld(); // Get the time after RenderWorld is done glQueryCounter(GL_TIMESTAMP, queries[1]); // Render the objects in the world RenderObjects(); // Get the time after RenderObjects is done glQueryCounter(GL_TIMESTAMP, queries[2]); // Render the HUD RenderHUD(); // read more..

  • Page - 536

    small amount of time. A single, unsigned 32-bit value can count to a little over 4 seconds’ worth of nanoseconds. If you expect to time operations that take longer than this (hopefully over the course of many frames!), you might want to consider retrieving the full 64-bit results that query objects keep internally. To do this, call void glGetQueryObjectui64v(GLuint id, GLenum read more..

  • Page - 537

    Query objects were introduced earlier in this chapter in the context of occlusion queries. It was stated that there are many questions that can be asked of OpenGL. Both the number of primitives generated and the number of primitives actually written to the transform feedback buffers are available as queries. As before, to generate a query object, call GLuint one_query; read more..

  • Page - 538

    There are a couple of subtle differences between the GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries. The first is that the GL_PRIMITIVES_GENERATED query counts the number of primitives emitted by the front end, but the GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN query only counts primitives that were successfully written into the transform feedback buffers. The primitive count read more..

  • Page - 539

    can call glBeginQueryIndexed() and glEndQueryIndexed() , whose prototypes are void glBeginQueryIndexed(GLenum target, GLuint index, GLuint id); void glEndQueryIndexed(GLenum target, GLuint index); These two functions behave just like their non-indexed counterparts, and the target and id parameters have the same meaning. In fact, calling glBeginQuery() is equivalent to calling glBeginQueryIndexed() with index read more..

  • Page - 540

    // We have two buffers, buffer1 and buffer2. First, we’ll bind buffer1 as the // source of data for the draw operation (GL_ARRAY_BUFFER), and buffer2 as // the destination for transform feedback (GL_TRANSFORM_FEEDBACK_BUFFER). glBindBuffer(GL_ARRAY_BUFFER, buffer1); glBindBuffer(GL_TRANSFORM_FEEDBACK_BUFFFER, buffer2); // Now, we need to start a query to count how many vertices get written to read more..

  • Page - 541

    GL_TRANSFORM_FEEDBACK and whose second parameter, id , is the name of the transform feedback object to bind. You can delete transform feedback objects using glDeleteTransformFeedbacks() , and you can determine whether a given value is the name of a transform feedback object by calling glIsTransformFeedback() : void glDeleteTransformFeedbacks(GLsizei n, const GLuint * ids); GLboolean read more..

  • Page - 542

    • Calling glDrawTransformFeedbackStream() is equivalent to calling glDrawTransformFeedback() , except that the stream given in stream is used as the source of the count. • Calling glDrawTransformFeedbackStreamInstanced() is equivalent to calling glDrawTransformFeedbackInstanced() , except that the stream given in stream is used as the source of the count. When you use one of the functions that read more..

  • Page - 543

    processed, it will empty the OpenGL pipeline, causing a bubble and reducing performance, sometimes drastically. In general, it is recommended that you don’t call glFinish() for any reason. Synchronization and Fences Sometimes it may be necessary to know whether OpenGL has finished executing commands up to some point without forcing to empty the pipeline. This is especially useful when read more..

  • Page - 544

    When glGetSynciv() returns, result (which is a GLint ) will contain GL_SIGNALED if the sync object was in the signaled state and GL_UNSIGNALED otherwise. This allows the application to poll the state of the sync object and use this information to potentially do some useful work while the GPU is busy with previous commands. For example, consider the code in Listing 11.10. GLint read more..

  • Page - 545

    Without this bit, there is a possibility that OpenGL could watch for a sync object that hasn’t been sent down the pipeline yet, and the application could end up waiting forever and hang. It’s a good idea to set this bit unless you have a really good reason not to. The third parameter is a timeout value in nanoseconds to wait. If the sync object doesn’t become read more..

  • Page - 546

    For glWaitSync() , the behavior is slightly different. The application won’t actually wait for the sync object to become signaled, only the GPU will. Therefore, glWaitSync() will return to the application immediately. This makes the second and third parameters somewhat irrelevant. Because the application doesn’t wait for the function to return, there is no danger of your application read more..

  • Page - 547

    current and calls glWaitSync() to wait for the sync object to become signaled. It can then issue more commands to OpenGL (on the new context), and those are queued up by the drivers, ready to execute. Only when the GPU has finished recording data into the transform feedback buffers with the first context does it start to work on the commands using that data in the read more..

  • Page - 548

    have the tools necessary to measure the latency of the graphics pipeline. This, in turn, allows you to alter your application’s complexity to suit the system it’s running on and the performance targets you’ve set for it. We will use these tools for real-world performance tuning exercises in Chapter 13, “Debugging and Performance Optimization.” You also saw how it is read more..

  • Page - 549

    This page intentionally left blank read more..

  • Page - 550

    Part III In Practice read more..

  • Page - 551

    This page intentionally left blank read more..

  • Page - 552

    Chapter 12 Rendering Techniques WHAT YOU’LL LEARN IN THIS CHAPTER • How to light the pixels in your scene • How to delay shading until the last possible moment • How to render an entire scene without a single triangle By this point in the book, you should have a good grasp of the fundamentals of OpenGL. You have been introduced to most of its features and should read more..

  • Page - 553

    Lighting Models Arguably, the job of any graphics rendering application is the simulation of light. Whether it be the simplest spinning cube, or the most complex movie special effect ever invented, we are trying to convince the user that they are seeing the real world, or an analog of it. To do this, we must model the way that light interacts with surfaces. Extremely read more..

  • Page - 554

    Diffuse Light Diffuse light is the directional component of a light source and was the subject of our previous example lighting shader. In the Phong lighting model, the diffuse material and lighting values are multiplied together, as is done with the ambient components. However, this value is then scaled by the dot product of the surface normal and light vector, which is the read more..

  • Page - 555

    The shininess parameter could easily be a uniform just like anything else. Traditionally (from the fixed-function pipeline days), the highest specular power is set to 128. Numbers greater than this tend to have a diminishingly small effect. Now, we have formed a complete equation for modeling the effect of lighting on a surface. Given material with ambient term ka, diffuse term read more..

  • Page - 556

    The effect of diffuse shading also becomes clearer from Figure 12.1. When the light source shines directly on the surface, the vector L will be perpendicular to the surface and therefore be colinear with N , where the dot product between N and L is greatest. When the light strikes the surface at a grazing angle, L and N will be almost perpendicular to one another, and their read more..

  • Page - 557

    // Calculate view-space light vector vec3 L = light_pos - P.xyz; // Calculate view vector (simply the negative of the // view-space position) vec3 V = -P.xyz; // Normalize all three vectors N = normalize(N); L = normalize(L); V = normalize(V); // Calculate R by reflecting -L around the plane defined by N vec3 R = reflect(-L, N); // Calculate the diffuse and specular contributions read more..

  • Page - 558

    efficient, as all the computations are done only once per vertex. Figure 12.2 shows the output of the phonglighting example program. Figure 12.2: Per-vertex lighting (Gouraud shading) Phong Shading One of the drawbacks to Gouraud shading is clearly apparent in Figure 12.2. Notice the starburst pattern of the specular highlight. On a still image, this might almost pass as an read more..

  • Page - 559

    switched between evaluating the lighting equations per vertex (and therefore implementing Gouraud shading) and evaluating them per fragment (implementing Phong shading). Figure 12.3 shows the output from the phonglighting sample program performing shading per fragment. Figure 12.3: Per-fragment lighting (Phong shading) The trade-off is of course we are now doing significantly more work in the read more..

  • Page - 560

    { vec3 N; vec3 L; vec3 V; }vs_out; // Position of light uniform vec3 light_pos = vec3(100.0, 100.0, 100.0); void main(void) { // Calculate view-space coordinate vec4 P=mv_matrix * position; // Calculate normal in view-space vs_out.N = mat3(mv_matrix) * normal; // Calculate light vector vs_out.L = light_pos - P.xyz; // Calculate view vector vs_out.V = -P.xyz; // Calculate the clip-space read more..

  • Page - 561

    // Calculate R locally vec3 R = reflect(-L, N); // Compute the diffuse and specular components for each // fragment vec3 diffuse = max(dot(N, L), 0.0) * diffuse_albedo; vec3 specular = pow(max(dot(R, V), 0.0), specular_power) * specular_albedo; // Write final color to the framebuffer color = vec4(diffuse + specular, 1.0); } Listing 12.4: The Phong shading fragment shader On today’s read more..

  • Page - 562

    Figure 12.4: Varying specular parameters of a material of the light by the diffuse and specular components of each fragment’s color. Blinn-Phong Lighting The Blinn-Phong lighting model could be considered an extension to or possibly an optimization of the Phong lighting model. Notice that in the Phong lighting model, we calculate R · N at each shaded point (either per vertex or read more..

  • Page - 563

    avoiding the call to the reflect function. Modern graphics processors are generally powerful enough that the difference in cost between the vector normalization required to calculate H and the call to reflect is negligible. However, if the curvature of the underlying surface represented by a triangle is relatively small and if the triangle is small relative to the distance from the read more..

  • Page - 564

    used for the Phong rendering is 128, whereas the specular exponent used for the Blinn-Phong rendering is 200. As you can see, after adjustment of the specular powers, the results are very similar. Figure 12.5: Phong lighting (left) vs. Blinn-Phong lighting (right) Rim Lighting Rim lighting, which is also known as back-lighting, is an effect that simulates the bleeding of light read more..

  • Page - 565

    N1 N2 V1 V2 Figure 12.6: Rim lighting vectors A quantity that is easy to calculate and is proportional to the angle between two vectors is the dot product. When two vectors are colinear, the dot product between them will be one. As the two vectors become closer to orthogonal, the dot product becomes closer to zero. Therefore, we can produce a rim light effect by taking read more..

  • Page - 566

    Figure 12.7 shows a model illuminated with a Phong lighting model as described earlier in this chapter, but with a rim light effect applied. The code to produce this image is included in the rimlight example program. The top-left image has the rim light disabled for reference. The top-right image applies a medium strength rim light with a moderate fall-off exponent. The read more..

  • Page - 567

    Normal Mapping In the examples shown so far, we have calculated the lighting contributions either at each vertex in the case of Gouraud shading, or at each pixel, but with vectors derived from per-vertex attributes that are then smoothly interpolated across each triangle in the case of Phong shading. To really see surface features, that level of detail must be present in the read more..

  • Page - 568

    The most common coordinate space used for normal maps is tangent space, which is a local coordinate system where the positive z axis is aligned with the surface normal. The other two vectors in this coordinate space are known as the tangent and bitangent vectors, and for best results, these vectors should line up with the direction of the u and v coordinates used in the read more..

  • Page - 569

    along with the rest of the code for this example is included in the bumpmapping sample application. #version 420 core layout (location = 0) in vec4 position; layout (location = 1) in vec3 normal; layout (location = 2) in vec3 tangent; layout (location = 4) in vec2 texcoord; out VS_OUT { vec2 texcoord; vec3 eyeDir; vec3 lightDir; }vs_out; uniform mat4 mv_matrix; uniform mat4 read more..

  • Page - 570

    is shown in Listing 12.8, we simply fetch a per-fragment normal map and use it in our shading calculations. #version 420 core out vec4 color; // Color and normal maps layout (binding =0) uniform sampler2D tex_color; layout (binding =1) uniform sampler2D tex_normal; in VS_OUT { vec2 texcoord; vec3 eyeDir; vec3 lightDir; }fs_in; void main(void) { // Normalize our incoming view and light read more..

  • Page - 571

    normals that are interpolated by OpenGL and does not use the normal map. It should be clear from contrasting the bottom-left and bottom-right images that normal mapping can add substantial detail to an image. The bottom-left image from Figure 12.9 is also shown in Color Plate 7. Figure 12.9: Result of normal mapping example Environment Mapping In the previous few subsections, you read more..

  • Page - 572

    three methods of simulating an environment in the next couple of subsections. Spherical Environment Maps As noted, a spherical environment map is a texture map that represents the lighting produced by the simulated surroundings on a sphere made from the material being simulated. This works by taking the view direction and surface normal at the point being shaded and using these two read more..

  • Page - 573

    vec3 view; }vs_out; void main(void) { vec4 pos_vs = mv_matrix * position; vs_out.normal = mat3(mv_matrix) * normal; vs_out.view = pos_vs.xyz; gl_Position = proj_matrix * pos_vs; } Listing 12.9: Spherical environment mapping vertex shader Now, given the per-fragment normal and view direction, we can calculate the texture coordinates to look up into our environment map. First, we reflect the read more..

  • Page - 574

    example program, using the environment map in the rightmost image of Figure 12.10. Figure 12.11: Result of rendering with spherical environment mapping Equirectangular Environment Maps The equirectangular environment map is similar to the spherical environment map except that it is less susceptible to the pinching effect sometimes seen when the poles of the sphere are sampled from. An read more..

  • Page - 575

    Figure 12.12: Example equirectangular environment map Listing 12.11. The result of rendering an object with this shader is shown in Figure 12.13. #version 420 core layout (binding =0) uniform sampler2D tex_envmap; in VS_OUT { vec3 normal; vec3 view; }fs_in; out vec4 color; void main(void) { // u will be our normalized view vector vec3 u = normalize(fs_in.view); // Reflect u about the read more..

  • Page - 576

    Figure 12.13: Rendering result of equirectangular environment map Cube Maps A cube map is treated as a single texture object, but it is made up of six square (yes, they must be square!) 2D images that make up the six sides of a cube. Applications of cube maps range from 3D light maps to reflections and highly accurate environment maps. Figure 12.14 shows the layout of read more..

  • Page - 577

    Figure 12.14: The layout of six cube faces in the Cubemap sample program this order, and so we can simply create a loop and update each face in turn. Example code to do this is shown in Listing 12.12. GLuint texture; glGenTextures(1, &texture); glBindTexture(GL_TEXTURE_CUBE_MAP, texture); glTexStorage2D(GL_TEXTURE_CUBE_MAP, levels, internalFormat, width, height); for (face = 0; face < read more..

  • Page - 578

    0, 0, width, height, format, type, data + face * face_size_in_bytes); } Listing 12.12: Loading a cube map texture Cube maps also support mipmaps, and so if your cube map has mipmap data, the code in Listing 12.12 would need to be modified to load the additional mipmap levels. The Khronos Texture File format has native support for cube map textures, and so the book’s .KTX read more..

  • Page - 579

    submatrix) to orient them in the right direction, and render the cube in world space. In world space, the only face we’d see is the one we are looking directly at. Therefore, we can render a full-screen quad, and transform its corners by the view matrix in order to orient it correctly. All this occurs in the vertex shader, which is shown in Listing 12.13. #version 420 read more..

  • Page - 580

    Once we’ve rendered our sky box, we need to render something into the scene that reflects the sky box. The texture coordinates used to fetch from a cube map texture are interpreted as a vector pointing from the origin outwards towards the cube. OpenGL will determine which face this vector eventually hits, and the coordinate within the face that it hits and then retrieve read more..

  • Page - 581

    out vec4 color; void main(void) { // Reflect view vector about the plane defined by the normal // at the fragment vec3 r = reflect(fs_in.view, normalize(fs_in.normal)); // Sample from scaled using reflection vector color = texture(tex_cubemap, r); } Listing 12.16: Fragment shader for cube map environment rendering The result of rendering an object surrounded by a sky box using the read more..

  • Page - 582

    shiny, and our ladybug looks somewhat plastic. However, there is no reason that every part of our models must be made from the same material. In fact, we can assign material properties per surface, per triangle, or even per pixel by storing information about the surface in a texture. For example, the specular exponent can be stored in a texture and applied to a model when read more..

  • Page - 583

    #version 420 core layout (binding =0) uniform sampler3D tex_envmap; layout (binding =1) uniform sampler2D tex_glossmap; in VS_OUT { vec3 normal; vec3 view; vec2 tc; }fs_in; out vec4 color; void main(void) { // u will be our normalized view vector vec3 u = normalize(fs_in.view); // Reflect u about the plane defined by the normal at the fragment vec3 r = reflect(u, normalize(fs_in.normal)); read more..

  • Page - 584

    Figure 12.17: Result of per-pixel gloss example must determine whether there is line of sight from the point being shaded to a light and, therefore, from the light to the point being shaded. This turns out to be a visibility calculation, and as luck might have it, we have extremely fast hardware to determine whether a piece of geometry is visible from a given vantage read more..

  • Page - 585

    declared as a variable with a sampler2DShadow type for 2D textures, which we’ll be using in this example. You can also create show samplers for 1D textures (sampler1DShadow), cube maps (samplerCubeShadow), and rectangle textures (samplerRectShadow), and for arrays of these types (except, of course, rectangle textures). Listing 12.18 shows how to set up a framebuffer object with only a read more..

  • Page - 586

    Rendering the scene from the light’s position results in a depth buffer that contains the distance from the light to each pixel in the framebuffer. This can be visualized as a grayscale image with black being the closest possible depth value (zero) and white being the furthest possible depth value (white). Figure 12.18 shows the depth buffer of a simple scene rendered with read more..

  • Page - 587

    axis. The matrix that transforms vertices from object space into the light’s clip space is known as the shadow matrix, and the code to calculate it is shown in Listing 12.20. const vmath::mat4 scale_bias_matrix = vmath::mat4(vmath::vec4(0.5f, 0.0f, 0.0f, 0.0f), vmath::vec4(0.0f, 0.5f, 0.0f, 0.0f), vmath::vec4(0.0f, 0.0f, 0.5f, 0.0f), vmath::vec4(0.5f, 0.5f, 0.5f, 1.0f)); vmath::mat4 shadow_matrix read more..

  • Page - 588

    component and then uses the resulting x and y components to fetch a value from the texture. It then compares the returned value against the computed z component using the chosen comparison function, producing a value 1.0 or 0.0 depending on whether the test passed or failed, respectively. If the selected texture filtering mode for the texture is GL_LINEAR or would otherwise require read more..

  • Page - 589

    Figure 12.19: Results of rendering with shadow maps also requires a pass over the scene, which costs performance. This can quickly add up and slow your application down. The shadow maps must be of a very high resolution as what might have mapped to a single texel in the shadow map may cover several pixels in screen space, which is effectively where the lighting calculations read more..

  • Page - 590

    travels. We use this scattering and absorption to gauge depth and infer distance as we look out into the world. Modeling it, even approximately, can add quite a bit of realism to our scenes. Fog We are all familiar with fog. On a foggy day, it might be impossible to see more than a few feet in front of us, and dense fog can present danger. However, even when fog read more..

  • Page - 591

    }tes_in[]; out TES_OUT { vec2 tc; vec3 world_coord; vec3 eye_coord; }tes_out; void main(void) { vec2 tc1 = mix(tes_in[0].tc, tes_in[1].tc, gl_TessCoord.x); vec2 tc2 = mix(tes_in[2].tc, tes_in[3].tc, gl_TessCoord.x); vec2 tc = mix(tc2, tc1, gl_TessCoord.y); vec4 p1 = mix(gl_in[0].gl_Position, gl_in[1].gl_Position, gl_TessCoord.x); vec4 p2 = mix(gl_in[2].gl_Position, gl_in[3].gl_Position, gl_TessCoord.x); vec4 p = read more..

  • Page - 592

    0.2 0.4 0.6 0.8 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Figure 12.20: Graphs of exponential decay The modified fragment shader that applies fog is shown in Listing 12.24. #version 420 core out vec4 color; layout (binding =1) uniform sampler2D tex_color; uniform bool enable_fog = true; uniform vec4 fog_color = vec4(0.7, 0.8, 0.9, 0.0); in TES_OUT { vec2 tc; vec3 world_coord; vec3 read more..

  • Page - 593

    color = fog(landscape); } else { color = landscape; } } Listing 12.24: Application of fog in a fragment shader In our fragment shader, the fog function applies fog to the incoming fragment color. It first calculates the fog factor for the extinction and inscattering components of the fog. It then multiplies the original fragment color by the extinction term. As the extinction term read more..

  • Page - 594

    Cell Shading — Texels as Light Many of our examples of texture mapping in the last few chapters have used 2D textures. Two-dimensional textures are typically the simplest and easiest to understand. Most people can quickly get the intuitive feel for putting a 2D picture on the side of a piece of 2D or 3D geometry. Let’s take a look now at a one-dimensional texture read more..

  • Page - 595

    model file, which we use to create the torus, supplies a set of two-dimensional texture coordinates, we ignore them in our vertex shader, which is shown in Listing 12.25, and only use the incoming position and normal. #version 420 core uniform mat4 mv_matrix; uniform mat4 proj_matrix; layout (location = 0) in vec4 position; layout (location = 1) in vec3 normal; out VS_OUT { vec3 read more..

  • Page - 596

    // Simple N dot L diffuse lighting float tc = pow(max(0.0, dot(N, L)), 5.0); // Sample from cell shading texture color = texture(tex_toon, tc) * (tc * 0.8 + 0.2); } Listing 12.26: The toon fragment shader The fragment shader for our toon shader calculates the diffuse lighting coefficient as normal, but rather than using it directly, it uses it to look up into a texture read more..

  • Page - 597

    Alternative Rendering Methods Traditional forward rendering executes the complete graphics pipeline, starting with a vertex shader and following through with any number of subsequent stages, most likely terminating with a fragment shader. That fragment shader is responsible for calculating the final color of the fragment3 and after each drawing command, the content of the framebuffer becomes read more..

  • Page - 598

    for geometry as it stores information about the geometry at that point rather than image properties. Once the G-buffer has been generated, it is possible to shade each and every point on the screen using a single full-screen quad. This final pass will use the full complexity of the final lighting algorithms, but rather than being applied to each pixel of each triangle, it read more..

  • Page - 599

    world-space coordinate of the fragment, and a 32-bit integer component to store a per-pixel object or material index, and a 32-bit component to store the per-pixel specular power factor. The sum total of these bits is six 16-bit components and five 32-bit components. How on earth will we represent this with a single framebuffer? Actually, it’s fairly simple. For the six 16-bit read more..

  • Page - 600

    necessary input information, it can export all of the data it needs into two color outputs as seen in Listing 12.28. #version 420 core layout (location = 0) out uvec4 color0; layout (location = 1) out vec4 color1; in VS_OUT { vec3 ws_coords; vec3 normal; vec3 tangent; vec2 texcoord0; flat uint material_id; }fs_in; layout (binding =0) uniform sampler2D tex_diffuse; void main(void) { uvec4 read more..

  • Page - 601

    convert the integer data stored in our textures into the floating-point data we need. The unpacking code is shown in Listing 12.29. layout (binding =0) uniform usampler2D gbuf0; Layout (binding =1) uniform sampler2D gbuf1; struct fragment_info_t { vec3 color; vec3 normal; float specular_power; vec3 ws_coord; uint material_id; }; void unpackGBuffer(ivec2 coord, out fragment_info_t fragment) { uvec4 read more..

  • Page - 602

    Figure 12.24: Visualizing components of a G-buffer vec4 light_fragment(fragment_info_t fragment) { int i; vec4 result = vec4(0.0, 0.0, 0.0, 1.0); if (fragment.material_id != 0) { for (i=0;i<num_lights; i++) { vec3 L = fragment.ws_coord - light[i].position; float dist = length(L); L = normalize(L); vec3 N = normalize(fragment.normal); vec3 R = reflect(-L, N); float NdotR = max(0.0, dot(N, R)); read more..

  • Page - 603

    The final result of lighting a scene using deferred shading is shown in Figure 12.25. In the scene, over 200 copies of an object are rendered using instancing. Each pixel in the frame has some overdraw. The final pass over the scene calculates the contribution of 64 lights. Increasing and decreasing the number of lights in the scene has little effect on performance. In read more..

  • Page - 604

    vertex shader, transforming them into tangent space using the TBN matrix, and passing them to the fragment shader where lighting calculations are performed. However, in deferred renderers, the normals that you store in the G-buffer are generally in world or view space. In order to generate view-space normals6 that can be stored into a G-buffer for deferred shading, we need to read more..

  • Page - 605

    layout (binding =0) uniform sampler2D tex_diffuse; layout (binding =1) uniform sampler2D tex_normal_map; void main(void) { vec3 N = normalize(fs_in.normal); vec3 T = normalize(fs_in.tangent); vec3 B = cross(N, T); mat3 TBN = mat3(T, B, N); vec3 nm = texture(tex_normal_map, fs_in.texcoord0).xyz * 2.0 - vec3(1.0); nm = TBN * normalize(nm); uvec4 outvec0 = uvec4(0); vec4 outvec1 = vec4(0); vec3 read more..

  • Page - 606

    solve all of your problems. Besides being very bandwidth heavy and requiring a lot of memory for all of the textures you attach to your G-buffer, there are a number of other downsides to deferred shading. With a bit of effort, you might be able to work around some of them, but before you launch into writing a shiny new deferred renderer, you should consider the read more..

  • Page - 607

    meta-data such as material IDs just doesn’t work. So, if you want to implement antialiasing, you’ll need to use multi-sampled textures for all of the off-screen buffers attached to your G-buffer. What’s worse, because the final pass consists of a single large polygon (or possibly two) that covers the entire scene, none of the interior pixels will be considered edge pixels, read more..

  • Page - 608

    from object to object in a scene such that surfaces are lit indirectly by the light reflected from nearby surfaces. Ambient light is an approximation to this scattered light and is a small, fixed amount added to lighting calculations. However, in deep creases or gaps between objects, less light will light them due to the nearby surfaces occluding the light sources — hence read more..

  • Page - 609

    of the lights. However, it should be clear that a point at the top of a peak should be able to see most, if not all of the lights. The bumps in the surface occlude the lights from points at the bottom of valleys, and therefore, they will receive ambient light. In a full global illumination simulation, we would literally trace lines (or rays) from each point being read more..

  • Page - 610

    2 - 3 4 1 0 - 2 3 Figure 12.28: Selection of random vector in an oriented hemisphere However, V2 and V3 lie outside the desired hemisphere, and it should be clear that the dot product between either of these two vectors and N will be negative. In this case, we simply negate V2 and V3, reorienting them into the correct hemisphere. Once we have our random set of vectors, read more..

  • Page - 611

    Figure 12.29: Effect of increasing direction count on ambient occlusion Figure 12.30: Effect of introducing noise in ambient occlusion As you can see in Figure 12.30, the introduction of randomness in the step rate along the occlusion rays has improved image quality substantially. Again, from left to right, top to bottom, we have taken 1, 4, 16, and 64 562 Chapter 12: Rendering read more..

  • Page - 612

    directions, respectively. With random ray step rates, the image produced by considering only a single ray direction has gone from looking quite corrupted to looking noisy, but correct. Even the 4-direction result (shown on the top right of Figure 12.30) has acceptable quality, whereas the equivalent image in Figure 12.29 still exhibits considerable banding. The 16-sample image on the read more..

  • Page - 613

    part — this is where we apply the ambient occlusion effect. It is shown in its entirety in Listing 12.32, which is part of the ssao sample application. #version 430 core // Samplers for pre-rendered color, normal, and depth layout (binding =0) uniform sampler2D sColor; layout (binding =1) uniform sampler2D sNormalDepth; // Final output layout (location = 0) out vec4 color; // read more..

  • Page - 614

    // For each random point (or direction)... for (i=0;i<point_count; i++) { // Get direction vec3 dir = points.pos[i].xyz; // Put it into the correct hemisphere if (dot(N, dir) < 0.0) dir = -dir; // f is the distance we’ve stepped in this direction // z is the interpolated depth float f = 0.0; float z=my_depth; // We’re going to take 4 steps - we could make this // read more..

  • Page - 615

    over geometry that’s already been rendered. In this section, we take it one step further and demonstrate how it’s possible to render entire scenes entirely with a single full-screen quad. Rendering Julia Fractals In this next example, we render a Julia set, creating image data from nothing but the texture coordinates. Julia sets are related to the Mandelbrot set — the iconic read more..

  • Page - 616

    level of detail in the resulting image. Listing 12.33 shows the setup for our Julia renderer’s fragment shader. #version 430 core in Fragment { vec2 tex_coord; } fragment; // Here’s our value of c uniform vec2 c; // This is the color gradient texture uniform sampler1D tex_gradient; // This is the maximum iterations we’ll perform before we consider // the point to be outside read more..

  • Page - 617

    operations are equivalent, but this way avoids a square root in the shader, improving performance. If, at the end of the loop, iterations is equal to max_iterations , we know that we ran out of iterations and the point is inside the set — we color it black. Otherwise, our point left the set before we ran out of iterations, and we can color the point accordingly. To read more..

  • Page - 618

    Ray Tracing in a Fragment Shader OpenGL usually works by using rasterization to generate fragments for primitives such as lines, triangles, and points. This should be obvious to you by now. We send geometry into the OpenGL pipeline, and for each triangle, OpenGL figures out which pixels it covers, and then runs your shader to figure out what color it should be. Ray tracing read more..

  • Page - 619

    0 1 Figure 12.33: Simplified 2D illustration of ray tracing textures, and so on. However, we also consider the contribution of the rays that we shoot in other directions. So, for I0, we’ll shade it using Rprimary as our view vector, N as our normal, Rshadow as our light vector, and so on. Next, we’ll shoot a ray off towards I1 (Rreflected), shade the surface there, read more..

  • Page - 620

    Expanding this gives us a quadratic equation in t : (D · D )t2 +2(O − C ) · Dt +(O − C ) · (O − C ) − r2 =0 To write this in the more familiar form of At2 + Bt + C =0 A = D · D B =2(O − C ) · D C =(O − C ) · (O − C ) − r2 As a simple quadratic equation, we can solve for t , knowing that there are either zero, one, or two read more..

  • Page - 621

    float intersect_ray_sphere(ray R, sphere S, out vec3 hitpos, out vec3 normal) { vec3 v = R.origin - S.center; float B=2.0 * dot(R.direction, v); float C = dot(v, v) - S.radius * S.radius; float B2 = B * B; float f=B2-4.0 * C; if (f < 0.0) return 0.0; float t0 =-B+sqrt(f); float t1 =-B-sqrt(f); float t = min(max(t0, 0.0), max(t1, 0.0)) * 0.5; if (t == 0.0) return 0.0; read more..

  • Page - 622

    float min_t = 1000000.0f; float t; // For each sphere... for (i=0;i<num_spheres; i++) { // Find the intersection point t = intersect_ray_sphere(R, S[i], hitpos, normal); // If there is an intersection if (t != 0.0) { // And that intersection is less than our current best if (t < min_t) { // Record it. min_t=t; hit_position = hitpos; hit_normal = normal; sphere_index = i; } } } read more..

  • Page - 623

    However, this isn’t particularly interesting — we’ll need to light the point. The surface normal is important for lighting calculations (as you have read already in this chapter), and this is returned by our intersection function. We perform lighting calculations as normal in the ray tracer — taking the surface normal, the view-space coordinate (calculated during the intersection read more..

  • Page - 624

    However, it doesn’t end there. Just as we constructed a new ray starting from our intersection and pointing in the direction of our light source, we can construct a ray pointing in any direction. For example, given that we know the surface normal at the ray’s intersection with the sphere, we can use GLSL’s reflect to reflect the incoming ray direction around the plane read more..

  • Page - 625

    a full-screen quad once for each bounce of the rays we want to trace. On each pass, we bind the origin, direction, and reflected color textures from the previous pass. We also bind a framebuffer that has the outgoing origin, direction, and reflection textures attached to it as color attachments — these textures will be used in the next pass. Then, for each pixel, the read more..

  • Page - 626

    As you can see in Figure 12.37, the top-left image (which includes no secondary rays) is pretty dull. As soon as we introduce the first bounce in the top-right image, we begin to see reflections of the spheres. Adding a second bounce in the bottom left, we can see reflections of spheres in the reflections of the spheres... in the third bounce on the lower right, read more..

  • Page - 627

    Otherwise, we can find a real value for t . Again, once we know the value of t , we can substitute it back into our ray equation, P = O + tD , to retrieve our intersection point. If t is less than zero, then we know that the ray intersects the plane behind the viewer, which we consider here to be a miss. Code to perform this intersection test is shown in read more..

  • Page - 628

    Now, if we add a few more planes, we can enclose our scene in a box. The resulting image is shown on the top left of Figure 12.39. However, now, when we bounce the rays further, the effect of reflection becomes more and more apparent. You can see the result of adding more bounces as we progress from left to right, top to bottom in Figure 12.39 with no bounces, read more..

  • Page - 629

    planes in real time. Current research in ray tracing is almost entirely focused on efficient acceleration structures and how to generate them, store them, and traverse them. Summary In this chapter, we have applied the fundamentals that you have learned throughout the book to a number of rendering techniques. At first, we focused heavily on lighting models and how to shade the read more..

  • Page - 630

    Chapter 13 Debugging and Performance Optimization WHAT YOU’LL LEARN IN THIS CHAPTER • How to figure out what’s wrong when your application isn’t doing what you want it to • How to achieve the highest possible performance • How to make sure you’re making the best use of OpenGL that you can By now, you’ve learned a lot about OpenGL. You’ll probably have started read more..

  • Page - 631

    Debugging Your Applications It is an all-too-common scenario that you’ll invent a nifty new algorithm for rendering something; set up all your textures, vertices, framebuffers, and other data that you’ll need; start calling drawing commands; and either see nothing, or see something other than what you wanted. In this section we’ll cover two very powerful assets that are available read more..

  • Page - 632

    Once you have created a debug context, you need to give it a way to notify your application when something goes wrong. To do this, OpenGL uses a callback function that is specified using a function pointer. The definition of the callback function pointer type is typedef void (APIENTRY * GLDEBUGPROC)(GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const read more..

  • Page - 633

    cause an infinite loop and crash your program. In the simple example of Listing 13.2, we just print the message along with the raw values of several of the parameters using the C function printf . Again, in debug builds, the sb6 application framework installs a default debug callback function that simply prints the received message. However, if you want more advanced control read more..

  • Page - 634

    • GL_DEBUG_TYPE_DEPRECATED_BEHAVIOR means that you’ve attempted to use features that are marked for deprecation (which means that they will removed from future versions of OpenGL). • GL_DEBUG_TYPE_UNDEFINED_BEHAVIOR indicates that something your application is trying to do will produce undefined behavior, and that even if it might work on this particular OpenGL implementation, this is not read more..

  • Page - 635

    You can tell OpenGL which types of messages you want to receive by calling the glDebugMessageControl() function. Its prototype is void glDebugMessageControl(GLenum source, GLenum type, GLenum severity, GLsizei count, const GLuint * ids, GLboolean enabled); The source , type , and severity parameters together form a filter that is used to select the group of debugging messages that the read more..

  • Page - 636

    so you can record these messages using the same logging mechanisms you might implement for regular debugging messages. To inject your own message into the debug output log, call void glDebugMessageInsert(GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const char * message); Again, the source , type , id , and severity parameters have the same meanings as they do read more..

  • Page - 637

    When you want to leave a debug group, call void glPopDebugGroup(void); Again, glPopDebugGroup() will produce another debug message, this time with the type parameter set to GL_DEBUG_TYPE_POP_GROUP but with all the other parameters set to the same thing as the corresponding message from when the group was pushed. When OpenGL produces debug messages, it will usually refer to objects such read more..

  • Page - 638

    • GL_TRANSFORM_FEEDBACK if name is the name of a transform feedback object. • GL_VERTEX_ARRAY if name is the name of a vertex array object. For glObjectPtrLabel() , the object is identified by a pointer type. This function is used for objects that have pointer types in OpenGL, which is currently only sync objects. For both functions, the label and length parameters specify the read more..

  • Page - 639

    download and install them right now! The first of these tools is GPUView, which is part of the Windows Performance Toolkit by Microsoft. The second is AMD’s GPU PerfStudio 2. Both of these tools are available for download from their respective vendors’ Web sites. Windows Performance Toolkit and GPUView Microsoft’s Windows Performance Toolkit (WPT) is a suite of tools for read more..

  • Page - 640

    Figure 13.1: GPUView in action The application under analysis is Figure 13.1 is the asteroid field example from Chapter 7, a screenshot of which is shown in Figure 7.9. This particular application uses almost all of the available GPU time. The system used to capture this trace contained an AMD Phenom X6 1050T processor with six CPU cores and an NVIDIA GeForce GTX 560 SE read more..

  • Page - 641

    Figure 13.2: VSync seen in GPUView shows up as gaps in the hardware queue. This is effectively wasted time. Here, we have wasted time on purpose in order to not allow the application to get too far ahead of the display (and to show what this looks like in the tool). However, anything that causes the GPU to have to wait will waste GPU time. When you install the WPT, read more..

  • Page - 642

    DMA packets once they reach the hardware), the tool can show you a number of other events that might be inserted into the graphics pipeline. For example, present packets are events that instruct the operating system to display the results of rendering (triggered by the SwapBuffers() command) and are displayed with a crosshatch pattern by GPUView. Clicking on a packet brings up a read more..

  • Page - 643

    to execute the command buffer, and the sum of these packets for a given frame places the upper limit on the frame rate of your application. GPUView can show you quite a bit more information than this about your application’s use of the graphics processor. As your applications become more and more complex, they will start to exhibit behavior that only a tool such as read more..

  • Page - 644

    As you can see, the GPU PerfStudio has captured all of the OpenGL calls made by the application and has produced a timeline of the application making those calls. Along with each OpenGL command, the amount of CPU time taken to execute the call is shown in both the timeline and the function call list. The function call list also logs the parameters sent to each command. read more..

  • Page - 645

    Figure 13.6: GPU PerfStudio 2 HUD control window Figure 13.7: GPU PerfStudio 2 overlaying information Chapter 8 with the in-use textures is shown in Figure 13.7. You can see in the figure that at the top left, the height map used by the tessellation evaluation shader is visible. On the top right of the screenshot is the depth buffer (pure white because it’s been cleared read more..

  • Page - 646

    If you happen to have access to AMD hardware, GPU PerfStudio 2 can read a number of hardware performance counters from OpenGL to measure the impact of the drawing commands that your application makes. This includes measurements of things like primitives processed, the amount of texture data read, the amount of information written to the framebuffer, and so on. This feature is read more..

  • Page - 647

    framebuffer using glReadPixels() ; reading the results1 of occlusion queries, transform feedback queries, or other objects whose results depend on rendering; or performing a wait on a fence that is unlikely to have completed. In particular, it should never be necessary to call glFinish() . Furthermore, cases that might be less obvious can be avoided. For example, functions such as read more..

  • Page - 648

    Figure 13.9: GPUView showing the effect of glReadPixels() into system memory to the GL_PIXEL_PACK_BUFFER target before calling glReadPixels() to retrieve data into a pixel pack buffer, which is what we’re doing towards the end of the trace in Figure 13.9. However, although there seems to be a significant change in activity, there are still gaps in the queue, which is not what we read more..

  • Page - 649

    Figure 13.10: GPUView showing the effect of glReadPixels() into a buffer Effective Buffer Mapping Once you have a buffer object whose data store has been allocated using a call to glBufferData() , you can map the entire buffer into the application’s memory by calling glMapBuffer() . However, there are several caveats to the use of this function. First, if you only want to read more..

  • Page - 650

    with offset zero being the first byte in the buffer and length being the size of the mapped range, in bytes. Besides being able to map a small part of the buffer, the additional power of glMapBufferRange() comes from the last parameter, access , which is used to specify a number of flags that control how the mapping is performed. Table 13.1 shows the possible bitfield read more..

  • Page - 651

    Of course, you can specify both the GL_MAP_READ_BIT and GL_MAP_WRITE_BIT flags at the same time by simply ORing them together. The GL_MAP_INVALIDATE_RANGE_BIT and GL_MAP_INVALIDATE_BUFFER_BIT tell OpenGL that you don’t care about the data in the buffer anymore and that it’s free to throw it out if it wishes. If you don’t need the old contents of the buffer after the mapping read more..

  • Page - 652

    Finally, GL_MAP_UNSYNCHRONIZED_BIT tells OpenGL not to wait until it’s done using the data in a buffer before giving you a pointer to the buffer’s memory. If you don’t set this flag and OpenGL is planning to give you a pointer to the same memory that’s about to be used by a previously issued command, it will wait for that command to finish executing before read more..

  • Page - 653

    sb6::object::render , it simply binds the vertex array object and calls the appropriate drawing command, giving it extremely low software overhead. Likewise, for framebuffer state, framebuffer objects wrap up all of the parameters describing the color, depth, and stencil attachments of the current framebuffer. It is far more efficient to create a framebuffer once at initialization time read more..

  • Page - 654

    glVertexAttribPointer() and setting the normalized parameter to GL_TRUE . You can then include a scale factor in any model matrices to return the object to its original scale, for free. This allows you to use only 16 bits per component rather than the 32 that would be needed for full-precision floating-point data, and at the same time provides for more precision than would be read more..

  • Page - 655

    to be a net win to do more math in your shader if it can avoid consumption of memory bandwidth. A reasonable format for normal maps is a two-component, 8-bit signed normalized format. Here, x and y are stored with 7 bits of precision (plus the sign bit), and the z component is reconstructed from the x and y components on use. Applying texture compression to normal maps read more..

  • Page - 656

    First, and most obviously, the startup time of your application is going to be affected by how quickly you can get all the shaders ready for running. Some OpenGL implementations may use additional CPU threads to compile your shaders and may even be able to compile multiple shaders in parallel. However, just as you should consider the OpenGL pipeline as something that read more..

  • Page - 657

    There may be some GPU-side performance advantages to using large, monolithic program objects, but more often than not, well-written shaders don’t see much gain from this. Therefore, if you have a large number of shaders, you might want to consider compiling and linking them into separable program objects. You might want to have a program object for each combination of front-end read more..

  • Page - 658

    taken many passes over the shader to reach this point, increasing optimization time. • If the optimizer stops because it has run out of available passes, it’s quite possible that the code is not as optimal as it could be. Plus, the optimizer has burned all of the time that it has allotted to it. To cope with this, it is in your best interest as a developer to read more..

  • Page - 659

    in what is known as alternate frame rendering (or AFR) mode, where one GPU renders a frame, the next GPU renders the next frame, and so on for as many GPUs as there are in the system. Most such systems have only two GPUs in them, but some may have three, four, or even more present. Also, AFR isn’t the only way to achieve scaling using multiple GPUs, but it read more..

  • Page - 660

    likely turn off compression and make your application run slower, even on single-GPU systems. On multi-GPU systems, the issue is more severe. If you don’t clear the framebuffer, then OpenGL doesn’t know that when you start drawing into it again that you’re going to overwrite everything, which means that before it can execute the first drawing command, it must wait for the read more..

  • Page - 661

    equivalent for your platform from that thread. Once contexts are current for a thread, the thread can create objects, compile shaders, load textures, and even render into windows at the same time. This is in addition to any multi-threading that OpenGL drivers may implement internally on your behalf. In fact, if you look at your application running inside a debugger or other read more..

  • Page - 662

    thread, signaling back to the main thread when it has finished loading the texture. At this point, the main thread can call glTexSubImage2D() to copy the now loaded data from the buffer into the target texture object. This same technique can be applied to any data that’s stored in buffer objects, including vertex and index data, shader constants stored in uniform blocks, read more..

  • Page - 663

    recommended that you always clear a framebuffer before you start rendering to it. This is partly so that optimizations such as framebuffer compression can be effective. It’s also a signal to OpenGL that you’re done with the contents of the framebuffer and that it should be free to reuse that memory for something else. After all, it’s pretty easy for OpenGL to recreate read more..

  • Page - 664

    As with glInvalidateTexImage() , glInvalidateBufferData() throws out any data contained in the buffer object whose name you pass in buffer . After you call this function, the entire contents of the buffer become undefined, but are still allocated and owned by OpenGL. You might call this function, for example, if you store data into an intermediate buffer using transform feedback and read more..

  • Page - 665

    • It may be able to reclaim memory for buffers or textures that have been invalidated and are no longer in use. • It can avoid copying data from resource to resource, especially in multi-GPU systems. • It can return framebuffer attachments to a compressed state without necessarily making their contents valid. In general, you should call one of the invalidation functions read more..

  • Page - 666

    Chapter 14 Platform Specifics WHAT YOU’LL LEARN IN THIS CHAPTER • How OpenGL interacts with major operating systems and window systems • How to create an application without using the book’s framework • How OpenGL translates onto mobile devices such as tablets and smart phones Throughout the book, we’ve been using a simple application framework to allow our example programs read more..

  • Page - 667

    Using Extensions in OpenGL All of the examples shown in this book so far have relied on the core functionality of OpenGL. However, one of OpenGL’s greatest strengths is that it can be extended and enhanced by hardware manufacturers, operating system vendors, and even publishers of tools and debuggers. Extensions can have many different effects on OpenGL functionality. An extension read more..

  • Page - 668

    constructed from extensions programmers have found useful. In this way each extension gets its time in the sun. The ones that shine can be promoted to core; the ones that are less useful are not considered. This “natural selection” process helps to ensure only the most useful and important new features make it into a core version of OpenGL. A useful tool to determine read more..

  • Page - 669

    You should pass GL_EXTENSIONS as the name parameter, and a value between zero and one less than the number of supported extensions in index . The function returns the name of the extension as a string. To see if a specific extension is supported, you can simply query the number of extensions, and then loop through each supported extension and compare its name to the one read more..

  • Page - 670

    GL_ABC_foobar_feature extension in GLSL, include the following in the beginning of your shader: #extension GL_ABC_foobar_feature : enable This tells the compiler that you intend to use the extension in your shader. If the compiler knows about the extension, it will let you compile the shader, even if the underlying hardware doesn’t support the feature. If this is the case, the read more..

  • Page - 671

    function pointer that represents the function you want to call. Function pointers are generally declared in two parts: The first is the definition of the function pointer type, and the second is the function pointer variable itself. Consider this code as an example: typedef void (APIENTRYP PFNGLDRAWTRANSFORMFEEDBACKPROC) (GLenum mode, GLuint id); PFNGLDRAWTRANSFORMFEEDBACKPROC read more..

  • Page - 672

    OpenGL on Windows OpenGL is a powerful API. Its low-level nature leaves all of the control in the hands of application developers. Additionally, the core OpenGL code is portable across many different platforms and operating systems. Because every operating system has a different means of window management, each operating system has a different layer to help applications interface with read more..

  • Page - 673

    operating system with the OSR2 release. OpenGL is now a native API on any full Windows platform (Windows XP, Vista, Win 7, Server 2003, Server 2008, and so on), with its functions exported through the opengl32.dll library and supporting components in user32.dll . Many different levels of OpenGL hardware are available for Windows platforms, from chipsets with part of OpenGL read more..

  • Page - 674

    The ICD is actually a part of the display driver and does not affect the existing opengl32.dll system DLL. The name of the driver is completely up to the hardware vendor, and other vendors will use their own naming conventions. For example, AMD’s OpenGL driver for Windows is packaged in atioglxx.dll , and NVIDIA’s OpenGL driver is packaged in nvoglv32.dll . The name of the read more..

  • Page - 675

    Because a display driver cannot modify the opengl32.dll to add new features for the current version, OpenGL needed a way to allow applications to access parts that were not exposed by the opengl32.dll . This is done through the extension mechanism and an interface that allows applications to get the address of the functions for any supported interfaces. Not only does this work read more..

  • Page - 676

    Basic Window Setup Now it’s time to get back to setting up your application using WGL. The book’s application framework provides only one window, and OpenGL function calls always produced output in that window. (Where else would they go?) Your own real-world Windows applications, however, will often have more than one window. In fact, dialog boxes, controls, and even menus are read more..

  • Page - 677

    as an argument to indicate which context you want the function to affect. You can have multiple device contexts, but only one for each window. Before you jump to the conclusion that OpenGL should work in a similar way, remember that GDI is Windows specific. OpenGL was designed to be completely portable across environments and hardware platforms (and it didn’t start on read more..

  • Page - 678

    In Listing 14.1, we first initialize the contents of the structure to zero using ZeroMemory (this is similar to memset , but does not depend on the C runtime). This means that any fields of the structure we don’t otherwise fill in will be zeros — which is what we want for most of them. We set the structure’s style member to CS_HREDRAW | CS_VREDRAW | CS_OWNDC , read more..

  • Page - 679

    0, 0, 800, 600, NULL, NULL, hInstance, NULL); HDC dc = ::GetDC(hWnd); Listing 14.2: Creating a simple window In Listing 14.2, once we’ve created our window, we get its device context using the GetDC function. Now we’re ready to set up the DC for rendering with OpenGL. Pixel Formats The Windows concept of the GDI device context is limited for 3D graphics because it was designed read more..

  • Page - 680

    has the characteristics and capabilities that match the needs of your application. This pixel format is then used to create an OpenGL rendering context. There are two ways to go about looking for a pixel format. The first method is the more preferred and capable mechanism exposed by OpenGL directly. The second method uses the original Windows interfaces, which have been around read more..

  • Page - 681

    For a given OpenGL device (hardware or software), the values of these members are not arbitrary. Only a limited number of pixel formats is available for a specific window. Pixel formats are said to be exported by the OpenGL driver. To find a format that suits your needs, you should create an instance of this structure, fill in as many fields as you need (setting the read more..

  • Page - 682

    you want! When you call an OpenGL command, how does the driver know which window to send its output to? In the previous chapters, we used the book’s application framework, which provided a single window to display OpenGL output. Recall that with normal Windows GDI-based drawing, each window has its own device context. To accomplish the portability of the core OpenGL functions, read more..

  • Page - 683

    Listing 14.5. These in turn call the function associated with the window’s class (the lpfnWndProc member of the window class structure that we set to WindowProc earlier). Double Buffering The example program in the previous section requests a double buffered pixel format by specifying PFD_DOUBLEBUFFER the in the PIXELFORMATDESCRIPTOR when searching for a pixel format using ChoosePixelFormat() read more..

  • Page - 684

    functions to OpenGL, and this is true for WGL too. The Windows OpenGL implementation has a function named wglGetProcAddress() that allows you to retrieve a pointer to an OpenGL function supported by the driver, and its prototype is PROC wglGetProcAddress(LPSTR lpszProc); This function takes the name of an OpenGL function and returns a function pointer that you can use to call it read more..

  • Page - 685

    Table 14.1: Pixel Format Attributes Constant (WGL_*) Description NUMBER_PIXEL_FORMATS_ARB The number of pixel formats for this device. DRAW_TO_WINDOW_ARB Non-zero if the pixel format can be used with a window. DRAW_TO_BITMAP_ARB Non-zero if the pixel format can be used with a memory Device Independent Bitmap (DIB). DEPTH_BITS_ARB The number of bits in the depth buffer. STENCIL_BITS_ARB The number of read more..

  • Page - 686

    Table 14.1: Continued Constant (WGL_*) Description SHARE_STENCIL_ARB Non-zero if layer planes share a stencil buffer with the main plane. SHARE_ACCUM_ARB Non-zero if layer planes share an accumulation buffer with the main plane. SUPPORT_GDI_ARB Non-zero if GDI rendering is supported (front buffer only). SUPPORT_OPENGL_ARB Non-zero if OpenGL is supported. DOUBLE_BUFFER_ARB Non-zero if double buffered. read more..

  • Page - 687

    The function wglChoosePixelFormatARB() is a more advanced version of ChoosePixelFormat() that can used to find pixel formats that match requirements using the attributes in Table 14.1. Its prototype is BOOL wglChoosePixelFormatARB(HDC hdc, const int *piAttribIList, const float *pfAttribFList, UINT nMaxFormats, const int *piFormats, UINT *nNumFormats); It’s important to notice the “ARB” suffix on read more..

  • Page - 688

    The results returned by wglChoosePixelFormatARB() in the piFormats attribute are sorted with the “best” matching formats at the start of the list. The “best” match is defined by the implementation and is device dependent. It is usually advantageous to pick formats that the implementation thinks are the best match as long as they meet the requirements of your application. Some read more..

  • Page - 689

    Enumerating Pixel Formats Although the wglChoosePixelFormatARB() can choose a pixel format that matches your requirements, sometimes it is necessary to ask the OpenGL driver for a list of all of the formats that it supports and query their properties. The wglGetPixelFormatAttribivARB() and wglGetPixelFormatAttribfvARB() functions can be used for this purpose and their prototypes are BOOL read more..

  • Page - 690

    WGL_BLUE_BITS_ARB, WGL_ALPHA_BITS_ARB }; int nPixelFormatCount = 0; wglGetPixelFormatAttribivARB(g_hDC, 1, 0, 1, pfAttribCount, &nPixelFormatCount); for (int i=0; i<nPixelFormatCount; i++) { GLint results[10]; printf("Pixel format %d details:\n",i); wglGetPixelFormatAttribivARB(g_hDC, i, 0, 10, pfAttribList, results); printf(" Draw to Window = %d:\n", results[0]); printf(" HW Accelerated = read more..

  • Page - 691

    the value for the attribute. The attributes WGL_CONTEXT_MAJOR_VERSION_ARB and WGL_CONTEXT_MINOR_VERSION_ARB are used to explicitly ask for a specific context version of OpenGL. If your application was written for OpenGL 3.3, for example, you would pass in 3 as the major version and 3 as the minor version. Similarly, if your application needed an OpenGL 4.0 context, you could ask for read more..

  • Page - 692

    not a valid OpenGL version. If any of the bits specified for WGL_CONTEXT_PROFILE_MASK_ARB are not supported, the error WGL_ERROR_INVALID_PROFILE_ARB is thrown. OpenGL can share objects (textures, buffers, sync objects, and so on) between contexts. If you want to share objects between two or more contexts, create the first context, and then pass its handle in the hShareContext parameter read more..

  • Page - 693

    attribute of a pixel format by using the wglGetPixelFormatAttribivARB() function. There is, however, a catch-22 to these and all other OpenGL extensions. You must have a valid OpenGL rendering context before you can call either glGetString() or wglGetProcAddress() of most OpenGL functions. This means that you must first create a temporary window, set a pixel format (we can actually read more..

  • Page - 694

    other sample in this book, but OpenGL will do what it needs to do to make your application run in full-screen mode. Even though this issue isn’t strictly related to OpenGL, it is of enough interest to a wide number of our readers that we give this topic some coverage here. Creating a full-screen window is almost as simple as creating a regular window the size of the read more..

  • Page - 695

    Eliminating Visual Tearing If your application is able to draw quickly and call SwapBuffers at a faster rate than the refresh rate of the monitor, an ugly effect called tearing can occur. If your application calls SwapBuffers before the previous frame is finished being scanned out, someone using your application will see part of one frame and part of the next. The widely read more..

  • Page - 696

    Then, we delete the window using DestroyWindow : BOOL DestroyWindow(HWND hWnd); Finally, we can unregister the window class using UnregisterClass : BOOL UnregisterClass(LPCTSTR lpClassName, HINSTANCE hInstance); By calling each of these functions in the reverse order to which their corresponding setup functions were called, we return resources to the operating system and effectively clean up read more..

  • Page - 697

    true OpenGL 3.2, with shader enhancements to boot, then you must create and use a core profile rendering context. Since this book has left the compatibility context behind, we focus exclusively on using OpenGL 3.2 with Apple technologies. Many, but not all, of the examples elsewhere in the book can be made to run on OS X with some modification. OpenGL is a C API, and read more..

  • Page - 698

    We use these interfaces to do the setup for OpenGL in a window or on a display device. After that is out of the way, OpenGL is just OpenGL! GLUT is really a legacy framework (it was used for all sample programs for previous editions of this and many other OpenGL books), and we will talk about it only briefly last in this chapter. Our primary focus, then, for read more..

  • Page - 699

    Figure 14.3: The OpenGL Extensions Viewer is free on the Mac App Store. events. Objective-C classes are sub-classed from controls or are created from scratch to add application functionality. Fortunately, OpenGL is a first-class citizen in this development environment. Creating a Cocoa Program A Cocoa-based program can be created using the New Project Assistant in XCode. Figure 14.4 read more..

  • Page - 700

    Figure 14.4: The initial CocoaGL project Figure 14.5: Interface Builder is ready to build your OpenGL app. .xib file. In the object library, scroll down until you see the OpenGL View object. Click and drag this view over to the main window, and resize it to fill the main window. You can also resize the main window to taste. As shown in Figure 14.6, we now have an read more..

  • Page - 701

    from NSOpenGLView . Couldn’t be easier right? Well, in the words of the late Amelia Pond, “Okay Kid, THIS, is where it gets complicated.” Figure 14.6: The OpenGL window ready to go... or is it? Core Profile Support in Cocoa If you bring up the attributes inspector for the OpenGL view, you will find all sorts of nice settings and checkboxes that will allow you to read more..

  • Page - 702

    created to be derived from NSOpenGLView. The latter choice requires that less functionality of the base class be reimplemented, and in the likely event that Apple adds core profile functionality in the future, this choice will require less refactoring should we need to modernize the code later. Figure 14.7: Creating the basic NSView view class Listing 14.10 shows the definition read more..

  • Page - 703

    overridden methods, the most important of which is initWithCoder, the method that initializes our view and OpenGL context, shown in Listing 14.11. -(id)initWithCoder:(NSCoder *)aDecoder { NSOpenGLPixelFormatAttribute pixelFormatAttributes[] = { NSOpenGLPFAColorSize, 32, NSOpenGLPFADepthSize, 24, NSOpenGLPFAStencilSize, 8, NSOpenGLPFAAccelerated, NSOpenGLPFAOpenGLProfile, NSOpenGLProfileVersion3_2Core, 0 }; NSOpenGLPixelFormat read more..

  • Page - 704

    NSOpenGLPFAStencilSize, 8, NSOpenGLPFAAccelerated, NSOpenGLPFAOpenGLProfile, NSOpenGLProfileVersion3_2Core, 0 }; Note that you must terminate the array with 0 or nil. Next, you allocate the pixel format using this array of attributes. If the pixel format cannot be created, the allocation routine returns nil, and you should do something appropriate because as far as your OpenGL rendering is read more..

  • Page - 705

    Table 14.4: Continued Attribute (NSOpenGLPFA*) Description StencilSize A numeric attribute specifying the desired depth of the stencil buffer. MinimumPolicy A Boolean attribute that indicates the pixel format choosing policy should select color, depth, and stencil buffers equal or greater than the sizes specified by the previous attributes. MaximumPolicy A Boolean attribute that indicates for the read more..

  • Page - 706

    Table 14.4: Continued Attribute (NSOpenGLPFA*) Description SampleAlpha A Boolean attribute that when used with NSOpenGLPFASampleBuffers and NSOpenGLPFASamples hints to OpenGL that alpha values should be included in multi-sampling operations. RendererID A numeric attribute that specifies a specific OpenGL renderer ID. A notable example is kCGLRendererGenericID , which selects the Apple software renderer. read more..

  • Page - 707

    Table 14.4: Continued Attribute (NSOpenGLPFA*) Description Compliant A Boolean attribute that requires only OpenGL-compliant renderers be considered. This is implied unless the NSOpenGLPFAAllRenderers attribute has been specified. ScreenMask A numeric attribute that is a bit mask of supported physical screens. AllowOfflineRenderers A Boolean attribute that indicates offline renderers may be considered. read more..

  • Page - 708

    background thread loading textures and other data into a shared OpenGL context, which can then be used by another thread controlling a foreground context. A Couple More Wires Before any of our code in our custom derived class will be called, we have to actually connect out class to this view in Interface Builder. Do this in the identity inspector with the OpenGL window read more..

  • Page - 709

    Do Me First! Typical OpenGL rendering tasks usually require some one-time setup. Perhaps to preload all the textures, shaders, geometry, and so on that will be used during repeated rendering operations. The NSOpenGLView class has a method that is called before any other rendering operations occur called prepareOpenGL. Listing 14.12 shows the body from our example that merely prints to read more..

  • Page - 710

    Draw Your Stuff! Finally, we get to where all the action takes place. The typical NSView (or derived class such as NSOpenGLView ) calls the drawRect method to fill the view. This is where we can put our OpenGL rendering code. Listing 14.14 shows our short example rendering code, which does nothing more than clear the color buffer. -(void)drawRect:(NSRect)bounds { read more..

  • Page - 711

    show you how to create genuine double-buffered contexts with real buffer swaps when we get to the full-screen section a little later. Introducing GLKit GLKit is a helper framework intended to ease the transition from the OpenGL ES 1.x fixed-function pipeline to the new OpenGL ES 2.0 shader-based pipeline. Originally available on iOS 5.0, GLKit migrated to the desktop with OS X read more..

  • Page - 712

    work asynchronously loading textures in the background using a shared OpenGL context on another thread. The GLKTextureInfo class contains all the useful information to know about a loaded texture, its size, OpenGL target type, and so on. These read-only properties are listed and described in Table 14.5 below. Table 14.5: Read-Only Properties of the GLKTextureInfo Class Method Description read more..

  • Page - 713

    and plain demo by today’s standards. In the spirit of this original demonstration of OpenGL on a non-SGI platform, the rendering example program for this chapter is a recreation of the original Stonehenge. Largely an artist’s conception, the model does attempt to hold to what the original structure was thought at least by some to be. Having a real example program then gives read more..

  • Page - 714

    Call this anytime the window changes size for the correct viewport and projection matrix settings: void GLStonehenge::resized(int w, int h); Call this to update the scene from the current camera position to move forward: void GLStonehenge::render(void); Move the camera position forward within the environment: void GLStonehenge::moveForward(float distance); Rotate the camera left/right (in radians): read more..

  • Page - 715

    For a larger or more involved environment, you might well consider making an array of GLKTextureInfo pointers, but for the purposes of demonstration code, this makes the code easier to follow. In the GLStonehenge.mm file, the member function initModels is called to load all the model information for the environment. Typically on the Mac, we store application resources in the app read more..

  • Page - 716

    Whenever we need to reactivate this texture during rendering, we’ll again use the texture object name supplied by the GLKTextureInfo object: glBindTexture(GL_TEXTURE_2D, textureStones.name); 3D Math with GLKit Our 3D math needs for the Stonehenge demo are relatively simple. For our shaders, we need a perspective projection matrix, a camera transform, and a normal matrix for lighting read more..

  • Page - 717

    mCamera = GLKMatrix4MakeLookAt( cameraFrame.vWhere.x, cameraFrame.vWhere.y, cameraFrame.vWhere.z, vLooking.x, vLooking.y, vLooking.z, cameraFrame.vUp.x, cameraFrame.vUp.y, cameraFrame.vUp.z); Note that the GLKMatrix4MakeLookAt wants a point where the camera is looking, not the vector. We calculate this vector ourselves by simply adding the location to the direction in which we are looking (assuming a vector read more..

  • Page - 718

    accomplished by rotating the forward vector appropriately. We can create an appropriate rotation matrix with GLKMatrix4MakeRotation , and then transform our vector with it using GLKMatrix4MultiplyVector3 . The complete source for this short function is shown here, also in its entirety: /////////////////////////////////////////////////////////////// // The Camera can turn left or right only void read more..

  • Page - 719

    Figure 14.10: The Cocoa sample with the supporting files updated. In this case, we set the update interval to 0.0 seconds to get the highest frame rate possible. Setting this to 1.0/60.0 would attempt to render at 60 fps. We’ll see later another way to limit the frame rate, and why you might want to, but we want to see the frame rate change as we monkey with this read more..

  • Page - 720

    NSTimer *pTimer = [NSTimer timerWithTimeInterval: 0.0f target:self selector:@selector(idle:) userInfo:nil repeats:YES]; [[NSRunLoop currentRunLoop]addTimer:pTimer forMode:NSDefaultRunLoopMode]; } -(void)idle:(NSTimer*)pTimer { [self drawRect:[self bounds]]; } Next, the reshape function tells the engine that the screen has changed size: -(void) reshape { NSRect bounds = [self bounds]; stonehenge.resized(NSWidth(bounds), read more..

  • Page - 721

    acceptsFirstResponder and return TRUE . It must also register itself as the new first responder. The first responder is simply the first view in a hierarchy that is given the opportunity to respond to window events, such as keystrokes: - (BOOL)acceptsFirstResponder { [[self window] makeFirstResponder:self]; return YES; } Next, we need to respond to the keyUp and keyDown messages, and read more..

  • Page - 722

    moveFlags |= MOVE_RIGHT_BIT; break; } } -(void)drawRect:(NSRect)bounds { static float fDistance = 0.025f; static CStopWatch cameraTimer; float deltaT = cameraTimer.GetElapsedSeconds(); cameraTimer.Reset(); if(moveFlags & MOVE_FORWARD_BIT) stonehenge.moveForward(fDistance * deltaT); if(moveFlags & MOVE_BACKWARD_BIT) stonehenge.moveForward(fDistance * -deltaT); if(moveFlags & MOVE_LEFT_BIT) stonehenge.rotateLocalY(fDistance * read more..

  • Page - 723

    device independent coordinates are actually half the values they would be if the coordinate system were based on pixels. Fonts and GUI elements are then rendered at full resolution and everything looks “normal”...yet crisper because of the additional pixel density. An OpenGL application rendering at full-pixel resolution would then have considerably more pixels to fill, taking quite read more..

  • Page - 724

    technologies and APIs as well, and we can use it for our Stonehenge example rendering as well. We cover here just a few quick and easy but useful recipes for using CGL in our Cocoa-based application. There may be Cocoa equivalents to some of these, but the CGL version will also work with your GLUT-based or any other higher-level third-party applications frameworks you might read more..

  • Page - 725

    GLCoreProfileView class is still used and is identical except for two changes. First, we can remove the initWithCoder method completely from the class, as we will be setting the pixel format from the outside this time. Next, in the drawRect method, we are going to replace the glFlush() call with a bona-fide buffer swap: [[self openGLContext] flushBuffer]; Our OpenGL view is now read more..

  • Page - 726

    [fullScreenWindow makeKeyAndOrderFront:self]; makeFirstResponder:fullScreenView]; } Listing 14.16: Creating and initializing the full-screen window Listing 14.16 is taken almost verbatim from the official Apple OpenGL Programming guide. OS X automatically detects the full-screen window and will optimize the rendering and buffer swaps for best performance of a full screen application or game. The read more..

  • Page - 727

    Figure 14.11: Tearing caused by an unsynced buffer swap than one frame per vertical retrace, while setting it to two allows two vertical retraces between buffer swaps. For example, if the swap interval was set to one, and the display refresh rate was 60 (about typical), you would get no more than 60 fps. For a swap interval of two, you’d get a maximum of 30 fps, and read more..

  • Page - 728

    change the screen resolution to a smaller value. Before Snow Leopard, it was not uncommon for a full-screen OpenGL game, for example, to change the screen resolution before running, capture the display, and so on. Now that we no longer need a display capturing solution, we can make use of CGL’s ability to change the size of the back buffer instead of changing the screen read more..

  • Page - 729

    OpenGL core that offloads some of these tasks to another thread. On a multi-core system, this can have a positive performance impact. You can enable this feature by calling CGLEnable on the kCGLCEMPEngine flag: CGLEnable(CGLGetCurrentContext(), kCGLCEMPEngine); This does not always improve performance, and in fact, sometimes, it can reduce performance! If your OpenGL code is not hampered read more..

  • Page - 730

    Remove the AppDelegate.m /.h and main.m files, and then add the OpenGL and GLUT frameworks (you should know how to do this by now). Add your GLUT C and/or C++ files, and you are ready to go. Let’s go over a quick example. The body of main in a typical GLUT program is shown in Listing 14.17. //////////////////////////////////////////////////////////////////////////// // Main entry read more..

  • Page - 731

    OpenGL on Linux One great thing about OpenGL is that it’s supported on so many different platforms. We looked at how to use OpenGL on Windows and on Macs. Now let’s dig into 3D rendering on one of the most popular open source platforms — Linux. This section looks at how Linux supports OpenGL, how to pick a specific version of OpenGL, what interfaces are available read more..

  • Page - 732

    that support OpenGL 4.3. The most recent version of Mesa at the time of publication (9.0.x) supports OpenGL 3.1. What Is X? The X Window System is a graphical user interface that provides a more intuitive environment for users than a command prompt, similar to Microsoft Windows and Mac OS. X Window sessions are not restricted to use on local systems. For instance, you can read more..

  • Page - 733

    OpenGL and are considerably slower. You also need the header files and libraries for OpenGL and GLX. These are necessary for compiling your own applications. Checking for OpenGL Let’s quickly look at how you can make sure OpenGL is supported on your system. Without that, the rest of this chapter is pretty meaningless. Try running the glxinfo command, as shown here: glxinfo | read more..

  • Page - 734

    packages in your distribution, permitting easy installation. However, package-distributed versions of these tools may be outdated compared to those available by direct download from the project’s Web site. Setting Up Mesa The latest version of Mesa can be downloaded from the Mesa3D Web site; a link is provided in Appendix A, “Further Reading.” There you will find the download read more..

  • Page - 735

    Some hardware vendors may also provide an open source version of their display drivers. Although it is often nice to have the source for the driver build, these drivers are often slower, updated less frequently, and have fewer features or more limitations than their proprietary counterparts. It’s worth noting that some distros may have drivers prepackaged. These can be outdated, read more..

  • Page - 736

    The first command creates the make files you use to compile the code. The make files are custom made for each system because different resources may be located in different places on each system. The second command actually compiles the code, and the third installs the result. To use GLFW in your applications, you need to add the GLFW library to your link command: -lglfw read more..

  • Page - 737

    The first line creates a variable that contains the link parameters for libraries to be included. The one used here looks in both the standard lib directory for X11 as well as the version for 64-bit specific libraries. The second line lists the include paths the compiler should use when trying to find header files. CC = gcc selects the compiler to use. The next line read more..

  • Page - 738

    This call would look like int majorVer, minorVer; glXQueryVersion(dpy, majorVer, minorVer); Displays and X Windows Before we get too far into using GLX, there are a few prerequisites for understanding how GLX works on Linux (or many of the other UNIX derivatives for that matter). An OpenGL application runs inside a window on the X server. We mentioned earlier that X Windows read more..

  • Page - 739

    Use the display handle that you got from calling XOpenDisplay() . For our purposes, we can use the default screen for the screen parameter. When the call returns, nelements tells you how many configs were returned. There’s more to each config than its index. Each config has a unique set of attributes that represent the functionality of that config. These attributes and read more..

  • Page - 740

    Table 14.6: Continued Attribute (GLX_*) Description FBCONFIG_ID The XID for the GLXFBConfig . LEVEL The frame buffer level. DOUBLEBUFFER Is TRUE if color buffers are double buffered. STEREO Is TRUE if color buffers support stereo rendering. SAMPLE_BUFFERS Number of multi-sample buffers. Must be 0 or 1. SAMPLES Number of samples per pixel for multi-sample buffers. Will be 0 if SAMPLE_BUFFERS is 0. read more..

  • Page - 741

    You can query any configs to find the value of each of these attributes by using the glXGetFBConfigAttrib() command: int glXGetFBConfigAttrib(Display * dpy, GLXFBConfig config, int attribute, int *value); Set the config parameter to the config number you are interested in querying and the attribute parameter to the attribute you would like to query. The result is returned in the read more..

  • Page - 742

    color, depth, and stencil channels meet minimum requirements. The pBuffer, accumulation, and transparency values are less commonly used. For attributes you don’t specify, the glXChooseFBConfig() command uses default values implicitly. These are listed in the GLX specification. The sort mechanism automatically sorts the list of returned configs using an attribute priority. The order for the read more..

  • Page - 743

    unsigned int border_width, int depth, unsigned int class, Visual *visual, unsigned_long valuemask, XSetWindowAttributes *attributes); After choosing good values for creating your window and calling XCreateWindow() , the handle to the new window is returned. This window handle can then be used to create a corresponding GLX window. When creating the GLX window, the configs you use must be read more..

  • Page - 744

    GLX Strings You can query various GLX strings to get more information on what your system can do. One of the most important strings is the extension string. This is a list of all the extensions the current implementation of GLX supports. To get the extension string, use const char *glXQueryExtensionsString(Display *dpy, int screen); The returned string, or character array, is a read more..

  • Page - 745

    Creating Contexts One way you can create a new context is with the glXCreateNewContext() command: GLXContext glXCreateNewContext(Display * dpy, GLXFBConfig config, int render_type, GLXContext share_list, bool direct); When successful, this function returns a context handle that you can use when telling GLX which context you want to use when rendering. The config that you use to create this read more..

  • Page - 746

    GLint attribs[] = { GLX_CONTEXT_MAJOR_VERSION_ARB, 3, GLX_CONTEXT_MINOR_VERSION_ARB, 3, 0}; rcx->ctx = glXCreateContextAttribsARB(rcx->dpy, fbConfigs[0], 0, True, attribs); glXMakeCurrent(rcx->dpy, rcx->win, rcx->ctx); The new method, glXCreateContextAttribsARB() , takes an additional parameter and allows you to select exactly the context you want: GLXContext glXCreateContextAttribsARB(Display * dpy, read more..

  • Page - 747

    time. Setting the GLX_CONTEXT_CORE_PROFILE_BIT_ARB bit causes the driver to return a context containing only core functionality, no deprecated OpenGL functionality. Using this bit is a good way to prepare an application for the next revision of OpenGL where deprecated functionality may be removed. Setting the GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB bit asks the driver to create a context that read more..

  • Page - 748

    In GLX, a direct context is one that supports direct rendering to a local X server. To find out if an existing context is a direct context, you can call glXIsDirect() . This returns true if the context is a direct rendering context. glXIsDirect(Display * dpy, GLXContext ctx); Using Contexts To use a context you have created, call glXMakeContextCurrent() : glXMakeContextCurrent(Display read more..

  • Page - 749

    Likewise, a call to glXWaitX() ensures that all native rendering made before the call to glXWaitX() completes before any OpenGL rendering after the call is allowed to happen: void glXSwapBuffers(Display *dpy, GLXDrawable draw); When using a double buffered config, a call to glXSwapBuffers() presents the contents of the back buffer to the window. The call also performs an implicit read more..

  • Page - 750

    void glXQueryDrawable(Display *dpy, GLXDrawable draw, int attribute, unsigned int *value); There also is a set of functions for creating, dealing with, and deleting pixmaps and pBuffers. These are not covered here because we are not using and do not recommend you use pixmaps or pBuffers. Putting It All Together Now, for the fun part! Let’s put all this GLX stuff together and read more..

  • Page - 751

    We also need a visual to create the X Window. Once we have a config, we can get the corresponding visual from it: XVisualInfo *visualInfo; visualInfo = glXGetVisualFromFBConfig(rcx->dpy, fbConfigs[0]); After we have a visual, we can use it to create a new X Window. Before calling into XCreateWindow() , we have to figure out what things we want the window to do. Pick the read more..

  • Page - 752

    This little demo application shown in Figure 14.12 just draws two eyeballs that do their best to follow your mouse pointer around the window. Some math is done to figure out where to put the eyeballs, where the mouse pointer is, and where the eyeballs should be looking. You can take a look at the rest of the GLXBasics sample program to see how all this works together. read more..

  • Page - 753

    XDestroyWindow(rcx->dpy, rcx->win); rcx->win = (Window)NULL; XCloseDisplay(rcx->dpy); rcx->dpy = 0; Going Full Screen on X Just as with the other platforms, X-based systems such as most Linux desktops also support applications taking control of an entire screen. Modern Linux distributions ship with some form of intelligent window manager, and so it’s best to cooperate with it read more..

  • Page - 754

    OpenGL on Mobile Platforms This section peeks into the world of OpenGL ES rendering. This set of APIs is intended for use in embedded environments where traditionally resources have been much more limited. OpenGL ES dares to go where other rendering APIs can only dream of. There is a lot of ground to cover, and so we will focus on the basics for getting started. There read more..

  • Page - 755

    driver to support it. In addition, special hardware is often required to make each path efficient and fast. The OpenGL APIs streamline the feature set, only including a subset of the most common and useful portions of related OpenGL APIs. Recent versions of OpenGL have drastically reduced the functionality overlap, but these revisions include features and functionality that most read more..

  • Page - 756

    interfaces that were more applicable to devices beyond the personal computer. The first embedded API it developed was OpenGL ES. Khronos consists of many industry leaders in both hardware and software. Some of the current members are AMD, Apple, ARM, Intel, Google, NVIDIA, and Qualcomm. The complete list is long and distinguished. You can visit the Khronos Web site for more read more..

  • Page - 757

    no longer encumbers the API. This means applications can implement and use only the methods they need in their own shaders. The latest version of OpenGL ES is 3.0. It is based on and is backwards compatible with OpenGL ES 2.0. This version adds a laundry list of features and formats lifted from various versions of the full OpenGL spec to bring the mobile version a read more..

  • Page - 758

    chapter is more about showing you what the major differences are between regular OpenGL and OpenGL ES and less about describing each feature again in detail. OpenGL ES 3.0 OpenGL ES 3.0 and OpenGL 4.3 are surprisingly similar at the API level. Both have slimmed-down interfaces that have removed old cruft. However, OpenGL 4.3 has added many new features not yet available on read more..

  • Page - 759

    Shaders OpenGL ES 2.0 and 3.0 use programmable shaders in much the same way as OpenGL 4.3. However, the only two supported shader stages are vertex and fragment processing. OpenGL ES 2.0 and 3.0 use a shading language similar to the GLSL language specification, called the OpenGL ES Shading Language. This version has changes that are specific to embedded environments and the read more..

  • Page - 760

    void glShaderBinary(GLsizei count, const GLuint *shaders, GLenum binaryformat, const void *binary, GLsizei length); All platforms supporting OpenGL ES must accept either source or binary shaders. OpenGL ES 3.0 requires a runtime compiler be present, while binary shader support is optional. Check your device documentation to see which option works best for your platform. If you are targeting read more..

  • Page - 761

    them on a PC or Mac and then transferring them over to ES once things work as you expect. While OpenGL ES 3.0 does not natively support Geometry or Tessellation shaders, it does support transform feedback mode. This rendering mode allows you to capture the output of the vertex shader directly into a buffer object. This might allow you to run only the vertex shader on a read more..

  • Page - 762

    Framebuffers Similar to OpenGL 4.3, OpenGL ES 3.0 also supports framebuffer and renderbuffer objects. Applications can create and bind their own framebuffer objects, attaching render buffers or textures to do off-screen rendering. An improvement over OpenGL ES 2.0, now OpenGL ES 3.0 will allow multi-sampled renderbuffers and depth textures to be used with framebuffer objects. You can also read more..

  • Page - 763

    shows an example of OpenGL ES running in a game on a cell phone. This figure is also shown in Color Plate 15. But before that, there are a few issues unique to embedded systems that you should keep in mind while working with OpenGL ES and targeting embedded environments. Figure 14.13: OpenGL ES rendering on a cell phone Application Design Considerations For first-timers to read more..

  • Page - 764

    ARM CPUs dominate most of the embedded environment and are part of nearly every mobile phone or tablet. This can ease the burden when porting between mobile platforms, but is also a challenge as the instruction set and performance profile are different from desktop systems. ARM processors and mobile systems are typically regarded as being more power efficient than traditional read more..

  • Page - 765

    Vertex storage can also impact memory, similar to textures. In addition to setting a cap for the total memory used for vertices, it may also be helpful to decide which parts of a scene are important and divide up the vertex allotment along those lines. One trick to keeping rendering smooth while many objects are on the screen is to change the vertex counts for objects read more..

  • Page - 766

    small numbers. They are related by m × 2e, where m is the mantissa and e is the exponent. Fixed-point representation is different. It looks more like a normal integer. The bits are divided into two parts, with one part being the integer portion and the other part being the fractional. The position between the integer and fractional components is the “imaginary point.” read more..

  • Page - 767

    fractional component when the result is converted back to one of the operand formats. There are also math packages available to help you convert to and from fixed-point formats, as well as perform math functions. This is probably the easiest way to handle fixed-point math if you need to use it for an entire application. That’s it! Now you have an idea how to do basic read more..

  • Page - 768

    3D Application OS EGL OpenGL ES System Hardware Graphics Processor Figure 14.14: A typical embedded system diagram Windows, the display_id parameter you pass would be the device context. You can also pass EGL_DEFAULT_DISPLAY if you don’t have the display ID and just want to render on the default device. If EGL_NO_DISPLAY is returned, an error occurred. Now that you have a display read more..

  • Page - 769

    EGL also provides a method to query the current API, eglQueryAPI() . This interface returns one of the three EGLenum values previously listed: EGL_OPENGL_API , EGL_OPENGL_ES_API ,or EGL_OPENVG_API : EGLenum eglQueryAPI(void); On exit of your application, or after you are done rendering, a call must be made to EGL again to clean up all allocated resources. After this call is made, read more..

  • Page - 770

    EGLBoolean eglChooseConfig(EGLDisplay dpy, const EGLint *attrib_list, EGLConfig *configs,EGLint config_size, EGLint *num_configs); Table 14.8: EGL Config Attribute List Attribute (EGL_*) Description BUFFER_SIZE Total depth in bits of color buffer. RED_SIZE Number of bits in red channel of color buffer. GREEN_SIZE Number of bits in green channel of color buffer. BLUE_SIZE Number of bits in blue channel read more..

  • Page - 771

    Table 14.8: Continued Attribute (EGL_*) Description RENDERABLE_TYPE Native type of visual. May be EGL_OPENGL_ES_BIT or EGL_OPENVG_BIT . SURFACE_TYPE Valid surface targets supported. May be any or all of EGL_WINDOW_BIT , EGL_PIXMAP_BIT ,or EGL_PBUFFER_BIT . COLOR_BUFFER_TYPE Type of color buffer. May be EGL_RGB_BUFFER or EGL_LUMINANCE_BUFFER . MIN_SWAP_INTERVAL Smallest value that can be accepted by read more..

  • Page - 772

    First, decide how many matches you are willing to look through. Then, allocate memory to hold the returned config handles. The matching config handles will be returned through the configs pointer. The number of configs will be returned through the num_configs pointer. Next comes the tricky part. You have to decide which parameters are important to you in a functional config. read more..

  • Page - 773

    Table 14.9: Continued Attribute (EGL_*) Comparison Operator Default NATIVE_VISUAL_TYPE Equal EGL_DONT_CARE RENDERABLE_TYPE Mask EGL_OPENGL_ES_BIT SURFACE_TYPE Equal EGL_WINDOW_BIT COLOR_BUFFER_TYPE Equal EGL_RGB_BUFFER MIN_SWAP_INTERVAL Equal EGL_DONT_CARE MAX_SWAP_INTERVAL Equal EGL_DONT_CARE SAMPLE_BUFFERS Minimum 0 SAMPLES Minimum 0 ALPHA_MASK_SIZE Minimum 0 TRANSPARENT_TYPE Equal EGL_NONE TRANSPARENT_RED_VALUE Equal EGL_DONT_CARE read more..

  • Page - 774

    preallocated based on the expected number of formats. After you have the list, it is up to you to pick the best option, examining each with eglGetConfigAttrib() . It is unlikely that multiple different platforms will have the same configs or list configs in the same order. So it is important to properly select a config instead of blindly using the config handle. Creating read more..

  • Page - 775

    parameter is used to share objects like textures and shaders between contexts. Pass in the context you want to share with. Normally you pass EGL_NO_CONTEXT here given that sharing is not necessary. The context handle is passed back if the context was successfully created; otherwise, EGL_NO_CONTEXT is returned. Now that you have a rendering surface and a context, you’re ready to read more..

  • Page - 776

    If you plan to render to your window using other APIs besides OpenGL ES and EGL, there are some things you can do to ensure that rendering is posted in the right order: EGLBoolean eglWaitGL(void); EGLBoolean eglWaitNative(EGLint engine); Use eglWaitGL() to prevent other API rendering from operating on a window surface before OpenGL ES rendering completes. Use eglWaitNative() to prevent read more..

  • Page - 777

    Extending EGL Like OpenGL, EGL provides support for various extensions. These are often extensions specific to the current platform and can provide for extended functionality beyond that of the core specification. To find out what extensions are available on your system, you can use the eglQueryString() function discussed earlier. To get more information on specific extensions, you can read more..

  • Page - 778

    For the Home Gamer For those of us not lucky enough to be working on a hardware emulator or hardware itself, there are other options if you still want to try your hand at OpenGL ES. Several OpenGL ES implementations are available that execute on desktop operating systems. These are also great for doing initial development. NVIDIA and AMD allow you to create ES profiles read more..

  • Page - 779

    to Android already written in C or C++. This can be an ideal method for larger game engines, especially if they use existing non-Java libraries. There may also be some performance advantages to going this route. You also have more control over the windowing system and setting up EGL to match the needs of your application. On the other hand, using the NDK can add to code read more..

  • Page - 780

    Once your hardware is set up, you are ready to run the app. Make sure your device is plugged into your computer via USB. Select Run in Eclipse and select your device from the list. That’s it! You should now be seeing the StonehengeES app render using OpenGL ES on your device as shown in Figure 14.15. Figure 14.15: StonehengeES rendered on an Android phone Setting Up read more..

  • Page - 781

    Listing 14.18 runs through the basics of how we set up GLSurfaceView in GLview.java . In the constructor, the context is created first. Then, the GLSurfaceView Renderer is created, and setRenderer is called. At this point, a rendering thread is created, and rendering is kicked off. Further down in Listing 14.18, we handle touch events in the onTouchEvent class. This class gets the read more..

  • Page - 782

    Listing 14.19 covers some of the interesting parts of the initialization for our main class, GLStoneHenge . Note that parts of the function have been removed to allow room for the members we are talking about without having the listing run on for pages. The constructor allocates the models that hold the stone textures. initModels loads the textures, sets up the vertex arrays, read more..

  • Page - 783

    get a feel for the pieces we haven’t had space to cover here. Developing OpenGL ES applications for Android is surprisingly easy, and Android devices are readily available. Enjoy bringing your OpenGL ES projects to a mobile device near you! iOpenGL Apple has three mainstream devices that are powered by OpenGL ES. The iPhone, the iPod Touch, and the iPad. All three devices read more..

  • Page - 784

    Figure 14.16: The Xcode welcome screen Figure 14.17: Selecting an OpenGL-ES-based game (application) template templates in the upper pane, one of which is OpenGL Game. Even though we aren’t necessarily going to build a game, select this by clicking on it, select Next to specify a project folder where the new project will be created, and click the Create button. Once the read more..

  • Page - 785

    Figure 14.18: The starter OpenGL ES application selected or changed the combo box in the upper left to be one of the Simulator options and not one of the device options. Getting your app on the device and configuring your hardware certificate is well beyond the scope of this book, so we will restrict ourselves to using the simulator. As is typical for Xcode, just press read more..

  • Page - 786

    Using C++ on iOS The native iOS programming environment uses the Objective-C programming language. There is a good bit of passion and sometimes vitriol about this, as the majority of non-Mac programmers in the world would much rather use C or C++. In fact, a good number of Mac programmers would rather use C++ as it turns out. Other than making use of Apple’s frameworks, read more..

  • Page - 787

    Listing 14.20 shows the construction of the GLKView object, which contains the actual OpenGL ES context that we are rendering with. This object is created when the nib is loaded initially, and by default contains only a color buffer and a 24-bit depth buffer. -(void)viewDidLoad { [super viewDidLoad]; self.context = [[[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES2] autorelease]; if read more..

  • Page - 788

    Core to ES Moving our core profile Stonehenge example from the desktop to an iOS device is very straightforward. We’ll begin with the client code, then talk about the shader differences. We begin by adding the needed resources and source code to the project, using nearly the same code as we did in the Mac chapter. Our Xcode project post this process is shown in Figure read more..

  • Page - 789

    mapping a buffer. We of course have to use the GL_WRITE_ONLY_OES extension enumerate as well. float *pData = (float *)glMapBufferOES(GL_ARRAY_BUFFER, GL_WRITE_ONLY_OES); ... glUnmapBufferOES(GL_ARRAY_BUFFER); GLSL on iOS Shaders on OpenGL ES 2.0 have a few quirks from the desktop OpenGL Core profile equivalents. Although OpenGL ES 3.0 GLSL code requires a #version 300 ES as the first read more..

  • Page - 790

    Finally, we need to set the default precision for floating-point variables in the fragment program. Desktop GLSL also now has precision qualifiers, but this feature first débuted on OpenGL ES 2.0, and fragment programs for handheld devices require a default precision to be specified for floats. Typically, this is just a single line of code you’ll see at the top of the read more..

  • Page - 791

    The reverse of this allows you to free up any dynamically allocated memory or resources associated with the OpenGL context: -(void)tearDownGL { [EAGLContext setCurrentContext:self.context]; stoneHenge.cleanupModels(); } Note the context is made current before any operations that free OpenGL resources will be valid. From frame to frame, you are given a chance to update your model. In the read more..

  • Page - 792

    c=(char*) szParentDirectory; while (*c!= ’\0’) // go to end c++; while (*c!= ’/’) // back up to parent c--; *c++ = ’\0’; // cut off last part (binary name) ///////////////////////////////////////////////////////////// // Change to directory. Any data files added to the project // will be placed here. chdir(szParentDirectory); @autoreleasepool { return UIApplicationMain(argc, argv, nil, read more..

  • Page - 793

    We’ll leave it to you to explore the project in its entirety to see how the touch events are used to move the camera and navigate the model. Summary This chapter covered how to build native applications that use OpenGL on Windows with nothing but calls to the Win32 API, on Mac OS X with Interface Builder and Cocoa, and with direct calls to X on Linux. On Windows, read more..

  • Page - 794

    specific version of OpenGL. Finally, you saw how to clean up window system state after your application is finished. Finally, we touched on OpenGL’s leaner cousin, OpenGL ES, which is the dominant graphics API on mobile platforms. An example of porting an application from Mac to iOS (which is used on Apple’s mobile platforms), and then to Android was presented. Summary 745 read more..

  • Page - 795

    This page intentionally left blank read more..

  • Page - 796

    Appendix A Further Reading Real-time 3D graphics and OpenGL are popular topics. More information is available and more techniques are in practice than can ever be published in a single book. You might find the following resources helpful as you further your knowledge and experience. Other Good OpenGL Books McReynolds, T., and Blythe, D. (2005). Advanced Graphics Programming Using read more..

  • Page - 797

    Wolff, D. (ed.) (2011). OpenGL 4.0 Shading Language Cookbook. Packt Publishing. 3D Graphics Books Watt, A. (1999). 3D Computer Graphics, 3rd Edition. Addison-Wesley. Dunn, F., and Parberry, I. (2011). 3D Math Primer for Graphics and Game Development, 2nd Edition. A.K. Peters / CRC Press. Van Verth, J., and Bishop, L. (2008). Essential Mathematics for Games and Interactive Applications, read more..

  • Page - 798

    information covered in this book and offer vendor-specific OpenGL support, tutorials, demos, and news. • The Khronos Group OpenGL ES home page: http://www.khronos.org/opengles/ . • The OpenGL Extension Registry: http://www.opengl.org/registry/ • AMD’s developer home page: http://www.amd.com/developer/ • NVIDIA’s developer home page: http://developer.nvidia.com/ • The Mesa 3D OpenGL read more..

  • Page - 799

    This page intentionally left blank read more..

  • Page - 800

    Appendix B The SBM File Format The SBM model file format is a simple geometry data file format devised specifically for this book. The format is chunk based and extensible, with several chunk types defined for use in the book’s examples. This appendix documents the file format. SBM files begin with a file header, followed by a number of chunks, each started with a read more..

  • Page - 801

    The following field, size , encodes the size of the file header, in bytes. This represents the offset in bytes from the start of the file header to the start of the first chunk header, described in the next section. The size of the SB6_HEADER structure as defined is 16 bytes, and so size will normally be 0x10 . However, it is legal to store data between the header read more..

  • Page - 802

    Index Data Chunk The index data chunk encodes a reference to index data stored in the file’s data chunk (which follows the last chunk in the file). Its structure is as follows: typedef struct SB6M_CHUNK_INDEX_DATA_t { SB6M_CHUNK_HEADER header; unsigned int index_type; unsigned int index_count; unsigned int index_data_offset; } SB6M_CHUNK_INDEX_DATA; The first member of the chunk (as with all read more..

  • Page - 803

    The header of a vertex data chunk has the chunk_type 0x58545256 , which corresponds to a chunk_name of {’V’, ’R’, ’T’, ’X’} . The size of a vertex data chunk is expected to be 20 (0x14) bytes. The data_size member contains the raw size in bytes of the vertex data and the data_offset field contains the offset in bytes from the start of the first data chunk read more..

  • Page - 804

    defines the data type of the attribute. Examples are 0x1406 (GL_FLOAT), 0x1400 (GL_BYTE), and 0x140B (GL_HALF_FLOAT) although any legal OpenGL type token may be used here. It is expected that loaders will cast this field to a GLenum token and pass it to OpenGL unmodified. The stride field encodes the number of bytes between the start of each element. As with OpenGL, a stride read more..

  • Page - 805

    Object List Chunk Object list chunks represent sub-objects within a single SBM file. Each SBM file may contain many sub-objects. Sub-objects share a single vertex declaration, and their vertex and index data is contained within the same buffers. typedef struct SB6M_CHUNK_SUB_OBJECT_LIST_t { SB6M_CHUNK_HEADER header; unsigned int count; SB6M_SUB_OBJECT_DECL sub_object[1]; } SB6M_CHUNK_SUB_OBJECT_LIST; The read more..

  • Page - 806

    Example 00000000 | 4D364253 00000010 00000004 00000000 | SB6M............ 00000010 | 544E4D43 00000020 65724300 64657461 | CMNT ....Created 00000020 | 20796220 6D366273 6C6F6F74 00000000 | by sb6mtool.... 00000030 | 54534C4F 0000032C 00000064 00000000 | OLST,...d....... 00000040 | 000001B0 000001B0 00000240 000003F0 | ........@....... 00000050 | 00000240 00000630 00000240 00000870 | @...0...@...p... read more..

  • Page - 807

    This page intentionally left blank read more..

  • Page - 808

    Appendix C The SuperBible Tools This book’s source code not only includes most of the examples from the book in compilable form for many platforms, but it also includes a number of tools that were used to create the .SBM and .KTX files used by those examples. You can use these tools to create and manipulate .SBM and .KTX files to use in your own applications. The read more..

  • Page - 809

    pixelheight = 0x00000100 pixeldepth = 0x00000000 arrayelements = 0x00000040 faces = 0x00000000 miplevels = 0x00000001 keypairbytes = 0x00000000 As we can see from the output of ktxtool , the aliens.ktx file is an array texture containing GL_BGRA data stored in unsigned bytes. It is 0x100 × 0x100 (256 × 256) texels in size, and there are 0x40 (64) slices in the array. The texture does read more..

  • Page - 810

    The --toraw option will take data the other way — simply stripping the .KTX header from the file and writing the raw data into the output. Next, we come to the --makearray , --make3d , and --makecube options, which allow you to construct array textures, 3D textures, and cube maps from separate .KTX files. To use these options, the input textures must be compatible with one read more..

  • Page - 811

    It decodes the .DDS file header, translates the parameters to a .KTX file header, and then dumps the data from the .DDS file into the .KTX file. It does very little error checking or sanity checking. However, it does allow common content creation tools, including several texture compressors, to produce .DDS files that can then be converted to .KTX files for use with this read more..

  • Page - 812

    Sub-object 92: first 40896, count 432 Sub-object 93: first 41328, count 288 Sub-object 94: first 41616, count 504 Sub-object 95: first 42120, count 432 Sub-object 96: first 42552, count 432 Sub-object 97: first 42984, count 504 Sub-object 98: first 43488, count 288 Sub-object 99: first 43776, count 576 As we can see, the asteroids.sbm file contains roughly 850K of raw data. There read more..

  • Page - 813

    --input rock5.sbm \ --input rock6.sbm \ --input rock7.sbm \ --output asteroids.sbm --makesubobj The tool will take all of the sub-objects in each of the files, in the order that they’re specified, and stuff them all into one big output file. You can even keep reading and outputting to the same file to append more and more data onto the end of it. This is exactly how read more..

  • Page - 814

    Glossary Aliasing Technically, the loss of signal information in an image reproduced at some finite resolution. It is most often characterized by the appearance of sharp jagged edges along points, lines, or polygons due to the nature of having a limited number of fixed-sized pixels. Alpha A fourth color value added to provide a degree of transparency to the color of an object. read more..

  • Page - 815

    Associativity An sequence of operations is said to be associative if changing the order of the operations (but not the order of the arguments) does not affect the result. For example, addition is associative because a +(b + c )=(a + b )+ c . Atomic operation A sequence of operations that must be indivisible for correct operation. Usually refers to a read-modify-write sequence read more..

  • Page - 816

    Commutative An operation is said to be commutative if changing the order of its operands does not change its result. For example, addition is commutative whereas subtraction is not. Compute shader A shader that executes a work item per invocation as part of a local work group, a number of which may be grouped together into a global work group. Concave A reference to the read more..

  • Page - 817

    Eye coordinates The coordinate system based on the position of the viewer. The viewer’s position is placed along the positive z axis, looking down the negative z axis. FMA Fused multiply-add. An operation commonly implemented in a single piece of hardware that multiplies two numbers together and adds a third with the intermediate result generally computed at higher precision than a read more..

  • Page - 818

    Invocation A single execution of a shader. Most commonly used to describe compute shaders, but applicable to any shader stage. Khronos Group The industry consortium that manages the maintenance and promotion of the OpenGL specification. Literal A value, not a variable name. A specific string or numeric constant embedded directly in source code. Matrix A 2D array of numbers. Matrices read more..

  • Page - 819

    Pixel Condensed from the words “picture element.” This is the smallest visual division available on the computer screen. Pixels are arranged in rows and columns and are individually set to the appropriate color to render any given image. Pixmap A two-dimensional array of color values that comprise a color image. Pixmaps are so called because each picture element corresponds to a read more..

  • Page - 820

    Specification The design document that specifies OpenGL operation and fully describes how an implementation must work. Spline A general term used to describe any curve created by placing control points near the curve, which have a pulling effect on the curve’s shape. This is similar to the reaction of a piece of flexible material when pressure is applied at various points read more..

  • Page - 821

    Token A constant value used by OpenGL to represent parameters. Examples are GL_RGBA and GL_COMPILE_STATUS . Transformation The manipulation of a coordinate system. This can include rotation, translation, scaling (both uniform and non-uniform), and perspective division. Translucence A degree of transparency of an object. In OpenGL, this is represented by an alpha value ranging from 1.0 read more..

  • Page - 822

    Index a 100 megapixel virtual framebuffer listing (9.23), 401 1D textures, 244 2D array textures, loading, 163–165 Gaussian filters, 411 pixel formats, 630 prefix sums, 452 3D Linux, 682 math with GLKit, 667–669 abstraction layers, 4, 5 acceleration calculating, 269 structures, 579 access map buffer types, 601 synchronization atomic counters, 137 images, 176–177 memory, 129–133 textures, arrays, read more..

  • Page - 823

    application of fog in a fragment shader listing (12.24), 543–544 Application Programming Interfaces. See APIs applications barriers, 131–132 cleaning up, 646–647 Cocoa, 650 debugging, 582–589 design (OpenGL ES 3.0), 714 frameworks, 14–16 geometry shaders, 313–317 Linux, 687–693 loading textures from files, 144 performance optimization, 589–616 shaders, 5 starting, 21 tuning for speed, read more..

  • Page - 824

    brute force, 579 bubble, 494 buffers, 10, 92–95 asteroids, 254 back, 365 binding, 261 command, 590 data allocating memory using, 92–95 feeding vertex shaders from, 97–103 filling and copying in, 95–97 depth, 46 double buffering, 634, 661 element array, 279 EGL, 726–727 G-buffers, 548, 549–551 mapping, 600–603 object storage, 251 point indexes, 228 swap values, 637 TBO (texture buffer read more..

  • Page - 825

    commands (continued) stencil buffers, 348 storing transformed vertices, 259–275 glxinfo , 684 SwapBuffers() , 593 synchronization, 699 comments, chunks, 755 communication compute shaders, 444–449 between shader invocations, 299 commutativity, 110 comparison operators, 352 compatibility profiles, 9 compiling makefiles, 687 programs, 201–219 shaders, 218, 606–609 simple shaders listing (2.5), 18–19 read more..

  • Page - 826

    CPU (central processing unit) queues, 590 creating. See also configuration; formatting and compiling a compute shader listing (10.1), 438–439 a debug context with the sb6 framework listing (13.1), 582 and initializing the full-screen window listing (14.16), 676–677 integer framebuffer attachments listing (9.29), 415 program member variables listing (2.6), 21 shared contexts on Windows listing read more..

  • Page - 827

    discarding geometry in geometry shaders, 317–320 rasterizers, 273 dispatching the image copy compute shader listing (10.3), 444 dispatch, indirect, 439–441 displacement mapping, 300 GPU PerfStudio 2, 594 tessellation evaluation shader listing (12.23), 542 displaying. See also viewing an array texture–fragment shader listing (9.11), 373 an array texture–vertex shader listing (9.10), 373 EGL, read more..

  • Page - 828

    Ericsson Alpha Compression (EAC), 179 Ericsson Texture Compression (ETC2), 179 errors code, 584 compiling, 201 EGL, 727 linker, 204 shaders, 203 Essential Mathematics for Games and Interactive Applications, 718 ETC2 (Ericsson Texture Compression), 179 Euler angles, 72 evaluation, TES (tessellation evaluation shader), 284 Event Trace Logs, 592 examples compute shaders, 450–471 shader storage block read more..

  • Page - 829

    formatting (continued) pixels, 630–632 SBM model file format, 751–757 textures, 138–139, 182–185 windows, 628–630 formulas, calculating indexes, 250 fractals, rendering Julia, 566–568 fractional even spacing, 295 fractional segments, 295 fragments, 341–435 antialiasing, 384–399 color output, 357–364 depth testing, 351–355 early testing, 355–357 off-screen rendering, 364–384 opacity, 15 OpenGL read more..

  • Page - 830

    geometry in geometry shaders, 322–325 new vertices in a geometry shader listing (8.27), 323–324 geometry, 10 cubes configuring, 233 drawing indexed, 234 drawing commands, 249. See also drawing commands primitive restart, combining, 235–237 transformations, 63 uniforms, 121–126 geometry shaders, 36–38, 310–340 changing the primitive type in, 35–328 discarding geometry in, 317–320 generating read more..

  • Page - 831

    gl functions (continued) glDrawElementsBaseVertex() , 234, 235, 239 glDrawElementsIndirect() , 250, 251, 252 glDrawElementsInstanced() , 239, 240, 245, 709 glDrawElementsInstancedBaseVertex() , 239 glDrawRangeElements() , 709 glDrawTransformFeedback() , 492, 493, 604, 615, 622 glDrawTransformFeedbackInstanced() , 492, 493 glDrawTransformFeedbackStream() , 493 glEnable() , 41, 348, 391, 392, 540 glEnableVertexAttribArray() , read more..

  • Page - 832

    glTexBuffer() , 273 glTexParameteri() , 536 glTexStorage2D() , 138, 144, 154, 156, 167, 180, 389, 527 glTexStorage2DMultisample() , 389, 390 glTexStorage3D() , 161, 180, 389 glTexStorage3DMultisample() , 389, 390 glTexSubImage2D() , 138, 139, 144, 154, 185, 429, 433, 435, 527, 613 glTexSubImage3D() , 161, 614 glTextureView() , 182 glTransformFeedbackVaryings() , 260, 261, 263, 264 glUniform*(), 105, 106, read more..

  • Page - 833

    hardware, 4, 9 Linux, 685–686 queues, 590 rasterizers, 10 support, 625 hazards, 129, 137 HDR (High Dynamic Range), 403–404, 606 header of a .KTX file listing (5.36), 144–145 heads-up display (HUD), 485 Hermite curve, 198 High Dynamic Range. See HDR higher order surfaces, 324 highlights, specular, 505–509 hints, 209, 614 histograms, 405 history, 3, 6–10 of Linux, 682–683 of OpenGL read more..

  • Page - 834

    ISR (Interrupt Service Routine), 593 items, work, 47 iterating over elements of gl_in[] listing (8.18), 312 jaggies, 384 Julia set, 566–568 Khrones Group, 706–707 Khronos Texture File format, 529 Kilgard, Mark J., 680 knots, 87n4 .KTX (Khronos TeXture) format, 144, 145 ktxtool utility, 759–761 languages, overview of, 188–201 layers, 162 abstraction, 4, 5 rendering, 370–376 rendering using a read more..

  • Page - 835

    mapping buffers, 600–603 a buffer’s data store listing (5.3), 95 bump, 518 displacement, 300 environment, 522–532 cube maps, 527–532 equirectangular, 525–527 spherical environment maps, 523–525 gloss maps, 533 GPU PerfStudio 2, 594 normal, 518–522, 554–556, 605 rendering to cubes, 375–376 shadows, 534–540 tone, 404–409 vertex shader inputs, 228 marching rays, 560 masking colors, read more..

  • Page - 836

    multiple vertices, 24 attributes, 225 shader inputs, 100–102 multiple viewport transformations, 336–340 multiplication, 62 coordinate spaces, 63–66 matrices, 62 model-view transformations, 76–79 quaternions, 75–76 multi-sampling, 46n3 aliasing, 140 antialiasing, 387–389 textures, 389–393 multi-threaded OpenGL, 679–680 multi versions of functions, 252 naïve rotated point sprite fragment shader listing read more..

  • Page - 837

    packed data formats, 227 packed vertex attributes, 247 packets DMA, 593 present, 593 standard queue, 592 parallax, 379 parallelism, 4, 42 parallel lines, 80 parallel prefix sum (compute shader example), 450–462 parallization, 450–462 parameters domains, 334 mode, 265 points, 423–424 passing data between tessellation shaders, 296–299 pass-through geometry shaders, 311–313 vertex shader listing read more..

  • Page - 838

    prefix sum, 450–462 implementation using a compute shader listing (10.6), 453 pre-fragment tests, 345–357 pre-optimizing shaders, 609 present packets, 593 primitive mode tessellation, 285–294 primitiveMode values, 265 primitive processing, 283–340 communication between shader invocations, 299 cubic Bézier patches (tessellation example), 304–310 geometry shaders, 310–340 terrain rendering (tessellation read more..

  • Page - 839

    rates, sync frame, 677–679 RAW (Read-After-Write), 129 ray-plane intersection test listing (12.38), 578 rays, 560, 568–580 ray-sphere intersection test listing (12.36), 572 RC (rendering context), 628, 632–634 Read-After-Write (RAW), 129 reading back texture data, 434–435 from a framebuffer, 429–431 state or data from OpenGL, 597–600 textures, 148–165 from textures in GLSL listing (5.35), read more..

  • Page - 840

    roughness, 533 rules, uniform blocks, 111 samples centroid sampling, 395–399 coverage, 391–393 multi-sample antialiasing, 387–389 textures, 389–393 objects, 148 parameters, 149 rate shading, 393–395 types, 142 variables, 141 sampling rates, 384 SB6 sb6GetProcAddress() , 622, 635 sb6IsExtensionSupported() , 620, 622 sb6mtool utility, 762–764 SBM model file format, 751–757 chunk headers, 752 defined read more..

  • Page - 841

    shaders (continued) subroutines, 213–216 TCS (tessellation control shader), 284 TES (tessellation evaluation shader), 284 tessellation control, 33–34 evaluation, 34–36 passing data, 296–299 textures reading from in, 141–144 writing to in, 165–176 vertices, 24 feeding from buffers, 97–103 inputs, 224–229 multiple inputs, 100–102 outputs, 229–230 passing to data to, 28–29 shading cell, read more..

  • Page - 842

    standard queue packets, 592 starfields, rendering, 420–423 starting applications, 21 Linux, 683–687 transform feedback, 264–266 state, OpenGL ES 3.0, 713 stencil tests, 46, 348–351 stereo, rendering in, 379–384 Stonehenge, 663–665 stopping transform feedback, 264–266 storage buffer objects, 251 multiple streams of, 328–329 qualifiers interpolation and, 342–345 patch, 445 shared, 445 shaders, read more..

  • Page - 843

    textures (continued) points, 420 a point sprite in the fragment shader listing (9.30), 420 reading, 148–165, 434–435 rendering, 610 shaders reading from in, 141–144 writing to in, 165–176 stars, 420 targets, 139–140 TBO (texture buffer object), 266, 269 views, 181–185 wrap mode, 158–160 threads, multiple, 611–613 three-component vertices, 53 tightly packed arrays, 101 timer queries, read more..

  • Page - 844

    projection matrices listing (5.22), 123 stencil buffers, 351 texture data listing (5.34), 138–139 uniforms, 108 vertex attributes listing (3.2), 29 user-defined clipping, 279–282 user-defined framebuffers. See FBOs using. See also applying attributes in vertex shaders listing (5.5), 99 a function to produce faces in a geometry shader listing (8.29), 324 a gradient texture to color a julia read more..

  • Page - 845

    vmath::perspective function, 82 vmath::rotate function, 72 volume, 140 clipping, 276 local work groups, 441 vsync, 591 WAR (Write-After-Read), 131 warnings, shaders, 203 WAW (Write-After-Write), 131 web sites, 748–749 WGF (Windows Graphics Foundation), 627 WGL (Windows-GL), 623, 634–639 wglChoosePixelFormatARB() , 638–641 wglCreateContext() , 633, 635, 641 wglCreateContextAttribsARB() , 641, 642, 643, 646 read more..

  • Page - 846

    Color Plate 1: All possible combinations of blend functions Color Plate 2: Different views of an HDR image read more..

  • Page - 847

    Color Plate 3: Adaptive tone mapping Color Plate 4: Bloom filtering: no bloom (left) and bloom (right) read more..

  • Page - 848

    Color Plate 5: Varying specular parameters of a material Color Plate 6: Result of rim lighting example read more..

  • Page - 849

    Color Plate 7: Normal mapping in action Color Plate 8: Depth of field applied to an image read more..

  • Page - 850

    Color Plate 9: A selection of spherical environment maps Color Plate 10: A golden environment-mapped dragon read more..

  • Page - 851

    Color Plate 11: Result of per-pixel gloss example Color Plate 12: Toon shading output with color ramp read more..

  • Page - 852

    Color Plate 13: Real-time rendering of the Julia set Color Plate 14: Ray tracing with four bounces read more..

  • Page - 853

    Color Plate 15: OpenGL ES rendering on a cell phone read more..

Write Your Review