This week, we will learn more about the basic data structures needed to support 3d graphics. We'll assign a first OpenGL project, and gradually provide hints about how to carry it out. We begin by walking through the Graph2D program, to get familiar with how matrices etc. are used in real software. Then we look at some 3d data structures.
The first thing to know about Graph2D is that it's not really an OpenGL program. It just uses Microsoft Foundation Class' simple graphics output functions to get us into Windows programming. You can get this source code from the Reserve section of the library, or from the Zipped version of the code which is pointed to in the Week 2 lecture notes.
What does it do? When you run Graph2D, you see a small irregular four sided object in the lower left side of the screen. When you pull down the TRANSFORM menu, the three choices are TRANSLATE, ROTATE and SCALE. No matter which of these you select, you always get the same dialog box, called "Transform." This box asks you for two values named "X axis" and "Y axis". If we're doing a translation or a scaling, their uses are obvious. For a rotation, the X axis number is used to enter the number of degrees to rotate the object. If you're not careful, you'll rotate the object right out of the picture.
A series of transformations can be applied to the data; nothing changes in the visible picture. Then if you right-click inside the picture, the image is redrawn showing the cumulative effect of your transformations.
How does it do it? We can expect some data structures and some matrices to multiply into them. We also have to deal with a coordinate system that's upside down. In Windows, the upper left corner is (0,0); x runs west-to-east, and y runs north-to-south. To get to Cartesian coordinates with origin in lower left, we need to know MaxY. This depends on the screen, so we ask:
RECT clientRect;
GetClientRect(&clientRect);
int maxY = clientRect.bottom;
We also need a "device context", which is a way of talking to the window, so we use
CCclientDC dc(this);
We then transform points from cartesian to screen by applying
newX = x;
newY = maxY - y
dc.MoveTo(newX, newY);
Walking through the Cgraph2dView Header File.
We see 3 typedefs for MATRIX3X3, VECTOR and SHAPE. We see that CGraph2dView has one public attribute, which is a document named CGraph2dDoc. It has some protected attributes including m_polygon, m_vectors[4], m_rotate, m_xScale, m_yScale, m_xTranslate, m_yTranslate.
The polygon consists of an integer (nr. of vertices) followed by an array of vertex structures. Each Vertex structure has an x and a y member.
The functions provided are all obvious ones: DrawShape, Translate, Scale, Rotate and Transform. Service routines include MultMatrix, InitMatrix and CopyMatrix. We'll see how these all fit together in a moment.
We're generally ignoring the stuff that is outside of the "START CUSTOM CODE" and "END CUSTOM CODE" markers; it's provided by the Windows Wizards. Some of it will make sense later.
Walking through the graphvw.cpp implementation file.
The Constructor.(page 57, 58). We have to set up our data structures with a simple set of assignments to the vectors and polygons.
OnDraw is a required function of every window document. All it does is call DrawShape whenever the right mouse is clicked in the window.
OnTransformRotate is one of three routines that were built specifically to respond to events from the menu. It does the following things:
CTtransformDlg dlg;
dlg.m_xAxis = 0;
dlg.m_yAxis = 0;
int response = dlg.DoModal();
if (response == IDOK)
m_rotate = (int) dlg.m_xAxis;
This means: create a dialog box of the class we created with the wizard. Set its two displayable data members to 0. Then we tell that object to "Do modal". Modal interaction means that this object has the focus and will keep it until you click an "OK" button. If you don't click OK, it doesn't send the message IDOK. If you do, then your specialized code runs. In this case, it just sets a protected variable value, namely m_rotate.
OnTransformTranslate and OnTransformScale are very similar.
OnRButtonDown is another wizard-supplied function. The authors added the following custom code:
MATRIX3X3 m;
InitMatrix(m); // call to our own routine; makes an identity
matrix.
Translate(m,(int) m_xTranslate, (int) m_yTranslate);
Scale (m, m_xScale, m_yScale);
Rotate(m,m_rotate);
Transform(m_polygon, m);
Then it just sets the scalars m_rotate, etc. back to their starting values.
The first three routines just apply the appropriate changes to the new matrix m. Then Transform applies that matrix to the data in the polygon.
InitMatrix just sets the array to an identity matrix (diagonals = 1, rest = 0);
CopyMatrix copies one matrix to another.
MultMatrix is a three-deep loop that multiplies two matrices and puts the results into a third.
Some Graphical Routines. Now we see how Translate works.
void CGraphi2dView:: Translate(MATRIX3X3& m, int xTrans,
int yTrans)
{
MATRIX3X3 m1, m2;
then it sets the matrix m1 up as a translation matrix, I.
e.
MultMatrix(m2,m1,m);
CopyMatrix(m,m2);
}
When we see the translate terms at the bottom of the matrix, it is apparent that they use the vector*matrix model, whereas we used the matrix*vector model in our class lecture. Rotate and Scale are very similar to Translate.
Transform just consists of multiplying each vertex in Shape, by applying the compound matrix m. Note that this code violates the principle I introduced in class, to wit: always preserve your original data!
Finally, DrawShape. This routine incorporates the stuff previously seen about inverting the Y coordinates. It gets the maxY, then loops through the vertices, applying pDC->MoveTo(newX, newY) to the first point, and LineTo to all the others. Finally it calls LineTo to get back to the start.
Enough basics. Let's go do some 3d stuff!
Vertex. (plural = "vertices").
typedef struct vertex
{
int x, y, z, w;
} VERTEX;
The "w" is our familiar homogenous coordinate-helper. For now, just think of it as '1'. You can't actually see a vertex, because it's a mathematical point of zero size. But they're useful to define edges.
Edge. (plural = "edges". Not edgices. But you knew that.)
typedef struct edge
{
UINT vertex1, vertex2;
} EDGE;
Why would anyone want vertices to be represented by unsigned integers? That mystery is solved in the next section.
Wireframe model. Here we have a data structure that can support any number of edges.
typedef struct model
{
UINT numVerts;
VERTEX* vertices;
UINT numEdges;
EDGE* edges;
} MODEL;
Now the nature of an 'edge' is clear: it consists of the two indices
(that's "indexes" properly spelled) into the vertex
list, of the vertices that define a given edge. This data structure
is economical and easy to use, because if I change one of the
vertices, ALL the edges attached to it automatically change.
Now we declare some objects.
VERTEX cubeVerts[8] =
| EDGE cubeEdges[12] =
|
Simple, huh? A cube is well known to have twelve edges. I gotta get me some software that can draw in GIFs, so I can put more pictures in these lecture notes.
From World to Screen Coordinates. If we draw with parallel projection, we just throw away the Z values and get a square. Actually if we traverse all the edges, we draw the square twice and we also draw four trivial lines from the corner vertices to themselves (with "different z values"). The code for this algorithm is on page 70.
The more interesting case is the perspective projection. Here's the basic idea. We assume the viewer is at the origin.
The arbitrary point P (x,y,z) is to be projected onto the screen which is at distance d from the origin, along the Z axis (this is a lefthanded system.) Consider two similar triangles in the (x,z) plane: the one formed from (x,0,z), (0,0,z) and the origin is the "big triangle". The line from (x,0,z) to the origin meets the screen directly under the desired screen-image of P, which is at (Xs, Ys). So the smaller triangle is formed by the points (Xs, 0, d), (0, 0, d) and the origin.
These two triangles are similar. Therefore, (x/z) = (Xs/d). We can quickly solve and see that
Xs = x*(d/z).
and by identical arguments,
Ys = y*(d/z)
This makes sense. As z gets larger (point moves away from the viewer at the origin) we expect Xs and Ys to get smaller, and they do. If d gets larger, which means pushing the projection screen away from the viewer, the image gets larger.
Often, we don't have the luxury of putting the viewer at the origin.
Similar logic can be used to justify a number of other equations.
Here's one of them:
t = 1/(1 - z/e)
Xs = x * t;
Ys = y * t;
In this case, e describes how far the eye is from the model (I. e. a data structure containing our point P.)
A1.
Find a geometric situation which corresponds to the above formula,
and be prepared to explain it. (Hint: you DO own a large and expensive
graphics textbook!)
The Synthetic Camera Problem. The simple perspective trick described above would only work if your world data were so nice as to sit right out there behind the screen; I. e. in the positive octant (where x, y and z are all positive) and behind the viewing screen (I. e. with z greater then d.) In fact the perspective projection works even if z<d, but our picture gets harder to understand.
SO: How can we take an arbitrary set of data (like that cube we
defined above), and get it into the "gunsights" of our
synthetic camera, so we can reduce it to 2d screen coordinates?
To do that, we need a basic rule that helps us move data about,
from one coordinate system to another.
Imagine Disney World's Cinderella castle below, as we fly over in our helicopter (if you think I'm gonna try to draw it, you're more ambitious than I am!) Its data is described in world coordinates. The origin is down there in the middle of the town square, with x running off toward Tomorrowland and Y running right through the castle's front door. Z points up to the sky.
However, our camera is up in the helicopter. We want to point the camera's Z axis just off to the left of the castle, and the Y axis upward, and the X axis to the right. Then we want to somehow "map" all that data onto the (Xs, Ys) coordinate reference system. In fact it's a two step process:
1) Map the 3d data from world coordinates into 3d data in the camera coordinate system.
2) Project that 3d data into 2d screen coordinates. (this is the
easy part.)
Here's the rule: The transformation which moves coordinate system B's axes exactly onto the axes of system A, will also transform data from coordinate system A to coordinate system B.
That is, if we can find the right amount of translation, rotation and scaling to apply to the camera's coordinates, and put them into a matrix, then we will be able to apply that same matrix to the castle's world coordinate data and get camera-coordinate 3d data.
In class, we talked through how this can be done, and how an orthogonal matrix is constructed by using cross products. The essentials of the technique are described in FVD chapter 5, page 220 through 222.
The middle of page 222 has a paragraph that begins "The second way to obtain the matrix R". This is THE WAY TO GO; ignore the stuff on pages 217-221 unless you REALLY like being confused.
The motivation for wanting to move those three points (P1 P2 P3) onto the axes as specified, comes from Chapter 6 where the Synthetic Camera is discussed. We'll revisit it later. Now, we need to get some practical programming issues settled before you folks lynch me.
You have two choices here. You can pursue the AUX-based example I gave you previously, (which is probably easier if you're using C) or you can pursue the C++ example given. These notes follow the C++ track, in the hopes that we can all understand something of Windows by the end of the semester. The AUX example doesn't provide step-by-step instrutions for compiling the code, and the C++ one does. But it means we have to wade through a bunch of confusion. I'll try to make it brief.
Here are the links for this lecture's example software (zipped).
Download the Chapter 4 Example Folder
Download the Chapter 5 Example Folder
Essential ideas.
In the Reserve collection you will find copies of Chapter 4 and 5 of the Walnum book. These are the materials we're working with today. We have to push through a series of details that the AUX library would hide from us. Here goes.
0. The role of MFC (Microsoft Foundation Classes). This is the heart of Windows, and it's going to be calling all our stuff. MFC has specific methods that it executes, over and over. It must create a window (so your program must always have an OnCreate method); do something with it (so you need an OnDraw method); and get rid of it afterwards, with an OnDestroy method.
The AppWizard guides us through the construction of these necessary
things, and also kinda makes it harder to see what they all mean.
The forest and the trees, as usual.
1. OpenGL Data Types. These are the essential data types
defined by OpenGL.
| GLbyte | signed char |
| GLshort | short |
| GLint | long |
| GLsizei | long |
| GLfloat | float |
| GLclampf | float |
| GLdouble | double |
| GLclampd | double |
| GLubyte | unsigned char |
| GLboolean | unsigned char |
| GLushort | unsigned short |
| GLuint | unsigned long |
| GLenum | unsigned long |
| GLbitfield | unsigned long |
| GLvoid | void |
| HGLRC | HGDIOBJ |
We'll discuss their uses as we go along. The "GL" beginning is typical of most things in OpenGL.
2. Contexts. A context is the "environment" in which something happens. The basic Windows "setup" for making things appear, including
is called a Device Context (DC). Before you can print to a window, you must have created a DC. This "bundle of tools" has a handle which is provided to every function that draws in the window.
OpenGL has to use DC's, but it also needs another kind of context - a rendering context (RC). Just before OpenGL draws, the RC is "turned on" - which is called making it current. (Strange technology.) Making something current, means "making it be in effect right now." Right after OpenGL draws to a window, we must also make the RC not current. Finally, before the program ends, the RC must be deleted.
We always have to pass a DC to a function that draws on the screen; but the RC is part of the system's state. Here are the functions that manage RC's:
| wglCreateContext() | creates a rendering context |
| wglDeleteContext() | obviously, gets rid of an RC |
| wglGetCurrentContext() | returns a handle to the current RC |
| wglGetCurrentDC() | gets a handle to the DC that's asssoc. with the RC |
| wglMakeCurrent() | makes the RC current |
3. What's in a Rendering Context?
Answer: several things:
pixelFormatDescriptor. Well, you render with PIXELS, so most of the stuff (on pages 112 through 115) sets up specific details of what pixels your hardware supports, how you want them handled. For instance one of the elements in the pixelFormatDescriptor structure specifies the number of bits used to represent a color. Let's use the default values on page 115-116 for now.
MEANING: You use the pixelFormatDescriptor to tell the system what you WANT in the way of pixels in your rendering context. (It may not be exactly what you GET.)
ChoosePixelFormat. You then use this procedure to get a DC, and then to ask the system software to provide you with a number that tells which (of the available) pixelformats to use. This is called a pixelformat index.
MEANING: The OpenGL system is saying "we'll give you the nearest thing we have to the kind of pixels you asked for. Please refer to it by this index number from now on."
SetPixelFormat. When you get such an index, you still have to say "all right, now USE IT!". You do this with the SetPixelFormat command.
MEANING: I asked the druggist for something for our headache. He hands me a bottle and says "this might solve your problem." I say "OK, I'll buy it." Three steps.
DescribePixelFormat is a way to force the system to tell you what pixel format you're actually getting. It copies the internal specifications out into a pixelFormatDescriptor structure, for you to look at. We don't really need this right now.
MEANING: You can examine what OpenGL gave you, to see in what ways it matched the pixelFormatDescriptor you asked for. In our drugstore story, we are opening the bottle to see what's inside. This could happen before, or after, we buy it (execute SetPixelFormat.)
4. Managing a Rendering Context.
Using Visual C++, the recommended method is to create the window's DC each time the program draws to the window. As part of creating that DC we will make the RC current (turn it on.). The code is on pages 119-120 of the handout.
In the OnCreate method of the COpenglView class, we will do these things:
0. call CClientDC to create a temporary (but typical) Device Context;
1. Create a pixelFormatDescriptor;
2. call ChoosePixelFormat to see what is available for this DC;
3. call SetPixelFormat to put that option into effect in this DC;
4. call wglCreateContext to provide a handle (m_hRC) for this RC.
The temporary DC evaporates at the end of this OnCreate program segment. Its job is over.
Then, when we actually draw something (with the OnDraw method of the same class)
1. wglMakeCurrent(pDC->m_hDC, m_hRC); // just like we said we would...
2. DrawWithOpenGl(); // We call the function that we ourselves will have to write
3. wglMakeCurrent(pDC->m_hDC, NULL); // kill that sucker (turn
off the RC.)
Fairly simple to do, messy to look at. pDC is the parameter for the current DC, which was passed to OnDraw. Like all good Windows methods associated with the screen image, OnDraw has to receive a DC so it knows where to do its stuff.
5. Turning off the lights afterward
The OnDestroy method on page 120 has the simple job of calling wglDeleteContext(m_hRC) which deallocates the memory used by the RC. The DC that was using it will be destroyed in the cleanup the WM_PAINT message at a higher level. (The WM_PAINT handling process called OnDraw, and will be ending up after OnDraw is finished.)
6. DOING IT. Please work through the explicit instructions on pages 122-130, to construct your minimum-GL application and get it running! Do it soon, because any second now a Project Assignment is going to arrive.
Ah, I hear it now!
P1.
(Project 1. Due 2 October 96.) Construct an OpenGL program which
produces a perspective view of a letter of the alphabet, such
as your first initial. (I is easy, M is harder!) Your letter is
standing on a six sided horizontal polygonal base, and the base
can be rotated around its vertical axis of symmetry. You rotate
the platform ten degrees clockwise (seen from above) by pressing
the left arrow or L key ("move the front of the model to
the left"), and ten degrees counterclockwise by pressing
the right arrow or R key.
If you want to get fancy, you can produce your whole name
or some other short word, if you have a long name.
Let's leap right into Chapter Five so that we can put some imagery into our example, and begin to move toward the necessary knowledge to carry out Project 1.
The essential concepts and commands introduced in Chapter 5 are:
1. The last digit and character of OpenGL function calls look like 2d or 3f; the integer tells how many arguments, and the character tells what data types the arguments are. b=byte, d=double, f=float, i=integer, s=short, bv=byte vector; dv=double vector; fv=float vector; iv=integer vector, sv=short vector.
2. There are STATE variables - that is, global "facts" that remain true until you change them. For instance, the GL_COLOR_CLEAR_VALUE constant denotes a global clear color; GL_CURRENT_COLOR denotes the drawing color to be used. We'll see these constants used below.
3. Some useful functions:
glClearColor() takes four floating arguments; sets the RGBA
color for clearing the background. (Wonder why no '3f'?)
glClear(GL_COLOR_BUFFER_BIT) applies the above color, to
the pixel buffer
glColor3f() sets the drawing color.
glFlush() forces the system to draw what you ordered.
4. The essential structure of drawing operations:
glBegin(GL_LINES)
glVertex2f(0.25f, 0.25f);
glLvertex2f(.75f, 0.25f);
glEnd();
GL_LINES consumes the vertices in pairs, as end-points of unconnected
lines.
5. There are many drawing types that can be put in place of GL_LINES. These include
GL_POINTS - draws individual points, whose size in pixels
is set by
glPointSize(4.0f) which would set them to 4 pixels.
GL_LINE_STRIP is like a MOVETO followed by a series of DRAWDO
commands.
GL_LINE_LOOP is like a strip in which the line is automatically
closed.
GL_POLYGON draws a polygon. (But isn't that a line-loop?)
Polygons are considerably more complex than just line loops,
because the software is going to have to deal with filling and
shading them. So, to tell the system we're about to do polygons,
we must first call this:
glPolygonMode(GL_FRONT_AND_BACK,GL_LINE)
The first argument means "which side's drawing mode
am I setting now?" and can be GL_FRONT_AND_BACK, GL_FRONT
or GL_BACK.
The second argument means what to draw: GL_POINT (just the
vertices), GL_LINE (the edges) or GL_FILL (make it look solid.)
If you set one side of a polygon to be 'line' mode and the other
to be 'fill' mode, you get what amounts to a one-way mirror. You
can see out of it but not into it.
The Front of a Polygon is defined by the order the points are presented. Normally this is in the counterclockwise direction as seen from the front. However, we can use glFrontFace(GL_CW) which means that we're going to go around the wrong way. (Let's not do that.)
Stippling means making dots in a pattern. Chapter 5 tells you how, but I don't want to do that just now.
We should know that polygons must be convex. If you want to make
a concave polygon, you can make it from two or more convex ones.
The interior edge can be turned off so that (even if your edges
aren't the same color as your fill) the two adjacent convex polygons
look line one concave one. This is done via the glEdgeFlag command
as follows:
glBegin(GL_POLYGON);
glEdgeFlag(TRUE);
glVertex2f(-0.2f, 0.3f);
glEdgeFlag(FALSE);
glVertex2f(-0.2f, -0.1f);
glEdgeFlag(TRUE);
etc...
glEnd();
That tells you which edge(s) to make invisible.
GL_TRIANGLE is one of the most useful kinds of polygons, because most hardware is optimized for triangles. This option digests vertices three at a time.
GL_TRIANGLE_STRIP is in fact even more useful. After the first
three vertices, each successive one is considered as the remote
point of a new triangle whose base was defined by the previous
two vertices.
GL_TRIANGLE_FAN is almost the same as the strip, but it continues to share the first vertext. Thus the base of the new triangle is defined by the first vertex issued, the most recent vertex, and the new vertex. This overcomes an annoyance of triangle strips when you try to tile an entire surface; they don't actually turn back upon themselves very gracefully. Try it, you'll see what I mean.
GL_QUADS does quadrilaterals. Quads and triangles can be Gouraud-shaded
(that is, smoothly shaded by a relatively cheap mechanism) whereas
polygons of higher complexity must be cut up into smaller pieces.
Wrapping it up. It's a good idea to work through the example
in Chapter 5, if you haven't yet mastered Visual C++. It will
firm up your notions of what is connected to what.