Viewing Systems For 3D Engines

Introduction

OK, so perhaps you have an engine running that has a duck, rotating in the middle of the screen. Very nice, but not much use for doing 'real' things with. You need a camera. You need to be able to walk around, to view the duck at different angles, to zoom in. You need a proper viewing system.

The XYZ Rotation Angle System

So you think, I know, I'll make a system where my camera has 3 angles, X, Y and Z rotation. Then I just add on its position. So you build a matrix, apply that transformation to a model, then put it into camera space. It seems to work well... but it's rather limited. Specifying an orientation in terms of X Y and Z angles is tricky, and can cause problems. For example, once you start rotating around a little, a rotation in the X axis may not have yield the results you expected. You may also experience "Gimbal Lock", where rotations can cancel each other out, and so on.

However, all is not lost. It is a very simple system to implement, and you can make a simple, effective system using it. For example, you can easily construct a system that has viewing parameters of Yaw, Roll and Tilt. All you have to do is make a Y rotation matrix, about angle Yaw, a Z rotation matrix about angle Roll, and an X rotation matrix about angle Tilt. Multiply them in the order of YXZ, and you now have an easy way to move around in 3D space, changing the angles in an easy way. Simple, but it works

The UVN Vectors System

This is quite a good system. Its from 3D Computer Graphics, by Alan Watt. It works by defining 3 vectors, U, V and N. These vectors are right, up, and viewing direction, respectively. If you take your left hand, point your index finger forwards, stick your thumb up to the sky, and point your middle finger to the right, then your middle finger is U, your thumb is V, and your index finger is N. Imagine that these vectors orientate the viewing system. So that to rotate your 'head' to the left, you would rotate about the V vector, to roll, you would rotate about N, and so on. Also needed to use this system is an origin, the point of the camera. To use this system, you will need to build a viewing matrix. This is calculated by multiplying together two matrices. The system is given by:

Translation matrix =

| 1  0  0  0 |

| 0  1  0  0 |

| 0  0  1  0 |

| -x -y -z 0 |



Rotation matrix =

| Ux Vx Nx 0 |

| Uy Vy Ny 0 |

| Uz Vz Nz 0 |

|  0  0  0 1 |



CameraMatrix = Translation*Rotation

N can also be calculated as:

a = azimuth angle

e = elevation angle



Nx = sin(e)*cos(a)

Ny = sin(e)*sin(a)

Nz = cos(e)

To point at an arbitrary position, you will then need to define N as the unit vector between the camera and the point. U and V will be guessed at, and adjusted. To adjust the vector, simply do:

V = V' - (V'*N)*N

Where V' is your guess at V. U is the cross product of N and V.

Understanding How The Camera Model Works

I've had a few people talking to me on IRC, with difficulty understanding how this system works. Well, I'll give you a full explanation, by breaking it all down into small pieces, and re-assembling it into a way you can understand.

Part #1: UVN Vectors

This part alone has caused quite a lot of confusion. In short, UV and N are 3 vectors that are used to orientate the XY and Z values of your points. U is for the X axis, V is for the Y axis, N is for the Z axis.

Now, we all may go around, merrily saying that a point is at (1, 2, 3). But what do we really mean when we say this? Simple. The point is 1 unit from the origin in the X AXIS, the point is 2 units from the origin in the Y AXIS, and the point is 3 units from the origin in the Z AXIS. Why have I emphasised the term axis? Because that is the key to understanding the rotation.

When I say "X axis", what does that mean? Well, an axis can be considered as a unit vector - a direction. An axis is infinite is usually infinite in each direction. The X axis has the value (1, 0, 0). Now, imagine that to get a points X co-ordinate in camera space, we have to map a point onto the X axis. We would end up with something like:

NewPoint.X = XAxis.x*Point.x + XAxis.y*Point.y + XAxis.z*Point.z

Which simplfies down to:

NewPoint.X = 1*Point.x + 0*Point.y + 0*Point.z

Can you see what has occured? We don't just take the X value, we take the distance from the origin along the X axis. Now, as the X axis can be orientated in an arbitrary manner, then as the axis rotates, the point takes on new values: XY and Z are being combined in different ratios.

Recall that we have similar vectors from V and N - the Y and Z axes. Again, a similar approach is taken to them: not simply taking the Y or Z co-ordinate, but taking the distance from the origin along an axis. It is fundamental that you understand that part.

Now, UV and N are always perpendicular to each other, ie they will form the corner of a cube, regardless of rotation, the angles between each other will remain the same; just the orientation changes. To visualise this, take your left hand. Point out your index finger, stick your thumb upwards to the sky, and stick your middle finger out to the right. Your index finger is N, your thumb V, and your middle finger U. Rotate your hand a bit. As you can see, the vectors rotate around, changing the rotation of the camera, but still remain at the same angle to each other.

Camera Position

Camera position has also given a little trouble. What you have to remember is that camera position is specified in world co-ordinates, and when we perform the translation, we map the world back to the camera, not map the camera into the world. So, when we translate, the world will be moved towards the camera. To do that, we need to translate by a negative vector. And that vector is the camera position. Going back to my "hand" metaphor, imagine that the position of your hand is the position of the camera

Space, (0, 0, 0), And Other Such Things

Consider what happens when we apply -camera.pos to every co-ordinate in the space. They all get translated by the same amount, including the camera, which gets placed at (0, 0, 0) -- the origin. So, you can think of the camera position as being the origin of the world. However world space co-ordinates are not defined relative to that point, they are defined relative to (0, 0, 0) (NOT THE CAMERA POINT MIND!). So, the translation by -camera.pos moves the co-ordinates to be relative to the camera; in fact, you could make any point the centre of the world using such a system.

Also, consider what the rotation is doing. Try to think not so much of the camera itself rotating; rather the world surrounding it rotating, being backward mapped into the universe. IE we don't take the horse to market, we bring the market to the horse.

I hope this explains it clearly enough for you; bear in mind though that I can't put the thoughts directly into your brain, you have to work out the bits inbetween for yourself!

Fast Phong Shading

Introduction

Fast phong shading is always a goal of anyone writing a 3D engine. Despite some of its failings, Phong shading is still common in 3D systems, and people are always looking for faster and easier ways to approximate it. In this page, I'll discuss firstly real phong shading, then some of the approximations of it. I'll also explain why the document OTMPHONG.DOC is utter nonsense, and should be avoided at all costs.

Real phong shading

Real phong shading is done by interpolating the vertex normals across the surface of a polygon, or triangle, and illuminating the pixel at each point, usually using the phong lighting model. At each pixel, you need to re-normalize the normal vector, and also calculate the reflection vector. The reflection vector is calculated by:

R = 2.0*(N.L)*N - L

This vector does not need to be normalized. Then, we feed these vectors into the illumination equation for the phong lighting model, which is

I = Ia*ka*Oda + fatt*Ip[kd*Od(N.L) + ks(R.V)^n]

Here, the variables are:

Ia is the ambient intensity
ka is the ambient co-efficient
Oda is the colour for the ambient
fatt is the atmospheric attenuation factor, ie depth shading
Ip is the intensity of the point light source
kd is the diffuse co-efficient
Od is the objects colour
ks is the specular co-efficient
n is the objects shinyness
N is the normal vector
L is the lighting vector
R is the reflection vector
V is the viewing vector

To do multiple light sources, you sum the term fatt*Ip[kd*Od(N.L) + ks(R.V)^n] for each light source. Also, you need to repeat this equation for each colour band you are interested in.

Such shading is incredibly expensive, and cannot be done in realtime on common hardware. So, people have been looking into optimizations and approximations of this for a long time.

Angular Interpolation

One idea is to avoid the expensive dot-product and normalization operations by interpolating the angles. This sounds good in theory, but has a couple of problems:

Choosing a function to do the interpolation
Perspective problems

In the file OTMPHONG.DOC, the idea was to interpolate the angle, or cosine of the angle (I can't remember which) across the surface of the polygon. This doesn't work, because:

Cosine is NOT linear
Using linear interpolation you cannot get a value outside of your start value, and end value. So no highlight is possible.

OTMPHONG.DOC just re-invented Gouraud shading, and tried to turn it into Phong shading. It doesn't really work very well. Yes, I have tried it, I've tried a great deal of methods of implementing it, but it just doesn't work!

Another way I've heard of is using a Taylor series approximation. Admittedly, I haven't tried this yet; from what I gather, the idea here is to use a polynomial to approximate the cosine. People have used this in the past to generate sin tables, in 4k intros and what have you, so it should be possible. Again, I haven't tried it yet, if I ever do, I'll tell you what I find.

A halfway-house is the highlight test. Here, a polygon is tested, to see if the highlight falls on the polygon. If it doesn't, then you just use plain gouraud shading. If it does, you use your phong shader. The test follows:

For any vertex, is N*H >= t (threshold) ? If so, highlight = TRUE

For any edge, is N*H >= t at any point on that edge? If so, highlight = TRUE

For all edge, does N*H exhibit a maximum? If so, highlight = TRUE

If none of the above conditions, highlight = FALSE

H is the halfway vector, (L + V) / 2. I found this algorithmn in the book "3D Computer Graphics" by Alan Watt. I have yet to try this one, but it sounds workable. The only problem might be slight discontinuities in the shading. Again, this is guesswork, you'd have to implement it to see.

The Fake Method

Now I'll describe a method that is a bit odd, yet works very well in practice. Its the method most demos use to do their 'phong' shading, and is very fast.

First, we need to generate a 'highlight map'. This is a texture, which consists of concentric circles, the brightest in the middle, decreasing in brightness as they go outwards. So it looks like a highlight. This is easy enough to generate, a simple way being to take the distance from the centre, and get it into the range 0..256. Also, this map should be 256x256, for efficiency reasons.

Now we need to map the highlight onto the object. Again, this is easy to do. Take your vertex normal X value, multiply it by 128, and add 127 to it. Do the same for Y. Call these values P, Q (save confusion with U,V - texture co-ords). These are our texture co-ordinates! Now simply map the texture to the object, and hey presto, one nicely shaded object. As you rotate the object, the highlights will also move. Its a very handy algorithm I think, if a little odd. Its based on a trick called environment mapping, which you will find described elsewhere in these pages. You'll also notice that if you go behind your object, the lighting is the same. The lighting is effectively double sided. Rather odd. Another problem with the algorithm. Take it as an extra light source for free. Also, as this method is based on environment mapping, its only correct for parallel projection. But don't let that spoil your fun!

Extending this to multiple light sources shouldn't be too hard. For 2 light sources, you would have 2 maps, each having values in the range 0..128. You would then add them together, to get a colour value. Moving the light source around involves changing the P and Q values; using the normals sort of works, but after about 90 degrees, weird things start to happen.

Also one other thought. The phong map is basically a bunch of circles:

So, it is symmetrical about the centre. Why not just use one quadrant of the map, then use sign and quadrant information to convert the u,v co-ordinates? Would say some memory... Like so:

Well, I hope I helped to demistify Phong shading, and disprove some of the myths surrounding the subject. So please, don't come onto #coders, asking how to implement phong shading, or asking about that gibberish otmphong.doc file, you have all the information you need right here.

[Refences: Proper phong shading, 3D Computer Graphics 2e].

Lighting Your Texture Maps

Introduction

So, you've got your engine running, you have texture mapping, and some kind of shading in place, it runs fast, its smooth, but you feel somethings missing. You need to shade your texture maps. But you haven't got a clue how to do it! Well, I shall now explain the simple process of adding shading or transparency to your textures, in 256 colour mode.

16*16 Linear Palette

The easiest way to make primitive shading is to change your palette. Firstly, you will only be able to have 256 / n shades, where n is the number of colours you will be using. Secondly, the image quality is very poor. But, I'll explain it anyway, and you should at least implement it, if at least just to sharpen your coding skills. You'll find it leads on well to the better method...

Ok, the idea here is that instead of having just one big lump of 256 colours, you organise the colours into a table. Say a 16*16 table. Going down the rows you have increasing shade, and going across the table you have colours. Building this table is easy. You start at black for the first row, for every colour. Then, you fade up to the correct RGB in each successive row. So for colour 1:


R, G, B = RGB values for the colour

DeltaR, DeltaG, DeltaB = Amount to step at each row

DeltaR = (R << 8) / numshades

DeltaG = (B << 8) / numshades

DeltaB = (B << 8) / numshades

CurR, CurG, CurB = current red, green, blue

CurR = CurG = CurB = 0



for row = 0 to numshades - 1

        for colour = 0 to numcolours - 1

                palette[row*16+colour].red = CurR[colour] >> 8

                palette[row*16+colour].green = CurG[colour] >> 8

                palette[row*16+colour].blue = CurB[colour] >> 8



                CurR[colour] += DeltaR[colour]

                CurG[colour] += DeltaG[colour]

                CurB[colour] += DeltaB[colour]

        end

end

This is only pseudo-code. Coding this up is a 15-min maximum job. OK, now you've got that going, you need to light your textures. Again, this is easy. Simply do:

PutPixel(x, y, shade*16+texture[v][u])

Simple! Hold on to this idea of using a lookup table to get your colour, its a very important concept.

Using A Seperate Look-up Table

In the previous section, the lookup table was your palette. Now, we will create a seperate lookup table, which will let us do better shading, and improve the versatility of our code.

Allocate a 256*256 block of memory. If you're still in real-mode, and can't allocate this much memory, stop for a minute, and ask yourself what the hell you're playing at? You're using a computer that has somewhere between 4 and 32 megs of RAM installed, and you're constricting yourself to 1 meg? Get yourself a new compiler! If you're a C/C++ man, get DJGPP or Watcom C++. If Pascal is your thing, go for TMT Pascal, or perhaps GNU Pascal under Linux. Finally, if you're a masochist, and use 100% assembly, get yourself DOS32, or Trans PMODE extender. Personally I prefer DOS32 and Watcom.

Okay, that block of memory is going to be used for a lookup table. As before, you're going to increasing shade with increasing row, and colour running across the columns:

          Colour -->

        +----------------------+

Shade 0 |                      |

      1 |                      |

     ... ~~~~~~~~~~~~~~~~~~~~~~



    256 |                      |

        +----------------------+

That really should have been a .gif picture, but what the hell, you can see what I mean. Now you'll need to fill that table. And this is the tricky, time consuming bit.

Finding The Colour Of Best Fit

We only have a finite number of colours in our palette. And to fully shade the texture, we would need 256*256 colours. So, we need to find the best fitting colour for a given shade. The way this is done is pretty simple. You need to find the 'distance' between the target colour, and the colour in the palette. This gives you a measure of how close the colour is to the desired colour. Then return the colour with the minimum distance. Distance is found by :

Dist = sqrt((TargetR - R) ^ 2 + (TargetG - G) ^ 2 + (TargetB - B) ^ 2)

Set your best-fit distance variable to some high, impossible number, say a million. Loop for every colour, and if dist < best-dist, then set best-dist to dist, and best-colour to current-colour. If distance is 0, then just return the colour; you can't do better than a perfect fit. Otherwise, at the end of the loop, just return your best fitting colour. Your code should look something like :

int BestColour(float r, float g, float b, char *palette)

{

	int             n;

	int             bestcol = 0;

	int             bestdist = 256000;

	float           dist, rdist, gdist, bdist;

	float           red, green, blue;



	for(n=0; n<256; n++) {

		red   = (float) palette[0];

		green = (float) palette[1];

		blue  = (float) palette[2];

		rdist = red   - r;

		gdist = green - g;

		bdist = blue  - b;

		dist = sqrt(rdist * rdist +

			    gdist * gdist +

			    bdist * bdist);



		if(dist <= bestdist) {

			bestdist = dist;

			bestcol = n;

		}

		if(dist == 0) 

			return n;                        

		palette += 3;

	}



	return bestcol;

}

Its also a good idea to weight the rgb distances differently. Your eyes are generally more sensitive to the green band of colour, then red, then blue. So weight them perhaps:

0.30*Red

0.59*Green

0.11*Blue

This usually improves quality a bit. Also, you can avoid the sqrt() function call, by just comparing the squared distance. I'm not sure if this works perfectly as a replacement, but it seems fine to me. Please note that this is a pre-process, NOT to be done at rendertime (unless palette changes)

Calculating The Colour For A Given Shade

Calculating what colour to search for is simple enough. You can use any formulae you like for this, a phong lighting model, a transparency, a depth cue formula. The point is you need to calculate RGB for a given colour, and a given 'shade'.

Phong Lighting Model

The formula for this is:

Ambient + Diffuse + Specular



Which becomes:

Ia*ka*Oa + I*(Od*kd*(N.L) + Os*ks*(R.V)^n)



Ia is intensity of ambient

Ka is ambient co-ef

Oa is ambient colour. These are constant, set to whatever you like

I is intensity of light. Say 1.0

Od is colour for diffuse, replace with colour[n]

Kd is diffuse co-efficient. About 0.95 - 1.0 is good

(N.L) is angle of incidence, replace shade[n]

Os is specular colour, say 255

Ks is specular co-ef, say 0.75

(R.V) is angle of reflection. Replace with spec[n]

N is the shinyness. A value of around 20 looks fine.

You'll need to calculate this for R G and B seperately. Make the N.L and (R.V)^n terms into lookup tables. Then search for the value in the palette, and you're done.

Depth cueing

A simple approximation can be done with

k*Colour

Where k is shade/256. Its also a good idea to make the colour fade to white, instead of black, which gives the illusion of fogging.

Transparecy

Transparecy is slightly different. Instead of shade, you will have background-colour, and colour will be replaced with foreground colour. Formulae for this:

background * transparency*foreground + (1.0 - transparency)*background

Repeated for RGB. Its also worth considering the shape of the surface. For example, a sphere could be more transparent at the centre, becoming less transparent to the edges. Or you could fade transparencies. This will take quite a bit of memory though.

Putting It All Together

Now you need to put all this together. Your code might look something like:

Precalculate as much as possible



For every shade

        For every colour

                Determine target colour

                Colour = BestFit(target colour)

                Store Colour

        End

End

Its a good idea to pre-calculate this, and store it to a file, because it can take quite a while to run through.

Also, consider the alignment of such tables, and your texture maps. If they are aligned to 64k boundaries, we can improve our routine greatly. We can use the upper 16 bits of the address as the address, and the lower 16 bits as the index:

        mov     ebx, [pointer]

        mov     bl, [u]

        mov     bh, [v]

Loading values into BH has the effect of an automatic multiplication by 256. Remember to insert 3 instructions between loading BL/BH, and actually using them - or you will cause an AGI.

Realtime Bezier Curves And Surfaces

Introduction

One of the most visually displeasing features of polygon/triangle based 3D engines is that everything is flat, square, and jaggy. Fine for walls, floors, ceilings and so on. But not much good for things such as spheres, torii, cones and so on. Bezier curves are a common way of providing curves. And now, with forward differencing, its even possible to calculate and display realtime triangulated surfaces. There are endless applications for such surfaces. For example, smooth landscapes that deform in a nice, fluid way. This file will help you understand the theory and implementation of said curves

The Polynomial

A polynomial is a set of powers of a given number, x, each having a co-efficient. x is usually in the range 0 .. 1. For example, a quadratic is a polynomial of degree 2:

Ax^2 + Bx + C

Where the powers are 2, 1, 0. x^1 is x, and x^0 is 1. A polynomial of degree 5 would be:

Ax^5 + Bx^4 + Cx^3 + Dx^2 + Ex + F

For shorthand, we often write Q(t), P(x) or something similar, to represent the polynomial. Beziers are typically made from cubic polynomials:

At^3 + Bt^2 + Ct + D

With a seperate polynomial for X, Y and Z. Though we can make curves of a higher degree, cubic polynomials are usually the best compromise. The co-efficients for each polynomial are stored in a matrix:

     | ax ay az |

C =  | bx by bz |

     | cx cy cz |

     | dx dy dz |

These are NOT a*x, b*x and so on. 'ax' = 'a' co-ef for 'x', and so on. The matrix for a bezier curve is:

     

     | -1   3 -3  1 |

Mb = |  3  -6  3  0 |

     | -3   3  0  0 |

     |  1   0  0  0 |

However, these formulae and matrices on their own are not enough to make a curve, we need control points.

Control Points

Control points are points that control the shape of a curve. We will consider bezier curves using 4 control points: P1, P2, P3, P4. These 4 points form a convex hull, that the curve lies in. A convex hull is an enclosed, convex area of space. In this case, it will be a polygon, of 4 sides. P1 is the start point of the curve. P4 is the end point of the curve. P2 and P3 control the path of the curve. If you like, we tell the curve to go from P1 to P4 via P2 and P3. Note the curve does not intersect P2 and P3, but it will intersect P1 and P4.

Now we need to feed those control points to the polynomial. The way this is done is by making a geometry vector, Gb. Gb is defined as:

 | P1 |

 | P2 |

 | P3 |

 | P4 |

You will have 3 seperate vectors for X Y and Z. The vectors are then multiplied my Mb, the basis matrix:


                  | -1   3 -3  1 |   | P1 |

                  |  3  -6  3  0 |   | P2 |

| t^3 t^2 t 1 | * | -3   3  0  0 | * | P3 |

                  |  1   0  0  0 |   | P4 |

Here, the row matrix consisting of powers of t are the weights, previously seen in the cubic polynomial. The square matrix, and the column matrix are the co-efficients, when multiplied together. The product of this equation gets us back to the cubic polynomial, albeit in slightly modified format:

Q(t) = ((1 - t)^3)P1 + (3t(1-t)^2)P2 + 3(t^2)(1 - t)P3 + (t^3)P4

I put brackets around some of the terms to avoid confusion. So now, if we wanted to plot a curve, approximating it using 'n' line segments, we could just iterate t from 0 to 1, in steps of 1 / n, draw lines between the points, and we'd have a curve. There is, however, a faster way of doing it. I'll go into that later.

Bezier Surfaces

Now that you have your curves working, bopping around the screen, you might want to try your hand at making surfaces. You can build a surface by having 16 control points, which define 4 bezier curves. Instead of calling the curve Q(t), we will now call it Q(s, t). This diagram should help clarify things:

Note that the surface needn't be the above shape; thats just for visualization purposes. In any practical situation, the control points will be moved around, and won't look like that.

Making The Surface

Remember in the previous section, I talked about called the polynomial function P(t)? Well, when we're making bezier surfaces, we need to deal with two variables, effectively bringing us up a dimension. So, the format for a surface would be P(s, t). We can indeed go even higher, to 3 dimensions. 3 dimensional patches are used for a technique known as FFD, where we have a hyperpatch, that loosely resembles a convex polytope, such as a cube. We can bend and deform points lying inside that volume by moving around the control points of the hyperpatch. However, this is advanced stuff, I won't be covering that, just yet.

Now, to build the points for a surface, we will need to interpolate in two variables: one outer loop interpolating 't', plus one inner loop interpolating 's'.

You will need to interpolate each of the four curves seperately. Then, your current point on each of the curves parameterised by 't' becomes your 4 control points interpolated with 's'. Effectively, we are interpolating a curve along a curve. In the previous diagram, imagine that the 4 vertical lines each define a curve. In the outer loop, we shall take the current point along each of these 4 curves, and interpolate another curve going inbetween them.

If we wanted to plot the points of a bezier surface, giving n points in t, and m points in s, the following code would work:


For t=0 to 1 in steps of 1/n

        Get P1, P2, P3, P4 from Curve1(t), Curve2(t), Curve3(t), Curve4(t)

        Use P1, P2, P3, P4 to construct a curve, Q(s)



        For s=0 to 1 in steps of 1/m

                Plot Point Q(s)

        End

End

By now, you're thinking something like "Wait a minute, interpolating two curves, which use cubic polynomials = lots of muls. How the hell can that be realtime?". Well, the secret ingredient is something called Forward Differencing. This allows us to interpolate the curve using just adds. Much faster.

Forward Differencing

Forward differencing allows us to evaluate a polynomial using deltas. It is done by differentiating the basic expression:

a*t^3 + b*t^2 + c*t + d

Now, what we want to do, is to succesively differentiate the expression, until there are no multiplies left in it; just additions. That way we can get a quick inner loop. Recalling the basic trick for differentiating a power of 'x', which is:

             n

Function: a*x



        d         n-1

        -- = n*a*x

        dx

We can apply that to our basic cubic polynomial expression. Its pretty easy to do, I'll leave it to you as an exercise. All you have to do is differentiate the expression 3 times. The resulting code for interpolating along a 3D curve would look something like:

Point = P1

For n steps

        Plot Point

        Point.x += deltax;      deltax += deltax2;      deltax2 += deltax3;

        Point.y += deltay;      deltay += deltay2;      deltay2 += deltay3;

        Point.z += deltaz;      deltaz += deltaz2;      deltaz2 += deltaz3;

End

So we add deltax to point.x. Then we add deltax2 to deltax. Then we add deltax3 to deltax2. A very clever trick.

Back to the problem in hand. Your deltas will be:

Delta1 = a*(delta^3) + b*(delta^2) + c*delta

Delta2 = 6*a*(delta^3) + a*b*(delta^2)

Delta3 = 6*a*(delta^3)



Or in matrix form:



    |         0         0      0  1 |   | a |

D = |   delta^3   delta^2  delta  0 | * | b |

    | 6*delta^3 2*delta^2      0  0 |   | c |

    | 6*delta^3         0      0  0 |   | d |

With seperate deltas X Y and Z. A, b and c are taken from the bezier calculations. Heres my bezier plotter code for those of you who are stumped:

void DrawBezierCurve(VECTOR p1, VECTOR p2, VECTOR p3, VECTOR p4,        

                     BYTE colour, LONG numsteps)

{

        NUMBER          xdelta, xdelta2, xdelta3;

        NUMBER          ydelta, ydelta2, ydelta3;

        NUMBER          zdelta, zdelta2, zdelta3;

        NUMBER          deltastep, deltastep2, deltastep3;

        NUMBER          ax, ay, az;

        NUMBER          bx, by, bz;

        NUMBER          cx, cy, cz;

        NUMBER          i;

        VECTOR          v1, v2;



        deltastep = 1.0 / (NUMBER) numsteps;

        deltastep2 = deltastep*deltastep;

        deltastep3 = deltastep2*deltastep;



        ax = -p1.x + 3.0*p2.x - 3.0*p3.x + p4.x;

        ay = -p1.y + 3.0*p2.y - 3.0*p3.y + p4.y;

        az = -p1.z + 3.0*p2.z - 3.0*p3.z + p4.z;



        bx = 3.0*p1.x - 6.0*p2.x + 3.0*p3.x;

        by = 3.0*p1.y - 6.0*p2.y + 3.0*p3.y;

        bz = 3.0*p1.z - 6.0*p2.z + 3.0*p3.z;



        cx = -3.0*p1.x + 3.0*p2.x;

        cy = -3.0*p1.y + 3.0*p2.y;

        cz = -3.0*p1.z + 3.0*p2.z;



        xdelta = ax*deltastep3 + bx*deltastep2 + cx*deltastep;

        ydelta = ay*deltastep3 + by*deltastep2 + cy*deltastep;

        zdelta = az*deltastep3 + bz*deltastep2 + cz*deltastep;



        xdelta2 = 6.0*ax*deltastep3 + 2.0*bx*deltastep2;

        ydelta2 = 6.0*ay*deltastep3 + 2.0*by*deltastep2;

        zdelta2 = 6.0*az*deltastep3 + 2.0*bz*deltastep2;



        xdelta3 = 6.0*ax*deltastep3;

        ydelta3 = 6.0*ay*deltastep3;

        zdelta3 = 6.0*az*deltastep3;



        v1 = p1;



        for(i=0; i<numsteps; i++) {

                v2.x = v1.x + xdelta; 

                xdelta += xdelta2; 

                xdelta2 += xdelta3;                



                v2.y = v1.y + ydelta; 

                ydelta += ydelta2; 

                ydelta2 += ydelta3;                



                v2.z = v1.z + zdelta; 

                zdelta += zdelta2; 

                zdelta2 += zdelta3;                



                DrawLine3D(v1, v2, colour);

                

                v1 = v2;

        }

}

The typedef NUMBER is a float. I use 'NUMBER' because I find it nice and easy to read. Also it allows be to change my floating point precision at the drop of a hat. The rest should be self explanatory.

Surfaces And Forward Differences

Coding surfaces using foward differences is a little tricky, but it works along the same principals. You will have 4 sets of deltas, for each of the 4 curves that we take our control points from. A good idea is to typedef a struct, and keep them in there. Also when you interpolate in S, use another function to do the work. Your code will then look something like:

Calculate forward differences for curves 1-4



For n steps

        Build a curve from current points on curves 1-4

        InterpolateCurve(p1, p2, p3, p4, m, buffer)

        Interpolate curves 1-4 using foward deltas

End

If you code all this correctly, you will then have a realtime bezier surface. And they're great fun! Download the demo from page 1 of this site, that shows you just what can be achieved. I may add textures + shading to that soon, I left it in wireframe because :

normals take ages to calculate
Sometimes produces backfaces, which are then removed
Wireframe makes it easy to see what is going on

There is also a way to calculate the normals to the surface. I haven't got round to implementing this yet though. When I do, I'll update this page.

ZBuffering

Introduction

ZBuffering is a very handy graphics algorithm. Invented by Catmull in '79, it allows us to paint objects to the screen without sorting, without performing intersection calculations where objects interpenetrate, to paint in whatever order we like, and to paint any kind of object we like. But there is one big problem with it: unless in hardware, its slow and inefficient. But you might as well add it to your engine, its very useful. A ZBuffer type system is also used in Radiosity, if you are calculating form-factors in the hemicube algorithm.

How It Works

The basic idea behind zbuffer is very simple: If the pixel we are currently painting is the closest yet painted, write it. If not, don't write it. To identify if the pixel is the nearest, we need a memory buffer - the Z buffer. Here we store the Z of each pixel painted so far. Pseudo-code for a naive implementation of this would be:

Clear Z-Buffer to a certain value

For every polygon

        For every scanline of that polygon

                For every pixel

                        If this is nearest pixel {

                                Write Pixel

                                Write Z into zbuffer

                        }

                End

        End

End

If you are wondering where to get your Z from, then its simply a case of linear interpolation. There are lots of documents available to cover that, so I won't go into that right now. Simply take the Z values at the vertices, and interpolate across the edges. Then, take the Z value at either edge, and interpolate that, getting you a Z value for each pixel.

Note that we needn't be using solely polygons. The implementation could equally be:

Clear Z-Buffer to a certain value

For every primitive

        For every scanline of that primitive

                For every pixel

                        If this is nearest pixel {

                                Write Pixel

                                Write Z into zbuffer

                        }

                End

        End

End

So we could build a graphics engine that handles things such as real spheres, torii etc. Such engines are very versatile.

However, there are two main problems with the z-buffer:

Clearing the zbuffer
Re-writing pixels that have already been written (overdraw)
Z-compression with distant objects
Jumping at every pixel!

Improvements

There are a number of improvements we can make to the z-buffer algorithm. The first is to use 1/z values. Such values are linear in screen space, are always in the range -1 to +1, and do not suffer from z-compression. Clearing the zbuffer simply becomes a case of memset(zbuffer, 0, zbuffersize), because 0 in floating point is 00000000h. We must however invert our condition, from:

If CurrentZ <= ZBuffer[y][x]

To:

If CurrentZ >= ZBuffer[y][x]

An advantage of using 1/z is that operations can be done in parallel with the FPU. So for example while you are stepping through your Z in the FPU, you can be calculating the next address, decrementing your counters etc in the IPU.

The penalty of clearing the Z-buffer can be improved with dirty rectangles. The idea behind this is that you just clear the part of your screen (and Z-buffer) that you have written to. However, this is not much use if you're doing full screen animation. Alternatively, you can try adding some value to your Z every frame. Eg if you are using a 32-bit integer zbuffer, try adding on a value greater than the far clipping limit; for example 65536. Apparently, using this technique, you can avoid clearing the frame buffer. I think pseudo code would go something like:

// Init Stuff...

        zadd = 0

        ClearZBuffer()



        for(;;) { // Render loop

                Project, Transform, Light Polygons etc...

                Clear Screen

                Add "zadd" to every Z value to be interpolated

                Render to zbuffer

                Add 65536 to zadd

                if zadd overflows {

                        ClearZBuffer

                        zadd = 0

                }

        }

Using this method, you can avoid clearing your buffer for literally hours, if you select the correct value. For those of you (like me) who use a floating point 1/z representation, I think this may still work; but I'm not sure if you have to do 1 / (z + zadd) or (1 / z) + zadd. Here I would expect a zadd increment of 1 should work well.

There is also a variation on Z-Buffer, called S-buffer. The idea being that we deal with segments, rather than pixels. A system similar to this is described in another document on this web page.

The Basics Of Raytracing

Introduction

Raytracing is an ever popular form of rendering, mainly due to its accurate modelling of phenomona such as reflections, refractions, specular highlights and so on. Its an elegant algorithm, providing simple solutions to many of the problems that plague traditional rendering methods. It also happens to be extremely slow. In this page, I'll run through a the basics of the algorithm, plus a little information on how to implement the simpler effects.

Algorithm

The algorithm is pretty straightforward. The idea is that we trace rays through a scene, adjusting colour information and so on as we go. We fire a ray for each pixel, trace it through the scene, and evaluate its colour. Light bounces off primitives, possibly spawning new rays, shadows are checked for, lighting calculations are performed. Everything is handled it one simple, slow algorithm. Clipping, projection to screen, lighting, shadows, the lot.

Rays

The basic equation of a ray is:

Origin + t*Direction

Origin is the point of origin for the ray, not the world. Direction is a normalized vector, pointing in the direction of the ray. t is a parameter, specifying a point along the path of the ray. Sometimes the equation is written as:

v1 + (v2 - v1)*t

Here, t is in the range 0-1. The direction vector is not normalized. Obviously here we can use this to determine if a point is on a ray by finding t, and checking to see if 0 <= t <= 1.

Shooting a ray for a given pixel is simple. The origin is your camera position, the direction is built from the screens pixel, and a Z component. The Z component seems to behave like d in the perspective projection equation. Anyway, I use the following:


origin.x = origin.y = 0

origin.z = -256

dir.x = pixel.x

dir.y = pixel.y

dir.z = 256



Normalize dir

Note that pixel must not be in range 0..width, 0..height, but in the range -width/2..+width/2, -height/2..+height/2. Otherwise your routine will not work correctly.

Intersecting Spheres

A common primitive in raytraced scenes is the sphere. Its pretty easy to code, looks nice, useful for debugging. The equation for a sphere is

(x - a)^2 + (y - b)^2 + (z - c)^2 = r^2

Where X-Y-Z are a point on the sphere, and A-B-C is the centre of the sphere. What we are doing here is basically specifying a point of constant distance from another point, the centre. Any point on a sphere is always distance R from the centre. Now on to intersection. Firstly, subsitute in the ray equation:

((o.x + dir.x*t) - a)^2 + ((o.y + dir.y*t) - b)^2 + ((o.z + dir.z*t) - c)^2 = r^2

We need to re-arrange to get the equation in terms of t. You can do the maths yourself if you want, and I would recommend that you do, but for the more mathematically challenged amongst us, the solution is:


A = (dir.x^2 + dir.y^2 + dir.z ^2)

B = 2*(dir.x*(o.x - a) + dir.y*(o.y - b) + dir.z*(o.z - c))

C = (o.x - a)^2 + (o.y - b)^2 + (o.z - c)^2 - r^2



Giving the quadratic equation:



A*t^2 + B*t + C

I laid the formula out so you can see it is a quadratic. A quadratic is a 2nd degree polynomial, Ax^2 + Bx + C. Now, we can find t by using the equations:

Discriminant = b^2 - 4*a*c



t = (1/2a) * [-b +/- sqrt(discriminant)]

And this equation can be simplified even more, because some of the co-efficients work out to be 1. So we have:

t = (-b +/- sqrt(b*b - 4*c)) / 2

The discriminant can be used to find information about the roots of the equation. If we call the discriminant 'd', then:

d > 0, we have 2 different real roots

d = 0, 2 equal real roots

d < 0, roots are complex

We're only interested in real roots, so we can bail out if d < 0. If d = 0, then the ray just grazes the sphere. If d > 0 though, we have 2 intersections. -b - sqrt(d) gives the nearer one, -b + sqrt(d) gives the further one. So when you put all this into code, you will have to:

Calculate d
Check to see if d is legal
If not, exit
If so, find closer of two solutions
Return that to some procedure

Once we have determined the closest intersection, we need the normal, for calculation. Again, simple enough for a sphere:

(Intersection - Centre) / Radius

The length of the vector is constant for a point on the sphere, because all points on a sphere are the same distance from the centre. So calculate this, feed it to a lighting formula.

Polygons

Polygons are a little more tricky. First, you'll need the plane equation, which is:

Ax + By + Cz + D = 0

Then, subsitute in the ray equation for x-y-z:

A(o.x + dir.x*t) + B(o.y + dir.y*t) + C(o.z + dir.z*t) + D = 0

Factorise for t:

t*(A*dir.x + B*dir.y + C*dir.z) + (A*o.x + B*o.y + C*o.z + D) = 0

And finally re-arrange for t:

        (A*o.x + B*o.y + C*o.z + D)

t = -  -----------------------------

       (A*dir.x + B*dir.y + C*dir.z)

This will give you a point on the plane. A denominator of 0 means the ray and plane are parallel - so chuck it out. Also if t < 0 then its behind the origin of the ray - chuck it out! A simple way of checking to see if the point is contained in the polygon is to then project down to 2-d, and do a 2-d point-in-polygon check. Use the ABC co-efficients to determine which axis to project down, if A is greatest, use X, if B is greatest, use Y and so on.

There is another algorithm however, but this works fore triangles. The idea is to use barycentric co-ordinates. I found this from an article in the Ray Tracing News, so credit goes to that author. Barycentric co-ordinates can map one triangle onto another, via a 1x1x1 cube. They are specified as 3 co-efficients, abc, rst, whatever. a+b+c always equals 1. From this, we can derive the following triangle: The idea here is that we split the triangle into another 3 triangles. If the total area of the triangle is A, then triangle 1-2-P has area rA, triangle P-2-3 has area sA, triangle 1-P-3 has area tA. So the summation of the area must equal A. If it doesn't, then we're not in the triangle.

So we need to calculate rst. This is done like so:

N = surface normal (you should know how to do this)

s = ((P - 1) x (3 - 1))*N

    ---------------------

            |N|



t = ((2 - 1) x (P - 1))*N

    ---------------------

            |N|



r = 1 - (s + t)



Recalling that of course x = cross product, * = dot product

So now you can find the intersection. rst can also be used to interpolate the vertex normals, to find the normal at intersection. This is done like so:

NormalX = r*vnorm[0].x + s*vnorm[1].x + t*vnorm[2].x

NormalY = r*vnorm[0].y + s*vnorm[1].y + t*vnorm[2].y

NormalZ = r*vnorm[0].z + s*vnorm[1].z + t*vnorm[2].z



vnorm = array of vertex normals

So back to the polygons - you could test for containment in n - 2 triangles, where n is the number of points in the polygon. Heres some example C code I knocked up late one night that works for this:

int TestTriangle(VECTOR *tri1, VECTOR *tri2, VECTOR *tri3, VECTOR *point)

{

	VECTOR normal, spana, spanb, vec;

	double len, len2;

	double r, s, t;



	spana.x = tri2->x - tri1->x;

	spana.y = tri2->y - tri1->y;

	spana.z = tri2->z - tri1->z;

	spanb.x = tri3->x - tri1->x;

	spanb.y = tri3->y - tri1->y;

	spanb.z = tri3->z - tri1->z;

	

	normal.x = spana.y * spanb.z - spana.z * spanb.y;

    	normal.y = spana.z * spanb.x - spana.x * spanb.z;

    	normal.z = spana.x * spanb.y - spana.y * spanb.x;

	len = sqrt(normal.x * normal.x + normal.y * normal.y + normal.z * normal.z);



	spana.x = point->x - tri1->x;

	spana.y = point->y - tri1->y;

	spana.z = point->z - tri1->z;

	spanb.x = tri3->x - tri1->x;

	spanb.y = tri3->y - tri1->y;

	spanb.z = tri3->z - tri1->z;

	vec.x = spana.y * spanb.z - spana.z * spanb.y;

    	vec.y = spana.z * spanb.x - spana.x * spanb.z;

    	vec.z = spana.x * spanb.y - spana.y * spanb.x;

	len2 = sqrt(vec.x * vec.x + vec.y * vec.y + vec.z * vec.z);

	s = (vec.x * normal.x + vec.y * normal.y + vec.z * normal.z) / len;



	spana.x = tri2->x - tri1->x;

	spana.y = tri2->y - tri1->y;

	spana.z = tri2->z - tri1->z;

	spanb.x = point->x - tri1->x;

	spanb.y = point->y - tri1->y;

	spanb.z = point->z - tri1->z;

	vec.x = spana.y * spanb.z - spana.z * spanb.y;

    	vec.y = spana.z * spanb.x - spana.x * spanb.z;

    	vec.z = spana.x * spanb.y - spana.y * spanb.x;

	len2 = sqrt(vec.x * vec.x + vec.y * vec.y + vec.z * vec.z);

	t = (vec.x * normal.x + vec.y * normal.y + vec.z * normal.z) / len;



	r = 1.0 - (s + t);



	printf("(%f %f %f)\n", r, s, t);

	if((r + s + t) == 1.0)

		return 1;

	else

		return 0;

}

Its not the most efficient piece of code, but it should provide a working example of this.

Reflections

These are pretty straight forward to do. You need to mirror the rays direction about the point of intersection. And intersection normal. The formula for this is:

2*(N.D)*N - D

Where N is the normal, and D is -direction. Note it must be -direction. This particular little catch up took a while to figure out ...Set your rays origin to be the point of intersection, and set the direction to be the above formula. Trace the ray, and multiply the returned colour by some factor, the reflectivity level. Then just add it to the colour. Simple enough.

Refractions

This works in a similar way to reflection, but its a lot harder and slower. To get the direction of refraction, we need to use the following formula:




        _   /    _ _              2       _ _ 2  \ _     _

        T = | n*(N.I) - sqrt(1 - n  (1 - (N.I)  )|*N - n I

            \  r                  r              /      r



Where:

        T is refracted direction

        n is eta, index of refraction

        I is incident light

        N is normal



        n  is given by:

         r



         n    = n  / n

          r      i    t



        Thats eta incident / eta refracted.

I won't give the full working of the formula; you'll wake up in the middle of the night screaming. If you want to look it up, its on pages 757/758 of Foley & van Damm. You then have to do a similar thing to your reflections. Also note that there is a sqrt in that formula. You may get values put into the sqrt that are < 0. As you may now, you cannot sqrt a value < 0. To get around this, I use the following (C++) code:

Vector incident = -dir, tr;

Real n = medium / nearest->refindex;

Real c1 = -incident*inormal;

Real c2;



c2 = 1 - n*n*(1 - c1*c1);

if(c2 > 0.0) {

        c2 = sqrt(c2);

        tr = n*incident + (n*c1 - c2)*inormal;

        tr.Normalize();



        Ray trans(ipoint, tr);

        transcol = trans.TraceRay(depth + 1, nearest->refindex);

        finalcol += transcol*nearest->trans;

}

Shadows

Shadows are very easily incorporated into a raytracer. A shadow, is a point that is blocked from a light source. So, all you have to do, is track a ray from the point of intersection, to the light source. Very simple. If a 100% opaque object intersects the ray, then simply set the colour to black. If no intersection, then the point is in light, so carry on as normal. However, if the ray intersects 1 or more (semi)transparent objects, then you will need to return some fraction 0 <= f <= 1, and multiply the colour by 'f'. A way to get soft shadows is by using distributed raytracing. However, I haven't tried this myself yet, so I can't report my findings.

Code layout

Raytracer code is usually structured recursively. This allows us to build a ray tree without additional coding. You may want to try avoiding the recursion, but you'd probably be better off spending your time doing more enjoyable things, such as sandpapering your teeth. Recursion is an elegant solution for this; despite possible stack problems I believe its the best solution. Your basic layout will look something like this:

Colour TraceRay(Ray ray, int depth)

{

        Find nearest object



        If found an intersection

                Shade and texture that object

                Calculate reflected ray

                If Depth < MaxDepth

                        Reflected-Colour = TraceRay(reflected-ray, depth+1)

                        Add Reflected-Colour*reflectivity to Colour

                End



                Calculate refracted ray

                If Depth < MaxDepth

                        Refracted-Colour = TraceRay(Refracted-ray, depth+1)

                        Add Refracted-Colour*refractivity to Colour

                End

        Else

                Colour = Black

        End



        Return Colour

}

This provides a clear simple solution; we use the MaxDepth parameter to bail out if its getting out of hand. Obviously, you put in a condition, and only spawn child rays if the object is reflective and if the object is refractive. Now begins the task of coding, and one hell of a load of optimizing...

[Ref: CGPP, Advanced Animation + Rendering]

Frustum Clipping

Introduction

Frustum clipping is a vital, yet neglected subject. I haven't read any documents on it yet on the web. So, now I will show you my findings on the subject, which should prove to be enough for you to implement frustum clipping in your engine.

Clipping a polygon

To clip the polygon, I will be using an algorithm known as the Sutherland-Hodgman polygon clipping algorithm. Its pretty simple to implement, and easy enough to understand.

The algorithm works by clipping a polygon against n planes, one after the other. The output from one clip is used as the input to another. Imagine you have a square piece of wood, say NxM units in size. Now on top of that piece of wood, you put a shape, say a triangle, made of paper. The most obvious way to clip it to the square would be to work your way around the edges, cutting off excess paper as you go. With me so far? So, you would clip against the lines:

(0, 0) - (N, 0)

(N, 0) - (N, M)

(N, M) - (0, M)

(0, M) - (0, 0)

Recalling that the square was NxM units in size. This is effectively 2D clipping to the 2D viewport. However, in 3D, we need to clip to a frustum. A frustum is basically a 3d pyramid, with the top lopped off. We store and clip against either 5 or 6 planes, depending on whether you want back plane clipping. The frustum is a convex volume in space, and we can also use it to check whether or not an item is viewable.

There are 4 cases that could occur when clipping the shape. These are:

Both endpoints inside region
Both endpoints outside region
Line is entering region
Line is leaving region

Obviously, for case 1 we do not insert any points into the final polygon. For case 2 we insert both points. Cases 3 and 4 are a little more complex however. We need to find out whether we are leaving or entering the region. Once we know this, we can then perform the actual clipping. If we are leaving the region, we must insert one point. If we are entering the region, we must insert two points. If you are leaving the region, simply insert the clipped point. If you are entering, insert the clipped point, and the point inside the region. Classifying whether you are entering or leaving a region can be done using comparions of the two points against the clipping boundary.

The Clipping Calculation

The calculation in itself is easy enough to do. Its linked to rays. A ray is defined as:

Origin + t*Direction

Origin will be point 1 on our line, Direction will be (Point2 - Point1). t is a parameter that specifies a point on the line. For this code, t will be in the range 0..1.

t is easy enough to calculate. Firstly, we find the distance travelled in the same axis as we want to clip, for example if we are clipping against Z, then we find Dz, distance-Z. The same could be done for X or Y. Now, if this value is zero, then t is equal to clipping-limit - point1. For example zminclip - point1.z. If dz is not zero, then we take zminclip - point1.z, and divide it by the distance-z. So t now tells us the position of the line we are clipping against, the amount of the line that is behind the clipping limit.

Now calculate similar distance values for x and y. So dx = p2.x - p1.x, and the same for y. Recall the equation for a ray:

Origin + t*Direction

Its now a piece of cake to calculate the missing values! Simply write:

NewX = p1.x + t*dx

NewY = p1.y + t*dy

And theres your clipped point! You can perform similar clippings for u,v, gouraud colour, phong lighting map co-ordinates, anything. But one word of warning: Beware of 1/n values. They can easily underflow. So clip in n values, then take the reciprocal of them. I found this out the hard way, with perspective texturing. After a certain point, the texture would just go mad, swimming and sliding and distorting all over the place. Clipping to u, v then taking 1/u, 1/v solved this problem

The above code will work fine for straight clipping boundaries. But in 3D engines, where we use a perspective projection, the view volume is not flat. It's sloped. This is a problem. Theres two ways around it:

Calculate the 3D x and y limits for a given Z. Done by rearranging your perspective projection. Then you plug these values into a 'straight' clipper.
Clipping against the planes, by finding the intersection of a line and a plane.

No. 1 is easier to code. If your perspective projection is (for x):

i = xd

    -- + xcentre

    z

Then your clipping limit for a given z would be:

(+/- xcentre)*z

---------------

       d

A similar thing can be done for Y. But, this is a less elegant solution; we have made special cases, which makes our code more complex, and harder to maintain/code/debug. What we need is a general solution, to clip a line against any arbitrary plane. And such a thing can be done!

Clipping Against An Arbitrary Plane

If you don't yet know, the equation for a plane is:

Ax + By + Cz + D = 0

Where A B C D are the co-efficients of the plane, and xyz is a point on the plane. Recalling that the equation of a ray is:

Origin + t*Direction

It becomes a case of substituting in the ray equation for x y and z, then re-arranging to give t. The solution is:

      (A*origin.x + B*origin.y + C*origin.z + D)

t = - ------------------------------------------

            (A*dir.x + B*dir.y + C*dir.z)

If the denominator (lower half) of the above equation is zero, then the line and plane are parallel; they never touch. Obviously, don't try to clip parallel lines and planes. Your code shouldn't even try to do that. So now we have t, ready to calculate the point of intersection

But how do I find what side of a plane a point is one? Easy! The plane equation. Feed the points points X Y and Z into the equation

Ax + By + Cz + D

(yes, the plane equation). If the value is zero, the point is one the plane. If the value is > zero, then the point is on the side of the plane pointed to by the planes normal. If the value is < 0, then the point is on the other side. Ok, to visualise this, do the following. Go into your back garden. Get a pole, a stick, whatever, and stick it in the ground. This is your plane normal. Imagine that is points to the sun. Now, the sky is the area pointed to by the planes normal. Say this plane is your lower Y clipping limit (the bottom of the screen). So if your value is > 0, then you're inside the view volume. If you value is < 0, you're outside, and need to be clipped. Simple eh?

Putting it all together

Now you're ready to code it. The following pseudo-code should explain how to implement the algorithm:

void ClipToPlane(POLYGON polyin, POLYGON polyout, PLANE plane)

{

        curvert = 0

        vert1 = polyin.numpoints - 1



        for(vert2 = 0 to polyon.numpoints) {

                classify vert1 and vert2 with respect to plane



                if(both are inside) {

                        insert(vert2) in polyout[curvert]

                        curvert++

                }



                if(entering) {

                        Calculate t

                        Calculate new vertex

                        insert(newvertex) in polyout[curvert]

                        curvert++

                }



                if(leaving) {

                        Calculate t

                        Calculate new vertex

                        insert(newvertex) in polyout[curvert]

                        curvert++

                        insert(vert2) in polyout[curvert]

                        curvert++

                }



                vert1 = vert2

        }



        polyout.numpoints = curvert

        polyout.points[curvert] = polyout.points[0]

}

You'll need to fill in the blanks. Then, you could store a set of clipping planes in an array, then just feed the planes to the above code one after the other.

My 3D clipping algorithm

Seems a lot of you were puzzled when I told you my clipping system didn't need to allocate memory on fly, make new polygons etc... well, to clarify how I do it, I'll briefly describe the way its done in my engine.

Firstly, some notes: this system doesn't generate extra triangles to be sorted, it doesn't allocate memory at runtime, and it projects triangles *before* clipping. Heres how it works:

At init-time, allocate a buffer of vertices, say 50. These will be used to store the temporary polygon that results after clipping. This saves allocating data at render time.
Do you usual transforms etc...
When rendering, first tag all vertices as invisible. Then, check each vertex against the frustum. If it is inside, mark it as visible. This flag will later be used to check for projection, lighting etc...
Now its time to check, cull, accept/reject triangles. If a triangle is fully within the frustum, *and* is front-facing, mark it as "visible" and "no clip". Determining whether or not it is inside the frustum can be done using the flags you set up for the vertices. If a triangle is front facing and intersects the frustum, mark it as "visible" and "clip". Mark which planes you need to clip against, using a byte, and a bit for each plane, eg:
```
Bit # | Plane

------+------

    0 | Front

    1 | Back

    2 | Left

    3 | Right

    4 | Top

    5 | Bottom
```

You don't need to use this exact scheme, anything will do -- but bear in mind that this will be used in a Sutherland-Cohen style marker. You have 1 byte for each edge of the triangle, and using logical ORs and ANDs, you can determine whether or not the triangle is within frustum. Look into line clipping algorithms for more info on this. Key points are:

Backface tris are rejected with no checking against frustum
Frontface tris are checked against frustum by a sutherland-cohen test:
- If inside, mark as "visible" and "no clip"
- If outside, reject
- If intersects, mark as "visible" and "clip". Set up a byte which tells you which planes to clip against.

When this is done, insert visible triangles into a buffer of pointers to triangles: Triangle *tribuffer[MAXTRI] or as I use it Triangle **tribuffer, and tribuffer is allocated at init-time:

tribuffer = (Triangle *) malloc(sizeof(Triangle *) * max_tris_in_scene);

This buffer is then sorted back->front or front->back, and then passes each triangle to a rendering function. Now I project all my vertices, perform lighting etc..

Time to render it

The rendering function is not called directly; instead I have a global function pointer, which points to the current routine to use, for the current rendering system. This way I can switch rendering types on the fly, without having things like switch(rendertype) or big if--then trees. Quite nice...

Firstly this copies in data that may be needed to render the triangle, such as uv co-ords etc into a structure which is passed to the triangle painter. The structure is just Vertex[3], same structure as is used for object verts. Just my uv co-ords etc are stored in triangle structure, so they need to be copied. Also u/z v/z are calculated: u/z = u*(vert->1/z) etc...

Anyway the render function just checks the clip flag: if no clipping, it just calls DrawTriangle(blah) or something, depends on your routine + engine

If it does need to be clipped however, it calls a clipping routine, for the related rendering mode. Eg if rendering for texture, it calls ClipTexture, for gouraud, it calls ClipGouraud, and so on. This routine looks something like:

void ClipTriangle(Triangle *tri)

{

        PlaneClip(tri);

        for(n=0; n<numclipvert-2; n++)

                DrawTriangle(outvert[0], outvert[n+1], outvert[n+2])

}

PlaneClip is just a standard Sutherland-Hodgman plane clipper. It is used for all clipping of triangles. "Wait a minute", I hear you say. Thats inefficient, you're clipping gouraud, texture etc, when you might not need it. Wrong. PlaneClip relies on *another* function pointer, which points to a routine that calculates a new point, given two endpoints of a line, and t, point on line. It writes to a pointer that is also passed to it. Clipfunc just looks something like:

void ClipLine(Vertex *vert1, Vertex *vert2, Vertex *outvert, Real t)

{

        *outvert = vert1 + t*(vert2 - vert1)

        ProjectVertex(outvert)  // project to 2D

}

Obviously I don't have a function call to project the vertex to 2d inside ClipLine, thats done inline. Y'see this allows me to project new vertices as they are created, without having to malloc new ones as I go along, or have a big pass where I project them all. [Tip] when projecting your vertices, and you're sitting waiting for fdiv to complete, try pre-warming your cache. Works very nicely...

Also I can now use one routine to do all my clipping, by using function pointers I can create a psuedo-self-modifying-code type system. All these pointers are set beforehand, with a function called something like SetRenderType, eg SetRenderType(RENDER_GOURAUD_TEXTURE).

Understand now? Its quite a complex system, very difficult for me to explain, difficult for you to understand. To really understand this, you need to be able to see my whole rendering pipeline, which I'm not going to give away right now ...

Rejecting polygons that are not inside the frustum is easy enough to do. If you place a bounding sphere around a polygon, or indeed a polyhedron, or any polytope in 3D, you can do a simple check to see whether or not it is inside the viewing frustum. What you do is simple: Find the distance from the plane to the centre of the sphere. Now, if this distance is < 0, and greater than the radius of the sphere, ie (dist > -radius), then you can chuck the volume out. It only needs to be outside of one plane to be discarded. Bounding boxes are also pretty easy to do. You can just check the 8 vertices of the box, and see if at least one of them is on the positive side of the plane. Or, you can check the diagonally opposite corners, eg the top left and bottom right corners, etc. That way you can get away with less checks, if you're clever.

I hope this web page has provided you with enough information for clipping polygons against planes, specifically the viewing frustum. If you have any questions, mail me. [References: Computer Graphics, Principles + Practice]

Shadows

Introduction

Shadows add a lot of realism to a 3D engine. They help to impart a good deal of information about movement, lighting and shape. Shadows are your friend. Use them wisely.

Fake Shadows

Perhaps the easiest shadows to make are fake shadows. Amongst the easiest are casting them to the floor. An easy method is to project your triangle to the floor (Y = 0 in most 3D engines). Then do a simple divide by Y, so the higher an object is, the smaller the shadow. Simple, but effective. This doesn't take into account the direction of the light source. Again, this is easy to do:

        s.x = p.x - (p.y / l.y)*l.x;

        s.z = p.z - (p.y / l.y)*l.z;

Where s is the shadows vertex, p is the point, and l is the light source. Very easy to code, and it works well. However, it doesn't really work well for much else than flat planes.

Shadow Z-Buffer

I also know another method of generating shadows. This requires the use of 2 z-buffers though. The basic idea is that you generate two z-buffers: one from the point of view of the camera, and one from the point of view of the light. When you come to render, you need to do the following:

If point is not visible, simply move on the the next pixel

Map co-ordinate from 'camera space' into 'light space'

Project down to 2-D again.

Use x' and y' to index into the shadow Z-buffer

If z is greater than shadow zbuffer, then a surface is nearer to the point

than the light - so shadow it, using a 'shadow intensity'

However, this method is pretty damn slow, as you might imagine.

Shadow Volumes

Another method of implementing shadows is by the use of a shadow volume. A shadow volume is an infinite volume of space, where light cannot be seen, because it is being blocked by a polygon. Making the volume is simple enough: Make vectors from the origin of the light, through the vertices of a polygon. Normalize them, and hey presto, a simple, infinite volume. These are now rays. Their equation would be:

D = Direction

O = origin of light

L = light source point

Vertex = polygon vertex



D = vertex - light

O = vertex

Ray = O + D*infinity

For this to be useful, it needs to lie withing the view volume. So, clip it to the view volume. Clipping lines against planes is covered somewhere within these pages. You won't be able to classify the two endpoints against the plane however: you'll have to find the intersection of the line and plane, and find out whether that is a valid part of the volume. I.E. the volume cannot be in the portion of space between the polygon and the light source, can it? Once the volume is clipped, you'll end up with a set of polygons, which will define the shadow volume.

When it comes to rendering, it becomes more interesting. Consider the interation of the shadow volumes, with a ray shooting from the viewers position, for a given pixel. If a point on that ray is withing a shadow volume, then the point is clearly in shadow. But what if the point is between two shadow volumes? Then it is not in shadow! So, you will need some kind of flag. The flag will start at FALSE (point not in shadow). When it enters a shadow volume, it will become TRUE, and when it leaves, it will become FALSE again.

Still, for a complex scene, this system will also be quite slow. The number of shadow polygons increases sharply as the number of polygons and light sources increases. Perhaps such issues are why realtime systems still only use fake shadows ...

"Shadow Slabs"

This is a little idea I've been brewing in my mind... it works similar to the shadow volumes above, but you can have a model half in shadow, half out of shadow.

The idea is that we perform an extensive pre-process, and generate "slabs", which define where an area comes into shadow/goes out of shadow. This would be an extensive pre-process, calculating for all of the lights and polygons in the scene. However, when it is complete, you would have a list of polygons ("Shadow Slabs"), which define the borders of the shadow. Very similar to shadow volumes.

Then, when rendering, you would clip against these slabs, in 3D. Then, the one half of the object will be considered as "illuminated", so its just rendered as normal. The other half will be considered as "shadowed", and this is where you can choose what to do next. If there shadowed area has zero lighting, then you can just discard the polygons -- a crafty piece of culling. However, if the area is lit, then you can just darken the polygon, say divide the colours at the triangle vertices by 2, 4, 8 whatever, for gouraud shading. The advantage of this is that you can have models emerge/submerge into shadowed volumes, with little extra processor power. With a well designed engine structure, I think it could most definitely be done real time. Any thoughts? Has this already been done? It wouldn't suprise me if it had. I'd be interested if anyone implemented such a system though.

Realtime volumetric lighting

Heres a quick and easy way of doing volumetric lighting in realtime. For those who don't know, volumetric lighting works along the lines of calculating the volume of space that is illuminated by a given light. So, when an objects falls into that volume, it is illuminated by the light, and when it comes out, it is not.

The algorithm is very simple. We just invert shadow z-buffers. What we do is for each directional light source, we render a z-buffer from its point of view. The frustum for this view will define the volume of light, and the z-buffer will carry the depth information needed to correctly identify whether or not a vertex is lit. Then, when we come to light a vertex, we simple transform it into the camera space for the light (light-space), perspective project it, if it lies within the lights frustum. Then, we simply do a z-test against the buffer. If the vertex Z is less than the corresponding entry in the z-buffer, then we can light it. If it is not, then the vertex is not visible, so we must not light it. The z-buffer for the light needn't be rendered at too high a resolution, say maybe 64x64. In addition, this trick can be modified to run in realtime, for dynamic lights. Additonally, if you rasterise the polygon onto the z-buffer for the light, you can also work out good shadows casting onto walls and floors. If the polygon over-writes light z-buffer pixel i, then you can transform a polygon corresponding to that pixel, into world space, and draw it, using the z information. So, we'd take the pixel z, and its neighbours z. We also take the 2d co-ords of those points in light z-buffer space. We can now project back into 3d co-ords, then back into world co-ords via the camera matrix for the light. We could then use these 4 points to define a shadow polygon, and fill it as normal. The quality may not be that high, because the shadows will be staircased, but it should work.

Perspective Corrected Texturing

Introduction

Perspective texturing is an almost essential addition to any 3D engine nowadays. People are sick of watching affine texture slip and slide all over their polygons, they want something better. And perspective texturing solves this problem. I'll be describing Chris Hecker-style perspective texturing here, rather than the 3 "Magic Vectors" method.

Affine Texturing

Ordinary texturing is very simple to code. We specify u, v at each vertex of our polygon (I'll be describing triangles however). We interpolate the U and V across the triangle edges, and then across scanlines. With triangles, the delta for the scanline is constant, which speeds things up a lot. Then its just a case of interpolating texture co-ords across a scanline, and sampling the texture map. Very simple, very fast. And very limited. If your triangles get too big, too much difference in Zs at vertices etc, the texture starts to slide. It looks horrible. We need perspective correction.

Perspective Texturing

Probably the most common form of perspective texturing is done via a divide by Z. Its a very simple algorithm. Instead of interpolate U and V, we instead interpolate U/Z and V/Z. 1/Z is also interpolated. At each pixel, we take our texture co-ords, and divide them by Z. Hang on, you're thinking - if we divide by the same number twice (Z) don't we get back to where we started - like a double reciprocal? Well, sort of. Z is also interpolated, so we're not dividing by the same Z twice. We then take the new U and V values, index into our texture map, and plot the pixel. Pseudo-code might be:

su = Screen-U = U/Z

sv = Screen-V = V/Z

sz = Screen-Z = 1/Z



for x=startx to endx

        u = su / sz

        v = sv / sz

        PutPixel(x, y, texture[v][u])

        su += deltasu

        sv += deltasv

        sz += deltasz

end

Very simple, and very slow.

Speeding Up The Routine

The first thing that comes to mind when speeding up this routine is the two divides - divides are a slow operation, and should be avoided. So, we'll turn those 2 divides into a reciprocal and a multiply:

for x=startx to endx

        recip = 1.0 / sz

        u = su * recip

        v = sv * recip

        PutPixel(x, y, texture[v][u])

        su += deltasu

        sv += deltasv

        sz += deltasz

end

This helps things a little. The second big way of speeding it up is to lerp (linear interpolate) between sets of 'correct' u, v. We calculate correct u, v every n pixels, and interpolate between them. This cuts down on the divides overall, but it can lead to problems: if your correction value is too high for your resolution, the texture will 'wiggle' - the sample rate is too low. If your correction value is too high, you'll see all sorts of weird bendy patterns at certain viewing angles. It takes a little time to find the best correction level for a given resolution. Pseudo-code for this would look something like:

zinv = 1.0 / sz;        // do the divide here

width++;

oddedge = width & cormask; // test for case of raggy-edge



zinv *= 65536.0;

u = su * zinv;

v = sv * zinv;          // reciprocal then multiply

RoundToInt(logu1, u);

RoundToInt(logv1, v);   

sv += cordvdelta;       // cordvdelta etc are deltasv*correction

su += cordudelta;

sz += cordzdelta;



zinv = (1.0 / sz) * 65536.0; // muls by 65536 are used to do

                             // u << 16 a little better

                             // one fmul = 3 clocks

                             // 2 shls = 2*2 clocks



while(width > 0) {

        if(width >= correction)

                pixels = correction;

        else

                pixels = oddedge;       // we have a raggy edge



        width -= correction;    // even if edge is raggy loop will

                                // still terminate due to this



        u = su * zinv;

        v = sv * zinv;

        RoundToInt(&logu2, u);

        RoundToInt(&logv2, v);



        luadd = (logu2 - logu1) >> corshift; // deltas for linear

        lvadd = (logv2 - logv1) >> corshift; // pass



        logu = logu1;   // 'logical' u and v

        logv = logv1;



        sv += cordvdelta;

        su += cordudelta;

        sz += cordzdelta;



        zinv = 1.0 / sz;        // again, do divide in parallel

        while(pixels--) {

                index = ((logv >> 8) & 0xFF00) +

                        ((logu >> 16) & 0xFF);

                PutPixel(x, y, texture[index]);

                logu += luadd;

                logv += lvadd;

        }



        zinv *= 65536.0;

        logu1 = logu2;

        logv1 = logv2;

}

This is based on the loop I use. I use the idea of doing floating point operations in parallel a lot here, because it means we can effectively get them for free. However it is often quite hard to persuade the compiler that this is what you want to do; it'll take a little experimentation. Note also that this loop doesn't have a seperate if () {} statement to cover the case of a 'raggy-edge', like most perspective texturers do. I see that in a lot of code, this way is smaller, and easier to maintain and optimize.

Note that you will need to scale the logu, logv values by 65536. Then you will have to divide (shr 16) them to get the U, V values. Note that you should something like:

zinv = 65536.0 / sz

RoundToInt(&logu, su*zinv);

RoundToInt(&logv, sv*zinv);

Rather than:

zinv = 1.0 / sz

RoundToInt(&logu, su*zinv);

RoundToInt(&logv, sv*zinv);

logu <<= 16;

logv <<= 16;

Because the latter will lose the fractional part of the value.

Other Considerations

A lot of people take a religious hatred to floating point calculations, because they think they are slow. Well, they may have been slow in times past, but now CPUs can do them very quickly indeed; they can be up to 29% faster on a 486DX, and 40% faster on a 586 (intel). I found these figures by doing 1,000,000 matrix muls, 1,000,000 * (Add/Sub/Div/Mul). Note that conversion to integer however, is still slow.

I know one person in particular was adamant that FPU was slower in his tests. What was his test? Something like:

for(x=0;x<65536; x++)

        array[x] = 1.0 * x;

What a stupid test! For a number of reasons:

Its not representative of the kind of work you'd do in a 3D engine
Most compilers would optimize away a mutiplication by 1.0.

Conversion to integer can be slow; especially in Watcom C 10.6 for DOS. Did you know that to convert to integer, it calls a function __CHP, which contains the following code:

__CHP:          push    eax

                fstcw   dword ptr [esp]

                wait    

                push    dword ptr [esp]

                mov     byte ptr +1H[esp],1fH

                fldcw   dword ptr [esp]

                frndint 

                fldcw   dword ptr +4H[esp]

                wait    

                lea     esp,+8H[esp]

                ret

Its amazing what can be done with WDISASM, and WLIB, and a little lateral thought...

FPU operations can also be done in parallel with the integer unit on Intel chips. I don't think this can be done on Cyrix. Certainly not the Cyrix 5x86. Thats just a souped up 486. Thats not worth worrying about. Despite all you might hate intels monopoly, Cyrix have no real chance of ever breaking it. So you may as well optimize with an Intel chip in mind.

The 1/z values can also be used for Z-buffering. Which is very handy. You can then have perspective correct texturing, and perspective correct Z-Buffering, at little speed cost. See my page on Z-Buffering for more information on that.

I also toyed with the idea of pre-perspective correcting textures. I heard that in Quake, textures are lit in a seperate pass to the texturing, due to the lack of registers on the 586 (can Intel count higher than the number of fingers they posess?). I wonder if it would be possible to do a similar thing with perspective texturing? Theoretically, it shouldn't work, because I think that the routine is dependant on the shape of the polygon being mapped to. However, I realised, this doesn't work! The perspective is dependant on the shape of the triangle...

Another possible speed up would be to use an affine texture where the change in z is very little, and a perspective texturer where the change is large. Hmm.. what would this look like in code?

average-z = (z1 + z2 + z3) / 3



zdiff = 0

for n=1 to 3

        zdiff += (z(n) - average-z)**2

end



if zdiff < z-threshold**2

        Affinetexture(polygon)

else

        PerspectiveTexture(polygon)

end

Sound about right? Maybe I'll try this one day. Idea here is to find difference between average Z and Z of each triangle. Distance is not square -rooted, just kept as a square, then compared against squared threshold. If too much Z change is present, then perspective is used. This however may fail with large triangles.

Environment Mapping

Introduction

Ever seen those effects in demos where an object seems to reflect its background? Wish you could do it too? Well, now I'll explain how to do it.

The Original Method

Environment mapping was originally developed by Blinn and Newell. Blinn is one seriously devoted guy! You can't pick up a single book on 3D graphics without finding some of his work in it! You can find the exact workings of his original algorithm in CGPP. Back to the point. Their basic method was to reflect V (viewing vector) about N (normal) to generate a vector which pointed to an environment map. The environment map is basically a big textured sphere, which surrounds the object. Its generated by rendering the scene from different views. Now, as you might guess, to index into the sphere, we will need spherical co-ordinates. Spherical co-ordinates are obtained with the following equations:

        theta = arctan (y/x). 

        rho = arccos (z/R). 

        R = sqrt (x^2 + y^2 + z^2)

If you use unit vectors, you lose R, which is handy. A good variation on this system is to modulate the co-ordinates in the Y axis by a sinewave.

Another method suggests using a cube. Here, you have 6 texture maps, surrounding your object, forming a cube. To chose which texture map to use, you would check the normalized reflection vector: if the X co-ordinate is the greatest, index the texture map to your right. If the Z is greatest, index into the texture map to your front, and so on. Negative co-ordinates mean to index the opposite: eg left instead of right. Again, the problem here is speed. Generating all those environment maps takes time. Not quick enough for realtime applications.

Demo Style Environment Mapping

The way demos do environment mapping is very simple. Take the X and Y components of your vertices, and use that to index your texture map!. Very simple indeed. Your formulae would be:

U = N.x*128 + 127

V = N.y*128 + 127



Or in general



U = N.x * (width / 2) + ((width / 2) - 1)

V = N.y * (height / 2) + ((height / 2) - 1)

Assuming 256x256 texture maps. Some even normalize their vectors to length 128, to avoid the multiplication. Very clever, however can get in the way. I''ve also heard of people storing them as spherical co-ordinates, meaning only addition/subtraction is needed, when you rotate. I haven't tried this one myself - it sounds too much of a pain in the neck.

A useful side-effect here is that you get your 'phong' highlight texture co-ordinates for free. A useful speedup technique.

Bilinear Interpolation Of Texture Maps

Introduction

Do you sometimes notice that when you texture map, the screen looks a little jaggy? Do your texture look pixellated? Well, there is a way around that. You filter your textures. In this file, I'll explain some of the more common techniques, and fast implementations of them.

Bilinear Interpolation

Bilinear interpolation (now on called 'bilerp' for short) is a process of filtering the surrounding texels, to smooth out any jaggies occuring between pixels, and giving the screen a smoother look. The basic calculation is as follows (from a post to usenet, auther unknown):

        double texture[N][M];  // 0 <= x < N, 0 <= y < M

        double xReal;  // 0 <= xReal <= N-1

        double yReal;  // 0 <= yReal <= M-1

 

        int x0 = int(xReal), y0 = int(yReal);

        double dx = xReal-x0, dy = yReal-y0, omdx = 1-dx, omdy = 1-dy;

 

        double bilinear = omdx*omdy*texture[x0][y0] +

                          omdx*dy*texture[x0][y0+1] +

                          dx*omdy*texture[x0+1][y0] +

                          dx*dy*texture[x0+1][y0+1];

What we are effectively doing here is using the decimal, fractional portion of the texture co-ordinates to interpolate between 4 texels. Each of the weights are the areas adjacent to the texel. So, as the fractioanl 'u' texture co-ordinate increases, less weight is given to the texels on the left, and increasing weight is given to those on the right. A similar effect can be seen as we move vertically.

I found a simple way to speed it up. This one works in 256-colours, but its perfectly possible to adapt it to other modes I would imagine. First thing to do is build a table. This table will contain colour A mixed with colour B, in the ratio x*A + (1-x)*B. The table is built in a similar way to colour lookup tables, described elsewhere. With this table you can now find the mix of two colours, very quickly. Now for the filtering. The way I do it is quite simple. I use a filter shaped like this:

Where T is the current texel, and 1-2-3-4 are the surrounding texels. Next, I mix 1 with 3, and 2 with 4. Then I mix both the results together. Finally I mix that with T, and plot it to the screen. Works quite well in practice, and is quite speedy. You'll most likely need to write this in assembly; to take advantage of the EBX addressing trick described elsewhere. Maybe this isn't true bilerp texturing, but it works well. I actually tried approximating bilinear texturing in direct RGB mode by using a table. It worked, but it didn't look good. The image was very grainy, and bandy.

Mip-mapping

Mip-mapping is becoming an increasingly common trick nowadays. The name mip is derived from multum in parvo, many things in small places. It was invented by Williams, in 1983. The idea is that we construct a set of textures, each being 1/4 (half-width*half-height) the size of its parent, and filtering them down. Then, you select a texture, based on distance, or rather compression on the screen, and paint it to the screen. Very simply, pretty efficient. You'll need 1.3 times as much memory to do this. The advantage of this is it reduces aliasing on the textures, by mapping a smaller texture to a smaller polygon. You see you'll sometimes notice that if you have say a 256x256 texture, as it moves away from the viewer, it will begin to flicker and shake. This is because we're mapping a large texture to a small area, meaning we don't have much coherency in the texels we chose. For example, at frame i, we may choose texel (50, 20). However, at frame i+1, we may choose texel (100, 50). If that texels contains sufficiently different colour information to the previous one, you'll get that odd effect. We are jumping around texel co-ordinates too much, both damaging image quality, and also thrashing the cache.

However, on the PC, this gives us a big headache - 256x256 textures. We can either:

Use smaller textures - 64x64, 32x32 etc, but lose speed of routine
Scale textures up to 256x256, and use up lots more memory.
Place multiple textures with a single 256x256 texture

(1) Would be the most tempting option, but would lose a little speed - do we make a set of routines for a given width texture? Do we have a pre-calculated Y offset table? Do we have an array of shift values, used to calculate the address? Tricky stuff.

Or we could go for (2). But there will be *very* high memory costs for this. Plus we will still be mapping a large texture to a small triangle, which will cause problems.

Perhaps 3? Well, we can still use a super-duper 256x256 ebx addressing mapper. But we'll be wasting space. Perhaps not much, but it'll soon add up. Plus it makes calculating co-ordinates perhaps a little tricker, but nothing a LUT can't solve.

I think the best solution is to leave this to hardware texture mappers, and use option 1. Yes, yes, I know not everyone has a 3D graphics card. I certainly don't. But I can't see any serious PC owner without one in 18 months when the market stabilizes, prices come down etc, Microsoft finally make D3D (perhaps not so likely as the first two) usable to coders.

Tri-linear Interpolation

This is a mix between the previous 2 techniques. The idea is we mix together a number of textures - the texture that is smaller than the current texture, the current texture, and the texture that is larger. We then filter these together, to produce a final image. The idea being that the viewer won't notice the sudden switch in mipmaps, instead giving a smooth blend as the texture is enlarged. This will be *very* slow in software though, because of coding complexity, lack of regs, (relatively) slow CPUs, caching problems..... This one is certainly best left to hardware.

Implementing Bi-linear texturing

Now that you have all this information, you can implement a bilinear texturing system. However, this can be a little tricky at first, so I'll just give you some pseudo code to get you start on it:

{All your normal setup stuff here etc, we're only interested in texturing part}



Get decimal u and decimal v. Assumed 16.16 fixed point texture index



du = (U & 0xFFFF) / 65536.0

dv = (V & 0xFFFF) / 65536.0

invdu = 1.0 - du

invdv = 1.0 - dv

Weight1 = invdu*invdv

Weight2 = invdu*dv

Weight3 = du*invdv

Weight4 = du*dv



r00 = Texture[V >> 16][U >> 16].Red

g00 = Texture[V >> 16][U >> 16].Green

b00 = Texture[V >> 16][U >> 16].Blue



r01 = Texture[(V >> 16) + 1][U >> 16].Red

g01 = Texture[(V >> 16) + 1][U >> 16].Green

b01 = Texture[(V >> 16) + 1][U >> 16].Blue



r10 = Texture[V >> 16][(U >> 16) + 1].Red

g10 = Texture[V >> 16][(U >> 16) + 1].Green

b10 = Texture[V >> 16][(U >> 16) + 1].Blue



r11 = Texture[(V >> 16) + 1][(U >> 16) + 1].Red

g11 = Texture[(V >> 16) + 1][(U >> 16) + 1].Green

b11 = Texture[(V >> 16) + 1][(U >> 16) + 1].Blue



Red = Weight1*r00 + Weight2*r01 + Weight3*r10 + Weight4*r11

Green = Weight1*g00 + Weight2*g01 + Weight3*g10 + Weight4*g11

Blue = Weight1*b00 + Weight2*b01 + Weight3*b10 + Weight4*b11



PutPixel(X, Y, Pack(Red, Green, Blue))

That is as good as a straight rip out of some code from my House project. (But obviously I don't use a putpixel function!).

Speed-up Techniques

Introduction

On this page I'll collect together various tricks I find about speeding up 3D engines. I'll present some of the more obvious ones first, because some people may have missed them, and then move on to some of the finer points. If you have any other speed up tricks/algorithms, feel free to send them to me.

Float-->int conversion

Always worth experimenting with, as this can often be slower. 2 methods I know of are using a #pragma, and an in-line function. Firstly, the #pragma:

#pragma aux RoundToInt=\

        "fistp DWORD [eax]"\

        parm nomemory [eax] [8087]\

        modify exact [8087];

Works very nicely, a WDISASM shows that the compiler drops the call to __CHP, and just in-lines the conversion.

There is another method, which doesn't use #pragmas, so it should be portable to other compilers. This was pointed out by "InnerSect":

#define FIST_MAGIC ((((65536.0 * 65536.0 * 16)+(65536.0 * 0.5))* 65536.0))



int32 QuickFist(float inval)

{

        double dtemp = FIST_MAGIC + inval;

        return ((*(int32 *)&dtemp) - 0x80000000);

}

He did some benchmarks, if I remember rightly, he found the above to be faster, which is interesting, to say the least.

Pre-calculation

Pre-calculate virtually everything! You can't precalculate too much. However don't go overboard to the point where everything is table-driven and you don't have any freedom. Always keep an eye to versatility. First things to precalculate are your normals. Surface normals and vertex normals. Very important for debugging - these things can take ages to calculate. Once you have these pre-calculated, you only need to rotate them as you do the rest of your object. Do NOT translate or scale them though. They need to stay at a unit length of 1. If you do happen to scale them though, then multiply them by inverse scale: Eg:

                   | Sx  0  0  0 |

Transform Matrix = |  0 Sy  0  0 |

                   |  0  0 Sz  0 |

                   |  0  0  0  0 |



Then you will need to do:



NormalX *= 1 / Sx

NormalY *= 1 / Sy

NormalZ *= 1 / Sz

Also you don't need to transform your normals when you perform culling or lighting. Simply back-transform the other vector you are considering, and avoid a matrix/vector mul. You back-transform simply by transposing the upper 3x3 matrix:

For every row

        For every column

                output[row][column] = input[column][row]

        End

End

Transform the light, viewing vector, whatever against this matrix, and you've saved a lot of work. You them simply need to do the usual calculations, with the new vector, and the old normal.

Reciprocals

If you're going to do multiple divides by the same number, take the reciprocal of the number, and multiply by it. The reciprocal is simply 1/n, where n is your number. If you're really clever, you'll find something to do while the fdiv executes. A nice little trick is to warm the cache up; a cache miss can take quite a while to get the value loaded in from memory etc. So, while you're sitting there waiting for your fdiv to complete, why not read from a bunch of memory addresses, so they are ready in the cache? Seems to work well for me, inside my screen projection code. You can apply this technique well to a perspective texture mapper. It also helps for your perspective transform. And to convert back to the number given the reciprocal, do another 1/n. Extremely nice. If you're really adventurous, you could even try storing data as reciprocals, such as Z. I haven't tried this one though - it could get very hard to debug.

Vertex Tagging

Given a large list of vertices to handle, you're going to find that a lot - say half - are not visible, out of the frustum, whatever. You need to discard those quickly. An easy way of doing this is to add a 'visible' flag to the vertices.

Before you enter your processing loops, during setup, set this flag to false. Then, as you process your loops, if you find the vertex is needed, set it to true. For example if you find a triangle is visible, set all of its vertices visible flags to true. Then later on, you can simply ignore or discard these values if the flag is not set. This comes in handy for complex models, with a lot of vertices to project. Equally handy on large scenes, where a lot of geometry is not visible, and you can skip projection. If you apply this to lighting, you have to be careful though. If you clip to the view-volume, and you haven't calculated lighting, then you may find that around the edges, your routine goes crackers. You're trying to find the intersection of a defined and an undefined value. This needs a little thought...

An alternative to tagging can be done using a counter. The idea is simple. At initialization, you set a counter in the vertices to be some illegal value, such as -1 (0xFFFFFFFF). You also set another value, the frame counter, to zero (0). Then, as you process your object, if you notice that a face/vertex is to be used, you set its counter equal to the frame counter. If not, nothing is set or cleared. Then, when it comes to processing, you compare the counter to the frame counter. Those vertices/ faces that have been tagged with the current frame counter will be used, those which have not will contain some undefined value (less than current frame counter though) and will be ignored. This becomes handy in large data sets, where clearing a flag every frame can get expensive, if not done correctly.

Pointers

Pointers are very handy things. You can use them to make nice data structures, like linked lists, binary trees etc. You can also use them to address data very quickly. Say you have an array of items, each item is 27 bytes in size. You could pad that item to 32-bytes, and use a shift to calculate the address. But, you'd waste 5 bytes of memory for every array item. That'll soon add up! So, use pointers to address it. Say that 27 byte structure was your vertex. In your triangle structure, don't have int vertindex[3], have vertex *vertptr[3]. Then simply load in the pointer, address the data, and you're away.

Triangles vs Polygons

Tricky. Triangles are very nice to deal with, fast to render. But if you have say a set of 6 co-planar triangles forming a surface, and you could easily use a polygon instead, then you are doing 6 times as much work with a triangle renderer. But the triangles will be faster to render. Personally, I prefer triangles, much easier to work in a pure triangle based environment. Polygons do have their advantages, it may be worth playing with them. And if anyone out there has an algorithm to do constant delta texturing on convex polygons, I'd be very interested.

Overdraw / Underdraw

Overdraw is a speed-killer. Especially with complex rendering code. You're painting, re-painting, re-re-painting, re-re-re-painting, and so on. Losing time, losing speed. An easy way to eliminate overdraw is to front-back sort your triangles, and feed them into a Z-Buffer/S-Buffer. This causes problems though. A fully obscured triangle rendered through a Z-Buffer will be a complete waste of time. Scanning all those pixels, with nothing to render! Likewise S-Buffers suffer from complex segment insertion code, which may take its toll. For large triangles S-Buffer seems to work nicely. But, for many small triangles, its not so hot; for example, an sbuffer algorithm I made took ~30 seconds to render a ~3k triangle head, with just lambert shading. Clearly the volume of triangles overloaded the segment insertion routine. I think the solution here is to develop some efficient occlusion and VSD algorithms. A great many possibilities for work to be done here in the future.

Another problem may be "underdraw" (my little term). This is when you spend too much time processing offscreen polygons. Some helping factors here may be the fact that typically the visible polygons in a scene doesn't change dramatically. So for example in a first-person shooter, you could perhaps calculate the visible polygon set, then use the player direction to find some polygons to 'keep an eye on', making them critical if the player gets too near. Bounding volumes such as spheres/boxes help here. Hierarchial modelling may help here too, in classifying which portions of a model you can throw away.

Summary

So, heres a small set of rules to summarise how to make your engine faster:

Precalculate as much as possible
Don't calculate anything you don't need
Don't calculate anything twice
Try to use results of previous calculations again, if possible
Look out for occaisions where you can 'spend a penny to save a pound'
Before you code that function in assembly, ask yourself "Have I really got the best solution?"
Experiment!
Take risks
Don't write anything off because it "looks slow"
Explore your target architecture, know its quirks

3D Bump mapping

Introduction

OK, I've been getting lotsa replies on the bump mapping scene, so I think its time to report on what you've been passing on. I'll try and give credit where its due (if I can find your addresses...). Great reponse, keep it up...

#1: Jean-Francois Dube

Ok, Jean (deks) was first off the mark. His scheme is quite complex, reminds me a lot of water in some places... The basic idea however is very similar to other schemes. I'll start of with a modified sample of code he sent me, I've made it into pseudo-code:

        for x1 to x2 do

                normal-x = interpolated-normal-x >> 20

                normal-y = interpolated-normal-y >> 20

                index = (v >> 16) << 8) + (u >> 16)

                texel = texture[index]

                bump-normal-x = bumpmap[index+1] - bumpmap[index - 1]

                bump-normal-y = bumpmap[index+256 - bumpmap[index - 256]

                p = 127 - (bump-normal-x - normal-x)

                q = 127 - (bump-normal-y - normal-y)



                if(p < 0) then p = 0

                if(q < 0) then q = 0

        

                colour = colourtable[texel][phongmap[p*256+q]]

                putpixel(x, y, colour)



                (and just interpolate normal + texture here...)

        end

Quite a nice method, but perhaps the inner loop is a little complex.

#2: Paul Adams

Next up is Paul (aka Frenzy). Well.. this guy lives in the same town as me, yet I still haven't been able to spot him... Paul, if you're reading, show yourself... anyway back to the code. His suggestion is to have a 'heightfield' map (could be same map as texture if working in monochrome, colour paletted images may freak out...). You then take:

nx = map[u-1] - map[u+1]

ny = map[v-1] - map[v+1]

Which sounds a little odd... I think he means...

nx = map[u-1] - map[u+1]

ny = map[v-256] - map[v+256]

Which would make a little more sense... Then just do the old ny*256+nx, to get the index into your phongmap. Not too bad..

#3: Jacco Bikker

This is the third and final entry. I just chose 3 to keep it simple, a lot of stuff was duplicated. His idea revolves around the idea of a relief map. This is an array of 8-bit bytes, broken up into 2 4-bit halfs. The upper 4 bits are filled with the X difference between the pixels neighbours, and the lower 4 are filled with the Y difference. With 4 bits, we have a range of (he said) -7 to +8 (I think its -8 to +7 though). You have to clamp to this range, to avoid problems later on ...

Next you have to create a 16x16 array. In this array, we interpolate a normal vector, I'd imagine this would be from (-0.5, -0.5, 0) to (0.5, 0.5, 0). Now, don't store this as array[16][16], store it as array[256]. This is because we shall now index into that array. And what do we use to index with? The value calculated in the paragraph above. Then, mix these values together with lighting, and texture, and plot. Quite a complex system, I have to say.

Summary

I have to say #2 is by far the simplest to code. It may also be the quickest, if when you find the nx and ny values you just clip them to 8 bits (by storing them in a byte register) then we can speed it up nicely. It'll also fit quite nicely into common 256x256 mappers. Anyway, have a play with them, I'm sure you'll find this lot handy.

OK, now if you want to send me something, then make it cells+portals this time, I've already had one reply (Jacco Bikker) which was interesting, I'd like to hear other views on the subject. I have some ideas of my own on this (which are quite complex, but sounds reasonable, to me at least...). Again, any tidbits or code samples will be nice, lets see what we can learn about them.

Quadratic 24-bit Gouraud Shading

Introduction

24-bit gouraud shades look very nice. However, doing it linearly is a little limited, because a) sometimes you get bands of colour across the middle scanline, and b) its just not complicated enough to be technically satisfying :). A nice little trick is to do it by the use of a quadratic equation.

Quadratic Equations

A Quadratic Equation is a second degree polynomial, and can be used to represent curves. It has 3 parts, and is written:

  2

ax  + bx + c

A, B and C are the 3 co-efficients of the equation, which scale, translate and map the curve. X is a variable we use to specify a point along the curve. Now, just for a little bit of fun, go and write yourself a little program that plots a quadratic equation given ABC and a range of X. You'll soon get a feel for how it works.

Reformulating The Equation

OK, now that we have ourselves an equation, we need to re-arrange it into a form we can use; specifically we need to obtain the A, B and C co-efficients, for 3 given points on a data set. These 3 points shall be the beginning, middle, and end, as these are the easiest to calculate. (Note, the following working originally came from Mark Feldmans Texture Mapping tutorial, a very nice source of information.)

       2

u  = ax  + bx  + c

 0     0     0



       2

u  = ax  + bx  + c

 1     1     1



       2

u  = ax  + bx  + c

 2     2     2

So, we need to find the solution to these 3 simultaneous equations. Tricky. However, we can manipulate the x variables, to give us a simpler problem:

x  = leftx

 0



x  = (x0 + x2)

 1   ---------

         2



x  = rightx

 2



Subtract x  from all 3...

          0



x  - x  = 0

 0    0



x  - x  = x0 + x2 - x0

 1    0   ------------

               2



        = 0.5*x

               2



x  - x = rightx - leftx                 (width of scanline ...)

 2    0

With me so far? Then, when we put these back into the equations...

       2

u  = a0  + b0 + c

 0



             2

u1 = a(0.5*x)  + b*0.5*x  + c

            2           2



        2

u2 = a*x  + b*x  + 

        2      2

Which helps us a great deal. So, we already have the co-ef C

u  = c

 0



c = u

     0

And with this, we can place it into the equation for u2:

        2

u  = a*x  + bx  + u0

 2      2     2



And we want B out of it, so ...



             2

u  - u  = a*x  + bx

 2    0      2     2



then

             2

u  - u  - a*x  = bx

 2    0      2     2



Swap sides:

                  2

bx = u  - u  - a*x

  2   2    0      2



Divide by x ...

           2



                 2

b = u  - u  - a*x

     2    0      2

    --------------

          x

           2

So we now have B and C. Now its just a case of putting these back into an equation, to get us A:

             2

u1 = a(0.5*x)  + b*0.5*x  + c

            2           2



Meddle with those 0.5s ..



            2

u1 = a*(x/2)  + b*(x/2) + c

         2          2



        2

u1 = a*x / 4 + bx / 2 + c

        2        2



Put in our B and C ...

First c:



        2

u1 = a*x / 4 + bx / 2 + u

        2        2       0



And now b:



        2     |           2 |

u1 = a*x / 4 +| u - u - ax  |*x + u0

        2     |  2   0    2 |

              | ----------- |

              |      x      |



large equation :)



First, eliminate mul/div by X:



        2                2 

u1 = a*x / 4 + u - u - ax   + u0

        2       2   0    2 



We have a -u  and a +u  so drop them (cancels out...)

            0         0



        2            2 

u1 = a*x / 4 + u - ax

        2       2    2



Finally, after much work, the solution for a is:



a = 2*u  + 2*u  - 4*u

       0      2      1

    ------------------

             2

            x

             2

So, now we have ABC co-efficients. Now, obviously, we don't want to do doing ax^2 + bx + c in our inner loop! But, as you may know from Bezier curves, a polynomial of degree n can be interpolated with n deltas. So, now to find those elusive deltas, we need to differentiate the quadratic equation.

To do this, you only really need to know a couple of rules of differentiation:


  n      (n-1)

ax  = anx



And, drop a constant that isn't multiplied by X :)

So, lets see, we'll need 2 deltas. So, we'll have to differentiate *twice*. First, differentiate ax^2 + bx + c...

  2

ax  + bx + c



   (2 - 1)      (1 - 1)

2ax        + 1bx                        (drop c)



     1      0

= 2ax  + 1bx



= 2ax + 1b1



= 2ax + b

du = 2ax + b

I deliberatly did "b" rather verbosely, to keep things clear: remember, n^1 = n, and n^0 = 1. Now, we're still left with a term of x, so lets differentiate again:

2ax + b



   1

2ax + b



     0

1*2ax                   (drop b)



1*2*a*1



= 2a



ddu = 2a

Coding it

Now, with those variables, we just about ready to set up an inner loop. Remember that u0 has an x value of 0:

      2

u = a0  + b0 + c



u = u

     0



For width of scanline

        PlotPixel(x, y, u)

        u += du

        du += ddu

End For

Almost there! Now you just set up 3 quadratics, for red, green and blue. You could also set up quadratics for specular-red, specular-green, and specular-blue. But, with the already expensive loop setups, that may get too much.

You should be able to work out the rest for yourself. Once you've got it working, you can start to think about optimizations. Plenty of simple terms in it, eg x / 2 etc, they're easily solved. Also if you have a bunch of divides, you'd best find the reciprocals. It should also be able to store the reciprocals of x and x^2 in a table. That should help too. The inner loop shouldn't take much optimizing, its just adds. Anyway, thats about it for now, have fun with it. Tell me how you get on with it, might be interesting if someone tried this with texturing, a quadratic gouraud-texturemapper perhaps? Perhaps a quadratic interpolation along the *edges* as well? Plenty of possibilities, with this kind of interpolation.

Texture Mapping Tutorial

Introduction

I've been getting quite a few requests lately from people wanting to know some of the more basic techniques in graphics programming, so I'll start with perhaps one of the more common requests, Texture Mapping.

Basics

The basic idea of texture mapping is to map a texture, be it an image, or procedurally generated, onto the surface of a triangle/polygon, to make that surface look better without the need for geometric transformations. There are a number of techniques that fall under this matter. However, to limit the discussion, this will concentrate on linear (affine) mapping, as this forms the basis of a number of other techniques. Information on further types can be found on other websites/books, such as Mark Feldmans homepage (Win95GPE).

Filling A Triangle

Firstly, to understand texture mapping, you must know how to fill a triangle. I'll limit the disussion to triangles, as they are both the easiest and fastest to map, and are easy to understand.

To fill a triangle, we basically need two steps:

Find the left and right edges of the scanlines in the triangle
Fill them

Simple enough really. Heres a diagram to illustrate what I mean:


                Vertex 1

                  / \

                /     \

              /         \

            /*************\             Scan line Y

          /                 \

        /                     \

Vertex 2------------------------\                <-- Middle of triangle

          ______                  \

                ______              \

                      ______          \

                            ______      \

                                  ______  \

                                        ____\    Vertex 3

So, to fill the triangle, we would have to first fill the set of scanlines above the centre of the triangle:

For y = y1 to y2 do

        FillScanLine(left[y], right[y], colour)

End For

And then from the middle line to vertex 3:

For y = y2 to y3 do

        FillScanLine(left[y], right[y], colour)

End For

Now, you're probably wondering where the left[y] and right[y] come from. Well, they're not arrays (I just written it so to indicate for every scan line we will have a left and right X). They are interpolated as we scan along. The following code should illustrate what I mean (for top half of triangle):

TopX = v1.x

LeftX = TopX

RightX = TopX



DeltaLeftX = (MiddleLeftX - TopX) / (MiddleY - TopY)

DeltaRightX = (MiddleRightX - TopX) / (MiddleY - TopY)



or perhaps more efficiently:



Reciprocal = 1.0 / (MiddleY - TopY)

DeltaLeftX = (MiddleLeftX - TopX) * Reciprocal

DeltaRightX = (MiddleRightX - TopX) * Reciprocal



Then in your loop:



For y = y1 to y2 do

        FillScanLine(LeftX, RightX, Colour)

        LeftX += DeltaLeftX

        RightX += DeltaRightX

End For

So, we interpolate the start and end of the edges, and then fill inbetween them. Now, the same thing is done for the bottom half of the triangle. But -- the key is, we have to recalculate *either* DeltaLeftX, *or* DeltaRightX. We decide this by checking if the "point" on the middle of the triangle points to the left or the right:

                  / \ t = 0

                /     \

              /         \

            /             \

          /                 \

        /                     \

Point   ------------------------\  t = 0.5 (arbitrary figure: doesn't always

          ______                  \         occur at 0.5!)

                ______              \

                      ______          \

                            ______      \

                                  ______  \

                                        ____\ t = 1

Here, the "Point" is on the left, so DeltaLeftX must be recalculated. However, it could just have easily been on the right, in which case DeltaRightX would need to be recalculated. So what must you do? You must calculate the width of the middle scanline of the triangle, and depending on its sign, you must set some kind of flag to either recalculate either DeltaLeftX, or DeltaRightX. But how to calculate it? Simple. We need to obtain a value 't' which indicates how far along the line on the *opposite* side of the "Point" that the break occurs, as shown on the diagram. We then need to obtain a value for X on that side. By then performing a subtraction, we can calculate what side it is on:

TopX, TopY = co-ords of top and bottom vertices in triangle

BotX, BotY = co-ords of top and bottom vertices in triangle

MidX, MidY = co-ords of "Point" of the triangle

0 <= t <= 1 : Point where the opposite scan line is broken.



Dx = BotX - TopX

  1



Dy = BotY - TopY

  1



Dy  = MidY - TopY               (portion of opposite side in top half)

  2



    Dy

      2

t = ---

    Dy

      1



t = (MidY - TopY) / (BotY - TopY)



So we now have a value telling us the point along the opposite edge.



If we recall that the equation of a ray is:



Point = Origin + t*Direction



We can use a 2D ray to calculate the X position



Point2.X = TopX + t*(BotX - TopX)



And now he have the right and left points on the centre scanline. However,

there is a problem: we do not yet know *which* is the right, and *which*

is the left. We must perform a subtraction to tell us this, or a comparison.



Width = Point2.X - MidX

If Width > 0 Then       // Point2.x was > MidX, ie on the right side

        SetFlag RecalculateDeltaLeftX

Else If Width < 0       // Point2.x was < MidX, ie point2.x was the left side

        SetFlag RecalculateDeltaRightX

Else

        Degenerate triangle; abort!

End If

You should now have the information you need to make a simplistic triangle filler. However, its not optimal; it'll run slowly. I'll explain in the next section how to get it running better.

Vertex Tables, Indices, Etc

Now, you may be wondering about all this flag setting, breaking the triangle into two halves and so on. Doesn't sound too efficient does it? Well, no, but thats not the real way it should be implemented. I've been explaining things in a purely *logical* fashion, not code wise. Its important to avoid discussing issues of code when explaining an algorithm. Now I'll deal with how it should really be implemented.

Firstly, the main point is that we do *NOT* have two seperate loops. We have *one* loop, and at the end, we decrement two heights: one for the left hand side, the other for the right. If one height runs out, ie <= 0, then we check to see if we have more work to do: if not, we then exit. The following pseudo code should help illustrate:

Loop {

        Width = RightX - LeftX



        If Width > 0 Then

                FillScanLine(LeftX, RightX, Colour)

        End If



        Decrement LeftHeight

        If LeftHeight <= 0 Then         // out of "edge"

                If Point Of Triangle On Left Side

                        Recalculate DeltaLeftX

                Else

                        Return          // hit bottom of triangle

                End If

        Else

                LeftX += DeltaLeftX

        End If



        Decrement RightHeight

        If RightHeight <= 0 Then         // out of "edge"

                If Point Of Triangle On Right Side

                        Recalculate DeltaRightX

                Else

                        Return          // hit bottom of triangle

                End If

        Else

                RightX += DeltaRightX

        End If

}

This will ensure we can code the filler as one loop, rather than the pair of loops previously. Now, you will often *not* see this algorithm implemented in this manner. Many coders prefer to use two tables and two indices to work out whats going on with respect to the inner 'If' statement above (If Point Of Triangle...). Its often coded as:

(Just the if statement)



Decrement LeftHeight

If LeftHeight <= 0 Then         // out of "edge"

        Decrement LeftIndex

        If LeftIndex < 1 Then   // dropped off bottom of table

                Return

        Else                    // find new delta (now function call)

                If RecalculateDeltaLeftX(LeftIndex - 1, LeftIndex) <= 0 Then

                        Return

                End If

        End If

Else

        LeftX += DeltaLeftX     // simply interpolate

End If

Further to this, we need a similar statement for the right section of triangle. Also, we will need a table and an index for the left and right side of triangle respectively:

Vertex Pointer LeftArray[3]

Vertex Pointer RightArray[3]

Array Index LeftIndex

Array Index RightIndex

We will also need to fill and set these tables and indices correctly, according to what side of the triangle the point is on. That decision is based on the width of the largest scan line. If you recall, a positive width meant that the "Point" of the triangle was on the left; negative width meant it was to the right. This gives rise to the following code:

(For the purposes of this pseudo-code, arrays are 1-based)



VertexPtr1, VertexPtr2, VertexPtr3 are 3 vertex pointers, sorted top->bottom,

used for quick lookup of data.



If Width > 0 Then       // Point is to the left...

        RightIndex = 2  // only one edge to the right

        LeftIndex = 3   // two edges to the left

        LeftArray[1] = VertexPtr1

        LeftArray[2] = VertexPtr2

        LeftArray[3] = VertexPtr3

        RightArray[1] = VertexPtr1

        RightArray[2] = VertexPtr3      // will only use 2 entries



        (Calculate initial deltas, etc)

Else If Width < 0 Then  // reverse situation

        LeftIndex = 2   // only one edge to the left

        RightIndex = 3  // two edges to the right

        RightArray[1] = VertexPtr1

        RightArray[2] = VertexPtr2

        RightArray[3] = VertexPtr3

        LeftArray[1] = VertexPtr1

        LeftArray[2] = VertexPtr3



        (Calculate initial deltas, etc)

Else

        Degenerate triangle; reject

End If

That *almost* completes the basic skeleton of the code. One last detail: The calculation of the initial deltas. You may think that could just be done with a call to a function "CalculateLeftDelta" etc. However there is a special case involved. Triangles with flat bottoms/tops. If this occurs, you have to decrement the array index, calculate the delta *again*, and check for zero. Two zeros in a row indicate a degenerate triangle. Code may look like:

(For first branch of above if statement (Width > 0))



...

If CalculateRightDelta(RightIndex - 1, RightIndex) <= 0 Then

        Return          // error occured, major side of triangle is flat

End If



If CalculateLeftDelta(LeftIndex - 1, LeftIndex) <= 0 Then

        Decrement LeftIndex

        If CalculateLeftIndex(LeftIndex - 1, LeftIndex) <= 0 Then

                Degenerate triangle; reject

                Return

        End If

End If

That should be enough information; you'll need to do the coding yourself, but this should prove sufficient for coding a flat-colour filler. Obviously you'll need to reverse the above code for the opposite case in the 'If' statement.

Adding Texturing

Now the serious begins; how to map a texture onto that triangle. In all honesty, it really is pretty simple; however the maths involved can become a little involved.

Firstly, we need to interpolate u,v. U and V are our texture co-ordinates, such that:

1 <= u <= TextureWidth

1 <= v <= TextureHeight

Assuming that a 1-based array is used. Interpolating U, V is done in the same fashion for the LeftX and RightX, except we only interpolate for *ONE* side of the triangle. The reason for this is that we only need a starting point; we move along the texture by adding DeltaU to U and DeltaV to V. We don't need to know a pair of [U,V], as we are not filling between them. Also for a triangle DeltaU and DeltaV are *constant*, which again simplifies our task. More on that later. Also we need to add code to our CalculateLeftDelta to calculate our deltas along the edge.

In summary, our task is to:

Calculate constant deltas
Interpolate [U, V] along the left triangle side
Interpolate another pair of [U, V] along the scanline, starting from our left [U, V], adding a constant delta at each pixel
Grab the pixel colour by looking it up in the texturemap: Colour = TextureMap[V][U]
Keep track of and recalculate left edge [U, V] deltas where needed

Seems quite a complex task, however it is quite simple in reality.

Constant Deltas

The calculation for the constant deltas is fairly simple: its linked to the one we used someway above to calculate the other x value for the centre scan line. The basic calculation is:

da = (a*(a3 - a1) + (a1 - a2)) / width

I used the value 'a' simply to emphasize that this calculation is not unique to texture co-ordinates: it can be used for other values, such as colour, specular colour, and so on (but please not angle!). What the calculation basically does is find the step between the values of 'a' for the widest scan line, then divide it, to give an interpolation delta. For texturing, we would write:

du = (u*(u3 - u1) + (u1 - u2)) / width

dv = (v*(v3 - v1) + (v1 - v2)) / width



Or if we wanted to be a little more stylish:



Recip = 1.0 / width

du = (u*(u3 - u1) + (u1 - u2)) * Recip

dv = (v*(v3 - v1) + (v1 - v2)) * Recip



Or if we really want to be clever:



Recip = RecipTable[Width]

du = (u*(u3 - u1) + (u1 - u2)) * Recip

dv = (v*(v3 - v1) + (v1 - v2)) * Recip

These values du, dv will then be used for interpolation across the scan line.

Putting It All Together

Right now your brain is probably cooked with all of this. So, lets now put all this code together, to give us a framework for a texture mapper. I'll write in pseudo code, and then you can implement it in your favourite language. Good luck!

Vertex Pointer LeftArray[3]

Vertex Pointer RightArray[3]

Real DeltaLeftX, DeltaRightX, DeltaLeftU, DeltaLeft V

Real LeftX, RightX, LeftU, LeftV

Integer leftheight, rightheight



// This function calculates our deltas for interpolating along the left

// edge. On error, it returns zero. On success, it returns the height of the

// section, and updates the global variables accordingly



Integer Function CalculateLeftDelta(Integer index1, Integer index2)

Begin

        Vertex Pointer Vertex1 = LeftArray[index1]

        Vertex Pointer Vertex2 = LeftArray[index2]

        Integer Height

        Real recip



        // Get height of this section. If its an illegal value, return 0

        // On catching the value zero, the function will then adjust its

        // variables, and attempt a recalculation



        Height = Vertex2.y - Vertex1.y

        If Height == 0 Then

                Return 0

        End If



        // RecipTable holds values of 1.0 / x for a series of integers

        // Size of table is not defined; you will decide that based on the

        // size of your screen, as only values that are within those limits

        // will be used



        recip = RecipTable[Height]

        DeltaLeftX = (Vertex2.x - Vertex1.x) * Recip

        DeltaLeftU = (Vertex2.u - Vertex1.u) * Recip

        DeltaLeftV = (Vertex2.v - Vertex1.v) * Recip



        // Initial values; these will then be interpolated



        LeftX = Vertex1.x

        LeftU = Vertex1.u

        LeftV = Vertex1.v

        leftheight = Height



        Return Height

End



// This function calculates our deltas for interpolating along the right

// edge. On error, it returns zero. On success, it returns the height of the

// section, and updates the global variables accordingly



Integer Function CalculateRightDelta(Integer index1, Integer index2)

Begin

        Vertex Pointer Vertex1 = RightArray[index1]

        Vertex Pointer Vertex2 = RightArray[index2]

        Integer Height

        Real recip



        // Get height of this section. If its an illegal value, return 0

        // On catching the value zero, the function will then adjust its

        // variables, and attempt a recalculation



        Height = Vertex2.y - Vertex1.y

        If Height == 0 Then

                Return 0

        End If



        recip = RecipTable[Height]

        DeltaRightX = (Vertex2.x - Vertex1.x) * Recip



        // Initial values; these will then be interpolated



        RightX = Vertex1.x

        RightHeight = Height



        Return Height

End



// This is the main function for performing the texture mapping. It accepts

// an array of vertices, and a texture. It will then paint a textured

// triangle onto the screen.



Void Function TextureTriangle(Vertex Pointer vertarray[3], Texture texture)

Begin

        Vertex Pointer vert1, vert2, vert3, verttemp

        Integer TopY, MidY, BotY, cury

        Integer height, widest, x, x1, x2, width

        Integer leftindex, rightindex

        Colour colour

        Real t, deltau, deltav recip, curu, curv



        // Get the pointers to the 3 vertices



        vert1 = vertarray[1]

        vert2 = vertarray[2]

        vert3 = vertarray[3]



        // Now sort them based on ascending Y



        If vert1.y > vert2.y Then

                Swap(vert1, vert2)

        End If



        If vert1.y > vert3.y Then

                Swap(vert1, vert3)

        End If



        If vert2.y > vert3.y Then

                Swap(vert2, vert3)

        End If



        // Calculate the value of t, described above, and the widest width



        t = (MidY - TopY) / (BotY - TopY)

        widest = (vert1.x + t*(vert3.x - vert1.x)) - vert2.x



        If widest > 0 Then

                rightindex = 2

                leftindex = 3

                LeftArray[1] = vert1

                LeftArray[2] = vert2

                LeftArray[3] = vert3

                RightArray[1] = vert1

                RightArray[2] = vert3

        

                If CalculateRightDelta(rightindex - 1, rightindex) <= 0 Then

                        Return

                End If



                If CalculateLeftDelta(leftindex - 1, leftindex) <= 0 Then

                        Decrement leftindex

                        if(CalculateLeftDelta(leftindex - 1, leftindex) <= 0 Then

                                Return

                        End If

                End If

        Else If widest < 0 Then

                leftindex = 2

                rightindex = 3

                RightArray[1] = vert1

                RightArray[2] = vert2

                RightArray[3] = vert3

                LeftArray[1] = vert1

                LeftArray[2] = vert3

        

                If CalculateLeftDelta(leftindex - 1, leftindex) <= 0 Then

                        Return

                End If



                If CalculateRightDelta(rightindex - 1, rightindex) <= 0 Then

                        Decrement rightindex

                        if(CalculateRightDelta(rightindex - 1, rightindex) <= 0 Then

                                Return

                        End If

                End If

        Else

                Return

        End If

                

        // Now we must calculate our constant texture deltas.



        recip = RecipTable[widest]        

        deltau = (vert1.u + t*(vert3.u - vert1.u)) - vert2.u)*recip

        deltav = (vert1.v + t*(vert3.v - vert1.v)) - vert2.v)*recip

        cury = vert1.y



        InfiniteLoop

        Begin

                x1 = LeftX

                x2 = RightX

                width = x2 - x1

                If width > 0 Then

                        curu = LeftU

                        curv = LeftV

                        For x = x1 to x2 do

                                colour = Texture[curv][curv]

                                PutPixel(x, cury, colour)

                                curu += deltau

                                curv += deltav

                        End For

                End If



                cury++

                Decrement LeftHeight

                If LeftHeight <= 0 Then

                        Decrement LeftIndex

                        If LeftIndex < 1 Then

                                Return

                        Else

                                If CalculateLeftDelta(LeftIndex - 1, LeftIndex) <= 0 Then

                                        Return

                                End If

                        End If

                Else

                        LeftX += DeltaLeftX

                        LeftU += DeltaLeftU

                        LeftV += DeltaLeftV

                End If



                Decrement RightHeight

                If RightHeight <= 0 Then

                        Decrement RightIndex

                        If RightIndex < 1 Then

                                Return

                        Else

                                If CalculateRightDelta(RightIndex - 1, RightIndex) <= 0 Then

                                        Return

                                End If

                        End If

                Else

                        RightX += DeltaRightX

                End If

        End

End

S-Buffering; The Latest Fad In Software Rendering

Introduction

S-Buffering is pretty much one of the latest crazes in software rendering, especially since the release of Quake. (Update: I'm not sure if Quake uses S-Buffers exactly, or if its a variation on Edge Tables. I'll try and find out ... ) But what is it? It was originally described in a FAQ by Paul Nettle. However, I have seen literature being referenced going back much further than that. In simple, S-Buffering is used to reduce overdraw, by sorting and splitting spans. Hence Span-Buffering. Its often used where there is a large overhead when writing a pixel; for example perspective texture mapping, or true phong shading. It works best with systems dealing with a small-medium polygon load, and a large per-pixel overhead, with large polygons.

Fundamental Concepts

Span buffering is built about the concept of a span. But what is a span? A span is simply a horizontal row of pixels, all on the same scanline (Y), with a start, an end, and some fill information:


X                               <- Pixel

XXXXXXX                         <- Span

AAAAABBCCCCCDDDEE               <- Row of screen built from multiple spans

When rasterising our polygons, we convert them to spans, and insert them to some data structure. Commonly, this data structure is a linked list, which, has its benefits. However, I feel that a better structure for this is a binary tree (greets Jazzvibe :). You'll soon realize why later on.

Also we shall present spans to the renderer in front->back order. This means that we must clip new spans against existing spans; so that the new spans only fill "new" portions of screen. For example:

C = current span

N = new span



        CCCCCCC

            NNNNNNN

If we were to insert that span, we would first clip its left edge against the "current" span:

        CCCCCCC

               NNNN

Then we would insert it to the right branch of "current"s binary tree; or, if a branch already exists, we would then traverse that sub-branch.

This presents us with the problem of working out how to handle each case and sub-case of span-overlap; its quite an extensive problem, and is the key to obtaining fast performance from an s-buffer.

Span Overlap

There are a number of cases that can occur when inserting spans; however a lot of them are similar, and so we can build an if() tree to handle them.

C = Current

N = New



1)      CCCCCCCCCCC

         NNNNNNNNN



2)          CCCC

        NNNNNNNNNNN



3)      CCCCC

             NNNNN



4)           CCCCC

        NNNNN             



5)      CCCCCC

           NNNNNNN



6)          CCCCCC

        NNNNNNN



7)       (no span)

        NNNNNNNNNNN

Now, most of these are similar, and easy to solve. Lets see what we need to do for each case:

Reject the new span, totally obscured. Trivial reject
Break the new span into two pieces, and recur with them, or build new tree branches with them
Either insert the new span to the right tree branch, or continue processing with curr->right tree branch. Trivial accept/loop cycle
Either insert the new span to the left tree branch, or continue processing with curr->left tree branch. Trivial accept/loop cycle
Trim off the portion of span thats obscured, and then perform (3) with the resulting piece. Note you will have to adjust texture pos etc
Trim off the portion to the right, and then perform (4) with the resulting pieces.
Simply use this span to root the tree

Data Structures

Now, you may be wondering what kind of data structures we will need for this. Well, two things are needed; a table of span pointers for every scanline, and a span structure. Something like:

Structure Span

        Integer x1

        Integer x2

        Integer Width

        Colour colour

        Texture Pointer texture

        Integer u

        Integer v

        Integer du

        Integer dv

        Span Pointer left

        Span Pointer right

End Structure



Span Pointer spantable[YResolution]

Initially, spantable will all be set to NULL. Also, as each new span is allocated/freed, its left and right members will also be set to NULL. These pointers will then be updated as we go. When we are complete, we will have a binary tree, storing that scanline. And, with this tree, we can traverse it, to give us scanline order - more on that later.

Now, some notes on inserting spans. Where above I said "insert" the span, I meant insert it to the part of the tree, so if you have a span that is totally to the right of the current span, you would do something like:

If Span.x1 > Current.x2 Then (totally to the right)

        If Current.right == NULL Then

                Current.right = Span

                Return

        Else

                Current = Current.right

                Next Loop

        End If

End If

A similar piece of code would be used for the left. Note that in the above cases, span overlap cases that are not trivial accept/reject will be reduced to that by the use of clipping. Then it will simply become a case of inserting the span, or traversing the corresponding branch.

Pseudo Code For Insert Routine

The insert routine is perhaps the most critical routine in an S-Buffer engine; every span must pass through it, both its coding and design must provide for efficient operation. If the routine is slow, then inserting the span will take longer than the overdraw would have cost. Likewise if a very large number of polygons are processed, the benefits will disappear, as insert time rises sharply with the number of polygons, and this growth is only compensated for by the level of overdraw; too little overdraw, and it'll work *slower* than painters. With plenty of overdraw, it'll give speed gains.

A general "rule of thumb" for working out the efficiency is quite simply; the efficiency is the average time taken to insert a span, multiplied by the number of spans, divided by the level of overdraw. Its not very accurate, but it gives a crude estimate of the efficiency.

This should insert a span to the span tree. Note it doesn't handle case (7),

that is simple enough to do.



Subroute InsertSpan(Span Pointer span, Span Pointer current)

        While((current != NULL) And (span != NULL))

                If span.x1 > current.x2 Then

                        If current.right == NULL Then

                                current.right = span

                                Return

                        Else

                                current = current.right

                                Next While

                        End If

                Else If span.x2 < current.x1 Then

                        If current.left == NULL Then

                                current.left = span

                                Return

                        Else

                                current = current.left

                                Next While

                Else If span.x1 >= current.x1 Then

                        If span.x2 <= current.x2 Then

                                Free(span)

                                Return

                        End If



                        If span.x1 <= current.x2 Then

                                (you should adjust u, v here)

                                span.x1 = current.x2

                                span.width = span.x2 - span.x1



                                If current.right == NULL Then

                                        current.right = span

                                        Return

                                Else

                                        current = current.right

                                        Next While

                                End If

                        End If          

                Else If span.x1 < current.x1 Then

                        If span.x2 > current.x2 Then

                                newspan = NewCopyOfSpan(span)



                                span.x2 = current.x1

                                span.width = span.x2 - span.x1



                                newspan.x1 = current.x2

                                newspan.width = newspan.x2 - newspan.x1



                                If current.left == NULL Then

                                        current.left = span

                                        span = NULL

                                Else

                                        InsertSpan(span, current.left)

                                End If



                                If current.right == NULL Then

                                        current.right = newspan

                                        Return

                                Else

                                        InsertSpan(newspan, current.right)

                                End If

                        Else If span.x2 <= current.x2 Then

                                span.x2 = current.x1

                                span.width = span.x2 - span.x1



                                If current.left == NULL Then

                                        current.left = span

                                        Return

                                Else

                                        current = current.left

                                        Next While

                                End If

                        End If

                End If

        End While

End Subroutine

Painting The Span Tree

Painting the span tree is simple enough, just a recursive process. However, recursion may not be the most efficient process for this; I've been toying with the idea of including a span pointer called "parent", to let me climb back up the tree, without using recursion. Haven't tried it yet, but I might do soon. But, for now, heres pseudo code for a function to draw the span tree:

Subroutine DrawSpanTree(Span Pointer root)

        If root.left != NULL Then

                DrawSpanTree(root.left)

        End If



        DrawSpan(root)



        If root.right != NULL Then

                DrawSpanTree(root.right)

        End If

End Subroutine

This routine is quite special; it gives us ascending X order. This is handy, because it will maximize cache access. If you consider that your painters algorithm or Z-Buffer render may be passing it polygons that could appear anywhere. You could have one in the top left corner, then one in the bottom right, then one in the centre, etc, etc. With S-Buffer, we are going from top->bottom, then left->right. Very handy.

Again, this function needs to be optimized for fast performance. Also, I think it might be interesting to see if you can come up with a way of balancing the tree, so that both less recursion is used, and also the insert time should be reduced. If you consider the tree:

                        A

                 /------|-------\

                B       B        B

              /

            C

          /

        D

      /

    E

Then inserting to (E) will be fairly expensive, as you have to go further down the tree, examine more spans, and so on. But inserting to (B) will be quick. However, the tree:

                        A1

                 /------^------\

                B1              B2

              /   \           /   \

            C1      C2      C3       C4

          /   \   /   \   /   \    /    \

         D1    D2D3    D4D5    D6 D7     D8

Will, on average, have a roughly similar insert time for each level of the tree. Inserting to any (C) will be a similar speed, as will (D) or (B). Note that I say similar; tree structure is just one part of getting increased speed; organizing the tree to have the minimum number of clipped spans will also help matters, and even more so if you reduce the number of broken spans. Coming back to this tree though, run the DrawSpanTree pseudo-code through you head. You should find that we get the order: [D1, C1, D2, B1, D3, C2, D4(, etc...)]. Thats the order of increasing X, another benefit.

Also note that polygons over triangles will give an increased speed using S-Buffers, due to the reduction in the number of spans to process. Consider:

|------------|                          |------------|

|AAAAAAAAAAAA|                          |AA\BBBBBBBBB|

|AAAAAAAAAAAA|                          |AAAAA\BBBBBB|

|AAAAAAAAAAAA|(1)               vs      |AAAAAAAA\BBB|(2)

|AAAAAAAAAAAA|                          |AAAAAAAAAAA\|

|------------|                          |------------|

Case (2) will give us twice as many spans to insert as case (1). Similar increases may be found as the number of vertices increases.

Another point to consider is that of trivial rejection; if we could somehow build a structure containing the bounding spans of spans, then we could further increase the speed of trivial rejection. For example:

        AAABBB  CCCCCCCC  DDEEFF        GGGGGGGGGGGGGGGGGG



Could have a structure, stored in addition to the span tree, that stores:



        AAABBB  CCCCCCCC  DDEEFF        GGGGGGGGGGGGGGGGGG

        111111  22222222  333333        444444444444444444



So that if he tried to insert a span Z:



        AAABBB  CCCCCCCC  DDEEFF        GGGGGGGGGGGGGGGGGG

        111111  22222222  333333        444444444444444444

                                          ZZZZZZZZZZZZZ



It could be quickly rejected, as long as G was not the tree root, say a part

of the tree was

                D

                  \

                    F

                      \

                        G

I also tried a "span mask" to try and reject spans quickly. What I did was keep a bit mask of the pixels that were currently covered by spans, updating it as new spans were inserted. However, it had a flaw: It was crap.

Well, thats all I can think of for now. I'm going to explore the concept of spans a little further though, they seem pretty useful in a non-3D-accelerated system.

Introductory Radiosity

Radiosity is one method in current use that attempts to solve the Global Illumination problem. The Global Illumination problem is concerned with the interaction of polygons (aka patches) within a system, attempting to calculate the various ways light interacts within a scene. Radiosity is one of the more popular methods, due to its ability to model diffuse interaction, producing attractive and realistic images. In addition, it can be used as a pre-process, which makes it ideal for realtime 3D engine lighting.

Introduction

Radiosity was originally used for calculating heat transfer within closed environments. Later, researchers at Cornell University adapted it to lighting, in Computer Graphics. Much work has been done on the technique at Cornell University, as well as a number of other places. The most important thing about it is that it calculates the enrgy transfer between surfaces, rather than just the direct lighting by a light source. For example, take a scene with just a square box-type room, that is completely enclosed, so that no light can enter or escape the scene. Now, imagine that room has a light source at the top of the room, with a positive emittance value.

Now, in conventional rendering, only the floor directly opposite the light would be illuminated. But we know from real life that the other walls too are illuminated, even though they may only see the light source partially or perhaps not at all. So where does the light come from? The light is transferred to these surfaces by the surfaces that originally received the light from the source. This is secondary illumination. And, then the light from these surfaces will then bounce off into the scene, hitting more surfaces, and illuminating them, until all the energy within the surface has been used up. Often you will notice that a lot of Radiosity demonstration images have such box-like scenes, perhaps with the addition of a few more boxes inside to demonstrate shadows.

Shadows are incorporated into this system automatically. Any surface that is not a light source, does not emit light. And it it does not emit light, then the only way it will be illuminated is if light strikes it. Now, for light to illuminate this surface, then it either needs to be in a position where light can be transferred to it directly from a light source, or indirectly via a secondary polygon. However, if a surface is not in a position for this interaction to take place, them it will not get any light, and it will still be in shadow. In a similar way, a surface will appear darker if it is in a place where little light arrives at, either because not much light can arrive there due to intervening patches, or because most of the light energy will be used up before it arrives there.

Mathematical Foundation

To calculate radiosity, we are concerned with 3 variables:

Reflectance. How much energy does a patch reflect?
Emittance. How much energy does a patch put into the system?
Radiosity. What is the final amount of energy incident at a patch?

Each variable has its own symbol. Reflectance is represented either by a capital 'R', or the greek symbol "rho". Rho looks like a p, except the tail is a little difference. I'll refer to reflectance by R, however in some books and papers you will see it referred to as rho. Radiositity is represented by a 'B' (Presumably to avoid confusion with Reflectance). Emittance is represented by an 'E'.

We need to find the total amount of energy at a patch. This is found by summing the emittance value of the patch, plus all the light arriving at that patch from other patches. To calculate the amount of light transferred between two patches, we use a value known as the form-factor, which is the fraction of energy that leaves one patch, travels through space, and hits another patch. Now, when dealing with form factors, you'll often see notation such as dA, and dFij. The 'd' means differential. This means, for example, that dAi, is the area of a small piece of patch i. Also, theres a lot of integration involved with these calculations. But, the form factors are as follows (ref. Computer Graphics, Principles + Practice):

Form factor from differential area of patch i to differential area of patch j



          cos(thetai)*cos(thetaj)

dF      = ----------------------- * H   * dAj

  di-dj         PI * r * r           ij





Where





cos(thetai / thetaj) is the angle between the ray between the centres of

two patches, times by either the normal of patch i, or patch j



r = Distance between the two patches

H(i, j) = Visibility function: Can the two differential areas see each other?

dF      = Differential form factor from i-j

  di-dj

dAj     = Differential area of j



Now, to get F    

             di-j



(Which is the form factor between the differential area of i, and the area

 of j)



        /

        | cos(thetai)*cos(thetaj)

F     = | ----------------------- * H  * dAj

 di-j   |       PI * r * r           ij

       /

       Aj



Which means we have to integrate over the area of patch j.



And finally, to find the form factor between I and J, we need another integral,

this time over the area of j...



       /  /

       |  | cos(thetai)*cos(thetaj)

F    = |  | ----------------------- H   * dAj * dAi

 i-j   |  |      PI * r * r          ij

       /  /

       Ai Aj

So what the hell does all that mean? Well, lets dissect it piece by piece.

The product of the two cosines calculates how much they face each other. Remember, cos(0) = 1, through to cos(90) = 0. So, if the ray between the two patches is the same as one of the normals, cos(theta) = 1. Now, as the ray tilts away from the normal, cos(theta) decreases, down to zero. Now, this occurs for both normals, and so the product of the two cosines tells us how much the patches face each other.

(PI * r * r) is used to give us the distance between two patches, which tells us how strong the light is when it travels from one patch to another. The closer to patches are, the greater the intensity, because the light has less distance to travel. However, the further apart they are, the less the intensity, because the light has further to travel.

Hij tells us if the two patches are visibile. It return 0 if they are not visible, because somethings in the way, or 1 if they are both visibile. In a practical implementation, you may want Hij to vary between 0 to 1, because you might have a situation where say 50% of one patch is visible from the other, and doesn't fit into either category of visible or invisibile very well.

Radiosities

Now we know wha values we are dealing with, we need to know how to combine them, to form the solution. The equation is simple enough:

                     \---              Aj

        B  = E + R *  >     B  * F   * --

         i    i   i  /---    j    j-i  Ai

                   1<=j<=n

Which translates as "The radiosity of patch i is equal to the amount of light patch i emits, plus all the light incident on patch i, times by the reflectivity of i". However, theres is a catch. That equation is circular, ie it relies on the result of itself.

The original method used to solve this equation was to re-arrange it, then construct a big matrix of simultaenous equations, and then solve that matrix. However, this method consumes a large amount of memory, and is not really practical. I've implemented it, and it causes Windows 95 some serious disk thrashing problems :). However, if you are interested in doing it, then "3D Computer Graphics 2e" supplies all the information you need to do it. The bit about the Gauss-Seidel at the end of the chapter, while it looks hard, is actually very easy to solve, but if you get stuck, I can help you out with it.

Nowadays, the preferred method is to use a method known as Progressive Refinement. The idea here is that rather than doing one huge piece of calculation to find our solution, we find the solution in small increments, slowly arriving at the final solution. So, we can now lose the large memory and precalculation costs, at the expense of having a slightly more complex implementation.

The fundamental idea is a switch, between "gathering" light from patches, and "shooting" light to patches. In gathering, the process is very much like traditional ray tracing. In ray tracing, we take the pixel, and shoot rays back from the pixel into the environment, and work out how much light arrives at the point. However, this can cause problems. We're doing things backwards. For example, traditional raytracing won't pick up things like those high intensity patches of light you can create using a magnifying glass.

Gauss-Seidel, or "gathering" works in this backwards manner. Given a patch, we're trying to work out what patches contribute light to it. However, there is a problem. Remember how the radiosity equation was circular, ie it depended on itself? Well, we can't work out the energy passed to a patch until we've worked out the radiosity of that patch, and we can't work out the radiosity of that patch until we've worked out the radiosity of the patches that contribute light to that patch, and so on. We get a big system of simultaneous equations. A rough way of thinking of simultaenous equations is 2 or more equations, each sharing the same variables, but you're not given enough information to solve them on site. You have to manipulate them, and re-arrange them to try and get a solution.

And this is done using a Gauss-Seidel iteration in Radiosity. However, as I said before, it is difficult, and the alternative is much easier.

Progressive Refinement

In this method, we don't gather light, we shoot it out into the environment. At each stage, we need to find the patch with the most potential energy. A patch can hold energy either because it is a light source, and so emits energy, or because energy has been passed to it from some other patch. So, we select the patch that will give the most energy. We then work out how much energy is passed from that patch to all the other patches that it can see, and distribute the energy amongst those patches accordingly. Once energy has been passed to those patches, whether they are light sources or not, they too then have potential energy to shoot into the environment. We also then zero out the potential energy in the patch that shot its energy, to avoid processing it twice. We then re-test all the patches, with their new potential energies, and distribute the energy of the highest patch into the environment. This then forms a loop, which continues until the amount of energy being passed around is so small that it won't make much difference, and we can just halt the loop.

To calculate the amount of energy shot from one patch to another, we need to reverse the form factor. Given a form factor Fij, we re-arrange it to get Fij * Ai / Aj, where A is the area of a given patch. Given this value, we can now calculate the amount of energy shot between two patches, rather than the amount of energy coming in to a patch, as we previously had.

We also need to be able to distribute the radiosity about the patches in the environment. To do this, we'll use values known as "deltarad". These values are the energy a patch will shoot out into the environment, not its overall radiosity. Initially, in the start-up section of our code, these values are set to the emittance values of the patch. If a patch doesn't emit light, then its emitter will be set to zero, and so its deltarad value will be zero. If however it does emit light, then the deltarad value will be set to the strength of that light. And so, when we enter the loop, those patches which emit light will be processed first.

To transfer the energy, we use some simple formulae. We calculate a value called 'energy', which is the energy passed from patch i to patch j, weighted accordingly by the reflectivities, the areas, and the form factor. We then add this energy onto both the radiosity and the deltarad values, so incrementing the energy incident on a patch, and also incrementing the amount of energy a patch will shoot out back into the environment. 'Energy' is calculated using the simple formula:

energy = Rj * DeltaRadi * Fij * Ai / Aj

Now, remember that patch j is the receiver, and patch i is the emitter. Now, lets look at the formula term by term. Rj is the reflectivity of patch j, which could essentialy be viewed as its colour. Blue patches reflect blue light, red patches reflect red light. So, that term will filter out the light components we're interested in, for our emitter. Secondly, the value DeltaRadi is simply the amount of energy that the patch is emitting. The final piece, Fij * Ai / Aj is the reversed form factor, telling us what fraction of energy leaving patch is strikes the surface of patch j.

Now, thats all the information we need to code our loop. The basic algorithm of the loop will be as follows:

Set up all deltarad values to the emittance value of the patch



While not converged

        Choose a patch that will give us the most energy

        Determine all the form factors

        For every patch that is visible

                Calculate 'energy'

                Add 'energy' to both Rad and DeltaRad

        End For

        Set patches DeltaRad to zero

End While

Now, most of this is pretty simple coding, apart from determing the form factors, which I'll come to a little later. However, choosing the patch to shoot its radiosity can cause a little problem. See, when I first wrote this, I decided that I would be clever, and precalculate the shooting order of the patches, by sorting them based on emit * area. However, this doesn't work, because the deltarad values change dynamically. Stupid mistake! So, what you have to do instead is do one pass through the patch list, and find the patch with the greatest deltarad * area. Then, if the potential energy of this patch is still greater than some tolerance value, shoot the patch, else return a code to break the loop. When all this is in place, you'll have a basic Radiosity engine, missing one part: Form Factor calculation.

Form Factors: Hemicube

The hemicube is perhaps the simplest algorithm to code for your first radiosity processer. You will have most of the code required already, somewhere on your hard disk. Its just a case of dropping in some code to calculate the delta form factor buffer, and summing them up. The basic idea of the hemicube is that we take a hemicube, ie half of a cube, drop it on top of the patch, line it up so that it lies on the patch correctly, ie the top face is parallel to the plane of the patch, and its centered on the patch. We then render the scene, from each of the faces of the hemicube, and for every pixel of that face, we calculate the patch which is visible from that pixel, and add the delta-form-factor for that pixel, to the form factor for the corresponding patch.

Determing which patch is visible at a given pixel can be done using a zbuffer. In addition to storing the Z value, we also store the patch ID. However, since my engine uses a BSP tree, I can simply use a system similar to a 1-bit draw flag. In my hemicube code, I simply store an array of pointers to patches. These pointers are set to NULL before rendering the face. Then I do a front-to-back traversal of the BSP tree. When filling the pixel, I check to see if the value of the pixel is NULL. If not, I set the value of the pixel to a pointer to the patch being rendered. If it is NULL, then I simply skip that pixel. It works very quickly.

If you have a hardware 3D accelerator, you're even luckier. All you have to do in this case, is clear the buffer to zero. Then, you just have to assign a colour to each patch. Don't do something stupid like saying "oh this patch can be blue, and this one can be red". Just use a counter, starting from 1, going up to the number of the patches. This counter will then be your colour. Render the scene, then scan through the screen, and if the pixel is not black, then the value is simply your index into your patch array. (Note that if you're using C, you will have to subtract one from the colour, because C array indexing starts at zero). If you're really crafty, you may also realize that the form factor gives a value approximating the visibility of one patch from another, which might be of use in some kind of visibility pre-calculation operation ... :)

The last thing you need to know is how to calculate the delta-form-factor buffers. We store 2 buffers, one for the top face, and one for the side face. We use two formulae to calculate the values for the buffer:


                         1

TopFormFactor = ----------------- * DeltaArea

                       2   2     2

                PI * (x + y  + 1)





                         z

SideFormFactor = ------------------ * DeltaArea

                        2    2    2

                 PI * (y  + z + 1)

In those formulae, x, y and z were to corresponding co-ordinates on the hemicube, and DeltaArea is the area of pixel on the hemicube. Now, in the hemicube, x and y will range from -1 to 1, and z will range from 0 to 1. So, for your top face, you will iterate x and y from -1 to 1. Your DeltaArea will be dx * dy, where dx and dy are 2 / res, where res is the resolution of the face. For the side, y will iterate from -1 to 1, and z will iterate from 0 to 1. DeltaArea in this case will be dy * dz. Save these values into two buffers.

Now to get the form factors, we need to work out the total form factor for a patch. This is given by summing the delta-form-factor value for every pixel that the patch projects on to. So, what we do is simple. Set all the initial form factor values to zero. Now, loop through every hemicube face, and every pixel on that hemicube face. If there is a patch projected onto that face, ie the patch ID for that pixel is not NULL, then we add the corresponding delta-form-factor value for that pixel, onto the form factor for the patch. When we have completed the loops, we then have some form factors to use for our calculations.

Right, well that should do to serve as an introduction. You should be able to code a basic system from that information, however it won't be perfect. You still need things such as patch subdivision, more accurate form factor computations etc to get a nice system. However, I haven't finished with Radiosity myself, I'm still pretty much researching into it at the moment.

References:

Computer Graphics, Principles + Practice
3D Computer Graphics 2e
Advanced Aniamtion And Rendering

Particle Systems

Introduction

Particle systems are a technique for modelling things that can't really be modelled with other approaches. Things such as explosions fall into this category. And this file will be concerned with modelling explosions using particle systems

The Basics

A particle system is a collection of independant entities, known as particles, which are animated using a set of rules, with the intention of modelling some effect. The specific way in which particles are displayed, moved, and interact is generally specific to the particle application that they are being used for.

A Particle is in general a single point in space. This point is assigned some attributes or characteristics, and animated over time. Common characteristics for a particle are:

Position
Speed
Energy
Direction

As the animation progresses, these characteristics are used, modified, and updated. For example, at time = 0, a particle may have a very high velocity, and a lot of energy. As time progresses, this velocity and energy will decrease, and its path will change.

The general algorithm for a particle will be:

Set up particle



While Animation In Progress

    If Particle Not Dead Then

        Add Particle Direction * Speed To Particle Position

        Add Particle Acceleration To Particle Speed

        Modify Particles Speed

        Modify Particles Energy



        If Particles Energy < Threshold Then

            Mark Particle As Dead

        End If



	If Particle Hits Object Then

	    Modify Particles Position, Direction, Speed and Energy

	End If



	Display Particle

    End If

End While

When it comes to displaying a particle, we can choose many approaches. We can set a particle to be a single pixel. We can scale a particle according to its distance from the viewer. We can make a particle illuminate its environment. We can even place a sprite in the particles position, or a 3D model. My personal favourite is to add the colour of each particle to the screen, clipping it into the legal range, then blur the screen afterwards. This gives the effect of many close particles illuminating a region, and it also smooths out the pixellated effect of the screen. The blur will also give a primitive trail to the particles.

Explosions

Explosions are one of the easier, and most impressive things to model using particle systems. To do an explosion, we generally set all particles to have some common point of origin, either a single, finite point, or randomly distributed within some sphere. We set them all to have directions and velocities which will shoot them away from the point of explosion. We also set them all to have high energy. We then set the system running...

In general, there are two kinds of explosions we can model. These are airborne explosions, and explosions at some point of impact. The two different kinds behave very differently. In general, an airborne explosions particle directions will be distributed in a sphere, centred at the point of explosion. An explosion based on a point of explosion will have a particles distributed in a hemi-spherical direction, where the sphere is cut by the plane where the explosion occured, such as a ground, wall, or model plane. The particles will travel generally in the direction of the planes normal, but randomly spread out.

Airborne Explosions

For an airborne explosion, we set the point of origin to be the point of explosion, perhaps randomly distributed within a sphere. We set all directions to be randomly distributed about a sphere. We then normalize the directions, and find some random number, which will be our speed. This poorly-drawn diagram should help clarify this:

To set up the particle, the following pseudo-code will create the particle, in 2D space. I use 2D for simplicity of explanation: 3D is just a case of adding an extra dimension:

randomvec.x = random(distribution) - distribution / 2

randomvec.y = random(distribution) - distribution / 2

Normalize(randomvec)

dist = random(distribution)



particle.x = explosion-x + randomvec.x * dist

particle.y = explosion-y + randomvec.y * dist



dir.x = random(width) - width / 2

dir.y = random(height) - height / 2

Normalize(dir)

dist = random(size)



particle.dir.x = dir.x

particle.dir.y = dir.y;

particle.speed = dist + 1

particle.energy = 1.0

This is an adapation of the code that I use. Here, distribution is some value used to spread the origins out about the centre of the explosion. Width and Height are used to vary the shape of the explosion. If width == height, then a spherical explosion is generated. Else, the explosion is elliptical. Elliptical explosions generally look better, in my opinion, because they look more natural, random, and less artificial.

Impact Explosions

I'll limit the discussion of these kinds to a simple ground-based type, because arbitrary types tend to complicate matters. In general, these kinds of explosions work the same as air-based, except for the fact that we are confined to one half-space of a plane. In this case, this half-space will be space above the ground. Things don't explode underground in solid soil. Our code will generally be the same as the first type, except that dir.y will be confined to one direction, the positive direction. So all particles will be forced to go upwards, though the amount will vary. So our code will look nearly the same:

randomvec.x = random(distribution) - distribution / 2

randomvec.y = random(distribution)

Normalize(randomvec)

dist = random(distribution)



particle.x = explosion-x + randomvec.x * dist

particle.y = explosion-y + randomvec.y * dist



dir.x = random(width) - width / 2

dir.y = random(height)

Normalize(dir)

dist = random(size)



particle.dir.x = dir.x

particle.dir.y = dir.y;

particle.speed = dist + 1

particle.energy = 1.0

Simply note that we don't subtract height / 2 or distribution / 2 from the Y components of the vectors.

Running The System

Once you have set up your system, you need to run it through. First, I'll discuss the movement of the particle system, then the rendering of it.

Movement

Generally, a basic movement system is all we need. We need to account for the following effects:

Movement due to particles own individual force
Movement due to gravity
Movement due to collision with objects

We ignore movement due to particles impacting with each other, to keep the speed up, and simplify the code. In any case, it wouldn't add much to the simulation.

Movement due the particles own force is easy to simulate. We just add the particles direction, multiplied by the particles speed, to the particles position. You can also decrease the speed at each frame. You have to be careful about that though, because if you reduce it too quickly, the particle will stop dead, then fall to the floor. The code is simple enough:

Particle.x = Particle.x + Particle.Speed * Particle.Direction.x

Particle.y = Particle.y + Particle.Speed * Particle.Direction.y

Movement due to gravity is a piece of cake. You simply have to make the particle acclerate towards the floor. If I remember rightly, the speed of gravity is about 31 feet per second squared. And remember, thats *acceleration*, not a constant speed. Thats easy enough to test. Just drop some stuff. Objects don't float, they speed up. Though the increase in speed is hard to notice because in general objects fall to earth quickly, it is present.

Movement due to collision with objects is easy to simulate too. If a particle impacts an object, such as the floor, you have to reflect its direction about the normal of the object it hit, and reduce its speed. To recap, the formula for reflecting a vector is: R = 2.0 * (N * L) * N - L, where N is the normal, and L is the negative particle direction. In cases such as walls, or the ground, then only one component will be non-zero, such as Y for the floors. In this case, you can get away with just negating the Y direction. Also, be sure to decrease the particle energy. It makes sense that some energy will be absorbed into the ground. If you use blurring, you will also notice that you get this strange kind of flame whoosh up into the sky when a lot of closely spaced particles bounce off the ground. That looks very nice.

Rendering the Particle System

We'll only need two routines here: One to draw the particle, and one to blur the screen. The routine to paint a particle is simple enough. Just take the colour of the background, add the particles colour, clip to the valid range (eg 0..255), and put the pixel back to the screen. This will make areas of the screen light up, according to how many particles cover that pixel.

The blur routine is simple enough too. Just take a pixels 4 surrounding neighbours, average them, and subtract some damping constant. Because of the subtraction, the value may go below zero. No problem, just clip to zero. And thats it! Just create the particles, the enter a loop, moving, painting and blurring, and you've got an explosion. Set off a new explosion when the current one dies out, and thats it. So simple...

Surface Caching

Surface Caching is a new technique used to speed up lighting of faces in engines. It allows the engine to use a much-simplified texture mapping routine, so increasing the speed of the engine. In addition, it provides a second layer between the rasteriser and the texture data, meaning that textures of different formats or bit depths can be displayed, using the same single rasterisation routine.

Introduction

Central to all caching algorithms is time. Caching is used to speed things up. Caches operate using time as a basis for cache replacement. And a caches data will only be relevant for a given instance in time.

A surface is considered to be a collection of joined, adjacent, coplanar polygons. A texture map is then stretched to fit across the surface. This texture map is then mixed and blended with a light map, giving us a surface texture for the surface. This map will then be applied to the surface at rendering time.

A cache is considered to be a collection of similar data structures, each holding a piece of information. Each cache slot has a variable telling us how old its data is. The memory allocated to the cache will be reused, so reducing the overall memory requirements.

Surfaces

I use the concept of a surface for a number of reasons. Firstly, it is a natural adaptation of what we see in real life. We don't think of a table top as being a triangulated mesh, we think of it being a surface. Grouping together faces in this manner also produces a data structure, which is coherent, and that we can use as a bounding volume. If a surface fails to intersect the frustum, all polygons belonging to that surface can be discarded. Also, a single light map can be calculated and allocated to one surface. This brings down the volume of calculations needed, brings down the memory requirements, and also ensures that lighting and texturing is smooth across faces.

Building surfaces is a simple trick. What we need to do, is to find all the polygons lying on a plane. We then pick a starting polygon. From this polygon, we walk through its vertex list, obtaining all the polygons connected to this vertex. If a connected polygon has no surface, then we flood to that polygon, set its surface to be our current, new surface, and continue. If, when processing is complete, we find that there are still polygons left, on this plane, with no surfaces, then we simply make a new surface, and begin with a polygon from the set. We repeat this until no polygons are left, globally. When this is complete, we are left with a list of surfaces, with each surface having a list of polygons on that surface.

It is worthwhile to note that when comparing plane equations, some kind of epsilon or tolerance value is needed. This is because when you calculate the plane equations, adjacent polygons can have approximately the same normal, but with slight, very small variations. This needs to be compensated for, by use of an epsilon variable.

After constructing the surface, we need to find its extents. This is simply a parallel projection down to 2D, by find the most dominant axis of the normal, and discarding the corresponding vector component from the vertices. For example, if the Y component was most dominant, we would discard it, giving us X and Z for our 2D X and Y. We then stretch a 2D bounding box to fit the dataset. Because we are working in a 3D world, we then need to adjust this box to fit the plane of the surface. This is easy to do:

If X is the most dominant axis:

Point2D.x = Point3D.y

Point2D.y = Point3D.z



Ax + By + Cz + D = 0

    Ax + By + Cz = -D

         Ax + By = -D - Cz

              Ax = -D - Cz - By

	       x = -D - Cz - By

                   ------------

			A

Similar equations can be found for Y and Z. Using this, we can also easily construct a rectangular grid of points lying on the surface. This is very handy for lighting...

Surface Textures

If we wish to apply a texture to a surface, we cannot do it in the usual manner of assigning (U, V) co-ordinates to vertices, then mapping in the texture. Instead, we have to define two texture spaces. One texture space is the surface texture space, defined using two vectors. The other texture space is used for the rasteriser, and defines the position of a vertex, or general point in 3D space, on the surface texture. The latter can be done using either texture vertices, or more economically with the 3 magic vectors texture mapping system.

We define a surface texture to be a map, that is stretched over the 2D bounding box of the surface. This contains a material texture, blended with a lightmap. This surface texture is then rendered to the screen. Material textures are not rendered to the screen. The following diagram should help clarify this:

|------------------|

|                  |

| Material Texture |

|                  |	 |------------------|	 |-------------|

|------------------|---->|		    |	 |	       |

		         |  Surface Cacher  |--->| Span Filler |

|------------------|---->|                  |    |             |

|                  | 	 |------------------|    |-------------|

| Surface Lightmap |

|                  |

|------------------|

We feed in materials, and lightmaps to the cacher. The cacher spits out surface textures, which are then passed to the span filler, and applied to the screen.

Generating the surface textures is simple enough, once you realise that you cannot easily and efficiently map in the material texture to the surface texture using (U, V) vertices. Instead, we define two axes: A "right" axis, and a "down" axis. We then use a technique similar to the "RotoZoomer" effect. We move across the surface texture map from top to bottom, left to right. Our texture co-ordinates start at some origin, perhaps (0, 0). As we move down, we add the values of the down axis to our current position on the map. As we move from left to right, we start with our current "down" position, and add on the left axis as we move. As we move, we pluck texels from the material map, and drop them onto the surface texture map. Using this, we can scale, offset, rotate, and tile our texture. Some pseudo-code should help to clarify this:

Function MapTexture(Texture surface_tex, Texture material_tex,

		    2DVector right_axis, 2DVector down_axis, 2DVector origin)

	2DVector texpos, curtexpos

	Integer i, j



	texpos = (0, 0)



	For i = 0 to surface_tex.Height 

		curtexpos = texpos

		For j = 0 to surface_tex.Width

			Texel = material_tex(curtexpos)

			surface_tex(i, j) = Texel

			curtexpos += right_axis

		End For

		texpos += down_axis

	End For

End Function

Illuminating The Surface

As previously stated, we also have a second texture map, a lightmap, which stores the illumination of the surface. We now want to blend this map with material map, shading the surface. Again, this is simple. If we choose a lightmap with dimensions A x A, where A is a power of two, and we choose our surfaces textures to have dimensions B x B, again with B a power of two, we can very quickly map from surface texture co-ords, to lightmap co-ords, with a right-shift. We pick the lightmap texel, mix it with the surface texel, and write the result back to the surface texture. Typical sizes for lightmaps are 8 x 8, or 16 x 16. Good sizes for surface textures range between 32 x 32, to 128 x 128.

We can also use bilinear interpolation on the lightmap to improve the shading quality of our image. Our calculations will also be greatly simplified, because we will be sampling in a fixed, constant direction. We can then produces precalculated weighting tables, and also minimise the number of reads from the lightmap, because we only need to read when the fractional part of the lightmap u or v index is zero. This will produce a smoothly shaded texture. However, it does have one flaw: Because the shape of the lightmap is rectangular, and because bilinear interpolation only interpolates linear between pixels, we will end up with a squarish lighting effect. This is not really a bad thing, but it reduces our ability to capture curved changes in light, such as specular highlights. The bilinear interpolation will approximate it, but not to a high level. A spline interpolation may help here, but will be CPU-intensive.

Caching The Surfaces

One piece is missing from the puzzle: The Cache. The cache itself is an array of slots, each slot being able to hold one surface texture. Slots are re-used, so that we can minimize the amount of memory consumed. We use a global frame counter to calculate the age of a cache slot. At start up, this counter is usually set to 1. As the engine runs, if a surface is visible, and not in the cache, we find a cache slot, light the surface, and insert it into the cache. We set the cache slots frame counter to the current counter. If a surface is visible and in the cache, we simply update its frame counter to be the current frame counter. If a surface is not visible, we ignore it. If a surface is not visible, and is in the cache, again, we ignore it: the cache slot will automatically age. If we last set the slots frame counter at frame "A", to "A", and we are now at frame "B", then there will be a difference in time, of "B" - "A". When this difference exceeds some threshold, the cache slot can then be re-used. It is important that we do not chuck out surface as soon as they become invisible. If we do so, we may end up having to relight them the next frame. Only when a surface has been invisible for a *period* of time can it be discarded.

Finding a free cache slot is easy enough. Firstly, we scan through the cache, to see if we have any pure, free, unused slots. If so, we grab the slot, fill it, mark it, and return. If not, the situation is more complex: something has to give. So, we scan through the cache, and find the oldest surface. This surface is then replaced. The age of a surface is simply calculated using the current frame counter minus the slots frame counter. We must also be careful that we do not replace surfaces with age 0. These surfaces will be surfaces that have been lit *this* frame. If we replace those surfaces, then next frame, the surface will be re-lit, and then possibly replaced. This will jam the frame rate. If we can't find any usable slots at all, then we ignore the surface. This is no real problem. In a front-to-back rendering system, the first surfaces to be cached will be the nearest ones, and the last to be cached will be the farthest ones. As more and more surfaces are cached, the importance of an individual surface to an image decreases, because it becomes further away, and so becomes smaller on the screen, possibly being occluded.

With all this correctly implemented and in place, you should now have a fully operation surface caching system. Some notes on surface caches, their performance, and characteristics:

If you get discontinuities in your texturing, then you are not calculating your surfaces correctly. An easy way to test this is to set the lightmap for a surface to some random colour. The surfaces will then show up.
The 3 magic vectors texture mapping algorithm is perfectly suited to this task. Using this algorithm speeds up span-buffering routines greatly, because no values need to be clipped or interpolated. We also only need to calculate the magic vectors once per surface. The size of the span structure can also be greatly reduced. My span is 20 bytes.
Invisible surfaces kill the cache. Dead. Visibility culling is of vital importance to keeping a surface cache healthy. If you thrash it with invisible surfaces, your frame rate will fall through the floor. Span buffer algorithms are handy here, because your system can be used to return visibility information.
MIP-Mapping can also be integrated to the surface cacher. This gives you high-quality mip-mapping and lighting very cheaply. Also, MIP-Mapping will speed up rendering of small, distant surfaces. In addition, because succesive maps are twice the size of their children, you can make a very quick routine to enlarge a given map.

Water

Introduction

Water is a really nice effect, one of the better tricks around. Its also pretty simple to code, should take no more than an afternoons work to get a good water routine.

Basics

First thing you'll need is 2 buffers, for the water. This needs to be an array of ints, same size as destination buffer. Arrange these in a 2-element array, to ease the flipping. Clear these to zero, and you're ready to start.

Calculate The New Water

Calculating the new water is pretty simple. You'll need a loop that execute from 1 to height-1, then 1 to width - 1. At each element for the water, you'll need to sum the North, South, East, West points from the current water, and divide by 2 Not 4, 2. Then subtract the new water for [y][x] from this. This is your basic smooth function. Now you need to add the 'harmonic motion' to it. Take the new-water[y][x], shift it right x places (x could be 4) and subtract it from the new-water[y][x]. Pseudo-code for this might look like:

for y := 1 to height - 1

        for x := 1 to width - 1

                new-water[y][x] = ((old-water[y-1][x] +

                                    old-water[y+1][x] +

                                    old-water[y][x-1] +

                                    old-water[y][x+1]) / 2) -

                                    new-water[y][x])

                new-water[y][x] -= new-water[y][x] shr x

        end

end

Not too hard to make into working code. Because there are 2 pages, and we are flipping between them, the references to what seem like uncalculated water ( - new-water[y][x]) are ok, because they come around and feed back. This is all it takes to calculate the water! You may also want a part where you set the 1 pixel wide frame around the water to 0, just to be safe.

Paint The Water

Water is also pretty straightforward to paint. In fact some of the techniques here, you will see later on in the bump-mapping. Back to the task in hand. We will again need a loop for every pixel on the screen. What we need to do is calculate a kind-of normal at each pixel. This is simply:

offsetx = water[y][x] - water[y+1][x]

offsety = water[y][x] - water[y][x+1]

Which measures changes in X and Y. 'Colour' is then calculated by 128 - offsetx. Clip colour to the 0..255 range. Divide both offsetx and offsety by 8, and add them to x and y. Now for the tricky bit. You'll need a background image (or perhaps you can do without ...). You need to light the background image by the water. This is done by:

offsetx /= 3;

offsety /= 8;

indexu = offsetx + x;

indexv = offsety + y;

MulTable[backdrop[indexv*256+indexu]*256+colour];

MulTable is another handy lookup table. It simply takes the value of (row*col) >> 8.

Pseudo code for this would be :

For y := 1 to height - 1

        For x := 1 to width - 1

                offsetx = water[y][x] - water[y+1][x]

                offsety = water[y][x] - water[y][x+1]

                colour = 128 - offsetx

                trim colour to 0..255

                divide offsetx and offsety by 8

                add offsetx to x giving indexu

                add offsety to y giving indexv

                Plot (backdrop*colour) >> 8, lookup in table

        End

End

What we are effectively doing here is applying fake lighting to the water, then mixing the colours. There are plenty of variations on calculating the normals. Plenty of room for exploration there.

The Water Loop

Note it makes a difference what order you do calculations in. Its pretty simple though. You need to:

Draw to the water
Paint it
Calculate new water
Page flip the water

If you stick to that, you can't go wrong. If you really want to be smart, you'll use the texturemap lighting info on this page to do make logos and so on ripple. I've even seen it combined with bump-mapping. Water is a very rewarding effect, well worth coding.

BSP Trees

Introduction

BSP trees are becoming one of the most popular spatial subdivision algorithms, due to their flexibility, and their ability to draw a scene with perfect order. However, though they are becoming popular, they are complex and difficult to code at first, and the theory can be hard to understand for some people. In this page, rather than going over and over the same things that have been said a thousand times before, I'll instead give a brief overview of the theory, and concentrate on more details regarding the implementation. More information concerning one specific part of BSP trees can be found in the BSP FAQ, though it does have to be said that there are some unfinished parts in it. This is no criticism on the author, just a neutral observation. (And perhaps a hint that people should help finish the FAQ!).

Theory

Note: this section is only intended as a refresher. If you don't know BSP trees, then there may be better definitions for you to learn from. This is to set the scene, and introduce the contents of this file.

A BSP tree carves up space into sets of successive half-spaces. A half space is the portion of space that is either in front of a hyperplane, or behind it. A hyperplane is a plane, with the same dimensions as the space it is in. For example a 2D hyperplane is a line. A 3D hyperplane is plane. However, a 2D hyperplane (line) cannot be used to partition 3D space into two half-spaces. Likewise a 3D plane cannot partition 4D space into two halfspaces. Halfspaces are infinite, and every polygon must lie entirely in only one halfspace.

A BSP tree is built by making a binary tree, where each node has a plane, which may or may not be from the source model, and the branches point to the sub-nodes that either lie entirely behind that plane, or, entirely in front of that node. Similary, the sub-nodes of that node have their own planes, which contain sub-nodes and so on. The planes stored in the nodes gives us a space partitioning scheme, as they can be used to identify polygons or objects that lie in regions of space enclosed by a set of planes. These planes are later used for sorting the scene. A node may or may not have polygons stored with it; this is the key distinction between node-based and leaf-based BSP (more on that later).

At each node, there is some form of cheap, simple bounding volume, such as a bounding sphere or a bounding box. This box/sphere encloses both the current node, and all its children. This volume is used to aid rendering and culling, as this gives a tree a hierarchial nature, so that subtrees can be discarded if their bounding volume fails to fulfill some criteria.

Code and data structures

The data structure for a BSP tree is fairly simple. Below is a pseudo-code description, derived from the BSP tree structure I use in my own BSP engine.

Structure BSPNode

    BSPNode pointer     frontnode, backnode;

    Vector              centre, transformed_centre;

    List                polygon_list;

    Plane               plane;

    Vector              boxmin, boxmax;

    Vector              transformed_boxmin, transformed_boxmax;

End Structure



Structure BSPTree

    BSPNode Pointer     root

    List                vertex_list

End Structure

These two structures enable us to represent the BSP tree in memory. Included in the structure is enough information to make the tree practical. Note that you should aim to try and make this structure as small as possible, as there may be a great many of them loaded into memory.

Code structures for the BSP tree largely reflect their recursive nature. Recursive functions are very common when dealing with BSP trees, whether they explicity recur on the CPU stack, or, instead use a stack of BSP trees to handle the recursion, so avoiding possible stack overflow. Each method has its advantages and distadvantages, and you should experiment with both. However, for the purposes of this file, I'll stick to explicit recursion, as it makes the code easier to follow.

Loading and saving a BSP tree can be done in the natural recursive manner for this data structure. All it takes is to simply dump the node structure, along with a flag, to the file, and traverse the tree in the same order.

In your flags field, you would have something like:

00000000000000xx - Has front sub-tree

              |--- Has back sub-tree

This flag field contains all the information you need. Then, you choose which order to save nodes in, either [front node, back node], or [back node, front node]. Choose an order and stick to it. Then when you load save, its simple. If a node has a front sub-node, set the bit to 1. Same for the back node. Then, when writing/reading, simply traverse the front or back node if it exists, then traverse the other if it exists. Eg:

    If (node has front sub-node)

        Set bit in flags variable



    If (node has back sub-node)

        Set bit in flags variable



    Read/Write flags field

    Read/Write node data



    If node has front sub-node

        Recur with front-node



    If node has back sub-node

        Recur with back-node

Its that simple. Makes stuff a lot more elegant. Although this might seem hard at first, it soon comes to you. This is partly the reason I don't like or use the Jaluit compiler, because its output format is so weird... this way is so much simpler.

Building the tree

Building the tree is the most expensive process, and it can quickly become very time- and memory-consuming, as the size of the polygon database increases. Building the most optimal tree is a desirable goal, but is very expensive for large databases. Also, for different purposes, there may be different optimal trees. However, there are two general "best" cases; Split-optimized, and Balance-optimized.

To construct a Split-optimized tree, you simply have to select the plane that will cause the least amount of polygons to be split. This is done by classifying what side of the plane a polygon lies. If all of its vertices are in front, its in front, no split. If all are behind, its behind, no split. If all are on the plane, then its just added to the list, no split. However, if some are in front, and some are behind, then theres a split. However, if some are on the plane, and some are [in front / behind], then there is no split. That last point once caused me a lot of problems, and not many people spot it when writing a compiler. Consider:


      /-----\

      |     |

    ============== <--- plane

      |     |

      \-----/



That case needs a split. However:



   ===------======

      |    |

      \----/



Does not need to be split. Its behind the plane, though it may not seem

so at first.

The reason this distinction is so important is that often we have such surfaces butted against each other, and a naive compiler will attempt to split, possibly causing both a degeneration in the polygon, and un-needed increase in polygon count.

Balanced trees must have roughly the same number of polygons either side of the plane. This is quite simple to calculate. Count the number of polygons in front of the plane. Count the number behind. Don't count the number on the plane, or those split, as they don't really need to be considered. To find the difference, just do abs(numback - numfront). If this is less than say some value, eg 5, then take the tree. Else carry on searching. You may wish to look for a balance of zero, which would give you a perfectly balanced tree.

You can also combine the two criteria together. I do this by calculating a "score". The score is derived from the number of splits, the balance, and the number of the plane. I use the formula:

Score = numsplit*ksplit  balance*kbalance  onplane*konplane



Where

Numsplit = number of polygons split

Balance = abs(front - back), the balancing value

Onplane = Number of coplanar polygons

ksplit, kbalance, konplane = Weighting of those threee criteria.

A smaller score will yield a better tree.

Classification Routine

You'll need a routine to classify the polygon against the plane. Though this seems easy enough at first, the case I pointed out above can often complicate things. The best routine I have found is something like:

Function ClassifyPolygon(Polygon, Plane)

    Integer front, back, on



    front = back = on = 0



    For i=0 to Polygon.Numvert

        Switch(ClassifyVertex(Polygon.vertex[i], Plane))

            Case InFront:

                front

            Case Behind

                back

            Case OnPlane

                on

                front

                back

        End Switch

    End For



    If on == Polygon.Numvert Then

        Return OnPlane

    Else If front == Polygon.Numvert Then

        Return InFront

    Else If back == Polygon.Numvert Then

        Return Behind

    Else

        Return Split

    End IF

End Function

I've found that this routine works well for me. Now, this routine will have to be coded very well, because its going to be called over and over again.

Bounding Boxes

For BSP trees, I find personally that bounding boxes work as the best bounding volume, because the further you go down the tree, generally, the flatter the box tends to become, because you come closer and closer to a single plane (for node based BSP. So, a box will have the least wasted space. However, a sphere will be easier to code, and slightly cheaper to test. However, you have to trade off the cheaper test against the fact that the wasted space may cause a sphere to be inside the volume, with no polygons inside after all.

Constructing the bounding box for a node and all its children is very simple. Firstly, construct the bounding box for the nodes polygons. Thats done by examining each vertex, and updating the minimum/maximum co-ordinate values accordingly. The two values, minimum and maximum, give you a bounding box for that node alone. Now, traverse either/both/neither of the subtrees, depending on whether they exist or not. Now, find the maximum bounding box out of all (1-3) boxes. Hey presto, thats your box.

Splits

I don't intend to go into the precise details of splitting polygons, as there is pseudo-code in the FAQ, and real code in my compiler. However, there are some things you should be aware of:

Numerical problems. Floating point can have some accuracy problems. So, to compensate, choose some value epsilon to be 1/2 the thickness of the plane. Then, instead of:

If dist < 0.0

    Behind

Else If dist > 0.0

    InFront

Else

    OnPlane

End If

Write:

If dist < -epsilon

    Behind

Else If dist > epsilon

    InFront

Else

    OnPlane

End If

Cracks. This is caused by one polygon being split, and a polygon that shares an edge being split, is not split. This can introduce T-vertices into the model, which in turn will cause shading anomalies. Some ways around this are:
1. Use a subpixel accurate rasteriser. However this will only remove the crack, it won't correct the shade/texture problems.
2. Insert the new vertex into both polygons that share the edge. This is the better solution, as it will correct the shading problems too. However, you'll have more complex geometry to deal with. If you triangulate your tree, you will then add an extra triangle per vertex inserted; not nice.
Personally I've just used option 1. However, I do have the texturing shading problems I mentioned. I've got as far as creating the edge list, but not finished coding number 2 yet. Just a matter or getting other things done first, before I worry about that
Rendering The Tree

This is the most important bit, and, the biggest advantage that using BSP trees over things like octrees. Though octrees can be used for a display algorithm, if I remember rightly, it only works for parallel projection. BSP trees however provide a fast, simple method of drawing, with zero sorting errors, and high speed.

The basic idea is that if the viewer is in one half-space, we traverse the trees and render polygons in one order, and, if the viewer is in the other half-space, then traverse in the opposite order. Pseudo code for a front->back traversal is:
```
Dot = Viewer*Plane



If Dot > 0.0 Then

    If Front Node exists Then

        Traverse Front Node

    End If



    Draw Polygons



    If Back Node exists Then

        Traverse Back Node

    End I

Else

    If Back Node exists Then

        Traverse Back Node

    End I



    Draw Polygons



    If Front Node exists Then

        Traverse Front Node

    End If

End If
```
This gives us a basic frame to work with. Firstly, we can get rid of the polygon drawing part of the Else clause in the If statement. This operation performs back face culling. We can now cull large numbers of polygons in one go, as the average scene will consist of large numbers of planar polygons, such as floors, ceilings, walls etc. This is also very beneficial for triangle renders, as it slashes the triangle count, because a triangle-based modeller (like 3ds) will generally use more triangles for the same surface than a polygon (n-gon) based modeller would.

Now, before we go any further, an important note: You need to transform the plane equations into camera space before finding the dot product. This is one important detail the various BSP sources on the 'Net failed to tell me, resulting in lots of strange sorting errors. Now, there are two ways of going about this:
1. Transform the plane equation itself. Given a point on the plane P, the plane equation N, then we have: N*P = 0. Given a transformation matrix M however, N*(M*P) != 0. So we have to compensate for that problem, so transforming the plane equation by the inverse matrix, M^-1. So, (M^-1 * N)*(P * M) = 0. However this is a bit awkward, as we have to invert the matrix etc...
2. Instead, just transform the plane normal, and a point on the plane. Then, use these two points to find the distance part of the plane equation, 'D'. If the plane equation is Ax By Cz D = 0 then the normal is given by (A, B, C). Rearranging for D gives us D = -Ax - By - Cz, which is the negative dot product of the normal against the point. Given the transformed normal, and the new D, you now have the plane equation you need. This is the method I use.
Now lets get back to optimizing this code. Now, we know that if a node is not in the frustum, theres no point traversing it, because we won't see its polygons, nor its childrens polygons. So, extend the traversal If statement to something like:
```
If (Front Node Exists) And (NodeInFrustum(Front Node) Then

    Traverse Front Node

End If
```
Repeat that for all the If statements, and now you're not traversing parts of the tree you don't need to. Now, another thing: don't do one big block traversal of your vertices. We have a spatial subdivison scheme, and so we can postpone transforming the vertices 'till we know that the vertex is in the frustum. Using the frame counter idea I describe in the Speed-up page, this can easily be done. Simply make a function that transforms all the vertices belonging to all the nodes in the polygon, if they haven't already been transformed. This is called just before the polygons are drawn. So we have:
```
TransformNodeVertices

DrawNodePolygons



Where TransformNodeVertices looks something like:



For every polygon in node

    For every vertex in polygon

        If vertex not already transformed Then

            TransformVertex

            If Vertex Inside Frustum Then

                ProjectVertex

            End If

        End If

    End For

End For
```
This alone should get you a decent frame rate. However, advanced things such as visibility pre-processing are not covered here; thats because I haven't got round to writing myself a good visibility generator yet, because I'm still researching the problem. However, once I do, I might write a page on that ... (If anyone knows anything...)

Leafy BSP

Leafy BSP is different to ordinary, node-based BSP. In nodey BSP, we store the polygons in the nodes of the tree. However in leafy BSP, we store the polygons in the leaves. A leaf is defined as a node with no children. The idea is that the polygons we then store in the leaves are no longer always planar (though they could be), but form a convex hull. The polygons in this hull can then be painted in any order, with backface culling applied. This also gives us the advantage that we can choose any old plane we like, rather than planes from the model -- though we can still use those if we want. Traversal is done as usual, with the planes in the tree simply being used to "guide" ourself to the leaves. It sounds a nice scheme, giving us the following advantages:
1. Smaller tree? Presumably larger numbers of polygons would be bunched together, and possibly in a more spatially coherent manner, as you would choose things like tabletops and boxes, rather than big, flat, planar surfaces such as floors and ceilings.
2. Following on from the latter point in #1, possibly better spatial coherence? However, not having planar polygons means that the backface culling will have to be done seperately. Though you can still cull entire sets of polygons in one go, because you can store a list of planes, with the associated polygons. You'd have something like:
```
Procedure RenderTriangle

    If triangles plane not processed Then

        If plane backfacing Then

            Mark all planar polygons backfacing

            Return

        Else

            Mark all planar polygons front facing

        End If

    Else

        If triangle marked backfacing

            Return

        End If

    End If



    Usual stuff to draw the triangle ...

End Procedure
```
  In fact thinking about it, that code could be used in *any* renderer, whether its BSP based or not. Damn ... why didn't I think of that sooner? Anyway, better spatial coherence will lead to better culling, etc ...
If you're interested in the source code to a BSP tree compiler to play with, then the source to a slightly older version of my compiler than the one I'm using (A man has to keep some tricks to himself) is available here. I may put up the source to my new Solid BSP Tree compiler sometime, but its still under development, so not for a while yet.

Solid BSP Trees

Before I go any further, I would like to thank Sean Barrett for answering a number of questions I had regarding Solid BSP Trees. Without those answers I wouldn't have got them working so quickly.

Solid BSP Trees are a slightly different version of regular BSP trees, with the property that they encode space into Solid/Non-solid partitions. This property has a number of useful applications, most notably in ray tracing / casting problems, such as visibility, or collision detection. They can also be used to form a leaf based BSP representation. Understandably they are desirable in any engine.

Crucial Differences

There are a number of areas where Solid BSP trees differ from conventional BSP trees. Most notably, is the fact that they store convex lumps of polygons in the leaves, rather than the nodes of the tree. This is the property that most people know of. As mentioned before, they can also encode space into Solid / Non-solid blocks. And, the rendering algorithm differs slightly.

Construction

Constructing a Solid BSP tree is similar to that of a normal BSP tree, but with a major difference: You can't use the "coincident" BSP case. So, if a polygon lies along the plane that is being used in the node, you cannot place it in a list. It has to be sent down the front list for that node.

Now, there is a second difference. In traditional BSP, you selected the plane from the list of input polygons, and used it in the node. Here, we do a similar thing. However, note that polygons are not removed from consideration when their plane has been selected, as in regular BSP: they will continue to be pushed down the tree until they hit a leaf. So, your routine may unknowingly continuosly select the same plane over and over. You need to mark each plane as used as you build the tree. In addition, we will use *every* plane of the input polygon list. This is required, because we cannot calculate whether we are crossing between solid and empty space if we do not have a plane to mark the transition. Unfortunatly, this may also bring about an increase in the number of polygons, which is a problem. So effective plane selection becomes even more important.

When we find that all planes in the given polygon input set have been used, we have reached a leaf. When we reach the leaf, we can work out if we have a solid or an empty leaf.

Calculating leaf contents

If a leaf has polygons, we can easily determine whether or not the leaf is solid or empty. We do this by using the normals of the polygons in the leaf. If the normals point away from the centre of the leaf (assuming outward-pointing normals), then the leaf is solid. If they point inwards, then the leaf is empty. This is easily proved using a cube. A cube could be a leaf in the BSP tree. The normals of the cube point outwards from the cube, therefore it is solid.

You can construct your BSP such that any leaf with polygons is an empty leaf, and that any leaf without polygons is a solid BSP. This is the simplest way of doing it. It also seems to be the way Quake does it. However, you can get problems with this, if your input data model contains things like 2 polygons, back to back, with different planes, such as a wall, with a box attached to it. To handle that, you need to cut out the piece of polygon that is between the wall and the box.

Planes of Connection

This is an algorithm I've been thinking about lately. The idea is that we can use the leaves of a BSP to find the connecting planes, between space. Heres a diagram for you to consider:
```
	| D		   | E			| F

   S		 S		      S		   S

 	|		   |			|

		 1		      2

  -   - |---------------------------------------| -   -		A

	|					|

	|		   |			|

	|					|

   S	|	 E	   |	     E		|   S

	|					|

     8	|		   |			| 3

	|					|

  -  -	| -  -	-  -  -	 - |--------------------|  -   -	B

	|                  |	    4

	|                  |			|

	|                  |  5

   S 	|        U         |	   S		|   S

	|                  |

     7	|                  |			|

	|                  |

  -  -	|------------------| -  -  -  -  -  -  -  -  -		C

						

	|	 6	   |			|

   S		 S		   S		    S

	|		   |			|
```
In this diagram, infinite planes are marked with dashed lines (- - - -). Polygons are marked using solid lines (-----). Solid leaves are marked with an 'S'. Empty leaves are marked with an 'E'. Unknown leaves are marked with a 'U'. Planes are labelled with letters, polygons are labelled with numbers.

If we wanted to flood some kind of information through the model, we can just calculate the planes of connection, and use them to propagate the medium throughout the model between leaves. In addition, because of the solidity information, we can work out where to stop flooding, and where to continue.

Given a leaf, marked on the diagram as 'U', we want to find out which planes connect it to other leaves. If we then have that information, we can flood fille data throughout the model.
```
	| D		   | E			| F

   S		 S		      S		   S

 	|		   |			|



  -   - |---------------------------------------| -   -		A

	|					|

	|		   |			|

	|					|

   S	|	 E	   |	     E		|   S

	|					|

     	|	 1	   |			|   5

	|					|

  -  -	| -  -	-  -  -	 - |--------------------|  -   -	B

	|                  |	    

	|                  |			|

	|                  |  

   S 4	|        U         |  2	   S		|   S

	|                  |

     	|                  |			|

	|                  |

  -  -	|------------------| -  -  -  -  -  -  -  -  -		C

		 3				

	|	 	   |			|

   S		 S		   S		    S

	|		   |			|
```
The leaves are now marked with numbers. Leaves 1 - 4 are connected neighbours, leaf 5 is a rogue, put in there to show the theory. Lets build a table, with planes in the columns, and leaves in the rows. A '' will mean that the leaf lies in the positive halfspace, a '-' will mean that the leaf lies in the negative halfspace:
```
Leaf	| Plane

	| A | B | C | D | E | F | Score

--------+--+--+--+--+--+--+-----

   U	|  | - |  |  |  |  | N / A

   1	|  |  |  |  |  | 	| 5

   2	|  | - |  |  | - | 	| 5

   3	|  | - | - |  |  | 	| 5

   4	|  | - |  | - |  | 	| 5

   5	|  |  |  |  | - | -	| 3
```
As you can see from the table, its neighbours will have the highest score, and its non-neighbours will have the lower scores. So, now we can easily find its neighbours. Also, its neighbours should all have the same score. Now, we have a problem. We need to work out what kind of solidity information to flood. We can find the connecting neighbours, but we can't select which neghbour to flood information from. And by examining the other leaves, no clear rule emerges.

The solution is to "borrow" polygons from connecting leaves. We find polygons that lie on the plane of connection, and borrow these, to calculate the leaf type. Note that we need at least 2 polygons to calculate the leaf type.

But how do we calculate the planes of connection? Lets repeat that last table, but instead of usings and -, lets use 1 and 0 respectively:
```
Leaf	| Plane

	| A | B | C | D | E | F | Score

--------+--+--+--+--+--+--+-----

   U	| 1 | 0 | 1 | 1 | 1 | 1 | N / A

   1	| 1 | 1 | 1 | 1 | 1 | 1	| 5

   2	| 1 | 0 | 1 | 1 | 0 | 1	| 5

   3	| 1 | 0 | 0 | 1 | 1 | 1	| 5

   4	| 1 | 0 | 1 | 0 | 1 | 1	| 5

   5	| 1 | 1 | 1 | 1 | 0 | 0	| 3
```
Can't see the connection yet? Well, lets XOR U with each bit code of 1-4:
```
U = 101111

1 = 111111

2 = 101101

3 = 100111

4 = 101011



U XOR 1 = 010000

U XOR 2 = 000010

U XOR 3 = 001000

U XOR 4 = 000100
```
Re-arranging back into columns for each plane ...
```
Code | Plane

     | A | B | C | D | E | F

-----+--+--+--+--+--+--

  1  | 0 | 1 | 0 | 0 | 0 | 0

  2  | 0 | 0 | 0 | 0 | 1 | 0

  3  | 0 | 0 | 1 | 0 | 0 | 0

  4  | 0 | 0 | 0 | 1 | 0 | 0

-----+--+--+--+--+--+--

 OR  | 0 | 1 | 1 | 1 | 1 | 0
```
Any plane that is a plane of connection has a 1. Not bad eh? It must be said that I haven't tested this theory extensively, nor have I implemented it yet (hell I only just worked it out), but it seems reasonable. It is a little complex (understatement!), so it will need some efficient coding behind it. Perhaps the biggest problem is that it will need a N * M table, where N is the number of leaves, and M is the number of planes. However that shouldn't be too much of a problem. Still a clever algorithm I think.

Construction Summary

In summary, the Solid BSP construction process is as follows:
- Build a list of all planes, and polygons in model
- For each BSP node:
  - Select best unused plane from list
    - If no planes left, construct leaf
  - Partition polygons into two lists. If polygon lies on plane, place polygon into front list
  - Recur with two new lists
- Calculate leaf contents
Collision Detection and Ray Stabbing

As I mentioned earlier, the solid BSP tree lends itself well to performing tasks like ray stabbing and collision detection. Both algorithms are very similar.

Collision Detection

Working out if a given line intersects with a solid piece of space is a necessary technique for a lot of applications, for example Computer Games. Given a line, defined by two points, at t (time) = 0, and t = 1, ie the start and end points of our motion, we want to work out whether that line runs into a solid object, and if so, clip it to the intersection.

Using our solidity information, we can easily check this. What we do is as follows. First, we classify both endpoints about the root plane. If they are both in front of the root plane, we go to the front sub-child, and continue. If they are both behind, we go to the back child, and continue. However, if they are on opposite sides, we have a potential intersection with a solid. So, we find the intersection point, and two other points, either side of the plane, to some small offset:
```
		   x



	-----------i------------



		   x
```
Where 'i' is the intersection, and 'x' are our two points. We will use one of these 2 points as the new endpoint. We then must work out what side of the plane the start point lies on. This is easy enough to work out if the start point does not lie on the plane. If it does lie on the plane, you need to use the backpoint. If the start point of the line lies in the front half-space of the plane, then we will adjust our end position to lie in the positive half-space of the plane. If the start point lies behind the plane, we adjust our end position to lie in the negative halfspace.

What we do is recur with the line segments (start, intersection) and (intersection, end). If we find that either segment ends up in a solid leaf, then we set the end point to be the intersection, slightly offset into either the positive or negative halfspace. If both points are coincidant to the plane, we recur down both children. To offset into a half space, we just add some scalar multiplied by the normal of the plane.

As thats not very easy to understand, perhaps some code is needed to demonstrate it. The following should help clarify things:
```
Function ClipNode(VECTOR point start, VECTOR pointer end, BSPNODE pointer node)

	While Node.Type == BSP_NODE

		Dist1 = DotProduct(node.plane, start)

		Dist2 = DotProduct(node.plane, end)



		If both in front of plane

			node = node.front

		Else If both behind plane

			node = node.back

		Else

			Vector delta, mid, split, split2



			delta = end - start



			mid = Intersection(node.plane, start, end)

			split = mid  5.0 * node.plane.normal

			split2 = mid - 5.0 * node.plane.normal



			If start in front of plane

				retcode = FALSE



				If ClipNode(start, mid, node.front)

					end = split

					retcode = TRUE

				End If



				If ClipNode(mid, end, node.back)

					end = split

					retcode = TRUE

				End If



				return retcode

			Else If start behind plane

				retcode = FALSE



				If ClipNode(start, mid, node.back)

					end = split2

					retcode = TRUE

				End If



				If ClipNode(mid, end, node.front)

					end = split2

					retcode = TRUE

				End If



				return retcode

			Else if coincident

				retcode = FALSE



				If ClipNode(start, end, node.front)

					end = split

					retcode = TRUE

				End If



				If ClipNode(start, end, node.back)

					end = split2

					retcode = TRUE

				End If



				return retcode

			End If

		End If

	End While



	If node.type == SOLID_LEAF

		return TRUE

	Else

		return FALSE

End Function
```
What we're doing is pushing a line down the BSP, to see where it ends up. Where we find that a line will end up in 2 different leaves, we break it, and push down the back segment, to see where that ends up. Note that we only push down the back segment, because we're only interested in testing if the end point needs clipping: we don't generally need to clip the start point, as the start point will generally be in a correct position. We also use a point which is slightly offset from the plane as the start of the new back segment, because sometimes the player can become locked inside of an empty leaf, doing this stops that.

Ray Stabbing

Ray stabbing works on a similar, but simplified manner. In ray stabbing, we only want to know if a line is occluded or not. To do so, we use the above algorithm, but modify it, such that we recur with both parts of the split line, testing them both for occlusion:
```
Function RayOccluded(VECTOR point start, VECTOR pointer end, BSPNODE pointer node)

	While Node.Type == BSP_NODE

		Dist1 = DotProduct(node.plane, start)

		Dist2 = DotProduct(node.plane, end)



		If both in front of plane

			node = node.front

		Else If both behind plane

			node = node.back

		Else

			Vector mid



			mid = Intersection(node.plane, start, end)



			If dist1 > 0.0

				If RayOccluded(start, mid, node.front)

					return TRUE



				If RayOccluded(mid, end, node.back)

					return TRUE



				return FALSE

			Else

				If RayOccluded(start, mid, node.back)

					return TRUE



				If RayOccluded(mid, end, node.front)

					return TRUE



				return FALSE

			End If

		End If

	End While



	If node.type == SOLID_LEAF

		return TRUE

	Else

		return FALSE

End Function
```
This ray-stabbing code is very handy for making things like lightmaps. Using this, its very easy (and pretty quick) to stab a ray from a point on a surface, to a light source.

3D Bump mapping

Introduction

#1: Jean-Francois Dube

        for x1 to x2 do

                normal-x = interpolated-normal-x >> 20

                normal-y = interpolated-normal-y >> 20

                index = (v >> 16) << 8) + (u >> 16)

                texel = texture[index]

                bump-normal-x = bumpmap[index+1] - bumpmap[index - 1]

                bump-normal-y = bumpmap[index+256 - bumpmap[index - 256]

                p = 127 - (bump-normal-x - normal-x)

                q = 127 - (bump-normal-y - normal-y)



                if(p < 0) then p = 0

                if(q < 0) then q = 0

        

                colour = colourtable[texel][phongmap[p*256+q]]

                putpixel(x, y, colour)



                (and just interpolate normal + texture here...)

        end

Quite a nice method, but perhaps the inner loop is a little complex.

#2: Paul Adams

nx = map[u-1] - map[u+1]

ny = map[v-1] - map[v+1]

Which sounds a little odd... I think he means...

nx = map[u-1] - map[u+1]

ny = map[v-256] - map[v+256]

Which would make a little more sense... Then just do the old ny*256+nx, to get the index into your phongmap. Not too bad..

#3: Jacco Bikker

Summary

2D Bump-mapping

Introduction

2D bump mapping seems to be all the rage nowadays. And granted, it is a very impressive effect. And its also pretty simple to code. I'll walk you through how to code a simple bump-mapper.

Algorithm

With 2D bumping, we basically have a 'phongmap', a precalculated lookup table for a light source. We want to use the image to distort the lighting, to make the surface appear corrugated. So we need to effectively calculate some kind of normal, and use that to index the texture map - like in fake env-map phong shading.

First thing we need to do is interpolate 2 variables across the image - Lx and Ly. These are 'lightx' and 'lighty'. Ly is initialized to -lightposy before the Y loop, and Lx is initialized to -lightposx before the X loop. At each cycle of the X loop, we increment Lx, and at each cycle of the Y loop, we increment Ly. Kind of like interpolating the texture co-ordinates for the light map if you like.

Now we need to calculate some kind of offset value. The way this is done is again quite simple. We take the difference between 2 entries in the bump-map. The values will be:

XDelta = ((texture[y][x-1] - texture[y][x+1]) >> 1) + 127

YDelta = ((texture[y-1][x] - texture[y+1][x]) >> 1) + 127

X and Y are our standard loop variables. The texture must be the same size as the screen. Now simply add these values to you Lx and Ly, and you've effectively disturbed the lighting normal. Now do a simple check; if the upper byte of either of those values if non-zero, ie (xtemp & 0xFF00) != 0, then paint the pixel black. Else, use those values to index the texture map:

XTemp = Lx + XDelta

YTemp = Ly + YDelta



if((XTemp & 0xFF00) || (YTemp & 0xFF00)) {

        PutPixel(x, y, 0)

} else {

        u = ytemp;      // these are byte variables - we are only

        v = xtemp;      // interested in 0..255

        PutPixel(x, y, phongmap[v*256+u])

}

And you've got yourself a simple bump-mapping routine. Putting it all together, you should get a routine like:

void Bumpmap(char *src, char *dst)

{

        int             xdist, ydist;

        int             xdelta, ydelta;

        int             xtemp, ytemp, temp;

        int             i, j, offset = 320;

        int             lx, ly;

        unsigned char   u, v;



        ly = -lighty;

        for(i=1; i<198; i++) {

                lx = -lightx;

                for(j=0; j<320; j++, offset++) {

                        xdelta = ((src[offset -   1] - src[offset +   1]) >> 1) + 127;

                        ydelta = ((src[offset - 320] - src[offset + 320]) >> 1) + 127;

                        lx++;



                        xtemp = xdelta + lx;

                        ytemp = ydelta + ly;

                        if((xtemp & 0xFF00) || (ytemp & 0xFF00)) {

                                *dst++ = 0;

                        } else {

                                u = ytemp;

                                v = xtemp;

                                *dst++ = phonglightmap[v*256+u];

                        }

                }

                ly++;

        }

}

Here src is the address of the source texture, and dst is the destination, ie the screen.

Colouring it

You can easily extend this to include colour. If you change the above code so that instead of painting a monochrome image, you paint a colour image, using a lookup table to get the right colour, you get a loop something like:

        xdelta = ((bumpmap[offset -   1] - bumpmap[offset +   1]) >> 1) + 127;

        ydelta = ((bumpmap[offset - 320] - bumpmap[offset + 320]) >> 1) + 127;

        lx++;

        

        xtemp = xdelta + lx;

        ytemp = ydelta + ly;

        if((xtemp & 0xFF00) || (ytemp & 0xFF00)) {

                *dst++ = 0;

        } else {

                u = ytemp;

                v = xtemp;

                colour = phongmap[v*256+u];

                *dst++ = colour_lookup[colour*256+texture[y][x]]

        }

Something like that. Where bumpmap is a pointer to your monochrome bump map, colour_lookup is a 256x256 lookup table for shade x of colour y, and texture is your 256-colour texture map. Obviously this will be a little slower, but it should still work well.

Anyway thats enough now of this simple effect, its time for you to go and code it up yourself. You'll be pleased with the results, its a great little effect.

Fractals

Introduction

Fractals is a really massive subject. I can't possibly hope to cover everything in such a small space; instead I'll concentrate on some of the fundamentals, and show you how to make a simple mandelbrot and julia fractal. Fractals are very rewarding, because they're simple to code, but produce some stunning images. I'll also show you how to do complex maths, so you can try your hand at making your own fractals.

Complex Maths

The first thing we need to know about before we even think about coding fractals is complex maths. A lot of fractals are made using this technique, so it must be covered. To begin, a complex number is defined as:

a + bi

Where a is the real part, and b is the imaginary part. 'i' happens to be the square root of -1. But thats impossible you say! You can't sqrt(-1)! Well, I know. Thats why its called imaginary. It was invented some time ago, so people could solve some quadratic equations. Now for the basic operations we can do:

Addition:

        a + bi

     +  c + di



    = (a + c) + i(b + d)



Subtraction:

        a + bi

     -  c + di

    = (a - c) + i(b - d)



Multiplication:

       (a + bi)(c + di)

    = (ac - bd) + i(ad + bc)



Division:

       (a + bi) / (c + di)



       (ac + bd)         (bc - ad)

    = ----------- +  i* -----------

      (c^2 + d^2)       (c^2 + d^2)



Conjugate:

              _

        z* or z is   a - bi



Modulus:

        |z| = sqrt(a^2 + b^2)



Arguement:

           -1  

        tan   (b / a)

And operations with the conjugate are:

        z = complex number, z* = conjugate

Addition:

        z + z* = 2a



Subtraction:

        z - z* = 2ib



Multiplication:

        zz* = a^2 + b^2



Division:

               a^2 - b^2          2ab

        z/z* = ---------  + i* ---------

               a^2 + b^2       a^2 + b^2

If you want to see how these are derived, any decent maths book should help. Code all these up, and stick em in a little library. You might as well use floating point for these, because its quick, easy to code, and precise. I've had a few problems with underflow, when I tried to use fixed point. Do a lot of zooming, and you'll soon find feel it. 80-bit floating point happens to work very nicely for this :)

Mandelbrot

The mandelbrot fractal is simply a mapping of the equation:

           2

z      = z   + c

 n + 1    n

Thats z(n+1) is z(n) squared plus c if you can't read my ASCII math formulae. We set up z to be equal to zero, and c is a point on the 'complex plane'. To plot a mandelbrot, we ask for a 'window', say -2 + -2i to 2 + 2i. Then we interpolate these values across the screen. We then plot a graph, of real versus imaginary, with the colour being the number of iterations. With that in mind, the real part of c is set to current interpolated real value, and the imaginary part of c is set to the current interpolated imaginary part. Perhaps some pseudo code should clarify this:

xdelta = (a2 - a1) / screen-width

ydelta = (b2 - b1) / screen-height



c.imaginary = b1

for y=0 to screen-height - 1

        c.real = a1

        for x=0 to screen-width - 1

                z = 0

                Evaluate colour for pixel

                Plot pixel, using some kind of colouring scheme

                c.real += xdelta

        end

        c.imaginary += ydelta

end

Now you're wondering how to get the colour. Well, we iterate. And thats what makes mandelbrots so slow! We keep iterating, to find out if the value of n ever gets about 2. If it doesn't, we set the colour to some value (which is where the black bit in the middle comes from). If it does, we usually set to the number of iterations it took.

So, our iteration loop may look something like:

        numiter = maxiter;

        imag = real = 0.0;

        imag_sq = real_sq = 0.0;



        while ((imag_sq + real_sq <= 4.0) && (--numiter > 0)) {

                imag = 2.0*real*imag + c.imaginary;

                real = real_sq - imag_sq + c.real;

                real_sq = real*real;

                imag_sq = imag*imag;

        }

We check against 4 here, to avoid a sqrt. The rest should be clear. The maths inside the loop simply computes z squared. So, now you can put all that together, and you should be able to code a mandelbrot. I'm not going to do your work for you though, you'll have to do that yourself.

Julia Sets

The julia set is similar to a mandelbrot, except instead of fixed z to zero, and interpolate c, we fix c to a value, and analyse the behaviour of z. That sounds complex, but its simple enough. We still interpolate through our little window on the complex plane, but instead of calling that value c, we set z(0) to that value. Then its a simple iteration loop:


        imag = y;

        real = x;

        imag_sq = imag*imag;

        real_sq = real*real;



        while ((imag_sq + real_sq < 4.0) && (--numiter > 0)) {

                imag = 2.0*real*imag + c.real;

                real = real_sq - imag_sq + c.imaginary;

                real_sq = real*real;

                imag_sq = imag*imag;

        }

The tricky bit with julia sets is picking a good start value for c. You'll need a little experimentation, but you'll find it in the end.

Making Your Own Fractals

This is simple enough. Remember all those programs you written ages ago, where you'd plot some formulae to the screen, then palette cycle? Well, you just need to start thinking up formulae again! Theres plenty of formulae already available, and doing variations on them shouldn't be hard. Just have a look around, you're bound to find something.

Tom Hammersley, tomh@globalnet.co.uk