Wednesday, January 4, 2017

Building a 4D World

In my last post, I described how to orient a virtual camera in four dimensions. Now, we're going to give that camera something to look at.

Making art assets for regular games is hard enough; building full 4D models - things which allow you to look at them from any arbitrary vantage point in 4 dimensions - is just ridiculous. So procedural generation and simple geometric objects will be our friends. In fact, let's just do nothing but try to look at a randomly-generated 4D maze, constructed entirely out of right-angled, unit-sized hypercubes, as viewed through a 3D hyperplane slicing through the maze at an arbitrary angle and position.

Maze generation algorithms are all basically variations on finding spanning trees for the graph of maze cells, and as such are not strongly tied to any particular dimensionality; any of them can be fairly easily converted to work on arbitrary graphs, or in grids of arbitrary dimension. So, we'll leave maze generation as a problem space aside, and just assume that we can get a good maze map.

One way to render our view of the maze would be to represent the walls of the maze with polygons in 4 dimensions, and calculate the intersection of every edge with the currently-visible hyperplane to generate a 3D model that can be rendered in The Usual Way, using your 3D rendering library of choice.

Calculating all of those intersections, though, seems really complicated, and there's a much simpler option that takes advantage of the fact that our maze is restricted to a hypercubical grid: we can use raycasting!

Raycasting is more typically used to give the illusion of 3D with a 2D map, in games like Wolfenstein 3D. It involves marching a ray from the camera's position through the grid until you find a wall, and then drawing a column of pixels with height scaled based on the length of the ray, so that walls appear to recede into the distance. The code for this is fairly simple, and when you only have to cast one ray per pixel of horizontal resolution, its fast enough that you can even do it in pure JavaScript in a web browser. In order to render a 3D slice of a 4D map, we'll have to instead cast one ray for every pixel on the whole two-dimensional screen. That's significantly slower, but the code for casting each individual ray is not much more complex- it's merely about twice as long as that for a traditional raycaster, due to handling twice as many cases.

Casting Rays & Finding Walls

To get reasonable framerates, I implemented my 4D raycaster in GLSL to run on a GPU. The first step is calculating the appropriate ray given a pixel position:
uniform float u_depth; uniform vec2 u_resolution; uniform vec4 u_origin; uniform vec4 u_rgt; uniform vec4 u_up; uniform vec4 u_fwd; void main(){ vec2 coords = gl_FragCoord.xy - (u_resolution / 2.0); vec4 ray = u_fwd*u_depth + u_rgt*coords.x + u_up*coords.y; gl_FragColor = cast_vec(u_origin, ray, 10.0); }
(For the uninitiated, "uniform" variables have their values passed in from outside the shader program.)

This code converts the pixel location into a new coordinate system with the origin at the center of the field of view, instead of one corner. Then it calculates a a direction in which to cast a ray based on the camera's egocentric coordinate system; the depth of the virtual camera in front of the screen specifies how much of the forward direction to use, which constrains the viewing angle, while the x and y pixel coordinates indicate how far to go along the right and up axes. Note that, so far, aside from the fact that all of the vectors have four components, this looks entirely like 3D code; there is no mention of the fourth, ana, axis. This is because we only care about casting rays contained in the hyperplane defined by the F, R, and U axes, so we only need to construct rays with components from those basis vectors. If we wanted to view a projection of the world into a 3D hyperplane, rather than a slice, then we could define an arbitrary w-resolution, in addition to the x and y resolutions, and cast additional rays with components from the ana basis vector of the camera's coordinate system. But multiplying the number of rays like that not do good things to your framerate!

(Note that, although we only use a 3D subset of the camera's egocentric coordinate system, the resulting ray can, and usually will, have non-zero components in all four of the x, y, z, and w axes. This will only fail to be the case if the egocentric coordinate system is exactly parallel to the map grid.)

After calculating the ray corresponding to a particular pixel, we pass the origin point, the ray, and a distance limit into the raycasting alorithm, which returns a pixel color. So, let's look at how that works. First, a useful utility function:
// Find the distance to the next cell boundary // for a particular vector component float cast_comp(vec4 v, float o, out int sign, out int m){ float delta, fm; if(v.x > 0.0){ sign = 1; fm = floor(o); delta = fm + 1.0 - o; }else{ sign = -1; fm = ceil(o - 1.0); delta = fm - o; } m = int(fm); return length(vec4(delta,delta*v.yzw/v.x));
} This does three things:
  1. Calculates the distance you have to move in the direction of a ray, starting from a particular origin point, before you hit a cell boundary in a particular grad-axis-aligned direction.
  2. Calculates the sign of the ray's propagation along a particular grid-axis direction.
  3. Calculates the integer position of the grid cell in which the ray originates along a particular grid-axis.
All of this information is necessary to initialize the main raycasting algorithm.
// Starting from the player, we find the nearest gridlines // in each dimension. We move to whichever is closer and // check for a wall. Then we repeat until we've traced the // entire length of the ray. vec4 cast_vec(vec4 o, vec4 v, float range){ v = normalize(v); // Get the initial distances from the starting // point to the next cell boundaries. int4 s, m; vec4 dists = vec4 cast_comp(v.xyzw, o.x, s.x, m.x), cast_comp(v.yxzw, o.y, s.y, m.y), cast_comp(v.zxyw, o.z, s.z, m.z), cast_comp(v.wxyz, o.w, s.w, m.w) ); // Inverting the elements of a normalized vector // gives the distances you have to move along that // vector to hit a cell boundary perpendicular // to each dimension. vec4 deltas = abs(vec4(1.0/v.x, 1.0/v.y, 1.0/v.z, 1.0/v.w));

The deltas give us the last bit of information we need to initialize the algorithm- how big the steps are in the direction of the ray in order to move one full grid unit in a grid-axis-aligned direction. This is different from the initial distance needed to get from an origin point somewhere inside a cell to the cell boundary, calculated above.
// Keep track of total distance. float distance = 0.0; // Keep track of the dimension perpendicular // to the last cell boundary, and the value of the // last cell the ray passed through. int dim, value; // while loops are not allowed, so we have to use // a for loop with a fixed large max number of iterations for(int i = 0; i < 1000; i++){ // Find the next closest cell boundary // and increment distances appropriately if(dists.x < dists.y && dists.x < dists.z && dists.x < dists.w){ dim = 1*s.x; m.x += s.x; distance = dists.x; dists.x += deltas.x; }else if(dists.y < dists.z && dists.y < dists.w){ dim = 2*s.y; m.y += s.y; distance = dists.y; dists.y += deltas.y; }else if(dists.z < dists.w){ dim = 3*s.z; m.z += s.z; distance = dists.z; dists.z += deltas.z; }else{ dim = 4*s.w; m.w += s.w; distance = dists.w; dists.w += deltas.w; }

In this section, we keep track of the length of the ray that it would take to get to the next cell boundary in any of the four grid axes, stored in the dists vector. Whichever one is smallest becomes the new length of the ray. Additionally, we record which axis we stepped along (in dim), update the coordinates of the cell through which the ray will pass next (in the m vector), and then increment the length of the theoretical ray which would hit the next cell boundary along the same grid axis.

After that, we just check to see if we've hit a wall yet. If there were any other objects in the game, this is where we'd add more detailed intersection checks for the objects within a particular cell, but with just a simple maze we just have to check the value of the cell to see if it is a wall or not:
value = get_cell(m.x, m.y, m.z, m.w); // If the cell is a wall, or we've gone too far, terminate. if(value == 255 || distance >= range){ break; } } // Calculate the actual intersection coordinates // and use them to look up a texture vec4 ray = o + distance * v; vec3 tex = calc_tex(dim, ray); return vec4(tex, 1.0); }

Once the casting loop terminates, we can use the coordinates of the end of the completed ray to look up the color of the bit of wall we ran into
The conversion of the vec3 color to a vec4 in the last line is just to add in an alpha channel, which is always 1, and is thus left out of the procedural texture code below for simplicity.

3D Textures

Once we've found a wall at the end of a ray, the coordinate for the axis perpendicular to that wall (identified by the value of dim), will have an integer value, while the fractional parts of the remaining three coordinates describe the position inside a cube which forms the boundary of one side of the hypercubical grid cell. This is exactly analogous to the 3D situation, where we would have two non-integer coordinates describing a location inside a square that forms the boundary of a cubical grid cell, or the 2D situation (where raycasting is more typically applied) where we have one non-integer coordinate identifying a position along a line forming the boundary of a square.

In order to figure out the color of a particular spot on the 3D wall of a 4D cell, we'll need three-dimensional textures. Say that we want our walls to be 512 pixels wide; for a 2D wall around a 3D cell. With one byte for each of three color channels, that means we'd need a texture taking up a little over 93KB in memory. That's entirely reasonable. We could easily have different textures for walls facing in all six directions around a cube, to help orient you in the maze. But for a 3D wall around a 4D cell, we'd need over 50 megabytes for each texture. Even if I had the digital artistic talent to create full 3D textures like that, that's rather a lot of memory to devote to textures. Once again, we can turn to procedural generation to create interesting wall textures.

There are lots of ways to do procedural texture generation, but the algorithm I picked looks basically like this:
uniform vec3 u_seed; const vec3 grey = vec3(0.2); const vec3 red = vec3(1.0,0.5,0.5); const vec3 green = vec3(0.5,1.0,0.5); const vec3 blue = vec3(0.5,0.5,1.0); const vec3 yellow = vec3(0.71,0.71,0.5); vec3 calc_tex(int dim, vec4 ray){ ray = fract(ray); vec3 coords, tint; if(dim == 1 || dim == -1){ coords = ray.yzw; tint = red; } else if(dim == 2 || dim == -2){ coords = ray.xzw; tint = green; } else if(dim == 3 || dim == -3){ coords = ray.xyw; tint = blue; } else if(dim == 4 || dim == -4){ coords = ray.xyz; tint = yellow; } float h = julia(coords, u_seed); if(h == 0.0){ return mix(tint/16.0, grey, layered_noise(coords, 3, 3)); } vec3 base = texture2D(u_colorscale, vec2(h, 0.5)).rgb; return mix(tint/8.0, base, layered_noise(coords, 5, 3)); }

It starts with a basic case analysis to pick some parameters based on which direction we hit the wall from, and thus which kind of hypercube boundary cell we're calculating the texture for.

Then, the basic texture is drawn from a 3D generalization of a Julia fractal. In order to break up the psychedelic color bands produced at the edges, we layer on top of this a bunch of 3D simplex noise at multiple frequencies; the layered_noise function (implementation not shown) takes a start and a length for the range of octaves of basic noise that should be added together.

In principle, we could produce a different 3D texture for each of the 8 cubes that form the boundary of each 4D cell. In fact, with the full coordinates of the ray endpoint, we could even produce different textures for every cube in the entire maze. In practice, however, that turns out to be more code for little benefit. Since the appearance of any particular wall depends heavily on your precise 4D position and orientation, revealing a very specific 3D slice of the 4D whole, having unique textures for every cube in the maze doesn't really seem to help that much with identifying your position.

So, to help with orientation, instead of calculating a completely different texture for every 3D face of a hypercube, we just add a tint of a particular color to the basic texture to identify the orientation of each cube with respect to the grid.

That ends up producing views something like this:


Note that not all of the walls in this scene seem to be at right angles! This despite the fact that the world consists of nothing but make walls aligned with a perfect right-angled hypercubical grid.

The reason for this is fairly straightforward; the viewing volume isn't parallel with the wall grid. Just like an oblique 2D slice through a 3D cube can end up with a hexagonal outline, a 3D slice through a 4D cube can also end up producing an outline with 2D segments meeting at all sorts of weird angles.

Tuesday, January 3, 2017

Implementing 4D rotation

In my last post, I explored various options for allowing a player to control their orientation in 4D space. Now, I want to look at how 4D rotations can be implemented, regardless of the control scheme.

For my 4D maze game, the only thing that need to be rotated is the virtual camera that provides the player's first-person point of view. Dealing only with that eliminates a lot of accidental complexity that would be involved in applying arbitrary rotations to models in the world, but the procedures described here can be extended for that purpose.

The first thing to decide is how to represent the player's current orientation. This could be done with hyperspherical coordinates, which work pretty much just like regular spherical coordinates but with some extra angles added. However, working with hyperspherical coordinates gets really confusing really fast, and they suffer from the same limitations as spherical coordinates regarding gimbal lock. So, we'll use a different representation: a set of four unit vectors which define a basis for an egocentric coordinate system, in terms of the game map coordinate system.

While the game map coordinates are given in terms of x, y, z, and w axes, the egocentric coordinate system we use for orienting the camera translates these into forward, right, up, and ana (or F, R, U, A) components. It's important to make sure these basis vectors for the egocentric coordinate system always remain mutually perpendicular (i.e., that they form an orthonormal basis); that means there is some redundancy in this system. It would be possible to use a leaner representation, such as 3 basis vectors where the 4th can be recovered computationally as needed, or a pair of right-isoclinic and left-isoclinic quaternions, but despite the redundancy the use of explicit basis vectors makes the algorithms for arbitrary rotations, and the math employed therein, much easier to follow, and thus less prone to implementation error and easier to debug.

The basic 3D rotations- pitch, roll, and yaw- correspond to movement in the forward-up (FU), up-right (UR) and forward-right (FR) planes, respectively. In order to execute, for example, a pitch up, we would want to rotate the forward and up vectors in the FU plane so that the new forward vector points a little more in the old up direction, and new up vector points a little more in the old negative forward direction. Thus, rotation in a plane can be expressed in terms of rotating each of the basis vectors (or any vector that falls in he same plane) towards or away from another vector in that plane. Vectors in the perpendicular plane are unaffected by rotations, so the remaining bases of our egocentric coordinate system can just be left alone, and there aren't any other vectors to worry about!

Therefore, the basic building-block we'll need to implement arbitrary rotations of our egocentric coordinates is a function that can rotate some vector in the plane defined by itself and a second, perpendicular vector. It turns out that this is much simpler than the general formula for rotating an arbitrary point parallel to an arbitrary plane. The code looks like this:

// Rotate a vector v in the vk plane, // by angle t, assuming v & k are orthogonal function vec_rot(v, k, t){ let cos = Math.cos(t); let sin = Math.sin(t); return { x: v.x*cos + k.x*sin, y: v.y*cos + k.y*sin, z: v.z*cos + k.z*sin, w: v.w*cos + k.w*sin }; }

We gain quite a lot of simplifications when we can assume that all of our vectors are normalized and orthogonal!

A complete rotation of the camera requires two calls to this function- one for each of the basis vectors in the plane of rotation. If we did just one, they would no longer be orthogonal, and that would cause all kinds of problems! The code for executing a complete pitch rotation thus looks like this:

Player.prototype.pitch = function(t){ let fprime = vec_rot(this.f, this.u, t); let uprime = vec_rot(this.u, this.f, -t); this.f = fprime; this.u = uprime; };

Note that the first basis vector moves towards the second, while the second basis vector moves away from the first; thus, the second half-rotation uses a negative angle. Also note that this implementation re-calculate the same sine and cosine values (up to a sign difference) twice, so a slight efficiency improvement can be gained by inlining vec_rot and eliminating the extra calculations, at the cost of some code duplication and a little loss of clarity. The implementations for roll and yaw are the same, simply replacing f and u with u and l or f and l, respectively.

Having defined our 3D rotations in terms of planes rather than axes, the generalization to 4D rotations is trivial: just add three more rotation functions that act in the ana-axis planes: FA, UA, and LA. Arbitrary rotations in non-basis planes can then be achieved simply by calling these basic functions in series. It's not quite as efficient as building a 4D rotation matrix to do everything at once, but it's good enough most of the time, and very easy to understand and hack on.

There is, however, one problem with the system as it stands- due to the accumulation of floating-point errors, after a large number of rotations the basis vectors for our egocentric coordinates can become denormalized and non-orthogonal, which can cause a variety of strange rendering errors. Thus, it is prudent to periodically re-normalize our coordinate system. This consists of two parts: normalizing each basis vector, and ensuring that the basis vectors are all orthogonal to each other.

Normalizing a single vector is fairly easy:

function normalize(v){ let {x,y,z,w} = v; let len = Math.sqrt(x*x+y*y+z*z+w*w); return {x:x/len, y:y/len, z:z/len, w:w/len}; }

And it's also not that hard to take a vector and get the next closest vector to it that is perpendicular to some other vector:

// subtract the projection of v onto k from v function orthogonalize(v,k){ let {x:vx,y:vy,z:vz,w:vw} = v; let {x:kx,y:ky,z:kz,w:kw} = k; let kk = kx*kx+ky*ky+kz*kz+kw*kw; let vk = vx*kx+vy*ky+vz*kz+vw*kw; let scale = vk/kk; return { x: vx - kx*scale, y: vy - ky*scale, z: vz - kz*scale, w: vw - kw*scale }; }

If we know that k is pre-normalized, this can be further simplified:

// subtract the projection of v onto k from v, // where k is assumed to be normalized function orthogonalize(v,k){ let {x:vx,y:vy,z:vz,w:vw} = v; let {x:kx,y:ky,z:kz,w:kw} = k; let vk = vx*kx+vy*ky+vz*kz+vw*kw; return { x: vx - kx*vk, y: vy - ky*vk, z: vz - kz*vk, w: vw - kw*vk }; }

When combining these, it is important to remember to orthogonalize first, then normalize. Otherwise, orthogonalization will undo normalization. Orthonormalizing any two vectors is thus made very simple:
let fprime = normalize(this.f); let rprime = normalize(orthogonalize(r, fprime));

But what about the remaining plane? In 3D, we could just generate a new one by calculating the cross-product, but a binary cross-product doesn't exist in higher dimensions, for the simple reason that there is no longer a unique (up to a sign) direction perpendicular to any two vectors. In the general case, we would need an algorithm for computing the null space of a matrix to figure out what the perpendicular vectors should be... but once again, we can take advantage of the fact that the vectors we need to be perpendicular to are themselves already orthonormalized to do some major simplifications.

Fixing the last two axes just consists of repeatedly calling orthogonalize for each of the previously-orthonormalized axes, and then normalizing the result.

// Ensure the egocentric basis is orthonormal Player.prototype.renormalize = function(){ let (f, r, u, a} = this; let fprime = normalize(f); let rprime = normalize(orthogonalize(r, fprime)); let uprime = normalize(orthogonalize(
orthogonalize(u,fprime),rprime)); let aprime = normalize(orthogonalize(
orthogonalize(
orthogonalize(a,fprime),rprime),uprime)); this.f = fprime; this.r = rprime; this.u = uprime; this.a = aprime; };

The fact that this works may not be completely obvious at first glance. After all, how do we know that orthogonalizing the basis vector for the U axis against rprime won't mess up the previous orthogonalization against fprime? Well, we know that rprime is guaranteed to be orthogonal to fprime, because we constructed it to be so. That means that the projection of u onto rprime will also be orthogonal to fprime. Adding or subtracting a component orthogonal to fprime clearly cannot introduce any components to uprime that are are parallel to fprime, and so we can conclude that orthogonalizing against rprime, or any vector which is also orthogonal to fprime, cannot cause uprime to become non-orthogonal to fprime. The same reasoning can be extended to prove the validity of the orthogonalization procedure for the A axis, and this procedure trivially generalizes to arbitrary higher dimensions as well.

As with the rotation itself, it is possible to gain some efficiency here at the cost of some code duplication, but since this operation need be performed only very rarely it has yet to become a bottleneck in my code.

Next: Building a 4D World

A Series of Options for 4D Game Controls

For several years, I have been working on-and-off on a 4D maze game. At any given point, your view is restricted to a 3D hyperplane sliced through the complete 4D world, such that the next part of the path may be out of sight due to its lying in a direction perpendicular to your current viewing volume. Successfully navigating the maze therefore requires the ability to perform rotations in 4 dimensions, to bring the extra coordinate axis into view.

As you might expect, there are a lot more ways to rotate in 4D than in 3D, which presents a problem in designing sensible & ergonomic video game controls.

How Rotation Works

Before getting to how a player might control their 4D orientation, it will be useful to go over some basics of how rotation actually works. In three dimensions, we are used to thinking of rotations as occurring about an axis. That, however, is a simplification unique to 3D space. In the generic n-dimensional case, rotations are defined by a set of planes.

In one dimension, there are no planes, so rotation is impossible. In two dimensions, there is exactly one plane, and rotations in that plane can be defined simply by their magnitude left or right. In 3 dimensions, however, you suddenly have 3 possibilities for basic planes of rotation, and an infinite number of linear combinations thereof. Specifically, one can rotate in the XY plane, the XZ plane, and the YZ plane. Whichever plane you choose, there is one axis left over, and the unique one-to-one pairing of axes and their perpendicular planes in three dimensions is what lets us describe rotations with axes. In 4D, however, there are six basic planes (XY, XZ, XW, YZ, YW, and ZW), and there is a whole additional plane perpendicular to any chosen plane of rotation- not a single unique axis. The simplest way of describing a rotation in 4 dimensions thus involves directly specifying the plane in which it occurs,

(Incidentally, the fact that any chosen plane is perpendicular to an entire additional plane means that 4D objects can have two independent rates of rotation in perpendicular planes, something not possible in 3D. But, that's a complication I can safely ignore for maze-navigating purposes.)

Straightforward Control Schemes

In 3D, we can often just assign one key to each direction of rotation in each basic plane; e.g., using WASD+QE for pitch, yaw, and roll. With six basic planes, however, that would require 12 keys. Now, that's doable; we could, for example, mirror the WASDQE arrangement on the other side of the keyboard, assigning the XW, YW, and ZW planes to UIOJKL. As far as I can tell, this is the maximally flexible scheme, allowing for linear combinations of any set of basic rotations.

But, we can cut down the number of keys used to just six again by exploiting the nature of 4D rotations: with one key for each of the four coordinate axes, and two keys for positive and negative rotation, the player can directly select which plane they want to rotate in. This (with some modifications to allow traditional controls for purely 3D motion) was the original scheme I implemented for my game.

The downside of the simple plane-selection scheme, however, is that it requires simultaneously pressing three keys to perform any rotation- two axis keys to define the plane, and a sign key for which way to rotate in that plane.


Alternatives for Reducing the Number of Keys


Another option is simply to assign one key to each of the six basic planes, again with two keys to determine the sign of the rotation. That's a total of 8 keys, requiring two key presses for each rotation. I worry, however, that remembering the key mappings for each plane might get annoying.

We can improve things a little more by shifting some of the responsibility for selecting planes onto the sign keys. If we have two sets of sign keys (say, the up/down and left/right arrow keys), then we only need half as many plane keys; a single key selects one plane to be controlled by the up/down arrows (or whatever else you pick for the sign keys), and another (preferably perpendicular) plane to be controlled by the left/right keys. That's a total of 7 keys, again requiring two simultaneous key presses for any rotation, and regaining some of the ability to compose basic rotations. If you choose a default set of planes for the sign keys, however, then you need only six keys total, as in the straightforward plane-selection scheme, but only 2/3 of all rotations require holding multiple keys, and you get the ability to compose some rotations.


Hybrid Approaches


In order to reduce a player's learning curve, it would be nice to ensure that normal, purely 3D rotations still use traditional controls. It turns out there's a very simple way to achieve that, extending the same logic used above. If we have six sign keys, and and a default selection of planes, then only one plane-selection key is required to switch the default three rotation planes out for the other three rotation planes. It is then trivial to choose the default planes to match 3D pitch, roll, and yaw. All 3D rotations then require only a single keypress, with an average of 1.5 simultaneous keys required for all rotations.

The current scheme I have implemented for my game is a slight expansion on this idea, with 3 plane-selection keys (for a total of 9 keys) that map the sign keys (implemented with WASDQE) to different sets of planes. The idea behind this design is to make it possible to create rotations in combinations of any two planes simultaneously, without requiring the full 12-key arrangement. Whether this is the best approach, however, I am still uncertain. It may be that, at least for this kind of game, there really isn't any great need for the ability to compose rotations in multiple basic planes, in which case the 7-key hybrid scheme may be better. On the other hand, if that is a desirable property, it might be best to simply go all-out with the full 12-key scheme.

Conclusion: More experimentation is required!

Next: Implementing 4D Rotation