Gliese 1337: software development

Showing posts with label software development. Show all posts

Wednesday, January 4, 2017

Building a 4D World

In my last post, I described how to orient a virtual camera in four dimensions. Now, we're going to give that camera something to look at.

Making art assets for regular games is hard enough; building full 4D models - things which allow you to look at them from any arbitrary vantage point in 4 dimensions - is just ridiculous. So procedural generation and simple geometric objects will be our friends. In fact, let's just do nothing but try to look at a randomly-generated 4D maze, constructed entirely out of right-angled, unit-sized hypercubes, as viewed through a 3D hyperplane slicing through the maze at an arbitrary angle and position.

Maze generation algorithms are all basically variations on finding spanning trees for the graph of maze cells, and as such are not strongly tied to any particular dimensionality; any of them can be fairly easily converted to work on arbitrary graphs, or in grids of arbitrary dimension. So, we'll leave maze generation as a problem space aside, and just assume that we can get a good maze map.

One way to render our view of the maze would be to represent the walls of the maze with polygons in 4 dimensions, and calculate the intersection of every edge with the currently-visible hyperplane to generate a 3D model that can be rendered in The Usual Way, using your 3D rendering library of choice.

Calculating all of those intersections, though, seems really complicated, and there's a much simpler option that takes advantage of the fact that our maze is restricted to a hypercubical grid: we can use raycasting!

Raycasting is more typically used to give the illusion of 3D with a 2D map, in games like Wolfenstein 3D. It involves marching a ray from the camera's position through the grid until you find a wall, and then drawing a column of pixels with height scaled based on the length of the ray, so that walls appear to recede into the distance. The code for this is fairly simple, and when you only have to cast one ray per pixel of horizontal resolution, its fast enough that you can even do it in pure JavaScript in a web browser. In order to render a 3D slice of a 4D map, we'll have to instead cast one ray for every pixel on the whole two-dimensional screen. That's significantly slower, but the code for casting each individual ray is not much more complex- it's merely about twice as long as that for a traditional raycaster, due to handling twice as many cases.

Casting Rays & Finding Walls

To get reasonable framerates, I implemented my 4D raycaster in GLSL to run on a GPU. The first step is calculating the appropriate ray given a pixel position:


uniform float u_depth;
uniform vec2 u_resolution;
uniform vec4 u_origin;
uniform vec4 u_rgt;
uniform vec4 u_up;
uniform vec4 u_fwd;

void main(){
    vec2 coords = gl_FragCoord.xy - (u_resolution / 2.0);
    vec4 ray = u_fwd*u_depth + u_rgt*coords.x + u_up*coords.y;
    gl_FragColor = cast_vec(u_origin, ray, 10.0);
}

(For the uninitiated, "uniform" variables have their values passed in from outside the shader program.)

This code converts the pixel location into a new coordinate system with the origin at the center of the field of view, instead of one corner. Then it calculates a a direction in which to cast a ray based on the camera's egocentric coordinate system; the depth of the virtual camera in front of the screen specifies how much of the forward direction to use, which constrains the viewing angle, while the x and y pixel coordinates indicate how far to go along the right and up axes. Note that, so far, aside from the fact that all of the vectors have four components, this looks entirely like 3D code; there is no mention of the fourth, ana, axis. This is because we only care about casting rays contained in the hyperplane defined by the F, R, and U axes, so we only need to construct rays with components from those basis vectors. If we wanted to view a projection of the world into a 3D hyperplane, rather than a slice, then we could define an arbitrary w-resolution, in addition to the x and y resolutions, and cast additional rays with components from the ana basis vector of the camera's coordinate system. But multiplying the number of rays like that not do good things to your framerate!

(Note that, although we only use a 3D subset of the camera's egocentric coordinate system, the resulting ray can, and usually will, have non-zero components in all four of the x, y, z, and w axes. This will only fail to be the case if the egocentric coordinate system is exactly parallel to the map grid.)

After calculating the ray corresponding to a particular pixel, we pass the origin point, the ray, and a distance limit into the raycasting alorithm, which returns a pixel color. So, let's look at how that works. First, a useful utility function:


// Find the distance to the next cell boundary
// for a particular vector component
float cast_comp(vec4 v, float o, out int sign, out int m){
    float delta, fm;
    if(v.x > 0.0){
        sign = 1;
        fm = floor(o);
        delta = fm + 1.0 - o;
    }else{
        sign = -1;
        fm = ceil(o - 1.0);
        delta = fm - o;
    }

    m = int(fm);
    return length(vec4(delta,delta*v.yzw/v.x));

This does three things:

Calculates the distance you have to move in the direction of a ray, starting from a particular origin point, before you hit a cell boundary in a particular grad-axis-aligned direction.
Calculates the sign of the ray's propagation along a particular grid-axis direction.
Calculates the integer position of the grid cell in which the ray originates along a particular grid-axis.

All of this information is necessary to initialize the main raycasting algorithm.


// Starting from the player, we find the nearest gridlines
// in each dimension. We move to whichever is closer and
// check for a wall. Then we repeat until we've traced the
// entire length of the ray.
vec4 cast_vec(vec4 o, vec4 v, float range){

    v = normalize(v);

    // Get the initial distances from the starting
    // point to the next cell boundaries.
    int4 s, m;
    vec4 dists = vec4
        cast_comp(v.xyzw, o.x, s.x, m.x),
        cast_comp(v.yxzw, o.y, s.y, m.y),
        cast_comp(v.zxyw, o.z, s.z, m.z),
        cast_comp(v.wxyz, o.w, s.w, m.w)
    );

    // Inverting the elements of a normalized vector
    // gives the distances you have to move along that
    // vector to hit a cell boundary perpendicular
    // to each dimension.
    vec4 deltas = abs(vec4(1.0/v.x, 1.0/v.y, 1.0/v.z, 1.0/v.w));

The deltas give us the last bit of information we need to initialize the algorithm- how big the steps are in the direction of the ray in order to move one full grid unit in a grid-axis-aligned direction. This is different from the initial distance needed to get from an origin point somewhere inside a cell to the cell boundary, calculated above.


    // Keep track of total distance.
    float distance = 0.0;

    // Keep track of the dimension perpendicular
    // to the last cell boundary, and the value of the
    // last cell the ray passed through.
    int dim, value;

    // while loops are not allowed, so we have to use
    // a for loop with a fixed large max number of iterations
    for(int i = 0; i < 1000; i++){
        // Find the next closest cell boundary
        // and increment distances appropriately
        if(dists.x < dists.y && dists.x < dists.z && dists.x < dists.w){
            dim = 1*s.x;
            m.x += s.x;
            distance = dists.x;
            dists.x += deltas.x;
        }else if(dists.y < dists.z && dists.y < dists.w){
            dim = 2*s.y;
            m.y += s.y;
            distance = dists.y;
            dists.y += deltas.y;
        }else if(dists.z < dists.w){
            dim = 3*s.z;
            m.z += s.z;
            distance = dists.z;
            dists.z += deltas.z;
        }else{
            dim = 4*s.w;
            m.w += s.w;
            distance = dists.w;
            dists.w += deltas.w;
        }

In this section, we keep track of the length of the ray that it would take to get to the next cell boundary in any of the four grid axes, stored in the dists vector. Whichever one is smallest becomes the new length of the ray. Additionally, we record which axis we stepped along (in dim), update the coordinates of the cell through which the ray will pass next (in the m vector), and then increment the length of the theoretical ray which would hit the next cell boundary along the same grid axis.

After that, we just check to see if we've hit a wall yet. If there were any other objects in the game, this is where we'd add more detailed intersection checks for the objects within a particular cell, but with just a simple maze we just have to check the value of the cell to see if it is a wall or not:


        value = get_cell(m.x, m.y, m.z, m.w);

        // If the cell is a wall, or we've gone too far, terminate.
        if(value == 255 || distance >= range){
            break;
        }
    }

    // Calculate the actual intersection coordinates
    // and use them to look up a texture
    vec4 ray = o + distance * v;
    vec3 tex = calc_tex(dim, ray);
    return vec4(tex, 1.0);
}

Once the casting loop terminates, we can use the coordinates of the end of the completed ray to look up the color of the bit of wall we ran into
The conversion of the vec3 color to a vec4 in the last line is just to add in an alpha channel, which is always 1, and is thus left out of the procedural texture code below for simplicity.

3D Textures

Once we've found a wall at the end of a ray, the coordinate for the axis perpendicular to that wall (identified by the value of dim), will have an integer value, while the fractional parts of the remaining three coordinates describe the position inside a cube which forms the boundary of one side of the hypercubical grid cell. This is exactly analogous to the 3D situation, where we would have two non-integer coordinates describing a location inside a square that forms the boundary of a cubical grid cell, or the 2D situation (where raycasting is more typically applied) where we have one non-integer coordinate identifying a position along a line forming the boundary of a square.

In order to figure out the color of a particular spot on the 3D wall of a 4D cell, we'll need three-dimensional textures. Say that we want our walls to be 512 pixels wide; for a 2D wall around a 3D cell. With one byte for each of three color channels, that means we'd need a texture taking up a little over 93KB in memory. That's entirely reasonable. We could easily have different textures for walls facing in all six directions around a cube, to help orient you in the maze. But for a 3D wall around a 4D cell, we'd need over 50 megabytes for each texture. Even if I had the digital artistic talent to create full 3D textures like that, that's rather a lot of memory to devote to textures. Once again, we can turn to procedural generation to create interesting wall textures.

There are lots of ways to do procedural texture generation, but the algorithm I picked looks basically like this:


uniform vec3 u_seed;

const vec3 grey = vec3(0.2);
const vec3 red = vec3(1.0,0.5,0.5);
const vec3 green = vec3(0.5,1.0,0.5);
const vec3 blue = vec3(0.5,0.5,1.0);
const vec3 yellow = vec3(0.71,0.71,0.5);

vec3 calc_tex(int dim, vec4 ray){
    ray = fract(ray);
    vec3 coords, tint;

    if(dim == 1 || dim == -1){
        coords = ray.yzw;
        tint = red;
    }
    else if(dim == 2 || dim == -2){
        coords = ray.xzw;
        tint = green;
    }
    else if(dim == 3 || dim == -3){
        coords = ray.xyw;
        tint = blue;
    }
    else if(dim == 4 || dim == -4){
        coords = ray.xyz;
        tint = yellow;
    }

    float h = julia(coords, u_seed);
    if(h == 0.0){
        return mix(tint/16.0, grey, layered_noise(coords, 3, 3));
    }

    vec3 base = texture2D(u_colorscale, vec2(h, 0.5)).rgb;
    return mix(tint/8.0, base, layered_noise(coords, 5, 3));
}

It starts with a basic case analysis to pick some parameters based on which direction we hit the wall from, and thus which kind of hypercube boundary cell we're calculating the texture for.

Then, the basic texture is drawn from a 3D generalization of a Julia fractal. In order to break up the psychedelic color bands produced at the edges, we layer on top of this a bunch of 3D simplex noise at multiple frequencies; the layered_noise function (implementation not shown) takes a start and a length for the range of octaves of basic noise that should be added together.

In principle, we could produce a different 3D texture for each of the 8 cubes that form the boundary of each 4D cell. In fact, with the full coordinates of the ray endpoint, we could even produce different textures for every cube in the entire maze. In practice, however, that turns out to be more code for little benefit. Since the appearance of any particular wall depends heavily on your precise 4D position and orientation, revealing a very specific 3D slice of the 4D whole, having unique textures for every cube in the maze doesn't really seem to help that much with identifying your position.

So, to help with orientation, instead of calculating a completely different texture for every 3D face of a hypercube, we just add a tint of a particular color to the basic texture to identify the orientation of each cube with respect to the grid.

That ends up producing views something like this:

Note that not all of the walls in this scene seem to be at right angles! This despite the fact that the world consists of nothing but make walls aligned with a perfect right-angled hypercubical grid.

The reason for this is fairly straightforward; the viewing volume isn't parallel with the wall grid. Just like an oblique 2D slice through a 3D cube can end up with a hexagonal outline, a 3D slice through a 4D cube can also end up producing an outline with 2D segments meeting at all sorts of weird angles.

Tuesday, January 3, 2017

Implementing 4D rotation

In my last post, I explored various options for allowing a player to control their orientation in 4D space. Now, I want to look at how 4D rotations can be implemented, regardless of the control scheme.

For my 4D maze game, the only thing that need to be rotated is the virtual camera that provides the player's first-person point of view. Dealing only with that eliminates a lot of accidental complexity that would be involved in applying arbitrary rotations to models in the world, but the procedures described here can be extended for that purpose.

The first thing to decide is how to represent the player's current orientation. This could be done with hyperspherical coordinates, which work pretty much just like regular spherical coordinates but with some extra angles added. However, working with hyperspherical coordinates gets really confusing really fast, and they suffer from the same limitations as spherical coordinates regarding gimbal lock. So, we'll use a different representation: a set of four unit vectors which define a basis for an egocentric coordinate system, in terms of the game map coordinate system.

While the game map coordinates are given in terms of x, y, z, and w axes, the egocentric coordinate system we use for orienting the camera translates these into forward, right, up, and ana (or F, R, U, A) components. It's important to make sure these basis vectors for the egocentric coordinate system always remain mutually perpendicular (i.e., that they form an orthonormal basis); that means there is some redundancy in this system. It would be possible to use a leaner representation, such as 3 basis vectors where the 4th can be recovered computationally as needed, or a pair of right-isoclinic and left-isoclinic quaternions, but despite the redundancy the use of explicit basis vectors makes the algorithms for arbitrary rotations, and the math employed therein, much easier to follow, and thus less prone to implementation error and easier to debug.

The basic 3D rotations- pitch, roll, and yaw- correspond to movement in the forward-up (FU), up-right (UR) and forward-right (FR) planes, respectively. In order to execute, for example, a pitch up, we would want to rotate the forward and up vectors in the FU plane so that the new forward vector points a little more in the old up direction, and new up vector points a little more in the old negative forward direction. Thus, rotation in a plane can be expressed in terms of rotating each of the basis vectors (or any vector that falls in he same plane) towards or away from another vector in that plane. Vectors in the perpendicular plane are unaffected by rotations, so the remaining bases of our egocentric coordinate system can just be left alone, and there aren't any other vectors to worry about!

Therefore, the basic building-block we'll need to implement arbitrary rotations of our egocentric coordinates is a function that can rotate some vector in the plane defined by itself and a second, perpendicular vector. It turns out that this is much simpler than the general formula for rotating an arbitrary point parallel to an arbitrary plane. The code looks like this:

// Rotate a vector v in the vk plane,
// by angle t, assuming v & k are orthogonal
function vec_rot(v, k, t){
    let cos = Math.cos(t);
    let sin = Math.sin(t);
    return {
        x: v.x*cos + k.x*sin,
        y: v.y*cos + k.y*sin,
        z: v.z*cos + k.z*sin,
        w: v.w*cos + k.w*sin
    };
}

We gain quite a lot of simplifications when we can assume that all of our vectors are normalized and orthogonal!

A complete rotation of the camera requires two calls to this function- one for each of the basis vectors in the plane of rotation. If we did just one, they would no longer be orthogonal, and that would cause all kinds of problems! The code for executing a complete pitch rotation thus looks like this:

Player.prototype.pitch = function(t){
    let fprime = vec_rot(this.f, this.u, t);
    let uprime = vec_rot(this.u, this.f, -t);
    this.f = fprime;
    this.u = uprime;
};

Note that the first basis vector moves towards the second, while the second basis vector moves away from the first; thus, the second half-rotation uses a negative angle. Also note that this implementation re-calculate the same sine and cosine values (up to a sign difference) twice, so a slight efficiency improvement can be gained by inlining vec_rot and eliminating the extra calculations, at the cost of some code duplication and a little loss of clarity. The implementations for roll and yaw are the same, simply replacing f and u with u and l or f and l, respectively.

Having defined our 3D rotations in terms of planes rather than axes, the generalization to 4D rotations is trivial: just add three more rotation functions that act in the ana-axis planes: FA, UA, and LA. Arbitrary rotations in non-basis planes can then be achieved simply by calling these basic functions in series. It's not quite as efficient as building a 4D rotation matrix to do everything at once, but it's good enough most of the time, and very easy to understand and hack on.

There is, however, one problem with the system as it stands- due to the accumulation of floating-point errors, after a large number of rotations the basis vectors for our egocentric coordinates can become denormalized and non-orthogonal, which can cause a variety of strange rendering errors. Thus, it is prudent to periodically re-normalize our coordinate system. This consists of two parts: normalizing each basis vector, and ensuring that the basis vectors are all orthogonal to each other.

Normalizing a single vector is fairly easy:

function normalize(v){
    let {x,y,z,w} = v;
    let len = Math.sqrt(x*x+y*y+z*z+w*w);
    return {x:x/len, y:y/len, z:z/len, w:w/len};
}

And it's also not that hard to take a vector and get the next closest vector to it that is perpendicular to some other vector:

// subtract the projection of v onto k from v
function orthogonalize(v,k){
    let {x:vx,y:vy,z:vz,w:vw} = v;
    let {x:kx,y:ky,z:kz,w:kw} = k;
    let kk = kx*kx+ky*ky+kz*kz+kw*kw;
    let vk = vx*kx+vy*ky+vz*kz+vw*kw;
    let scale = vk/kk;
    return {
        x: vx - kx*scale,
        y: vy - ky*scale,
        z: vz - kz*scale,
        w: vw - kw*scale
    };
}

If we know that k is pre-normalized, this can be further simplified:

// subtract the projection of v onto k from v,
// where k is assumed to be normalized
function orthogonalize(v,k){
    let {x:vx,y:vy,z:vz,w:vw} = v;
    let {x:kx,y:ky,z:kz,w:kw} = k;
    let vk = vx*kx+vy*ky+vz*kz+vw*kw;
    return {
        x: vx - kx*vk,
        y: vy - ky*vk,
        z: vz - kz*vk,
        w: vw - kw*vk
    };
}

When combining these, it is important to remember to orthogonalize first, then normalize. Otherwise, orthogonalization will undo normalization. Orthonormalizing any two vectors is thus made very simple:


let fprime = normalize(this.f);
let rprime = normalize(orthogonalize(r, fprime));

But what about the remaining plane? In 3D, we could just generate a new one by calculating the cross-product, but a binary cross-product doesn't exist in higher dimensions, for the simple reason that there is no longer a unique (up to a sign) direction perpendicular to any two vectors. In the general case, we would need an algorithm for computing the null space of a matrix to figure out what the perpendicular vectors should be... but once again, we can take advantage of the fact that the vectors we need to be perpendicular to are themselves already orthonormalized to do some major simplifications.

Fixing the last two axes just consists of repeatedly calling orthogonalize for each of the previously-orthonormalized axes, and then normalizing the result.

// Ensure the egocentric basis is orthonormal
Player.prototype.renormalize = function(){
    let (f, r, u, a} = this;

    let fprime = normalize(f);
    let rprime = normalize(orthogonalize(r, fprime));
    let uprime = normalize(orthogonalize(

                           orthogonalize(u,fprime),rprime));
    let aprime = normalize(orthogonalize(

orthogonalize(

                           orthogonalize(a,fprime),rprime),uprime));

    this.f = fprime;
    this.r = rprime;
    this.u = uprime;
    this.a = aprime;
};

The fact that this works may not be completely obvious at first glance. After all, how do we know that orthogonalizing the basis vector for the U axis against rprime won't mess up the previous orthogonalization against fprime? Well, we know that rprime is guaranteed to be orthogonal to fprime, because we constructed it to be so. That means that the projection of u onto rprime will also be orthogonal to fprime. Adding or subtracting a component orthogonal to fprime clearly cannot introduce any components to uprime that are are parallel to fprime, and so we can conclude that orthogonalizing against rprime, or any vector which is also orthogonal to fprime, cannot cause uprime to become non-orthogonal to fprime. The same reasoning can be extended to prove the validity of the orthogonalization procedure for the A axis, and this procedure trivially generalizes to arbitrary higher dimensions as well.

As with the rotation itself, it is possible to gain some efficiency here at the cost of some code duplication, but since this operation need be performed only very rarely it has yet to become a bottleneck in my code.

Next: Building a 4D World

A Series of Options for 4D Game Controls

For several years, I have been working on-and-off on a 4D maze game. At any given point, your view is restricted to a 3D hyperplane sliced through the complete 4D world, such that the next part of the path may be out of sight due to its lying in a direction perpendicular to your current viewing volume. Successfully navigating the maze therefore requires the ability to perform rotations in 4 dimensions, to bring the extra coordinate axis into view.

As you might expect, there are a lot more ways to rotate in 4D than in 3D, which presents a problem in designing sensible & ergonomic video game controls.

How Rotation Works

Before getting to how a player might control their 4D orientation, it will be useful to go over some basics of how rotation actually works. In three dimensions, we are used to thinking of rotations as occurring about an axis. That, however, is a simplification unique to 3D space. In the generic n-dimensional case, rotations are defined by a set of planes.

In one dimension, there are no planes, so rotation is impossible. In two dimensions, there is exactly one plane, and rotations in that plane can be defined simply by their magnitude left or right. In 3 dimensions, however, you suddenly have 3 possibilities for basic planes of rotation, and an infinite number of linear combinations thereof. Specifically, one can rotate in the XY plane, the XZ plane, and the YZ plane. Whichever plane you choose, there is one axis left over, and the unique one-to-one pairing of axes and their perpendicular planes in three dimensions is what lets us describe rotations with axes. In 4D, however, there are six basic planes (XY, XZ, XW, YZ, YW, and ZW), and there is a whole additional plane perpendicular to any chosen plane of rotation- not a single unique axis. The simplest way of describing a rotation in 4 dimensions thus involves directly specifying the plane in which it occurs,

(Incidentally, the fact that any chosen plane is perpendicular to an entire additional plane means that 4D objects can have two independent rates of rotation in perpendicular planes, something not possible in 3D. But, that's a complication I can safely ignore for maze-navigating purposes.)

Straightforward Control Schemes

In 3D, we can often just assign one key to each direction of rotation in each basic plane; e.g., using WASD+QE for pitch, yaw, and roll. With six basic planes, however, that would require 12 keys. Now, that's doable; we could, for example, mirror the WASDQE arrangement on the other side of the keyboard, assigning the XW, YW, and ZW planes to UIOJKL. As far as I can tell, this is the maximally flexible scheme, allowing for linear combinations of any set of basic rotations.

But, we can cut down the number of keys used to just six again by exploiting the nature of 4D rotations: with one key for each of the four coordinate axes, and two keys for positive and negative rotation, the player can directly select which plane they want to rotate in. This (with some modifications to allow traditional controls for purely 3D motion) was the original scheme I implemented for my game.

The downside of the simple plane-selection scheme, however, is that it requires simultaneously pressing three keys to perform any rotation- two axis keys to define the plane, and a sign key for which way to rotate in that plane.

Alternatives for Reducing the Number of Keys

Another option is simply to assign one key to each of the six basic planes, again with two keys to determine the sign of the rotation. That's a total of 8 keys, requiring two key presses for each rotation. I worry, however, that remembering the key mappings for each plane might get annoying.

We can improve things a little more by shifting some of the responsibility for selecting planes onto the sign keys. If we have two sets of sign keys (say, the up/down and left/right arrow keys), then we only need half as many plane keys; a single key selects one plane to be controlled by the up/down arrows (or whatever else you pick for the sign keys), and another (preferably perpendicular) plane to be controlled by the left/right keys. That's a total of 7 keys, again requiring two simultaneous key presses for any rotation, and regaining some of the ability to compose basic rotations. If you choose a default set of planes for the sign keys, however, then you need only six keys total, as in the straightforward plane-selection scheme, but only 2/3 of all rotations require holding multiple keys, and you get the ability to compose some rotations.

Hybrid Approaches

In order to reduce a player's learning curve, it would be nice to ensure that normal, purely 3D rotations still use traditional controls. It turns out there's a very simple way to achieve that, extending the same logic used above. If we have six sign keys, and and a default selection of planes, then only one plane-selection key is required to switch the default three rotation planes out for the other three rotation planes. It is then trivial to choose the default planes to match 3D pitch, roll, and yaw. All 3D rotations then require only a single keypress, with an average of 1.5 simultaneous keys required for all rotations.

The current scheme I have implemented for my game is a slight expansion on this idea, with 3 plane-selection keys (for a total of 9 keys) that map the sign keys (implemented with WASDQE) to different sets of planes. The idea behind this design is to make it possible to create rotations in combinations of any two planes simultaneously, without requiring the full 12-key arrangement. Whether this is the best approach, however, I am still uncertain. It may be that, at least for this kind of game, there really isn't any great need for the ability to compose rotations in multiple basic planes, in which case the 7-key hybrid scheme may be better. On the other hand, if that is a desirable property, it might be best to simply go all-out with the full 12-key scheme.

Conclusion: More experimentation is required!

Next: Implementing 4D Rotation

Friday, May 13, 2016

Thoughts on Designing an AI for Hnefatafl

Hnefatafl is an ancient Scandinavian board game part of a family of Tafl games played on varying board sizes and with slightly different rules for win conditions. The unifying thread of Tafl games is that they are played on odd-numbered square boards (between 7x7 and 19x19) with radially symmetric layouts, and feature asymmetric sides- one side has a king, whose goal it is to escape, starting in the center, while the other side has undifferentiated pieces arranged around the edge who goal is to capture the king, and about twice as many of them.

I like playing Hnefatafl, but not a whole lot other people do these days; given the difficulty of finding good partners, I'd like to write an AI for it. This is not something I have done yet, so I'm not sure how well it will actually work, but these are my pre-programming musings on how I might want to approach it:

The basic ground-level approach to game AI is minimax search- you calculate every board position that you can get to from where you are, and every board position your opponent can get to from those positions, and so on, as deep as you can, and then you use some heuristic to figure out which of the resulting positions are "best" (max) for you and "worst" (min) for your opponent, and take the move that has the greatest probability of taking you to good positions later.

All of the pieces in Hnefatafl move like rooks; combined with the large board, this gives the game an extremely large branching factor; i.e., the number of possible positions you can get to in a small number of moves gets very, very big. This makes minimax game trees very difficult to calculate to any useful depth, because it just takes too long and uses too much memory. One simple optimization is to remember states that you've seen before, and what the best move was. Then, whenever you find a state you recognize, you can stop looking deeper, saving time and memory. This works best across multiple games, with the AI learning more states and getting better each time. It only really works well, though, if you can expect to come across repeated board states on a regular basis- something which becomes very unlikely when the total number of possible states is very, very large, as it is for large Tafl boards.

Fortunately, we can exploit symmetries to reduce the number of board states 8-fold: each position can be rotated into 4 orientations, and reflected, and the strategic situation remains the same. If we can define a criterion for choosing a canonical representation for any of these sets of 8 board states, then every time we generate a new possible move, we can put it in canonical form to compare against a database of remembered states and best moves from those states, and we just have to keep track of 3 bits of additional information to remember how to transform the board between internal representation and presentation, so that the board doesn't flip and rotate on-screen in case the states preceding and following a move have different canonical orientations.

Unfortunately, the space of board states is still huge. For a 13x13 board (my personal favorite), there are something the ballpark of 6x10^77 distinct board states, after accounting for symmetry. Some board states are enormously more likely than others, but even so, with such a large search space, the probability of coming across repeated states between multiple games after the first few moves is extremely low, making a database of previously-accumulated "good move" knowledge of marginal utility. So, we need to do better.

As do many board games, Hnefatafl gives rise to a large number of recognizable patterns that take up only a small part of the board, but disproportionately control the strategic situation. Trying to recognize entire boards is kind of dumb, because even if you have figured out the best move from one position already, that will be useless if you later come across a strategically-identical position that happens to one inconsequential piece misplaced by one square. If the AI can recognize parts of boards as conforming to familiar patterns, however, there is less important stuff to memorize, and the memorized knowledge is much more generalizable.

So, how do we recognize patterns? Some patterns involve specific configurations of pieces in specific relative positions, while others just involve having a certain number of pieces that can attack a particular position, regardless of how exactly they are spaced out along a row or a column. To capture both kinds of information, we'll want to keep track of the pieces arranged in a small segment of the board, as well as a count of how many pieces (and of which type) are on each row and column leading out of that section. In my experience, a 4x4 square should be large enough to capture most common and important patterns. Using a 4x4 square means some duplication of storage for smaller patterns, but we can still make use of symmetry to reduce storage requirements and branching factors. There are generally three types of pieces (attackers, defenders, and the king), and generally three types of squares which may have different movement rules associated (normal squares, the king's throne in the center, and goal squares where the king escapes).
Given restrictions on how throne and goal squares can be positioned relative to each other, and the fact there can only ever be one king, there are somewhere between 5380841 and can be positioned relative to goal squares, which reduces the total a good bit, there are something like 3.6x10^11 canonical 4x4 squares. The upper end of that range is still pretty huge, but without doing the detailed calculation, I suspect the exact number is much closer to the lower end. Given that some positions are still much more likely than others, this looks like a range where we can actually expect to come across repeats fairly often, without having to store an impractically large amount of data. Additionally, with this method it becomes possible to share strategic memory across multiple different Tafl variations!

Adding in row-and-column counts of course increases the number of variations; storing the maximum amount of information on row and column dispositions would increase the number of specific varieties by about 3^9 per row and column, plus 11 positions for a square vertically or horizontally, or a total factor of about 25548534 for 4x4 squares on a 13x13 board; smaller boards, of course, would use smaller subsets of that total set.

I'm not sure what the probability distribution of those extra states is, exactly; I suspect that they will be clustered, in which case the large total space is not a huge deal, but we can also probably get fairly good performance with a simplified model, which simply stores which side, if any, controls a particular row or column, and how many pieces that side has on that row or column before you reach an enemy or the edge. Just storing which side has control might give useful performance, which only multiplies the memory space by a factor of 24 regardless of board size (one or the other side or nobody for each of 8 perimeter spaces) but my intuition about the game says that storing counts is actually important. It may take experimenting to see if it's really worth fragmenting the pattern memory in this way.

In any case, this general approach won't capture every strategically-important pattern, but it will get most of them. A few large-scale patterns, like building walls across the board, could be hand-coded, but they might end up being automatically recognized as advantageous just due to the positive influence of many overlapping small sections that are individually recognized as advantageous.

Now, we can use the database of board sections both to evaluate the strength of a particular position without having to search deeper to find out how it turns out in the end, and to remember the best move from a particular familiar position without having to calculate all possible moves or recognize entire boards.

For evaluating positions, pattern segments can be assigned weights depending on their prevalence and proximity to wins and losses- when the AI wins, it can examine all patterns in the series of states for that game, and assign positive values to them, and vice-versa when it loses.

To pick good moves, it should be possible to prove just by looking at a small section of the board that moving certain pieces will always result in a worse position, or always result in a better position, and thus minimize the number of possible moves that need to be calculated to pick that actual best.

Sunday, March 9, 2014

Parles: A Language With Less Irritating Superfluous Parenthesization

Well, I haven't blogged in rather a while. But, what with being in a Linguistics MA program, I've been feeling the need to take some me-time and do some Computer Science recently, and I might as well tell the world about it.

Building off my thoughts from Lisp Without Parentheses and Pointful Concatenative Programming, I've started working on a new language called Parles.

Most programming languages seem to me to have a rather inconsistent idea about evaluation order within a line- it usually goes left-to-right, the direction you read in, except when you have to jump backwards to the left to figure out what function to apply to all those argument expressions you just read left-to-right. Most concatenative languages are more straightforward- everything always goes left-to-right; but despite the wonderful consistency, this seems to cause Discomfort and Confusion to people coming from traditional Algol-style languages beyond what comes from just dealing with weird syntax (or the utter lack thereof- even less syntax than LISPs!). I have tried to resolve this issue in the design of Parles.

At its most basic level, Parles is essentially FORTH, read backwards: evaluation proceeds from right to left, with arguments evaluated before function calls (ignoring for a moment that all arguments *are* function calls), and appearing after them in the program text.

There are two ways to alter this basic behavior: semicolons and pipes.
The semicolon is the sequence operator; it is eliminated at parse time and causes everything parsed prior to the semicolon to be post-fixed to whatever comes after, up to the next semicolon or end-of-block. If you end every line with a semicolon, this will cause lines to be evaluated top-to-bottom, just as pretty much everybody ever expects them to be. Thus,

+ 1 2;
* 3 4;

is equivalent to

* 3 4 + 1 2

The trailing semicolon is irrelevant- it causes the entire program to be postfixed to the empty string, which is a no-op.

The pipe (|) operator behaves like a unix shell pipe- take the output from the stuff on the left, and feed it to the stuff on the right. It's essentially identical to the semicolon, except that it binds more tightly (re-ordering occurs up to the next pipe, semicolon, or block boundary) and encodes a type assertion that the output row type of the left side exactly matches the input row type of the right side. E.g., while

1 2 3; +

is totally valid, and results in "3 3" on the stack (with + popping 1 and 2 off the stack and replacing them with 3, and the original three coasting underneath untouched)

1 2 3 | +

is a type error, as the compound expression "1 2 3" has output type "int int int" while + expects only two ints as input. Pipe can be used to simulate infix operators, as long as there is a block boundary or semicolons to properly delimit the expressions whose types are to be matched. E.g.,

1 | + 2

is a valid infix version of

+ 2 1

There are three kinds of blocks, using three kinds of brackets: [], (), and {}.
[] blocks are called quotations and are taken pretty much straight from FORTH: they produce a function value whose type is equal to the type of the body. These are Parles's primary means of abstraction and delayed computation. In addition to all of the things that you would usually use functions for, they are necessary for using control-flow operators. For example, "if" in Parles is a function which takes a boolean and two quotations, and applies one or the other of them depending on the value of the boolean. Its usage looks like this:

if < 2 3
[print "true"]
[print "false"]

This is an excellent example of a circumstance where you do not want to end a line with a semicolon. Inserting semicolons would require re-ordering lines as follows (so that they will be properly un-reordered by the parser):

[print "false"];
[print "true"];
if < 2 3

which is the scary weird order that most FORTHs will make you write it in (modulo each line still being written backwards).

() blocks are argument blocks. They are evaluated immediately, but introduce a type assertion that the contained expression requires no arguments itself- i.e., takes nothing off the stack. Thus, they represent things that you would put parentheses around if you were writing in a traditional LISP. Indeed, you can write Parles to look like Scheme if you feel like putting parentheses everywhere you possibly can and not doing any interesting tricks with floating extra arguments on the stack. I considered extending the type assertion to require that the body terminate having placed exactly one value on the stack to serve as the return value, but plenty of LISPs already allow multi-value return, and it'd be a shame to eliminate that feature from a language to which it belongs so naturally.
Aside from organizing and documenting things for the sake of future readers of the code, the primary purpose of argument blocks is to annotate sections of code that can be executed in parallel due to a lack of input dependencies. A smart compiler would be able to identify such regions automatically, rather than just verify them when pointed out by a human, but that's Hard, so I probably won't implement it for a while, and putting in some clues to make your code easier to read is a good thing anyway.

{} blocks are just regular blocks, and they have no particularly special function except delimiting scope. As mentioned above, block boundaries limit semicolon and pipe re-ordering. In addition, however, all three kinds of blocks allow for an optional argument declaration that binds local variables by popping values off the stack. Thus, binding local variables is likely to be the most common use for regular blocks. That usage looks like this:

{a: str -> print print a a} "waka"

which binds "waka" to the string variable "a" and prints it twice ("wakawaka"). (This is slightly different from the syntax I played with in Pointful Concatenative Programming because I ended up needing the pipe character for something else.) All arguments (at least for now, until the type inference system is more advanced) must have a type declaration, and variables listed in the same order as the expressions which produced the values they bind to. E.g.,

{a: str b: num -> b a} "hi" 1

will bind "hi" to "a" and "1" to "b" (and swap their positions on the stack when it's done).

Quotations are a little bit special. Their argument binding behavior is just like what one would expect from anonymous functions in any other language: arguments are taken from the stack at the point of application, not at the point of creation (technically, this is true for all blocks, but for the others those are the same time). Quotations also form closures over their free variables that are bound in higher scopes.

So far, all functions (except built-in ones like "if") are anonymous. There is an "apply" operator for executing quotations stored on the top of the stack, but it is also possible to give quotations names by binding them to variables (as in pretty much every other language with first-class functions). In order to ensure equality between built-in and programmer-defined functions, using a variable that holds a quotation does not place that quotation value back on the stack, like accessing another value would; rather, it results in the application of the stored quotation. This means you can bind quotations to variable names and then use them just like the built-in words, without needing to use "apply". If you don't want them to be executed, just re-quote.

There is also one additional way to bind variables: using a lambda binding. This consists of the unicode lowercase letter lambda (λ) or a backslash (\) followed immediately by a name and type assertion. This is translated at parse time into a block binding a single variable whose scope extends left until the next explicit block boundary (or EOF). The intent is that you put these things at the beginning of a line terminated by a semicolon, kind of like a regular variable declaration in an Algol-style language, but you can put them anywhere you like. So, one might decide to declare a new function like this:

\sayhi [print "!" print print "hi "];
sayhi "Bob"

or

\sayhi [name: str -> print "!" print name print "hi "];
sayhi "Bob"

which will both print out "hi Bob!"

All variables are (for now) immutable, but can be shadowed as much as you like. That includes all of the built-in functions; so far, the only keywords or reserved symbols are: the various brackets, the argument arrow "->", the colon for introducing type assertions, the backslash and lambda characters, and the semicolon and pipe characters. Everything else is free game for redefinition by shadowing. Use this power wisely.

As of now, you sadly can't actually run any Parles programs; there's an outline of a virtual machine for it, but no code-generator. The parser and type-checker (a critical component of the compiler, as you can't figure out what hooks up with what unless you can figure out the in- and out-arity of every word) are working though, so you can at least parse and type-check Parles programs with the Python scripts in the Github repo.

Thursday, March 1, 2012

You Can't Buy Everything At The Bazaar- Yet

The first rule of good software is that it scratches the programmer's personal itch. Unfortunately, not everyone with an itch is a programmer. Programming exists as a profession precisely because not everyone with a problem is capable of effectively solving it themselves, and thus someone has to get paid for scratching other people's itches. This will always be the case, because no matter how widely computer literacy is spread throughout society, some people will always be better at it than others; the fact that almost everyone can read and write does not make novelists obsolete, and the same is true for good programmers. This seems to imply that there will always be bad software maintained by people who care only as far as they are paid. However, there is one way out: as the population of programmers expands, the probability that one of them shares whatever problem you may have grows. When the bazaar is big enough, one just has to solve the problem of finding the person who wants to make what you need.