Friday, May 13, 2016

Thoughts on Designing an AI for Hnefatafl

Hnefatafl is an ancient Scandinavian board game part of a family of Tafl games played on varying board sizes and with slightly different rules for win conditions. The unifying thread of Tafl games is that they are played on odd-numbered square boards (between 7x7 and 19x19) with radially symmetric layouts, and feature asymmetric sides- one side has a king, whose goal it is to escape, starting in the center, while the other side has undifferentiated pieces arranged around the edge who goal is to capture the king, and about twice as many of them.

I like playing Hnefatafl, but not a whole lot other people do these days; given the difficulty of finding good partners, I'd like to write an AI for it. This is not something I have done yet, so I'm not sure how well it will actually work, but these are my pre-programming musings on how I might want to approach it:

The basic ground-level approach to game AI is minimax search- you calculate every board position that you can get to from where you are, and every board position your opponent can get to from those positions, and so on, as deep as you can, and then you use some heuristic to figure out which of the resulting positions are "best" (max) for you and "worst" (min) for your opponent, and take the move that has the greatest probability of taking you to good positions later.

All of the pieces in Hnefatafl move like rooks; combined with the large board, this gives the game an extremely large branching factor; i.e., the number of possible positions you can get to in a small number of moves gets very, very big. This makes minimax game trees very difficult to calculate to any useful depth, because it just takes too long and uses too much memory. One simple optimization is to remember states that you've seen before, and what the best move was. Then, whenever you find a state you recognize, you can stop looking deeper, saving time and memory. This works best across multiple games, with the AI learning more states and getting better each time. It only really works well, though, if you can expect to come across repeated board states on a regular basis- something which becomes very unlikely when the total number of possible states is very, very large, as it is for large Tafl boards.

Fortunately, we can exploit symmetries to reduce the number of board states 8-fold: each position can be rotated into 4 orientations, and reflected, and the strategic situation remains the same. If we can define a criterion for choosing a canonical representation for any of these sets of 8 board states, then every time we generate a new possible move, we can put it in canonical form to compare against a database of remembered states and best moves from those states, and we just have to keep track of 3 bits of additional information to remember how to transform the board between internal representation and presentation, so that the board doesn't flip and rotate on-screen in case the states preceding and following a move have different canonical orientations.

Unfortunately, the space of board states is still huge. For a 13x13 board (my personal favorite), there are something the ballpark of 6x10^77 distinct board states, after accounting for symmetry. Some board states are enormously more likely than others, but even so, with such a large search space, the probability of coming across repeated states between multiple games after the first few moves is extremely low, making a database of previously-accumulated "good move" knowledge of marginal utility. So, we need to do better.

As do many board games, Hnefatafl gives rise to a large number of recognizable patterns that take up only a small part of the board, but disproportionately control the strategic situation. Trying to recognize entire boards is kind of dumb, because even if you have figured out the best move from one position already, that will be useless if you later come across a strategically-identical position that happens to one inconsequential piece misplaced by one square. If the AI can recognize parts of boards as conforming to familiar patterns, however, there is less important stuff to memorize, and the memorized knowledge is much more generalizable.

So, how do we recognize patterns? Some patterns involve specific configurations of pieces in specific relative positions, while others just involve having a certain number of pieces that can attack a particular position, regardless of how exactly they are spaced out along a row or a column. To capture both kinds of information, we'll want to keep track of the pieces arranged in a small segment of the board, as well as a count of how many pieces (and of which type) are on each row and column leading out of that section. In my experience, a 4x4 square should be large enough to capture most common and important patterns. Using a 4x4 square means some duplication of storage for smaller patterns, but we can still make use of symmetry to reduce storage requirements and branching factors. There are generally three types of pieces (attackers, defenders, and the king), and generally three types of squares which may have different movement rules associated (normal squares, the king's throne in the center, and goal squares where the king escapes).
Given restrictions on how throne and goal squares can be positioned relative to each other, and the fact there can only ever be one king, there are somewhere between 5380841 and can be positioned relative to goal squares, which reduces the total a good bit, there are something like 3.6x10^11 canonical 4x4 squares. The upper end of that range is still pretty huge, but without doing the detailed calculation, I suspect the exact number is much closer to the lower end. Given that some positions are still much more likely than others, this looks like a range where we can actually expect to come across repeats fairly often, without having to store an impractically large amount of data. Additionally, with this method it becomes possible to share strategic memory across multiple different Tafl variations!

Adding in row-and-column counts of course increases the number of variations; storing the maximum amount of information on row and column dispositions would increase the number of specific varieties by about 3^9 per row and column, plus 11 positions for a square vertically or horizontally, or a total factor of about 25548534 for 4x4 squares on a 13x13 board; smaller boards, of course, would use smaller subsets of that total set.

I'm not sure what the probability distribution of those extra states is, exactly; I suspect that they will be clustered, in which case the large total space is not a huge deal, but we can also probably get fairly good performance with a simplified model, which simply stores which side, if any, controls a particular row or column, and how many pieces that side has on that row or column before you reach an enemy or the edge. Just storing which side has control might give useful performance, which only multiplies the memory space by a factor of 24 regardless of board size (one or the other side or nobody for each of 8 perimeter spaces) but my intuition about the game says that storing counts is actually important. It may take experimenting to see if it's really worth fragmenting the pattern memory in this way.

In any case, this general approach won't capture every strategically-important pattern, but it will get most of them. A few large-scale patterns, like building walls across the board, could be hand-coded, but they might end up being automatically recognized as advantageous just due to the positive influence of many overlapping small sections that are individually recognized as advantageous.

Now, we can use the database of board sections both to evaluate the strength of a particular position without having to search deeper to find out how it turns out in the end, and to remember the best move from a particular familiar position without having to calculate all possible moves or recognize entire boards.

For evaluating positions, pattern segments can be assigned weights depending on their prevalence and proximity to wins and losses- when the AI wins, it can examine all patterns in the series of states for that game, and assign positive values to them, and vice-versa when it loses.

To pick good moves, it should be possible to prove just by looking at a small section of the board that moving certain pieces will always result in a worse position, or always result in a better position, and thus minimize the number of possible moves that need to be calculated to pick that actual best.