Gliese 1337: math

Showing posts with label math. Show all posts

Monday, April 29, 2024

Solving the Game of 5000

The Game of 5000 is a game of dice, with an element of choice. You start by rolling six dice, and scoring as follows, taking the largest score possible:

Six-of-a-kind gives you 5000 points.

A straight (1,2,3,4,5,6) is 2000 points.

Two triples or three pairs are 1500 points.

Three ones are 1000 points.

Triple 2s are 200, triple 3s are 300, etc.

Individual 1s are 100 points.

Individual 5s are 50 points.

You then remove the scoring dice, and consider the remaining dice. If you have no unused dice, you must roll again. Otherwise, you have a choice: keep your current score, or roll again with the remaining dice. If those dice score, you add them to your total, and repeat; but it you ever roll a zero, you lose everything, and end your turn. The game ends when someone has reached a score of 5000, and the round has finished so all players have an equal number of turns.

The highest-scoring patterns are only available if you are rolling six dice, so lower-dice-count rolls become riskier. With 3, 4, or 5 dice, you can get single triples and collections of individual ones and fives. With one or two dice, you can only get individual ones and fives, and the probability of rolling a zero and losing everything is high.

So, when should you roll again, and when should you stay? To figure this out, we need to know the expected value of any given roll. That is, if you roll a certain number of dice, what is the average score you would get over a large number of rolls? That turns out to depend on what you have to lose, and to incorporate that we will need to know all the ways that you could score, and how many ways you could roll zero and lose everything.

For 1 die, you have 1/6th probability of rolling 100, 1/6th probability of rolling 50, and 2/3rds probability of rolling zero. The sum of all possible scores is 150. Multiplying scores by probabilities and adding them together, you get an expected value of 25, with a 2/3rds chance of scoring nothing and 1/3rd chance of scoring something. Note that the expected value, being an average, is not actually a possible score--the fewest points you can score at once is 50! But that fractional value represents the probability that you could get nothing.

With 2 dice, things get a little more complicated. If we labelled the dice, there would be 36 different possible rolls; but in fact, the order or identity of the dice doesn't matter, so rolling 1,2 is the same, as far as scoring goes, as rolling 2,1. To make scoring easier, we can sort the dice results to put every possible roll in a standard form, and then account for the multiple ways to get each of those collections of numbers. There is one out of 36 ways to get two 1s (200 points) or two 5s (100 points). There are 8 ways to get exactly one 1 (100 points) or exactly one 5 (50 points). There are two ways to get one 1 and one 5 (150 points). And that leaves 16/36 ways to roll a zero. That works out to an expectation value of 50 points, a 4/9ths chance of rolling a zero, and 1800 as the sum of all possible scores.

To figure out the numbers for more dice, we can get a computer to enumerate all of the possible combinations of die values and automatically score them.

At three dice, we can start getting higher scores. There 6^3, or 216, labeled rolls, which collapses down to 56 distinct collections of die values. 60 of those rolls produce no score, giving a 5/18th (or approximately 27.8%) chance of rolling a zero, an expectation value of ~86.81, and 18750 as the sum of all possible scores.

At 4 dice, there are 1296 labelled rolls but only 126 distinct rolls. 204/1296 rolls are zeros (17/108ths or ~15.7%), the expectation value is ~141.3, and the sum of scores is 183150.

At 5 dice, there are 7776 labelled rolls, 252 distinct rolls, and 600/7776 zeros (25/324ths or ~7.7%). The expectation value is ~215.5, and the sum of scores is 1675800.

With a full 6 dice, there are 46,656 labelled rolls, 462 distinct rolls, and 1080/46,656 (5/216ths or ~2.3%) zeros. The expectation value is 402.2, and the sum of score is 18763500.

To be really accurate, we should account for the fact that scoring on six dice requires re-rolling, and the results of the next roll should be accounted for in the final value of that choice... and recursively, as you might score on all sixes again, with increasingly small probabilities but potentially forever. However, for practical purposes, that turns out not to matter!

Now, these initial values are only the expectation for a single roll with an initial score of zero. If you have already accumulated some previous score (which must be the case for rolling any number of dice other than six), then all of the zero-scoring rolls actually cost you negative points. Which means the expectation value of the next roll can become negative, depending on a combination of the probability of rolling a zero and the value of your existing score.

We can figure out the expectation value of a roll-in-context by replacing the zero scores with the negative of the current score, and recalculating the average. So, if 'a' is the current accumulated score, 'p' is the total number of labeled rolls, 'z' is the number of zeros, and 's' is the sum of positive scores, then the contextual expectation value can be calculated as

e = (s-az)/p

If the expectation value is positive, that means it's a good idea to roll. In such a situation, even if you have a less than 50% chance of scoring, the amount that you could score is high enough that it is worth the risk anyway. But, we can make the calculation simpler by doing a little algebra to see what the accumulated score must be to make the expectation value zero:

0 = (s-az)/p

0 = s/p-az/p

az/p = s/p

az = s

a = s/z

We can work that out once for each number of dice, and get the following table of cutoff scores:

1 - 37.5

2 - 112.5

3 - 312.5

4 - ~897.79

5 - 2793.0

6 - ~17373.6

And now you can see why the rule about always re-rolling when you have used all six dice doesn't matter--in order to accumulate a single-round score higher than 17,373, you would need to, at minimum, make scoring rolls on all six dice 4 times sequentially (assuming you get 4 sequential straights, or 3 straights and a set of triples or pairs). That is an incredibly unlikely event, so in practice you will never accumulate a single-round score higher than 17,373, which means the expectation value of rolling six new dice is always positive, and you should always do it, even if it wasn't required!

At the other end, it is impossible to get down to 2 dice left without accumulating a score of at least 200; to get in that situation, you must have scored 50, losing one die each time, four times in a row, or scored 2 50s, losing 2 dice each time, twice. Any other set of rolls will leave you with a higher accumulated single-round score. Thus, in any game scenario where you could roll 1 or 2 dice, the expectation value is always negative, and we can conclude that you should never roll fewer than 3 dice unless someone else has already reached 5000 and you are on your last turn, in which case you just keep rolling as long as you can until you either get a higher score or go bust.

Since scores only come in multiples of 50, we can thus simplify the preceding table as follows, rounding the cutoffs to the nearest 50 and eliminating the 3 trivial cases:

3 dice - 300 (~27.8% chance of loss)

4 dice - 850 (~15.7% chance of loss)

5 dice - 2750 (~7.7% chance of loss)

If you have 1 or 2 dice, always stay. For 2 dice, the chance of loss is less than 50%, but the potential gains are so small as to be not worth it. If you have 6 dice, always roll (you have to anyway). If you have 3, 4, or 5 dice, stay if your accumulated score is above the cutoff in the table; otherwise, roll. So, it turns out, executing the optimal strategy just requires memorizing 3 numbers!

Wednesday, January 10, 2024

A Language of Graphs

Recently I got thinking about syntax trees, and what a purely-written language might be like that was restricted to the syntactic structures available to linearized spoken languages and made those structures explicit in a 2D representation. Or in other words, a graphical (double-entendre fully intended) language consisting of trees--that is, graphs in which there is exactly one path between any two nodes/vertices--whose nodes are either functional lexemes roughly corresponding to internal syntactic nodes and function words in natural languages, or semantic lexemes corresponding to content words--but where, since the "internal" structure is made visible, content words are not restricted to leaf nodes!

Without loss of generality, and for the sake of simplicity, we can even restrict the visual grammar to binary trees--which X-bar theory does for natural languages anyway--although calling them "binary" doesn't make much sense if you don't display them in the traditional top-down tree format with a distinguished root node, since internal nodes can have up to three connections--one "parent" and two "daughters", which are a natural distinction in natlang syntax trees but completely arbitrary when you aren't trying to impose a reversible linearization on the leaf nodes! So, in other terms, we can say that sentences of this sort of language would consist of tree-structured graphs with a maximal vertex degree of 3.

I am hardly the first person to have thought up the idea of 2D written language, but a common issue plaguing such conlang projects (including their most notable example, UNLWS) is figuring out how to lay them out in actual two dimensions; general graphs are three-dimensional, and squishing them onto a plane often requires crossing lines or making long detours, or both. Even when you can avoid crossings, figuring out the optimal way to lay out a graph on the page is a very hard computational problem. Trees, however, have the very nice property that they are always planar, and trivial to draw on a 2D surface; if we allow cycles, or diamonds (same thing with undirected edges), it becomes much more difficult to specify grammatical rules that will naturally enforce planarity--which is whay I've yet to see a 2D language project that even tries. Not only is it easy to planarize trees, there are even multiple ways of doing so automatically, so one could aspire to writing software that would nicely lay out graphical sentences given, say, parenthesized typed input. (Another benefit of trees is that they can be fully specified by matched-parentheses expressions, w we could actually hope to be able to write this on a keyboard!) And then we can imagine imposing additional grammatical rules and pragmatic implications for different standard layout choices--what does it mean if one node is arbitrarily specified as the root, and you do lay it out as a traditional tree? What if you instead highlight a root node by centering it and laying out the rest of the sentence around it? What if you center a degree-two node and split the rest of the sentence into two halves splayed out on either side?

The downside of trees is that semantic structure is not limited to trees; knowledge graphs are arbitrary non-planar graphs. But, linear natural languages already deal with that successfully; expanding our linguistic from a line to a tree should still reduce the kinds of ambiguities that natural languages handle all the time. So, this sort of 2D language will require the equivalent of pronouns for cross-references; but they probably won't look much like spoken pronouns, and there's a lot more potential freedom in where you decide to make cuts in the semantic graph to turn it into a tree, and thus where pronouns get introduced to encode those missing edges, and those choices can probably be filled with pragmatic meaning on top of the implications of visual layout decisions.

Now, what should words--the nodes in these trees--look like? It seems to be common in 2D languages for glyphs to be essentially arbitrary logographs, perhaps with standard boundary shapes or connection point shapes for different word classes. The philosophy behing UNLWS, that it should take maximal advantage of the native possibilities of the written visual medium, even encourages using iconic pictoral expressions when feasible. But that's not how natural languages work; even visual languages (i.e., sign languages), despite having more iconicity on average than oral languages, have a phonological system consisting of a finite number of basic combinatorial units that are used to build meaningful words, analogous to the finite number of phonemes that oral languages have to sring together into arbitrary words. Since we've already got a certain limited set of graphical "phonological" items necessary for drawing syntax trees, and constraint breeds creativity, why not just re-use those?

Here we have an idealized representation of the available phonemes / graphemes / glyphemes: a vertex with one adjoining edge, a vertex with 2 adjoining edges, and a vertex with 3 adjoining edges. On the left, the three -emic forms. On the right, the basic allographic variants. In all cases, absolute orientation and chirality don't matter--if you mirror the "y" glyph, it is still the same glyph. Note that "graph" and "grapheme" are standard terms in linguistics for the written equivalents of "phones" and "phonemes", but that's gonna get really confusing when we're also talking about "graphs" in the mathematical sense. "Glyph" also has a technical meaning, but I am going to repurpose it here to talk about the basic units of this 2D language. So, we have glyphs, glyphemes, and alloglyphs, which are composed into graphs to form lexemes and phrases. Having only 3 glyphemes to work with may seem extremely limiting, but the expanded combinatorial possibilities in 2D vs. 3D make up for it.

While keeping syntax restricted to tree structures is the core idea of this language experiment, lexical items, which don't need to be invented and laid out on the fly, can be more general; we could allow them to be any planar graph. And just as syntax trees can be laid out in many different ways, we could say that lexical items are solely defined by their abstract graphs, which can also laid out in many ways. But, it turns out that recognizing the topological equivalence of two graphs laid out in different ways is a computationall hard problem! If this language is to be usable by humans, that simply will not do. Thus, the layout for lexical items should be significant, up to rotation and reflection equivalence, so that their visual representations are easily recognizable. This doesn't require introducing any additional phonemic elements--the arrangement of phonemes and letters in one-dimensional natural language words also affects meaning, but we don't consider it "phonemic". Despite the Monty Python sketch about the guy who speaks in anagrams, spoken words are not just bags of sounds in arbitrary order, and written words are not just bags of letters--that's why, for example, "bat" and "tab" mean different things, and "bta" just isn't an English word at all. The spatial arrangement--which, in the case of natural language, works out to just linear order--matters a lot, and that sketch only works because it's precisely constructed to use close-enough anagrams with a lot of supporting context. So, what sort of glyphotactic rules should we have to determine the valid and recognizable arrangements of glyphs in 2D space?

With 3 edges per vertex, the most natural-seeming arrangement is to spread them out at 120 degree angles, and degree-2 vertices would sit nicely in a pattern with 180-degree angles (although we probably want to minimize those, since vertices are more noticeable if they are highlighted by a corresponding angle, rather than a straight line through them). That suggests a triangular grid, which can accomodate both arrangements. The idealized glyphemes and alloglyphs shown above are drawn assuming placement on such a triangular grid, with 60, 120, and 180-degree angles. (I will continue to refer to the features of glyphs in terms of 60, 120, and 180-degree angles, but these, too, are idealizations; in practice, non-equilateral grids might be used for artistic or typographic purposes--e.g., as an equivalent to italics--in which case these angle measurements should be interpreted as representing 1, 2, or 3 angular steps around a point in the grid.) So, words shouldn't be completely arbitrary planar graphs--they should be planar graphs with a particular layout on a triangular grid.

It does not make sense to extend a single grid across an entire phrase or sentence; the boundaries of trees grow exponentially, so you'd need a hyperbolic grid to do it in the general case, and hyperbolic paper is hard to come by (although laying out a sentence on a single common grid within, say, a Poincare-disc model of the hyperbolic plane might be a neat artistic exercise). Maintaining a grid within a word is sufficient to maintain graphical recognizability, and breaking the grid is one signal of the boundary between lexicon and morphology on one side and syntax on the other.

Making an analogy to chemistry, I feel, as an aesthetic preference, that word-graphs should have a minimal amount of "strain". That is, glyphotactically valid layouts should use 120-degree angles wherever possible, and squish them to 60 degrees or spread them to 180 degrees only where necessary. So, where is it necessary?

60-degree angles should only occur on 3-vertex triangles, the acute points of 4-vertex diamonds, or as paired 60-degree angles on the interior of a hexagon.
180-degree angles should only occur adjacent to 60-degree angles, or crossing vertices at the centers of hexagons.

Additional restrictions:

All edges should be exactly one grid unit long--i.e., there are no words distinguished by having a straight line across multiple edges, vs. two edges with a 180-degree angle at a vertex in the middle.
Syntactic connections must occur on the outer boundary. I.e., you can't have a word embedded inside another word.
All vertices must have a maximum of three adjacent edges; thus, any word must have at least one exterior vertex with degree 2 or 1, to allow a syntactic adge to attach to it.
As they are nodes in a binary syntax tree, words can have at most 3 external syntactic connection points.

With those restrictions in place, here are all of the possible word skeletons of 2, 3, or 4 vertices:

I refer to these "word skeletons" rather than full words because they abstract away the specification of syntactic binding points--and the choice of binding points may distinguish words (although they should probably be semantically-related words if I'm not being perverse!) Including all of the possible binding point patterns for every skeleton massively increases the number of possibilities, and it quickly gets impractically tedious to enumerate them all and write them down. Here are all of the word skeletons with 5 vertices:

And here are all of the word skeletons with 6 vertices:

And the number of possible 7-vertex words is.... big. Counting graphs turns out to also be a hard problem, so I can't tell you exactly how fast the number of possible words grows, but it grows fast.

Now, I just need to start actually assigning meanings to some of these....

Tuesday, November 12, 2019

Asymmetric Cryptography For Those Who Only Know Arithmetic

This is the second in a series of posts adapted from old Facebook Notes. The original was published on February 8th, 2011.

Let's start with basic substitution cryptography, and take baby steps all the way to asymmetric public key cryptography.

The simplest sort of cipher that nearly everyone is probably familiar with are alphabetic rotation ciphers (like rot-13). The key is how many letters you're going to rotate by; i.e., for rot-13, you rotate by 13 letters, so to encrypt an 'a', you count 13 spaces along the alphabet from 'a' to get 'n', to encrypt a 'w' you count 2 spaces down to the end of the alphabet and then the remaining 11 spaces down from 'a' to get 'j', and so forth. Mathematically, this operation is known as "modular addition", or "clock addition" (because numbers wrap around like the hours on a clock face). Except in the special case of rot-13 using the 26-letter roman alphabet, where adding 13 twice gets you back to where you started, the decryption function is different from the encryption function--you use modular subtraction instead of addition, counting backwards instead of forwards to reverse the encryption.

Now, rotation ciphers are so simple that they're only really useful at all if your potential attackers don't know that you're using it, but for most ciphers of this general category you assume that the functions are well known and the basis for security is keeping the key secret. If anyone finds out your key, you're screwed. And keeping the key private is a big problem with this sort of encryption (which was pretty much the only kind of encryption around for most of history), because you have to share the key with the person you want to talk to, and the key has to be sent "in the clear" (because you haven't shared it yet, so you can't encrypt it).

Not the features of this kind of cryptography: the encryption and decryption procedures are public (everyone knows how to count letters in the alphabet), they are asymmetric (encryption involves adding, while decryption involves subtracting), and the key is private and symmetric (encryption and decryption use the same key).

Now, instead of keeping my key private, I could not worry about the key and just keep the decryption function private. In fact, I might just do away with a key entirely; an example of this is the use of invisible ink (although that's really a better example of steganography than encryption). The message can only be decrypted if you know the secret decryption function (chemical process) for the relevant type of ink. And if I'm content with one-way communication, I don't have to share the secret. I can make the key and the encryption function completely public, and just keep the really-hard-to-find decryption function to myself, and then anybody who wants to can send me a secure message that only I can decrypt. If I want to go two-way, then my partner will have to come up with his own set of public encryption and private decryption procedures. Using a rotation cipher isn't very practical here, because figuring out how to reverse modular addition with modular subtraction is really easy, so your secret function wouldn't stay secret for very long. You can imagine functions that are harder to figure out, but since finding new mathematical functions with the right properties is rather difficult, this system is not very practical. It does not make sense for everyone who wants to send private messages to have to do serious research in pure mathematics!

But what if we made the key asymmetric instead of (or in addition to) the encryption and decryption procedures? With a rotation cipher, one way to do that would be to say that if, e.g., my encryption key is 5, then my decryption key is 21, and we're always going to use modular addition, symmetrically, to encrypt and decrypt. You add 5 to each letter to get the ciphered text, and then you add 21 to each ciphered letter (completing a 26-space trip around the alphabet) to get back to where you started. Now, you can do one of two things:

You can publish your encryption key, and keep the decryption key secret. That means other people can encrypt messages that only you can read.
You can publish your decryption key, and the encryption key secret. That means you can write messages (digital signatures) that other people can prove came from you, because decrypting it with your published decryption key works, and no one else is capable of doing that kind of encryption.

Now, that eliminates the key sharing problem! You have to keep something secret, but it really can be secret--you never have to share your private key with anyone. But, by using your public key, your friend can write messages that only you can read, and by using your friend's public key, you can write messages that only they can read. And it's easy to come up with new keys--much easier than inventing whole new mathematical functions!

Of course, a system based on simple modular arithmetic still isn't all that secure, because finding an additive inverse is *really easy*; in normal arithmetic, you just stick a negative sign in front (i.e., the additive inverse of 5 is -5), and in modular arithmetic you just subtract from the size of the alphabet (so the inverse of 5 in an alphabetic rotation cipher is 26-5 = 21). As soon as you publish the fact that your decryption key is 21, anyone can do some simple subtraction and figure out that your private encryption key is 5, or vice-versa. But, there are lots of mathematical operators and functions which have the nice property that some numbers have inverses in relation to those operators--not just addition. For example, the inverse of 2 over multiplication is 1/2, etc. Now, finding multiplicative inverses isn't all that difficult either, but it's a little harder than just sticking a minus sign out front, so let's run through an example using multiplication as the encryption function:

Say my public key is 2. Multiplying letters doesn't make immediate sense like counting them does, so we need to convert our letters into numbers first; fortunately, computers already represent letters as numbers anyway. So, suppose you want to secretly send me the letter 'c', which we'll assign the number 3. So, you use the encryption function (*) and the key (2) to calculate the ciphertext (3*2=6), which you send to me.
My private key is the inverse of 2-- 1/2.
I get the ciphertext 6 from you, and use the encryption function (*) with my private key (1/2) to recover the plaintext (6*1/2 = 3), which I can then interpret as the letter 'c'. Ta da! We have just accomplished encrypted communication without ever sharing a secret key.

By now, the basic proof-of-concept, that there is a way to communicate privately without ever having to arrange to share a secret with someone ahead of time, should be clear--as long as all of your adversaries are toddlers who haven't learned arithmetic. The multiplication-based system is still easy to crack, because everybody learns how to do division and multiply fractions in elementary school. But what if we did modular multiplication? Say I want to encrypt the letter 'n', which is the 14th letter of the alphabet; multiplying by my encryption key gets us 28--but there is no 28th letter of the alphabet! Instead, we wrap around and get the (28-26)=2nd letter of the alphabet--so 'n' encrypts as 'b'. What's the inverse of that? In this case, it turns out to be 20: 20*2=40, 40-26=14, 14='n'. That was a little harder to figure out, so we're getting somewhere... but it wasn't *that* hard to figure out (if it were, we wouldn't be able to figure out our own decryption keys ourselves, which would be kind of useless!), and it turns out that you aren't actually guaranteed to be able to find an inverse all the time--it's only guaranteed to work if the length of your alphabet is a prime number (which 26 is not!), so you have to pick your keys carefully, which makes it easier for an attacker to just try all of the possibilities until they guess the right one.

Real cryptosystems, therefore, get a little more complicated in a few ways. For one thing, although not all encryption functions depend on the alphabet being a prime number, most do require it to be special in some way; additionally, we want keys to be big numbers, in a large range (not just 1 to 26), so that they are harder to guess. Thus, we use more complicated ways of turning strings of letters into numbers before doing arithmetic on them. And for another thing, we have to be fairly clever about finding special mathematical functions (more complicated than just modular multiplication) which have a special property that makes it easy to generate a pair of numbers that are inverses of each other, but really hard to figure out the inverse given just one number. These are called "trapdoor functions"--it's easy to open the trapdoor from one side, but really hard to climb back out from the other side! But if you can find such a function (like, say, modular exponentiation over a range defined by multiplying two large prime numbers), so that you can't figure out what my private key is despite having the function and the public key, and then bam, you've got RSA public-key encryption & digital signing! If you know the prime numbers used to generate an RSA key, finding the inverse is trivial, so we can make keypairs for ourselves easily--but inverting a key without knowing how it was made requires factoring a large number, which is much, much more difficult than just doing division or multiplying fractions! And that is what keeps your messages over the internet safe.

NOTE: In these toy examples, and in actual RSA, the private key is the inverse of the public key, and the public key is also the inverse of the private key. This makes them interchangeable, and let's you do signing & encryption at the same time (that's generally considered a bad idea, but it can be done). However, while this seems intuitive and natural, it is not necessarily the case. There do exist functions for which a=inverse(b) does not imply that b=inverse(a), in which case the keys are not interchangeable.

Also note that, if you can factor large numbers easily, RSA becomes really easy to crack, just like our toy examples using addition and multiplication (that's what all the fuss about quantum computers is about). But while RSA was the first public-key encryption system invented, it's far from the only one nowadays, and there are plenty of more complicated encryption systems that rely on functions not-quite-as-accessible to an audience who only know arithmetic in which finding the inverses of public keys is much, much harder still.