Sunday, October 20, 2024

On the Tjugem Alphabet & Font

This Bluesky thread with Howard Tayler reminded me that, although I posted progress updates about it on Twitter back in the day, I never did a comprehensive write-up on how the thing works.

    A good place to start is this Reddit comment on Toki Suli.Yeah, it's not Tjugem, but phonetically it works the same way. Quote:

in the WAV files, the 'm' sounds seem to be going up rather than down, such as with "mi", even though the "m" is supposed to be grave. sharp and acute sounds seem to go down rather than up, such as in "tu".

is the linguistic term for "downward" vs "upward" the opposite of what i'd expect from a western music theory perspective? or am i maybe missing something as i'm listening to the files?

    Yes, Reddit user, you were missing something! Because in the phonetics of human whistle registers, "grave" and "acute" are positions. not motions. So, if you move from a vowel to a grave consonant, the formant will go down in pitch--from a middle-pitch vowel locus to a low-pitch consonant locus. But when going from a grave consonant to a vowel, pitch will go up--from a low-pitch consonant locus to a middle-pitch vowel locus. An "m" in between two vowels willl be realized by a down-then-up formant motion, while a "t" between two vowels will be realized by an up-then-down motion.

    Now, because whistled speech only has a single formant, it turns out to be not-unreasonable to write whistled speech as an image of the formant path on a spectrogram. You can just write a continuous line with a pen! Or, almost. There are some details--like amplitude variation--that are lost if you try to write with a ballpoint, and still difficult to get right if you write with a wide-tip marker or fountain pen. Thus, a few extra embellishments and decorations are useful, but that is the basic concept: each letter is just the shape that that letter makes on a spectrogram when pronounced. And with just that background, you should be able to start to make sense of this chart of Tjugem letters, as they would be written on lined paper:


    The correspondence between Tjugem glyphs and the standard romanization is as follows:

   
    Keep in mind, however, that the actual phonemes are whistles--not sounds that are representable with the IPA, despite the fact that the romanization is designed to be pronounceable "normally" if you really want to. And for the sake of space, only the allographs for one vowel environment are shown for each consonant. The G glyph is not so much a "glyph" as a lack of one, which is why it does not show up in the first image; acoustically, the phoneme is just a reduction in the amplitude of a vowel, represented by a break in the line. Thus, any line termination could be interpreted as a G. That necessitated the introduction of the line termination glyphs, which have no phonetic value but just indicate that a word ends with no phonemic consonant. The above-line vs. below-line variants of the Q glyph are chosen to visually balance what comes before or after them. Additionally, the "schwa" vowel (romanized as "E") is not represented by any specific glyph. The existence of a schwa sound in the first place is an unavoidable artifact of the fact that transitioning between certain consonants requires moving through the vowel space, but which vowel loci end up being hit isn't actually important. So, in the Tjugem script, the schwa just turns into whatever stroke happens to make the simplest connection between adjacent consonants.

    You shouldn't be expected to always be writing on lined paper, which explains the extra lines--a mark above or below a vowel segment tells you whether it is a high vowel or a low vowel, for those curves which could be ambiguous. And the circular embellishments help to distinguish manner of articulation for different consonants, which have the same spectral shape but different amplitude curves, which would otherwise have to be indicated by varying darkness or line weight. But note in particular that every consonant comes in a pair of mirror-symmetric glyphs: one moving from the vowel space to the consonant locus, and one moving from the consonant locus to the vowel space. And there are three different strokes for each half-consonant depending on which vowel is next to it! Making for a total of six different strokes for every consonant, because the actual spectral shapes of consonants change depending on their environment! It's allophony directly mirrored in allography.

    This makes creating a font for Tjugem rather... complicated. Sure, we could assign every allograph to a different codepoint, but that would be very inconvenient to use. It would be nice if we could just type out a sequence of phonemes, one keystroke per phoneme, and have the font take care of the allographic variation for us! Is that sort of thing possible? Yes! Yes, it is!

    The individual letter forms get assigned to a list of display symbols, specifying every possible consonant/vowel pairing:
# i_t i_d i_n i_k i_g i_q i_p i_b i_m
# a_t a_d a_n w_a_k j_a_k a_g w_a_q j_a_q a_p a_b a_m
# u_t u_d u_n u_k u_g u_q u_p u_b u_m
# t_i d_i n_i k_i g_i q_i p_i b_i m_i
# t_a d_a n_a k_a g_a q_a p_a b_a m_a
# t_u d_u n_u k_u g_u q_u p_u b_u m_u
# i_i j_a j_u_a j_u
# u_u w_a w_i_a w_i

and the slots for the romanized letters that we actually type out (a b d e g i j k m n p q t u w) are left blank. Contextual ligatures are then used to replace the sequence of input phonemes with an expanded sequence of intermediate initial, final, and transitional symbols, which are then finally substituted by the appropriate display symbols, which are then used to look up the correct alloglyphs. Then, it we update the boring straight-ruled glyph set with a slanted, more flowy-looking version, we can get a calligraphic font slightly reminiscent of Nastaliq, where lines can overlap each other because the ornamentation disambiguates; the Tjugem Tadpole script:



A Brief Note on John Wick

The actual Russian dialog in the John Wick movies is, uh... not great? But, the fact that John Wick is diegetically fluent in Russian ends up kicking off the plot of the first movie, when Russian gangster Iosef tries to buy John's car. Iosef asks how much, John says it ain't for sale, then, from  the script:

                                              IOSEF
                         (in Russian, subtitled)
                     Everything's got a f[*****]g price.
                         
                                              JOHN
                         (in Russian, subtitled)
                     Maybe so... but I don't.

          Taken aback by John's fluency, he watches as John enters the
          vehicle, guns the engine, and drives off.

(Censored for sensitive eyes.)

However, that's not actually how it was filmed! The Russian dialog for that scene in the movie is as follows (or at least, my interpretation of it; the pronunciations are bad):

                                              IOSEF
                     У всего, сука, своя цена.
                         
                                              JOHN
                     А у этой суки нету.
This is closed-captioned as
                                              IOSEF
                     Everything's got a price, b[***]h.
JOHN Not this b[***]h.

Which is not word-for-word, but essentially accurate. Given that Iosef did not expect John to understand him, we have to assume that his switch into Russian was expressing frustration to himself, even though it contains a vocative, clearly addressing the sentiment to John. Possibly, he was going to switch back into English to attempt another pitch, after reminding himself that everything has a price. And if that's what had happened, then this insertion of Russian dialog would've been just a bit of implicit character exposition, with a bit of an Easter Egg for a Russophone audience. But John responding at all suddenly changes the dynamic. That's also an implicit character exposition moment--we learn that John, despite being American, speaks Russian for some reason, which is further explicated later on. But in the scene, Iosef realizes that John must have understood him, and knows that Iosef was insulting him!  That turns the outcome of the interaction into a face-threatening issue. Now, in addition to still wanting the car which John has denied him, Iosef has to back up the implied threat of his insult to save face.

The change in dialog from the script also adds a layer of double meaning, because John has his (female) dog with him in the car. Thus, Iosef could be interpreted as insulting the dog (which--spoiler alert--he later kills), which John has a strong emotional attachment to. (It turns out the Russian word for "female dog" has exactly the same insulting double-meaning that it does in English!) Out of context, John's reply could even be interpreted as claiming that his dog is not for sale, as opposed to his car--and both interpretations are true! The same cannot be said about Iosef's statement, but the oblique association is a nice addition to the scene as filmed.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Wednesday, October 9, 2024

Newtonian Mechanics in 4+1 Dimensions

In the higher-dimensional universe of the world of Ord, most of Newtonian mechanics generalizes to 4 spatial dimensions (and 5 total dimensions when you include time--hence the 4+1 in the title) just fine. 

 is still true when F and a are 4-component vectors instead of 3-component vectors, and so is 
 for linear momentum. Squaring vectors still produces scalar quantities, so

KE = 1/2mv^2
is still true, and 
still works just fine. Rotation occurs in a plane with some fixed center for all numbers of dimensions, so the formula for moment of inertia in a given plane, 
is also still valid.

But when it comes to angular momentum and torque, we've got a problem. 

and 
contain cross products, which only exist in exactly 3 dimensions. Usually, these are explained as creating a unique vector that is perpendicular to both of the inputs; but in less than 3D, there is no such vector, and in 4 dimensions or more, there is a whole plane (or more) of possible vectors. In reality, angular momentum and torque are not vectors--they are bivectors, oriented areas rather than oriented lines, which exist in any space of more than 1 dimension. It just happens that planes and lines are dual in 3D--for every plane, there is a normal vector, and for every vector there is a perpendicular plane, so we can explain the cross product as producing the normal vector to the plane of the bivector.

In 4D, you can't implicitly convert a bivector into its dual vector and back, so we have to deal with the bivectors directly. Bivectors are formed from the outer product or wedge product (denoted ∧) of two vectors, or the sum of two other bivectors. Thus, we can write the angular formulas for a point particle in any number of dimensions as 

and 
And those a good for orbital momentum and torque about an external point on an arbitrary body as well. To get spin, we need a sum, or an integral, all of the components of an extended body. That means we need to be able to sum bivectors! That's easy to do in 2D and 3D; in 2D, bivectors can be represented by a single number (their magnitude and sign), and we know how to add numbers; in 3D, as we saw, bivectors can be uniquely identified with their normal vectors, and we can add normal vectors. In either case, you always get a simple bivector (also called a blade) as a result; i.e., for any bivector in 2D and 3D space, you can find a pair of vectors whose wedge product is that bivector. But in 4 dimensions and above, that is no longer true. This is because, once you identify a plane in 4+ dimensions, there are still 2 or more dimensions left over in which you can specify a second completely perpendicular plane which intersects the first at exactly one point (or zero or one points in 5+ dimensions), and there is no set of two vectors that can span multiple planes. This also means that there can be two simultaneous independent rotations, with unrelated angular velocities, and the formulas for angular momentum and torque must be able to account for arbitrary complex bivector values. You could, of course, just represent sums of bivectors as... sums of bivectors, with plus signs in between them. But that's really inconvenient, and if you can't simplify sums of bivectors, then those formulas aren't very useful for predicting how an object will spin after a torgue is applied to it!

Fortunately, even though the contributions of multiple not-necessarily-perpendicular and not-necessarily-parallel simple bivectors will not always simplify down to a single outer product, it turns out that in 4 dimensions, any bivector can be decomposed into the sum of two orthogonal simple bivectors--and most of the time, the result is unique. Unlike vector / bivector addition in 3D, this is not a simple process of just adding together the corresponding components, but there are fixed formulas for computing the two orthogonal components of any sum of two bivectors. They are complicated and gross, but at least they exist! So, we can, in fact, do physics!

The result of bivector addition does not have a unique decomposition exactly when the two perpendicular rotations have exactly the same magnitude. This is known as isoclinic rotation. With isoclinic rotations, you can choose any pair of orthogonal planes you like as a decomposition. Once you pick a coordinate system to use, there are exactly 4 isoclinic rotations, depending on the signs of each of the two component bivectors. In isoclinic rotation, every point on the surface of a hypersphere follows an identical path, and there is no equivalent of an equator or pole. Meanwhile, simple rotation results in a circular equator, but also a circular pole--i.e., a circle of points that remain stationary as the body spins. That circle is also the equator for the second plane of rotation, so the ideas of "equator" and "pole" become effectively interchangeable for any object in non-isoclinic complex rotation. One plane's equator is the other plane's pole, and vice-versa.

Looking ahead a little bit to quantum mechanics, particle spin in 4D is still quantized, still inherent, still divides particles into fermions and bosons--but has two components, just like the angular momentum of a macroscopic 4D object. Whether or not a particle is a boson or a fermion depends on the sum of the magnitudes of the two components. If the sum is half-integer, the particle is a fermion. If the sum is integer, then its a boson. Thus, bosons can (but need not necessarily) have isoclinic spins, and the weird feature of quantum mnechanics that the spin is always aligned with the axis you measure it in would not be so weird, because that's the case for isoclinic rotation of macroscopic objects, too! Fermions, on the other hand, can never have isoclinic spins! Because if one component has a half-integer magnitude, the other must not. In both cases, however, there end up being four possible spin states for all particles with complex spins, allowing fermions to pack more tightly than they do in our universe; 2 spin states (as in our universe) for particles with simple spins; and of course only a single spin state for particles with zero spin.

Monday, August 12, 2024

Mapping out Tetrachromat Color Categories

tetrachromacy is kind of convenient because you still have only 2 dimensions of hue, so you can actually diagram out what the color regions are, and just tell people "y'all already know how brightness and saturation work, so I don't need to put those on the chart".

    But I didn't actually try to make such a diagram. However, the last two episodes of George Corley's Tongues and Runes stream on Draconic gave me a solid motivation to figure out how to do it.

    Any such diagram will have to use some kind of false-color convention. We could try subdividing the spectrum to treat, e.g., yellow or cyan like a 4th physical primary for producing color combinations, and that might be the most accurate if you're trying to represent the color space of a tetrachromat whose total visual spectrum lies within ours, just divided up more finely--but the resulting diagrams are really hard to interpret. It's even worse if you try to stretch the human visible spectrum into the infrared or ultraviolet, 'cause you end up shifting colors around so that, e.g., what you would actually perceive as magenta ends up represented as green on the chart. The best option I could come up with was to map the "extra" spectral color--the color you can't see if it happens to be ultraviolet or infrared--to black, and use luminance to represent varying contributions of that cone to composite colors. Critically, if you don't want to work out the exact spectral response curves for a theoretical tetrachromatic creature to calculate their neurological opponent channels, you can map out the color space in purely physical terms, like we do with RGB color as opposed to, e.g., YCrCb or HSV color spaces. That doesn't require any ahead-of-time knowledge of which color combinations are psychologically salient.

    My first intuition on how to map out the 2D hue space was to arrange the axes along spectral hue--exactly parallel to the human sense of hue--and non-spectral hue, which essentially measures the distance between two simultaneous spectral stimuli. As the non-spectral hue gets larger, the space that you have to wiggle an interval back and forth before one end runs off the edge of the visible spectrum shrinks, so the space ends up looking like a triangle:

    This particular diagram was intended for describing the vision of RGBU tetrachromats, with black representing UV off the blue end of the spectrum; you could put black representing IR at the other end, but ultimately the perceivable spectrum ends up being cyclic so it doesn't really matter. If you want the extra cone to be yellow or cyan-receptive, though.. eh, that gets complicated, and any false-color representation will be bad. But, that highlights a general deficiency of this representation: it does a really bad job of showing which colors are adjacent at the boundaries. The top-edges spectrum is properly cyclic, but otherwise the edges don't match up, so you can't just roll this into a cone.

    Another possible representation is based on the triangle diagram of trichromat color space:

    Each physical primary goes at the corner of a simplex, and each point within the simplex is colored based on the relative distance from each corner. This shows you both hue, with the spectrum running along the exterior edges, and saturation, with minimal saturation (equal amount of all primaries) in the center. We can easily extend this idea to tetrachromacy, where the 4-point simplex is a tetrahedron:

    The two-dimensional hue space exists on the exterior surface and edges of the tetrahedron, with either saturation or luma mapped to the interior space. Note that one triangular face of the tetrahedron is the trichromat color triangle, but the center of that face no longer represents white.If we call the extra primary Q (so as not to bias the interpretation towards UV, IR, or anything else), the the center of the RGB face represents not white, but anti-Q, which we percieve as white, but which is distinct from white to a tetrachromat. This is precisely analogous to how the center of the dichromat spectrum is "white", but what a dichromat (whose spectral range is identical as hours) sees as white could be any of white, green, or magenta to us. Similarly, what we see as white could be actual 4-white, or anti-Q.

    Since the surface of a tetrahedron is still 2D, we can unfold the tetrahedron into another flat triangle:

    Here, in is unfolded around the RGB face, but that is arbitrary--it could equally well be unfolded around any other face, with a tertiary anti-color in the center, and that would make no difference to a tetrachromat, you as spinning a color wheel makes no difference to you. Note that, after unfolding, the Q vertex is represented three times, and every edge color is represented twice--mirrored along the unfolded edges. This becomes slightly more obvious if we discretize the diagram:

    Primary colors at the vertices, secondary colors along the edges, tertiary colors (which don't exist in trichromat vision) on the faces. This arrangement, despite the duplications, makes it very easy to to put specific labels on distinct regions of the space--although the particular manner in which the color space is divided up is somewhat artificial. And the duplications actually help to show what's going on with the unfolded faces--yes, the Q vertex shows up three times, but note that the total area of the discretized region around the Q vertex is exactly the same size as the area around the R, G, and B vertices.

    If we return to the trichromat triangle, note that you can obtain a color wheel simply be warping it into a circle; the spectrum of fully-saturated hues runs along the outside edge either way. Similarly, we can "inflate" the tetrahedron to get a color ball.

    If we want it flattened out again, any old map projection will do, but we have to keep in mind that the choice of poles is arbitrary; here's the cylindrical projection along the Q-anti-Q axis:

    And here's a polar projection centered on anti-Q:

    This ends up looking quite a lot like a standard color wheel, just extended past full saturation to show darkening as well as lightening; note the fully saturated ring at half the radius. However, the interpretation is quite different; remember, that center color isn't actually white. True tetrachromat white exists at the center of the ball, and doesn't show up on this diagram. And the false-color black around the edge isn't just background, it's the Q pole. If you need extra help to get your brain out of the rut of looking at this as a trichromat wheel, we can look at 7 other equally-valid polar projections that show exactly the same tetrachromatic hue information:

The Q pole.
The B pole
The anti-B pole.
The R pole
The anti-R pole
The G pole
The anti-G pole
(I probably should've done some scaling for equal area on these; the opposite poles end up looking like they take up way more of the color gamut than they actually do, and the false-color-black Q pole ends up getting washed out as a result. But I don't really expect anybody to use these alternate projections for labelling regions of hue--they're just to help you understand that the space really is a sphere, not a wheel!)

    And we could produce alternately-oriented cylindrical projections as well, if we wanted to.

    Of course, the full tetrachromat color space still contains two more whole dimensions--saturation and luminosity. But those work exactly the same way as they do for trichromats. Thus, if you want to create separate named color categories for tetrachromatic equivalents of, say, brown (dark orange) or pink (light red), you can still place them on the map by identifying the relevant range of hues and then just adding a note to say, e.g., "this region is called X when saturated, but Y when desaturated".

    Now, go forth and create language for non-human speakers with appropriate lexical structure in color terms!

Friday, August 9, 2024

Some More Thoughts on Toki Pona

What the heck is Toki Pona?

After publishing my last short article, several people expressed interest in a deeper analysis of various aspects of toki pona--among them, Sai forwarding me a request from jan Sonja for one conlanger's opinion about how to categorize toki pona. So, I shall attempt to give that opinion here.

The Gnoli Triangle, devised by Claudio Gnoli in 1997, remains the most common way to classify conlangs into broad categories.


Within each of these three categories are numerous more specific classifications, but broadly speaking we can define each one as follows based on the goals behind a conlang's construction:

Artlang: A language devised for personal pleasure or to fulfill an aesthetic effect.

Engelang: A language devised according to meet specific objective design criteria, often in order to test some hypothesis about how language does or can work.

Auxlang: A language devised to facilitate communication between people who otherwise do not share a common natural language. Distinct from a "lingua franca", a language which actually does function to facilitate communication between large groups of people without a native language in common.

Any given language can have aspects of all three of these potential categorizations. But, to figure out where in the triangle toki pona should fit, we need to know the motivations behind its creation.

To that end, I quote from the preface of Toki Pona: The Language of Good:

Toki Pona was my philosophical attempt to understand the meaning of life in 120 words. 

Through a process of soul-searching, comparative linguistics, and playfulness, I designed a simple communication system to simplify my thoughts.

I first published [Toki Pona] on the web in 2001. A small community of Toki Pona fans emerged.

In relation to the third point, in private communication jan Sonja confirmed that she never actively tried to get other people to use it. The community just grew organically. Even though the phonology was intentionally designed to be "easy for everyone", that tells me that the defining motivation behind toki pona was not that of an auxlang. In practice, it does sometimes serve as a lingua franca, but it wasn't designed with the intention of filling that role. It was designed to help simplify thoughts for the individual. Therefore, we can conclude that toki pona does not belong in the auxlang corner, or somewhere in the middle. A proper classification will be somewhere along the engelang-artlang edge--what I am inclined to call an "architected language" or "archlang" (although that particular term has been slow to catch on in anyone's usage but my own!)

So, what are the design criteria behind toki pona? Referring again to The Language of Good, toki pona was intended to be minimalist, using the "simplest and fewest parts to create the maximum effect". Additionally, "training your mind to think in Toki Pona" is supposed to promote mindfulness and lead to deeper insights about life and existence.

Toki Pona is also described as a "philosophical attempt"; can it then be classed as a "philosophical language"? I referred to it as such in my last post, and I think yes; it is, after all, the go-to example of a philophical language on the Philosophical language Wikipedia page! The term "philosophical language" is sometimes used interchangeably with "taxonomic language", where the vocabulary encodes some classification scheme for the world, as in John Wilkins's Real Character, but more broadly a philosophical language is a type of engineered language designed from a limited set of first principles, typically employing a limited set of elemental morphemes (or "semantic primes"). Toki Pona absolutely fits that mold--which means it can be legitimately classed as an engelang as well.

However, Toki Pona was clearly not constructed entirely mechanistically. It came from a process of soul-searching and playfulness, and encodes something of Sonja's own sense of aesthetics in the phonology. Ergo, it is clearly also an artlang. Exactly where along that edge it belongs--what percentage of engelang vs. artlang it is--is really something that only jan Sonja can know, given these categorial definitions which depend primarily on motivations. But I for one am quite happy to bring it in to the "archlang" family.

To cement the artlang classification, I'll return to the "minor complexities" I mentioned in the last article. To start with, what's up with "li"? It is supposed to be the predicate marker, but you don't use it if the subject is "mi" or "sina"... yet you do for "ona", so it's clearly not a simple matter of "pronoun subjects don't need 'li'". But, if we imagine a fictional history history for toki pona, it makes perfect sense. There is, after all, a fairly common historical process by which third person pronouns or demonstrative transform into copulas in languages that previously had a null copula. (This process is currently underway in modern Russian, for example.) So, suppose we had "mi, sina, li" as the "original" pronouns; "li", in addition to its normal referential function, ends up getting used in cleft constructions with 3rd person subjects to clarify the boundary between subject and predicate in null-copula constructions. Eventually, it gets re-analyzed as the copula, except when "mi" and "sina" are used because they never required cleft-clarification anyway (and couldn't have used it if they did, because of person disagreement), and a new third-person pronoun is innovated to replace it--which, being new, doesn't inherit the historical patterning of  "mi" and "sina", so you get a naturalistic-looking irregularity.

Or, take the case of "en". It seems fairly transparently derived from "and", and that is one of its glosses in The Toki Pona Dictionary, based on actual community usage, but according the The Language of Good it does not mean "and"--it just means "this is an additional subject of the same clause". Toki Pona doesn't really need a word for "and"; clauses can just be juxtaposed. and the particle "e" makes it clear where an object phrase starts so you can just chain as many of those together as you want with no explicit conjunction. So, we just need a way to indicate the boundary between multiple different subject phrases. You could interpret that as just as kind of marked nominative case--except you don't use it when there's only one subject. It's this weird extra thing that solves a niche edge case in the basic grammar. A strictly engineering-focused language might've just gone with an unambiguous marked nominative, or an explicit conjunction, but Toki Pona doesn't. It's more complicated, in terms of how the grammatical rules are specified, than it strictly needs to be.

And then, we've got the issue of numerals. All numerals follow the nouns which they apply to, whatever their function--but that means an extra particle must be introduced into the lexicon to distinguish cardinal numerals (how many?) from ordinal numerals (which one?). That is an unnecessary addition which makes the lexicon not-strictly-minimalist. The existing semantics of noun juxtaposition within a phrase make it possible to borrow the kind of construction we see in, e.g., Hawai'ian, where using a numeral as the head of a noun phrase forces a cardinal interpretation (something like "a unit of banana", "a pair of shoes", "a trio of people", etc.), while postposing a numeral in attributive position forces an ordinal interpretation ("banana first", "shoe second", "person third"). But Toki Pona doesn't do that!

Finally, as discussed previously, the lexicon is not optimized. These are all expressions of unforced character--i.e., artistic choice.

But what if Toki Pona were an auxlang? How would it be different?

Well, first off, we'd fix those previous complexities. At minimum, introduce an unambiguous marked nominative (which also helps with identifying clause boundaries), unify the behavior of pronouns and the copula / predicate marker, and get rid of the unnecessary ordinal particle. Then, we look at re-structuring the vocabulary. I collected a corpus of Toki Pona texts, removed all punctuation, filtered for only the 137 "essential words", and ended up with set of 585,888 tokens from which to derive frequency data. Based on this data set, 7 of the "essential words" appear zero times... which really makes them seem not that essential, and argues for cutting down the word list to an even 130. (Congratulations to jan Sonja for getting so close to the mark with the earlier choice of 120!) There are 72 two-syllable words that occur "too infrequently"--in the sense that there are three-syllable words that occur more frequently, and so should've been assigned shorter forms first. And similarly, there are 23 one-syllable words which are too infrequent compared to the two-syllable words. Honestly, predicting what these frequency distributions ought to be is really freakin' hard, so jan Sonja can't be blamed for these word-length incongruities even if she had been trying to construct a phonologically-optimized auxlang, but now we have the data from Toki Pona itself, so we could do better! Design a phonology, enumerate all of the possible word forms in order of increasing complexity, and then assign them to meanings according to the empirical frequency list!

For that, of course, we need to define a new phonology. It needs to produce at least 129 (remember, we're dropping the ordinal particle) words of three syllables or less, but no more than that. Based on picking the most cross-linguistically common segments according to Phoible data, we can go with the following inventory:

i,a,u
n (/m), p, k, w (/v)

With a strict syllable structure of CV, that produces 12 monosyllables and 144 disyllables.
Cutting out w/v gives us 9 monosyllables and 81 disyllables--not enough to squish everything into two syllables or less. But there are 729 trisyllables--way more than we need! So, we could cut it down even more... But, that gets at a hard-to-quantify issue: usability. Aesthetics, it turns out, can be an engineering concern when engineering for maximal cross-cultural auxlang usability! Too few phonemes, and the language gets samey and hard to parse. Toki Pona as it is seems to hit a sweet spot in having some less-common phonemes, but sounding pretty good--good enough to naturally attract a speaker community. If I were doing this for real, I'd probably not just look at individual segments, but instead comb through Phoible for the features that are most cross-linguistically common, and try to design a  maximnally-large universally-pronounceable inventory of allophone sets based on that to give variety to the minimal set of words. But if we accept the numbers of phonemes, and accept their actual values as provisional, what happens if we enumerate words while also aliminating minimal pairs?

Well, then we get a maximum of 3 monosyllables (re-using any vowel would produce a minimal pair) well under a hundred disyllables, but plenty of trisyllables. It would be nice to not do worse than Toki Pona in the average word length, though, which means we probably need 118 monosyllables + disyllables--we can get that pretty easily by relaxing the word-difference constraints such that we can have minimal pairs between, e.g., /n/ and /k/, which are extremely unlikely to be confused. Or, we just go up to 5 consonants instead of four, probably adding in something like j (/l).

I'm still not super inclined to add the mountain of failed auxlangs or tokiponidos in the world... but, that's the process I would use to properly engineer an optimal auxlang-alternate to Toki Pona.

Some Thoughts... Index

Friday, August 2, 2024

Some Thoughts on Toki Pona

Toki Pona is a minimalist philosophical artistic language, not an auxlang. Nevertheless, it has attracted a fairly large and international community of users--enough so that it was possible for Sonja Lang to publish a descriptive book on natural usage of Toki Pona (The Toki Pona Dictionary)! Thus, while this should in no way be seen as a criticism in the negative sense of Sonja's creation, it seems fair to critique Toki Pona on how optimized its design is as an auxlang.

Toki Pona has 92 valid syllables, composed of 9 consonants and 5 vowels. Accounting for disallowed clusters at syllable boundaries, this results in 7519 possible 2-syllable words--far, far more than any accounting of the size of Toki Pona's non-proper-noun vocabulary, which does not surpass 200 words. In developing the toki suli whistle register, I discovered that some phonemes can be merged without any loss of lexical fidelity--so even if we wanted to additional restrictions like spreading out words in phonological space to eliminate minimal pairs, or ensuring that the language was uniquely segmentable, the phonetic inventory and phonotactic rules are clearly larger and more permissive than they strictly need to be. And a smaller phonemic inventory and stricter phonotactics would theoretically make it trivially pronounceable by a larger number of people. For example, we could reduce it to a 3-vowel system (/a i u/), eliminate /t/ (merging with either /k/ or /s/) and merge /l/ and /j/. More careful consideration in building a system from scratch rather than trying to pair away at Toki Pona's existing system could minimize things even further, but if we start there, and require that all syllables be strictly CV, then 7x3=21 valid syllables and 441 valid 2-syllable words. We could rebuild a lexicon on top of that with no minimal pairs and unique segmentation just fine, or choose to make the phonemic inventory even smaller--all while still reducing the average Toki Pona word length, since the current vocabulary does include a few trisyllabic words!

The grammar, on the other hand, I really have no complaints about. It is not quite as simple as it could be (e.g., li could be made always obligatory, rather than onligatory-unless-the-subject-is-mi-or-sina), but it's really quite good--and the minor complexities actually help add to its charm as an artlang.

I am not much inclined to actually construct a phonologically-optimized relex of Toki Pona, as what would be the use? But it is fun to imagine an alternate history in which Toki Pona was designed from the outset with usage as an auxlang in mind. Would it actually have become as successful as it is, had Sonja taken that route? Perhaps we need to consider another contributor to Toki Pona's popularity--Sonja's specific phonological aesthetic. As mathematically sub-optimal as it is, Toki Pona sounds nice. Would it still have become popular if its sounds were instead fully min-maxed for optimal intercultural pronoucneability, length, and distinctiveness? Maybe I'll build a Toki Pona relex after all, just to see if it can be made to sound pretty....

Some Thoughts... Index

Monday, April 29, 2024

Solving the Game of 5000

The Game of 5000 is a game of dice, with an element of choice. You start by rolling six dice, and scoring as follows, taking the largest score possible:

Six-of-a-kind gives you 5000 points.
A straight (1,2,3,4,5,6) is 2000 points.
Two triples or three pairs are 1500 points.
Three ones are 1000 points.
Triple 2s are 200, triple 3s are 300, etc.
Individual 1s are 100 points.
Individual 5s are 50 points.

You then remove the scoring dice, and consider the remaining dice. If you have unused dice, you must roll again. Otherwise, you have a choice: keep your current score, or roll again with the remaining dice. If those dice score, you add them to your total, and repeat; but it you ever roll a zero, you lose everything, and end your turn. The game ends when someone has reached a score of 5000, and the round has finished so all players have an equal number of turns.

The highest-scoring patterns are only available if you are rolling six dice, so lower-dice-count rolls become riskier. With 3, 4, or 5 dice, you can get single triples and collections of individual ones and fives. With one or two dice, you can only get individual ones and fives, and the probability of rolling a zero and losing everything is high.

So, when should you roll again, and when should you stay? To figure this out, we need to know the expected value of any given roll. That is, if you roll a certain number of dice, what is the average score you would get over a large number of rolls? That turns out to depend on what you have to lose, and to incorporate that we will need to know all the ways that you could score, and how many ways you could roll zero and lose everything.

For 1 die, you have 1/6th probability of rolling 100, 1/6th probability of rolling 50, and 2/3rds probability of rolling zero. The sum of all possible scores is 150. Multiplying scores by probabilities and adding them together, you get an expected value of 25, with a 2/3rds chance of scoring nothing and 1/3rd chance of scoring something. Note that the expected value, being an average, is not actually a possible score--the fewest points you can score at once is 50! But that fractional value represents the probability that you could get nothing.

With 2 dice, things get a little more complicated. If we labelled the dice, there would be 36 different possible rolls; but in fact, the order or identity of the dice doesn't matter, so rolling 1,2 is the same, as far as scoring goes, as rolling 2,1. To make scoring easier, we can sort the dice results to put every possible roll in a standard form, and then account for the multiple ways to get each of those collections of numbers. There is one out of 36 ways to get two 1s (200 points) or two 5s (100 points). There are 8 ways to get exactly one 1 (100 points) or exactly one 5 (50 points). There are two ways to get one 1 and one 5 (150 points). And that leave 16/36 ways to roll a zero. That works out to an expectation value of 50 points, and 4/9ths chance of rolling a zero, and 1800 as the sum of all possible scores.

To figure out the numbers for more dice, we can get a computer to enumerate all of the possible combinations of die values and automatically score them.

At three dice, we can start getting higher scores. There 6^3, or 216, labeled rolls, which collapses down to 56 distinct collections of die values. 60 of those rolls produce no score, giving a 5/18th (or approximately 27.8%) chance of rolling a zero, and expectation value of ~86.81, and 18750 as the sum of all possible scores.

At 4 dice, there are 1296 labelled rolls but only 126 distinct rolls. 204/1296 rolls are zeros (17/108ths or ~15.7%), the expectation value is ~141.3, and the sum of scores is 183150.

At 5 dice, there are 7776 labelled rolls, 252 distinct rolls, and 600/7776 zeros (25/324ths or ~7.7%). The expectation value is ~215.5, and the sum of scores is 1675800.

With a full 6 dice, there are 46,656 labelled rolls, 462 distinct rolls, and 1080/46,656 (5/216ths or ~2.3%) zeros. The expectation value is 402.2, and the sum of score is 18763500.

To be really accurate, we should account for the fact that scoring on six dice requires re-rolling, and the results of the next roll should be accounted for in the final value of that choice... and recursively, as you might score on all sixes again, with increasingly small probabilities but potentially forever. However, for practical purposes, that turns out not to matter!

Now, these initial values are only the expectation for a single roll with an initial score of zero. If you have already accumulated some previous score (which must be the case for rolling any number of dice other than six), then all of the zero-scoring rolls actually cost you negative points. Which means the expectation value of the next roll can become negative, depending on a combination of the probability of rolling a zero and the value of your existing score.

We can figure out the expectation value of a roll-in-context by replacing the zero scores with the negative of the current score, and recalculating the average. So, if 'a' is the current accumulated score, 'p' is the total number of labeled rolls, 'z' is the number of zeros, and 's' is the sum of positive scores, then the contextual expectation value can be calculated as

e = (s-az)/p

If the expectation value is positive, that means it's a good idea to roll. In such a situation, even if you have a less than 50% chance of scoring, the amount that you could score is high enough that it is worth the risk. But, we can make the calculation simpler by doing a little algebra to see what the accumulated score must be to make the expectation value zero:

0 = (s-az)/p
0 = s/p-az/p
az/p = s/p
az = s
a = s/z 

We can work that out once for each number of dice, and get the following table of cutoff scores:

1 - 37.5
2 - 112.5
3 - 312.5
4 - ~897.79
5 - 2793.0
6 - ~17373.6

And now you can see why the rule about always re-rolling when you have used all six dice doesn't matter--since the game ends at 5000, you will never accumulate a score higher than 17,373, so the expectation value of rolling six new dice is always positive, and you should always do it!

At the other end, it is impossible to get down to 2 dice left without accumulating a score of at least 200; to get in that situation, you must have scored 50, losing one die each time, four times in a row, or scored 2 50s, losing 2 dice each time, twice. Thus, in any game scenario where you could roll 1 or 2 dice, the expectation value is always negative, and we can conclude that you should never roll fewer than 3 dice unless someone else has already reached 5000 and you are on your last turn, in which case you just keep rolling as long as you can until you either get a higher score or don't.

Since scores only come in multiples of 50, we can thus simplify the preceding table as follows:

3 - 300 (~27.8% chance of loss)
4 - 850 (~15.7% chance of loss)
5 - 2750 (~7.7% chance of loss)

If you have 1 or 2 dice, always stay. For 2 dice, the chance of loss is less than 50%, but the potential gains are so small as to be not worth it. If you have 6 dice, always roll (you have to anyway). If you have 3, 4, or 5 dice, stay if your accumulated score is above the cutoff in the table; otherwise, roll. It turns out, executing the optimal strategy just requires memorizing 3 numbers!

Tuesday, March 19, 2024

Human Actors Shouldn't Be Able to Speak Alien Languages

Isn't a little weird that humans can speak Na'vi? Or that aliens can learn to speak English? Or, heck, Klingon! The Klingon language is weird, but every single sound is used in human languages.

Of course, there's an obvious non-diegetic reason for that. The aliens are played by human actors. Actors wanna act. Directors want actors to act. It's less fun if all of your dialog is synthesized by the sound department. But while it is an understandable and accepted trope, we shouldn't mistake it for representing a plausible reality.

First, aliens might not even use sound to communicate! Sound is a very good medium for communication--most macroscopic animals on Earth make use of it to some extent. But there are other options: electricity, signs, touch, light, color and patterning, chemicals. Obviously, a human actor will not, without assistance, be able to pronounce a language encoded in changing patterns of chromatophores in skin, nor would a creature that spoke that language have much hope of replicating human speech. But since sound is a good and common medium of communication, let's just consider aliens that do encode language in sound.

The argument was recently presented to me that aliens should be able to speak human languages, and vice-versa, due to convergent evolution. An intelligent tool-using species must have certain physical characteristics to gain intelligence and use tools, therefore... I, for one, don't buy the argument that this means humanoid aliens are likely to start with, but supposing we do: does being humanoid in shape imply having a human-like vocal tract, or a vocal tract capable of making human-like noises? I propose that it does not. For one thing, even our closest relatives, the various great apes, cannot reproduce our sounds, and we can only do poor approximations of theirs. Their mouths are different shapes, the throats are different shapes, they have different resonances and constriction points. We have attempted to teach apes sign languages not just because they lack the neurological control to produce the variety of speech sounds that we do, but also because the sounds they can produce aren't the right ones anyway. Other, less-closely-related animals have even more different vocal tracts, and there is no particular reason to think they would converge on a human-like sound producing apparatus if any of them evolved to be more externally human-like. We can safely assume that creatures from an entirely different planet would be even less similar to us in fine anatomic detail. So, Jake Sully should not be able to speak Na'vi in his human body, and should not be able to speak English in his avatar body--yet we see Na'vi speaking English and humans speaking Na'vi all the time in those movies.

And that's just considering creatures that make sounds in essentially the same way that we do: by using the lungs to force air through vibrating and resonant structures connected with the mouth and nose. Not all creatures that produce sound do so with their breath, and not all creatures that produce sound with their breath breathe through structures in their heads! Intriguingly, cetaceans and aliens from 40 Eridani produce sound by moving air through vibrating structures between internal reservoirs, rather than while inhaling or exhaling--they're using air moving through structures in their heads, but not breath!

Hissing cockroaches make noise by expelling air from their spiracles. Arguably, this should be the basis for Na'vi speech as well: nearly all of the other animals on Pandora breathe through holes in their chests, with no obvious connection between the mouth and lungs. They also generally have six limbs and multiple sets of eyes. Wouldn't it have been cooler to see humanoid aliens with those features, and a language to match? But, no; James Cameron inserted a brief shot of a monkey-like creature with partially-fused limbs, no operculi, and a single set of eyes to provide a half-way-there justification for the evolution of Na'vi people who are just like humans, actually.

Many animals produce sound by stridulation. No airflow required. Cicadas use a different mechanism to produce their extremely loud songs: they have structures called tymbals which are crossed by stiff ribs; flexing muscles attached to the timbals causes the ribs to pop, and the rest of the structure to vibrate. It's essentially the same mechanism that makes sound when you stretch or compress a bendy straw (or, as Wikipedia calls them, straws with "an adjustable-angle bellows segment"). This sound is amplified and adjusted by passage through resonant chambers in the insects' abdomens. Some animals use percussion on the ground to produce sounds for communication. Any of these mechanisms could be recruited by a highly intelligent species as a means of producing language, without demanding any deviation from an essentially-humanoid body plan.

There is, of course, one significant exception: birds have a much more flexible sound-production apparatus than mammals, and some of them are capable of reproducing human-like sounds, even though they do it by a completely different mechanism (but it does still involve expelling air from the lungs through the mouth and nose!) Lyrebirds in particular seem to have the physiological capacity to mimic just about anything... but they extent to which they choose to imitate unnatural or human sounds is limited. Parrots and corvids are known to specifically imitate human speech, but they do so with a distinct accent; their words are recognizable, but they do not sound like humans. And amongst themselves, they do not make use of those sounds. Conversely, intraspecific communication among birds tends to make use of much simpler sound patterns, many of which humans can imitate, about as well as birds can imitate us, by whistling. So, sure, some aliens may be able to replicate human speech--but they should have an accent, and if their sound production systems are sufficiently flexible to produce our sounds by different means, there is no reason they should choose to restrict themselves to human-usable sounds in their own languages. Similarly, humans may be able to reproduce some alien languages, but they will not sound like human languages--and when's the last time you heard a human actor in alien makeup whistling? (Despite the fact that this is a legitmate form of human communication as well!)

The most flexible vocal apparatus at all would be something that mimics the action of an electronic speaker: directly moving a membrane through muscular action to reproduce any arbitrary waveform. As just discussed, birds come pretty close to capturing this ability, but they aren't quite there. There are a few animals that produce noise whose waveform is directly controlled by muscular oscillation which controls a membrane, but they are very small: consider bees and mosquitoes, whose buzzing is the result of their rapid wing motions (or, in the case of bumblebees, muscular vibrations of the thorax). Hummingbirds are much bigger than those insects, and they can actually beat their wings fast enough to create audible buzzing sounds (hence, I assume, the name "humming"bird), but they are still prety small animals. And despite these examples of muscule-driven buzzing, it seems rather unlikely that a biological entity--or at least, one which works at all similarly to us--could have the muscular response speed and neurological control capabilities to replicate the complex waveforms of human speech through that kind of mechanism. But if they did (say, like the Tines from Vernor Vinge's A Fire Upon the Deep), just like parrots and crows, why would their native communication systems happen to use any sounds that were natural for humans?

Now, some people might argue with my assertion that "any of these mechanisms could be recruited... as a means of producing language". That doesn't really impinge on my more basic point that an alien language should not reasonably be expected to be compatible with the human vocal apparatus, but let's go ahead and back up the assertion anyway. Suppose a certain creature's sound-production apparatus isn't even flexible enough to reproduce the kinds of distinctions humans use in whistled speech, based on modulating pitch and amplitude (which cicadas certainly can). Suppose, in fact, that it can produce only four distinct sounds. That should be doable by anybody that can produce sound ata ll--heck, there are more than 4 ways of clapping your hands. With 2 consecutive sounds, you can produce 16 distinct words. If you allow 3, it goes up to 80 words. At a word length of 4 or less, you've got 336 possible words. So far, that doesn't sound like very much. But then, there are 1360 possible words of length 5 or less, and 5456 of length 6 or less. At a length of 7, you get 21,840 possible words--comparable to the average vocabulary of an adult English speaker. The average length of English words is a little less than 5 letters, and we frequently (9 letters) use words that are longer than 7 letters, so needing to go up to 7 to fit your entire adult vocabulary isn't too bad. And that's before we even consider the ability to us homophones to compress the number of distinct words needed! So: we might argue about exactly how many words are needed for a fully-functional language with equivalent expressive power to anything humans use, but through the power of combinatorics, even small numbers of basic phonetic segments can produce huge numbers of possible words--indisputably more than any number we might come up with as a minimum requirement. A language with only four sounds might be difficult for humans to use, as it would seem repetitive and difficult to segment... but we're talking about aliens here. If 4 sounds is all their bodies have to work with, their brains would simply specialize to efficiently process those specific types of speech sounds, just as our brains specialize for our speech sounds.

Now, to be clear, this is not intended to disparage any conlanger who's making a language for aliens and using human-compatible IPA sounds to do so. It's an established trope! And even if it's not ever used in a film or audio drama, it can be fun. There are plenty of awesome, beautiful examples of conlangs of this type, and there's no inherent problem with making more if that's what you want to do. Y'all do what you want. But we should not mistake adherence to the trope for real-world plausibility! And it would be great to see more Truly Alien Languages out there.

Saturday, March 2, 2024

On Mantis Shrimp, Butterflies, & Frogs

Previously, I discussed how to conceptualize the experience of organisms with different dimensionalities of their color spaces, along with a few other effects like the varying color sensitivy across different parts of the retina as seen in rabbits. But, as hinted out by the mention of Mantis shrimp at the end, multichromatic visual systems can actually get a lot weirder than that.

Even humans, and in fact most vertebrate animals you are likely to be familiar with, actually have a more complex visual system than our 3-dimensional color space implies. After all, we have four different types of light sensing cells in our retinas, and yet we do not have tetrachromatic vision! What is that fourth type--the rods--doing? Most readers will probably already know that rod cells are what give us low-light vision. Most of the time there is very little, if any, interaction between rod cells and our three types of cones: either it is too bright, rods are completely bleached, and we only get visual information from cones (known as photopic vision); or, it's too dark for cones to respond at all, and we only get visual information from rods (known as scotopic vision). In those sorts of low-light situations, humans become monochromats--we physiologically cannot see color in dim light! Our brains, however, are very good at lying to us, and filling in the colors that we know things should be. Unless, perhaps, you are small child who does not have a whole lot of experience with what colors things should be yet--a situation which once led to an adorable experience with my oldest child when he was very small. Once, when he had woken up early in the morning, I found him playing in his bedroom with a pile of balls, sorting them into "black ball!" and "grey ball!"; and then, when I turned on the light in the bedroom he gasped and said "Oh! Color!"

Incidentally, it is possible for each of these parallel visual systems to fail independently, mostly due to genetic conditions that inactivate either rods or cones. Humans lacking cone cells are rod monochromats and experience day blindness; humans lacking functional rod cells are nyctalopic and experience night blindness. And while most mammals are at least dichromats, armadillos, anteaters, and tree sloths are all rod monochromats, as are 90% of deep-sea fish species.

In between these extremes, there is range of mesopic vision, where both rods and cones have significant activity, and color perception gradually shifts as light levels get progressively darker or lighter. At no point do we incorporate rod cell data into the opponent process to get tetrachromatic vision, though; it's essentially used to augment luminosity information when cone cells start to struggle, causing a shift in the spectral sensitivity peak and altering apparent saturations.

Not all vertebrates handle dark vision in the same way, though, or have the same rod or cone sensitivity limits. Birds transition into scotopic vision at much brighter illumination levels than we do, as their color vision is sharpened by oil droplets that cut out noise at the tails of cone cell receptivity--but that also means that they waste more light, and so need more light to see color. Meanwhile, although most nocturnal vertebrates rely heavily on rods and tend to reduce their color perception or lose it entirely (which is why so many mammals are dichromats, and as mentioned above some are even rod monochromats--we lost tetrachromacy when our ancestors were nocturnal, and occasionally re-evolved trichromacy in more recent eons) nocturnal geckos don't have rods at all--they rely entirely on cone cells, which have simply evolved to be more sensitive than ours, and so retain a constant sense of color perception across their entire perceptive range of luminosity. (This is probably because they evolved from diurnal ancestors who had already lost their rods, as most diurnal lizards and some snakes have). In fancy terms, they have simplex retinas (containing receptors for a sngle integrated visual system), while we have duplex retinas (containing receptors for two parallel visual systems). Hawkmoths and nocturnal bees have parallel adaptions, with altered ommatidium geometry that improves light concentration onto individual receptors, for monoplex trichromatic low-light vision. But that's less weird and complicated than humans--what about more weird?

Toads and frogs, it turns out, have multiple types of rod cells, which are sensitive to even lower levels of light than human rods are. Which means they have genuine dichromatic vision in situations that would seem to us pitch black! In theory, there could be creatures that integrate their visual experiences across different light levels, using multiple rod types so that the brain has to lie less about what colors things should be in the daylight--but amphibians don't do that! Neither do they have a single 5-dimensional color space--rather, they have two completely independent color spaces, one dichromatic and one trichromatic, overlapping the same frequency range, which could in the correct conditions be perceived at the same time, but generally show up in different environments and are used for different purposes. Frogs and toads use their cone-based vision to identify food and mates, but they use their rod-based vision exclusively for navigation, with dichromacy allowing them to better distinguish directions based on different colors of light sources (incidentally, they prefer to jump towards high-frequency sources in light conditions, and towards lower-frequency sources in dark conditions).

Now Mantis shrimp provide the most famous example of optical complexity, but plenty of arthropods have large numbers of opsin types. Even daphnia, or water fleas, which don't even have image-forming eyes, are tetrachromatic in their ability to respond to the colors of light sources! Does this mean that butterflies with 8 receptor types are octochromats, with a 6-dimensional hue space? Well, no, for the same reason that frogs aren't pentachromats. Like dichromatic rabbits, creatures with large numbers of photorecptor types tend to have them localized in different parts of the visual field, to serve different purposes, and the signals are not neurologically combined to form a single coherent color space. Papilio butterflies, for example, which do have 8 different photoreceptor types, behave like tetrachromats when identifying flowers as food sources, but behave like dichromats (despite using 3 receptor types to form the relevant dichromatic retinal signals!) when selecting leaves for egg-laying. This kind of behavior-specific segmentation of visual systems means that in some species, different sexes actually have completely different visual systems, because they need them for different reproductive tasks! Which suggests some interesting sci-fi possibilities. And while daphnia are individually tetrachromatic, they have genes for many more than just 4 opsin types. If different sets of opsin genes were expressed in different individuals, the philosophical question "is what I call red really the same as what you call red?" would have an objectively-verifiable answer, as every different morph of the species (whether segmented by sex or caste or random variation) would have different color perceptions.

That brings us to the Mantis shrimp. With 12 different spectral receptor types, they could be doing a multiple-parallel-colorspace thing, like frogs and butterflies do. But... they aren't. As mentioned in that previous post, Mantis shrimp don't actually have particularly high spectral resolution, and they don't have the neural architecture to construct decorrelated opponent channels to produce a single perceptual color space. Instead, their large number of receptor types seems to exist to avoid the need for that kind of complex neural architecture! Instead, the Mantis shrimp visual system is built for speed and efficiency. Because of the spatial distribution of different receptor types into bands across their compound eyes, getting a full spectral profile on any given object requires mechanical scanning, which is relativey slow, but metabolically cheap; and wherever a given object falls in the visual field, determining whether or not it matches the spectral sensitivity of that region is instantaneous.

If Mantis shrimp were conscious, we might imagine their experience of color as being more analogous to our own perceptions of sound or taste. Mantis shrimp don't recognize abstract colors--they recognize specific fuzzy spectral patterns. Similarly, we have thousands of auditory hair cells that each respond to a specific frequency, but we don't uniformly group them into a kilodimensional "sonic color" space--we can selectively identify individual frequencies overlayed, or recognize particular spectral patterns of timbres and specific known source types. Taste and smell are similar; we have more than 400 types of olfactory receptors and at least 5 taste receptors, but we don't have a 405-dimensional experience of taste and smell (in fact, we don't know what the neurological dimensionality of human chemoreception is; to date, there is no model that can predict olfactory sensation from receptor activations). Instead, we can pick out individual receptor channels that are useful for specific purposes (sour helps us identify acids; bitter helps us identify poisons; salty helps us identify, well... salt; sweet helps us identify carbohydrates; and umami helps us identify proteins), and we can recognize specific fuzzy patterns that form the chemical signature of specific source types. For a good long time, western philosophy held that smell was "ineffable", and impossible to describe in language through any means other than "smells like a specific thing"; that turns out to be a symptom of western philosophers just not being bothered to try, though, and in fact there are many languages around the world which have generic olfactory terms disconnected from a specific source just as we have generic color terms. Statistical analysis of those language's vocabularies suggests that humans actually conceive of smells arranged in a two-or-maybe-three-dimensional space, where the major axes are "edible vs. non-edible" and "pleasant vs. unpleasant" (or "dangerous vs. safe"). Thus, durian is unpleasant but edible, ammonia is unpleasant and inedible (and dangerous), flowers are (generally) pleasant but inedible, and fruits are pleasant and edible. Languages which have generic olfactory terms generally have 12-16 of them--similar to the maximum number of basic color terms found in human languages.

So, a conscious alien species with Mantis-shrimp-like vision, or even a large number of parallel multidimensional color systems like butterflies have, might experience their spectral perceptions not in an analogous manner to our experience of color, but collapsed down into a small number of behaviorally-relevant dimensions. Is this the spectral pattern of something I can eat? Is this the spectral pattern of something dangerous? Is this the spectral pattern of something useful to me? Is this the spectral pattern of a potential mate? Etc. And depending on how important vision is to their culture (vision doesn't have to be an alien's primary sense just because it's ours!), they may consider the categorization and naming of generic colors to be completely ineffable, or totally normal--just disconnected from the raw physiological inputs which exist below the level of conscious awareness. 

But could there be creatures with extremely high dimensional color vision? Aside from the lack of evidence that they exist on Earth implying that high-dimensional vision probably wouldn't evolve elsewhere either, there are some practical arguments for why they shouldn't exist. Dichromatic vision permits distinguishing between objects and areas exhibiting predominantly higher-frequency vs. predominantly lower-frequency light, which is useful for picking out objects and against a background and general navigation, as seen in amphibians; however, because dichromatic vision conflates hue and saturation, it is not reliable for picking out specific wavelengths. While trichromatic vision can still be fooled by pairs of inputs that are indistinguishable from monochromatic light, it at least provides the possibility of identifying a unique spectral peak, giving us perception of the spectral colors. A lot of animal behaviors rely on this ability, such as the aforementioned Papilio butterflies which use a g-(r+b) opponent color signal to identify green leaves for egg laying, excluding objects which are too red or too blue; or apes and humans, whose trichromatic vision allows us to distinguish ripe, unripe, and overripe fruit (among other things!) as the peak reflectance shifts across the spectrum. Separating out the hue and saturation dimensions also gives us more information about the material properties of reflecting objects. So if trichromacy is alreadys so much better, why are there so many tetrachromats in the world? Well... we don't know. There are probably multiple contributing factors; trichromacy is mostly-adequate for disinguishing most ecologically-significant variation in most natural spectra, but tetratchromacy does reduce further reduce the possibility of spectral confusion. It may assist with color constancy--the ability to calculate what the color of a reflecting object "should" be under varying light conditions (although even dichromats can do that to some extent). Having more receptor types may provide better spectral resolution when covering a wider visual range--note that most tetrachromats can see further into the infrared and ultraviolet than we can. So perhaps pentachromacy or hexachromacy would be more useful to creatures that evolved in an environment with a different atmosphere that transmitted a wider band of potentially-visible light!

References:

Thresholds and noise limitations of colour vision in dim light
From spectral information to animal colour vision: experiments and concepts