Saturday, February 4, 2023

Some Thoughts On Glossing

A Note: This article has been edited from its original version to take into account feedback from David himself.

David J. Peterson has on several occasions been outspoken against Interlinear Glossing, particularly in the context of developing and documenting conlangs. I find that very strange, as I find glossing absolutely indispensable in language documentation, so after our last social media exchange on this topic, I decided to do some Deep Thinking.

And this is the part where I feel a lot like Brandon Sanderson expressing his dissatisfaction with Audible. I don't dislike DJP! I still have signed copies of his books on conlanging for kids and adults, and I am quite happy to provide those Amazon Affiliate links so other people can give him money for them! But on this one point at least, I think he is wrong, and I want to understand why, and what glossing is really good for.

To that end, I asked a bunch of people on social media about their opinions on glossing. You can see the raw responses on Facebook (group 1, group 2), Reddit, Quora, and Twitter. It turns out to be very difficult to get the question across effectively given the differing restrictions on message length and type on different platforms, but I still got some pretty useful data.

There are, I think, two major factors at play:

First, David does not believe in morphemes--or at least, does not find the concept of morphemes useful language design. And that's fine! David is far from the only person to point out that morphemes aren't necessarily a great concept even in formal linguistics, or to propose alternative models of morphology.

Second, David works for an unusual audience. As essentially the world's only full-time professional conlanger for movies and TV, the primary audiences for his documentary output are:
  1. Actors, who have to be able to pronounce translated lines, but not necessarily understand what they mean.
  2. Set design artists, who need access to font files and translated text, and need to know what it looks like and how much space it will take up... but again, not what it actually means.
Now, I thought that these would be situation where, admittedly, full on Leipzig-style interlinear glossing won't be particularly useful, and may in fact be an inconvenient distraction. I am, in fact, fully ready to admit that there are many situations in which interlinear glossing is not useful. However, while (as I would expect) they are not fully detailed Leipzig-style morphological glosses, David does provide phonetic and word-level interlinear glosses for actor lines, as you can see published on his website, and has said that this is "one of the only places I think it's useful." (Incidentally, William Annis also provides actors with interlinears so they don't stress the wrong words--but again, not with full morphological analyses. I don't know how, e.g., Marc Okrand, Paul Frommer, or Christine Shreyer work, but I would not be at all surprised to find that it's very similar.) 

But, I suspect that David's unusual success in this particular context has led him to overgeneralize. Now, to be clear, I don't know that David has any problem with glossing in an academic linguistic context; to quote "When you're glossing to analyze, that's analysis. It has its place, and its place is in analysis, not creation." Thus, some of this might seem like attacking a strawman--but as far as I am concerned, conlang documentation is language documentation, and what's good for natlangs is good for conlangs, and vice-versa. In fact, I don't even particularly disagree with the statement that glossing is for analysis--but I do strongly disagree with the idea that analysis and creation must be considered distinct, and with the implication that analysis should be excluded from presentation, which is where conlang creation and natural language documentation collide.

In this reddit post (a response to an earlier revision of this article), David breaks down the problem as follows:
The problem I see is twofold:
  1. Biased morphological analyses (both betraying the framework being used, and how the language itself is being used).
  2. Taking something readable, like language data, and exploding it, so it's a mess.
These are not bad points. However, they should not be applied overly broadly. Thus, when David goes on to say that "In short, morphological glossing is for analysis, not for presentation—or for comprehension.", I must vehemently disagree--that is throwing the baby out with the bathwater. And while this looks initially like a very different breakdown of the issue than what I came up with, they actually line up pretty well--so let's take a look and my two points and David's to points together.

Regarding morphological analysis: It is true that any particular gloss at a level deeper than word-for-word will entail some kind of analysis, and an associated theoretical position. But does that mean you have to believe in morpheme theory to use glossing? No! Martin Haspelmath doesn't believe in morphemes (see links above), but he is (along with Bernard Comrie and Balthasar Bickel) nevertheless one of the editors for the latest edition of the Leipzig rules! Which rules include several options for notating non-concatenative and non-trivially-segmentable morphology. And while not all such proposals are included in the current edition of the standard rules, there have nevertheless been even more proposals for glossing symbols that explicitly avoid making theoretical claims (e.g., "+" as an alternative to "-" vs. "=", to indicate that some particular subforms are joined without making any claim about whether the joint is an instance of affixation cliticization, or compounding). And nobody is forcing you to strictly stick to the limits of the Leipzig conventions. In fact, one respondent to my social media surveys was explicit about the idea that glosses "should be geared towards the readership, and the idea that glosses should always follow a standard format is wrong".

So, just because you don't accept a particular theoretical position is no reason to reject interlinear glossing altogether. You can design your style of glossing to fit whatever theoretical or non-theoretical considerations you prefer, to best communicate with your target audience.

But suppose you do expose a theoretical bias in your glossing: is that actually a bad thing? Certainly, it can be bad, particularly if you aren't doing it on purpose, and I will address that in more detail below, but it does not need to be. In many cases, the analysis can be the whole point, and betraying the framework being used can be a positive addition to the presentation. Several of my own conlangs, after all (notably, the three "Languages of Spite") exist to demonstrate that a language with a particular theoretical grammatical structure is possible. Omitting the intended analysis from the presentation in such cases would make it nearly worthless. So what is David's solution? "On the other hand, if there's sufficient language data and accompanied by faithful, accurate translations, that's all you really need." Technically, this is true--it's what a working linguist would rely on for documenting a natural language in the first place, after all, and the same can certainly be done with conlangs--but do you really want to ask that of your audience? I don't; no matter how much raw language data is provided, among the people who might read an appreciate a conlang grammar, practically none of them will do the work to produce the analysis themselves. So, again, I think David's success in a particular context has led to overgeneralization--if you are producing a conlang with the intent that it be appreciated through the aesthetics of text produced in that language (where "text" here is meant in the technical sense, encompassing spoken dialog as well as written text), then a gloss may be unnecessary to your purposes. But if you intend the language to be appreciated for itself, as an exercise in constructing a system, then the analysis is the thing, and it must be part of the presentation, just as much as it must be included in any academic paper analyzing the structure of a natural language.

Furthermore, creation and analysis need not be separate stages. If you work that way with your own conlangs, cool, I won't tell you that you're wrong--but it's not the only way. In my own experience, analysis and creation feed back into each other; analyzing what I have already created generates new ideas other features to create, which may interact in unexpected ways and lead to changes in direction or re-analysis that makes the original obsolete. When working this way, interlinears are a way to communicate with myself.

Now, David has also said that
In fact, most of the time when I'm looking at conlangs, I completely ignore the glosses, because they're often (a) incoherent, and (b) wrong.
And... yeah, I can't argue with that. But does that mean that we shouldn't do them at all? No! It simply means that we shouldn't do them badly! (At least, not in the final presentation; if your glosses-for-yourself produced in the creative process are crappy, but they work for you, then more power to you!) If your glosses are incoherent, then get better at glossing! If your glosses are wrong, I am glad that you included them, so that I could make that determination myself. It opens the door to a potentially productive conversation. It is not at all unusual for me to come across a datum in a natlang grammar or analytical paper and think "I'm pretty sure that analysis is wrong", so it should not be at all surprising that the same would happen with conlang grammars. But the gloss provides a means of understanding the author's analysis as well as formulating my own.

Now, let's move on to considering my and David's second points: what's appropriate for the audience, and when does a gloss enhance or detract from the presentation? Well, as I've already explained, when The Analysis Is The Thing, you should include a gloss! But beyond that, I myself thought that appropriate usages were more restricted than it turns out they actually are before I started this little bit of research; in particular, I thought "well, if you are writing a document intended to teach the language, then interlinear glossing is probably not terribly useful." After all, in my formal school studies of French and Russian, not once did I ever encounter a textbook that contained interlinear glosses--they just teach you the vocabulary and morphology ahead of time, or else expect you to memorize complete constructions and infer the rules later, and the main point of a Leipzig-style interlinear gloss is to make the text accessible to someone who doesn't have the necessary background in the language yet (or ever). And yet, if we look up Interlinear Glossing in Wikipedia, the very first phrase of the article is "In linguistics and pedagogy" (emphasis added). And furthermore, consider this excerpt from Nishnaabemwin Reference Grammar by J. Randolph Valentine:
Linguistic researchers may be disappointed to see that morpheme-level segmentations of examples are rarely provided. At a conference held in Thunder Bay, Ontario, in 1996, a steering comittee of Nishnaabemwin speakers explicitly requested that such details not be included, as it was felt that they interfered with the flow of the presentation, and contributed to what is sometimes called the "intellectual mining" of aboriginal languages and cultures. To accommodate these concerns, I [provided] word-level annotations[...]. [I]t seems to me that there are many good reasons for working the annotations into the text. For one, many of my readers will be semi-speakers of Nishnaabemwin, who will benefit from the help annotations provide; secondly, Nishnaabemwin varies dialectically, and the word-level glosses will allow fluent readers to more readily accommodate dialect differences; lastly, of course, the annotations make the language more accessible to those lacking prior exposure.
I have complex feelings about this situation. On the one hand, if this is an academic reference grammar, then yeah, I would be surprised and dismayed at the lack of detailed glosses. But, on the other hand, if it is used largely as study reference for learners of the language, then I agree with steering committee that detailed glosses would indeed interfere with the presentation! But on the gripping hand... readers "will benefit from the help annotations provide". And that triggered a realization that I am shocked I did not have earlier, given that I spent 9 years working for a university language department developing software to improve adult language acquisition! What's the number-one most effective technological assistance you can give to a new language learner? Parallel translations, subtitles, and word-by-word glosses to ensure they are exposed to comprehensible input! Anything that will reduce the friction of discovering the meaning of words or phrases the reader is unsure about, keeping them engaged with the text in a flow state. At the end of my time in academia, we were even looking into automatic morphanalysis for augmented reader applications. So while a fully detailed Leipzig-style interlinear gloss can get distractingly complex, and thus unhelpful, some level of glossing--tailored to the audience, not slavishly holding to the formal Leipzig rules in maximal detail--is clearly appropriate to pedagogical settings!

To quote David again: "The glosses I typically see make the work less accessible, and the work would be improved by their removal." I suspect that Nishnaabemwin steering committee would agree. And yet, "the annotations make the language more accessible". The issue is not, fundamentally, interlinear glossing. The issue is bad interlinear glossing! David is absolutely right that many people (conlangers and academic linguists alike) are not great at writing glosses clearly, and often have a tendency to include far more information than is necessary for the purpose of the given example. This is why you don't give a full morphological breakdown to an actor who just needs to pronounce the line with right prosody! But let's not throw out the good with the bad; don't just stop glossing, learn to consider your audience, and learn to gloss well.

What about not-teaching-a-language settings? Well, that's where glossing really shines. If you are writing an article, or a descriptive grammar, whether documenting or analyzing a natural language or a conlang, pretty much everyone surveyed agrees that interlinear glosses are "crucial" or "absolutely indispensable"--and as I have argued above, there is at least a certain genre of conlang presentation in which The Analysis Is The Thing. But, that genre aside, on its own, this is just an opinion, and no more forceful than "glossing is useless". So, why are interlinears indispensable? Fundamentally, it's because your audience does not know the language (or at least, cannot be assumed to know the language--if you are writing a paper on English in English, you can probably skip the glosses most of the time; same goes for, e.g., writing a paper on Spanish in Spanish, etc., where obviously your audience will know the language under discussion) and will not be learning the language. As one respondent put it, the gloss is
the part that lets me know what the hell I'm even looking at. A string of Latin script doesn't even tell me what your language sounds like, let alone how it works. If you don't gloss, you could literally just as well post gibberish and no one could tell the difference.

Which, given the mystery surrounding the Voynich manuscript, does indeed seem to be true! And as I already stated above, even if you provide "sufficient" language material for the reader to come up with their own analyses... they just won't. Ain't nobody got time for that! To quote David again: "I expect if you give a grammar and unglossed data of a language you've created to someone else to gloss, you're going to get back glosses that surprise you." No doubt! And I would love to run that experiment! But it's not an argument against making your own analysis more accessible.

The more different the structures of the analyzed and analysis languages are, the more important interlinear glossing is to help elucidate the structure of the original and how it actually corresponds to the final translation. It gives you the details that the author knows and the reader needs.

If you are trying to support a particular theoretical analysis, then glosses are a succinct way to illustrate that analysis. As noted about above, you don't have to subscribe to any particular theory to use interlinear glossing, but if you do, an appropriate choice of glossing conventions will let you show it! But whether or not you are committing to an analysis, interlinears are also helpful in allowing the reader to develop their own analyses. To quote another respondent:
An example sentence without a gloss conveys that an example or illustration exists
An example sentence with a gloss actually illustrates how things work and shows the reader information that might actually be part of it working.
(Emphasis added.) In other words, a gloss, which conveys author knowledge about the structure of the analyzed language, helps to prevent the reader from going astray and engaging in "analyzing the translation". It makes the source more accessible without having to analyze huge quantities of text on one's own, such that the reader can usefully formulate and propose their own hypotheses to explain the data, even if the original glosser is theoretically biased. This is particularly important when the point of an example is to illustrate a grammatical feature that may not be fully preserved in the language of analysis. For example, to quote another respondent:
If I were writing in Eng. about resumptive pronouns in X, and I gave a sentence with a translation "I saw the neighbors we sold the car to" without adding "I see-PST neighbors Rel we sell-PST them-DAT car", I'm not really presenting direct info about pronouns in X.
Information is almost always lost in translation, and appropriate glossing allows you to preserve it, and focus on what you want the example to show!

So, should you always gloss everything? No, there are definitely situations where it is not useful for the target audience or purpose of the text. Should you always gloss in maximal detail and with perfect conformance to the formal Leipzig rules? Also no--you should tailor your glosses to the intended audience and what you want to convey with a particular example. But should you never gloss? Also no! If you intend your documentation to actually be useful to other linguists, or appreciable by other conlangers, you should know how and when to gloss, and do so liberally!


Some Thoughts... Index


Tuesday, January 31, 2023

Rylan & Last Starfighter

Nowadays, we are awash in an ocean of geeky sci-fi and fantasy media content--but, it was not always so. There once was a time when a single person could easily list every significant live-action film with fantastical content in the last decade, own them all on VHS, and watch them all in a weekend movie marathon. When I was a small child, The Last Starfighter was one of those rare and precious sci-fi films that my family had on VHS, and I loved it.

(Beware the Amazon affiliate link! If you click it, and buy the Blu-ray, you might end up giving me some money!)

The Last Starfighter was revolutionary in a number of ways. It took the kid protagonist out of the suburbs and into a trailer park. It emphasized the 3D nature of space for navigation and combat--spaceships don't all get conveniently aligned in a flat plane and all facing the same way up! It had the heaviest use of CGI since TRON, and while we wouldn't call the results "photorealistic" today, it was the first film to attempt photographic realism in computer generated imagery.

But that's not why you read this blog! You wanna know what the linguistic content is like!

The Last Starfighter does feature alien dialog from the Rylans, but not a proper conlang, or even a sketch of one--according to the Director Commentary, it's all gibberish. But, it's remarkably well-constructed gibberish! If you're going to do gibberish, The Last Starfighter ain't a bad reference to emulate! (Unlike, say, the Jabba's Palace scene in The Return of the Jedi.)

The alien dialog is entirely confined to a fairly short sequence in between Alex (our human MC) entering the Star Car for transport to Rylos and being given a translator device so that he (and the audience) hears everything for the rest of the movie in English. Some of that dialog is very difficult to hear clearly, as it is diegetically filtered through PA speaker systems in the background of the scenes, and the subtitles don't transcribe it. However, I managed to transcribe a few bits for reference:

[On approach to the Rylan base in the Star Car]
Centauri: Rita. Ritana. Ritana.
Traffic control: Ritana. Ritana sanswela. Ritana.

[Greeting, when Star Car door opens]
Rityensi. Kita / Kritar (pronunciation seems to flip between rhotic and non-rhotic versions over several repetitions of the word; would be good to get multiple sets of  ears on it)

[While gesturing through a door]
Kritar ina

[While receiving uniform]
Incredulous Official: Isanjay!
Centauri: Isanjay? Onimatswela! Prita, Prita!
Hula!

[Calling for Alex's attention]
Iai. Kita / Kritar

Throughout the sequence, the filmmakers make heavy use of environmental and actor-emotional cues to communicate the intended meaning (making the exact meaning of the words Irrelevant, and broad gist Obvious), while simultaneously allowing the audience to participate in Alex's confusion at being thrown into contact with this alien culture, and showing through the use of an alien language that it is, well... an alien culture!

There is little enough there that the writers are not at risk of falling afoul of self-contradiction--there's enough freedom to induce a meaningful language behind these utterances if you wanted to. But at the same time, there is enough repetition of phonetic elements to make it feel cohesive and consistent, and re-use of common elements in the same situations. E.g., "Kita", combined with a beckoning gesture (cross wrists with palms inward), is pretty clearly something like "come here" / "go this way". There are lots of 'R's, 'T's, and 'I's, and the "swela" element occurs twice in different contexts--it sounds like a suffix, but maybe it's a distinct word.

Finally, Alex is presented with an automatic translator device pinned to his shirt collar, "so that people don't have to listen to that for the rest of the movie" according to the Director's Commentary. I, for one, would've been fine listening to that for the rest of the movie... but if you have that kind of bulk of dialog, it really would've required actually constructing a proper conlang! So, the writer & director knew their limits, and constructed the film accordingly. What I find quite odd, though, is that the Director's Commentary calls out the use of the automatic translator device as "cheesy"! On the one hand, that's a little overly self-deprecating in light of the ridiculously broad usage of Universal Translators in other SF media--notably, Star Trek. On the other hand, it's nice to hear a filmmaker explicitly acknowledging that not using that trope would have been better. And, while various Star Trek series and episodes really deserve their own separate posts, it is worth noting here that more recent Star Trek productions have been better about not completely background the existence of the translator mechanism--just to point out a couple of notable examples, Star Trek: Prodigy starts out in an environment where prisoners are isolated and unable to communicate with each other because they are not given access to universal translators (implicitly acknowledging that it is a real in-world technology, rather than strictly a narrative convenience); and in season 2 of Star Trek: Discovery Saru's sister expresses confusion about how she is able to understand Michael (who, to her, is an alien)--because, as a pre-technological alien, the existence of translator devices has not yet been explained to her. So, despite its imperfections in the linguistic department, we can identify The Last Starfighter as also being revolutionary in explicitly acknowledging the need for and existence of translators as in-universe technology in a way that is relevant to the audience experience!

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

Monday, January 30, 2023

Linguistics as the Science of Science Fiction

Well, I've been away for a while, and I have several old drafts waiting around for me to get back to them and turn them into more analyses of Linguistically Interesting Media... but right now, I have been inspired to get writing again not by a particular work of fiction, but by this 2018 blog post by Martin Haspelmath.

Linguistic science fiction is relatively limited in how it deploys the actual science of linguistics. Most science fiction that employs linguistics as a background focuses on some variety of the Sapir-Whorf hypothesis: that language constrains thought, and thus teaching people different languages can be employed as an effectively technological solution to various problems--either mundanely altering how people behave, or exercising somewhat fantastical levels of control, or granting people mental superpowers. This kind of linguistic SF conceit shows up in, for example,

(As usual, all links to media are Amazon Affiliate links.)

And then we have some more peripheral examples:

  • Embassytown by China Miéville, which features aliens who cannot lie. This is frequently claimed to be an instance of Whorfianism in sci-fi, but the weirdness is not an inherent feature of the fictional language, but rather of the aliens' minds.
  • Snowcrash by Neal Stephenson, which features a language that can directly reprogram the human brain. This is certainly akin to Whorfianism, but in my mind not quite the same thing, as Stephenson's brain programming language operates at a physiological, rather than psychological, level. Other readers may disagree, but the mechanism of action here feels a lot more like a highly refined version of the kind of direct neurological manipulation that occurs by accident in, e.g., epileptic people who have seizures triggered by flashing lights, rather than the sort of Whorfian personality manipulation seen in The Languages of Pao, etc.

And there are a few other stories available that avoid Whorfianism altogether; Sheila Finch wrote a few collected in The Guild of Xenolinguists, such as "Reading the Bones" (about which I have written previously), which, somewhat like Embassytown, deals with a fictional culture's psychological and cultural relationship to language-as-cognitive-technology--neither aliens nor humans have their minds fundamentally altered by learning each other's languages in that story! Nevertheless, these sorts of examples are hard to find, and I am far from the first commentator to notice the overwhelming overuse of the Sapir-Whorf hypothesis as the basis for linguistic science fiction.

So, what does this have to do with Martin Haspelmath? Well, his blog post made me realize that there is, in fact, a second principle of theoretical linguistics which has informed a great deal of science fiction and fantasy: The Universal Grammar (UG) hypothesis. This shows up explicitly in The Embedding, where the backstory is all about setting up The Forbidden Experiment to test the limits of UG, but it is often much more subtle, to the point that most readers may not even realize that they are being exposed to linguistic sci-fi when they encounter it!

Every time we encounter aliens or eldritch monsters whose languages are beyond human comprehension--that's making an implicit claim about UG, that humans have an innate grammatical system and the aliens have a different, incompatible one. To quote Martin-quoting-Jessica-Coon:

There are grammatical properties we could imagine that we just don’t ever find in any human language, so we know what’s specific to humans and our endowment for language. There’s no reason to expect aliens would have the same system. In fact, it would be very surprising if they did.

If the UG hypothesis is true, then almost all aliens should be fundamentally incomprehensible!

Meanwhile, every time you encounter a universal translator, the story is making an even stronger claim--that not only does UG exist, but it is actually not accidental to human evolution, but rather is the same for all language-using beings, such that the universal translator can just find the right settings for a finite number of switches in order to convert between specific languages flawlessly.

But, there is a more nuanced view, one adopted by Martin himself:

We wouldn’t expect aliens to have the same representational (=UG) constraints as humans, because presumably they have different brains and minds. But their languages would be expected to be subject to very similar functional-adaptive constraints as human languages, if the languages are used for communication in much the same way as humans use their language.

In other words, differences in the construction of other species' brains may indeed cause them to evolve languages with features that we cannot learn to use fluently--and vice-versa! For example, Jeffrey Henning's conlang Fith depends on a very non-human structure for short-term working memory. But even if we could not use the full extent of some specific alien language intuitively and fluently, we should be able to comprehend how it works and interpret it, and to find a common ground that does allow for meaningful communication--e.g., by constructing a creole like Shallow Fith, accessible to both Fithians and Humans. So far, The Guild of Xenolinguists is the only body of fiction work I know of so far that takes this approach seriously!

Considering the physical, cognitive, and functional bases of language allows us to more deeply examine some other common tropes. How often have seen a member of a Superior Species explain how easily they were able to learn our "primitive language"? Well, what might that actually mean? What differences in cognitive ability could create a asymmetry that makes it easy for aliens to learn our languages but not easy for us to learn theirs? Is it a higher bitrate that allows them to use something like Heinlein's Speedtalk? Is it vastly expanded working memory that lets them track a larger number of unambiguous anaphoric references? The possibilities are vast, and could present a variety of distinct types of disability to be faced by a human trying to interact with that culture. Or consider the trope of the name unpronounceable by humans: is it really unpronounceable by humans, or just by this Anglophone human? If it really is unpronounceable by humans, why can the bearer of the name pronounce human speech correctly? Maybe they can't, are implicitly speaking with a thick accent, and our names get butchered by them just as badly as their do by us. Or maybe they are built like birds, and just have a much better ability to synthesize arbitrary sounds than we do. Either possibility can exploited for further plot and characterization!

So... yeah. I have no particularly compelling conclusion here, but go forth and write better linguistic SF!

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

Tuesday, June 21, 2022

The Phonology of Baseline

Dath ilan is an alternate-history Earth envisioned by Eliezer Yudkowski, whose history diverges at least a couple thousand years ago from our own, and in which civilization has achieved a much higher degree of global economic coordination. Part of this increased coordination is that everyone on dath ilan speaks, at minimum, an in-universe conlang called "Baseline". Out-of-universe, Baseline does not actually exist--but descriptions of what it is like do, so I have determined to attempt to remedy the situation. In terms of explicit descriptions of Baseline's phonology, this is all we have:
For example, all the phonemes are a minimum distance away from each other that guarantees people with slightly less acute hearing can understand it when spoken under slightly adverse conditions. In-between phonemes that are possible to pronounce, but potentially difficult to hear correctly, are then reserved for constructing 'conlangs', constructed languages, many of which use 'Baseline' as a baseline but add new short words using the expanded phoneme set.

That seems... not to be super well supported by the data? Like, it appears to contain all three of s/θ/f, which are easily confusable in low-fidelity audio environments. (It's actually rather difficult to figure out what the objective perceptual distance between different phones is, independent of biases induced by a test subjects pre-existing knowledge of any specific language; the closest I could find to that kind of research is the planning that went into designing the NATO Phonetic Alphabet--but even that is optimized to avoid confusion by speakers of particular popular languages, which is overconstrained for our purposes here. However, when native speakers of some language--like English--do in fact confuse phonemes of their own language sometimes, that seems like strong evidence that the underlying phones are actually pretty close!) 

However, fortunately for us, the character who speaks that paragraph is not specifically trained in linguistics, and may not know exactly what he's talking about--and there are other constraints on the design of Baseline which may conflict with that one, such that the optimal design for Baseline phonology is not one which optimizes distinctness-of-phonemes in isolation. In particular, Baseline speakers seem to have a strong sense of syllables as the most salient components of word structure, and count of syllables as the obvious way to measure utterance length; and, they value having short words and short utterances for concepts that are common in their culture. Thus, we can also expect to have a large phonemic inventory to allow for the maximum number of individual syllables, maximum information per syllable, and maximal number of short words, which is in direct conflict with keeping individual phones as far apart from each other in acoustic space as possible.

By skimming all of the "Planecrash" stories (about dath ilani people who are in a plane crash, and get isekaied to various other fantasy worlds to have culture shock in), I have extracted a total of five actual Baseline words-that-are-not-names:

dath
ilan
tsi-imbi
farsheth
kelthorkarnen

And then a bunch of personal names:

AlisAthpechyaBahb
BahdhiBohobCorun
ElshormElzbethHelorm
IlleiaKaralKeltham
LimyarMerrinMiyalsvor
NemamelRanthalSalthin
ThellimVerrez

Most names have two syllables; a few (4 in this list) have 3, or maybe 4. "Bahb" is the only one-syllable names, but I don't think that is actually representative of any real name used for a dath ilani person, as it appears in a context where it is clearly meant to be transcription of the English name "Bob", as part of the set "Alis, Bahb, and Karal", standing in for "Alice, Bob, and Carol", the standard placeholder names for participants in a cryptographic protocol. "Bohob" seems to be an alternative adaptation of "Bob" that fits Baseline naming patterns better. In combination with "Bahdhi", though, the orthographic possibility of "Bahb" suggests the existence of <a> and <ah> as separate vowels. If <h> can only occur in onset positions, there would be minimal ambiguity introduced in the Anglicization by adopting that convention. <Illeia> could be a four-syllable name, but we have a negative example in that <Athpechya> is presented as a dath ilani equivalent for a non-Baseline 4-syllable name, which has been cut down to 3 syllables (assuming <y> is to be interpreted as a consonant). Thus, I am inclined to interpret that intervocalic <i> as a transcriptional variant of <y>, much like <c> is a transcriptional variant of <k>, rather than as a whole extra syllable.

As a cultural note, all dath ilani are mononymic, so there is nothing to be said about the structure of family names / patronymics.

From this data, I conclude that Baseline has a 6 vowel system:

FrontBack
Hi/i/ <i>/u/ <u>
Mid/ɛ/ <e>/o/ <o>
Hi/æ/ <a>/ɑ/ <ah>

with three degrees of height, a binary front-back distinction, and rounding in the back non-low vowels.

I would like the <e> vowel to be a little higher, to maximize contrast with /æ/, but we've got an explicit negative example where the dath ilani Merrin struggles to pronounce the French name "Félix", which
confirms that the Baseline <e> vowel is not /e/. ¯\_(ツ)_/¯

Attested consonants, based on the assumption that names are supposed to be pronounced in the most obvious possible way for an Anglophone reader, are as follows:

p - /p/
b - /b/
d - /d/
k/c - /k/

f - /f/
v - /v/
s - /s/
z - /z/
th - /θ/
dh - /ð/
sh - /ʃ/
h - /h/

ts - /t͡s/
ch - /t͡ʃ/

l - /L/ (for maximal distinctiveness from /j/, I'm assuming this to be universally a dark/velarized l, rather than copying English's light/dark allophony; the presence of this and /v/ justify the lack of /w/)
r - /r/ (for maximal distinctiveness from /l/, I'll assume this to be a tap/trill even though that's not the most natural reading for most Anglophones).
y - /j/

m - /m/
n - /n/

The lack of /g/ is not typologically odd, but the lack of isolated /t/ (assuming that <ts> is, in fact, an affricate, which seems reasonable given the existence of <ch> and the lack of other /Cs/ clusters in onset positions) in the presence of /p/ and /d/ is a bizarre gap. On that basis, and because there seems to be a fairly robust voicing distinction in the affricates, I infer that there should also be /t/ and /g/ phonemes, even though they happen to be missing from this dataset. Additionally, I feel we ought to fill in unattested */ʒ/, */d͡z/, and */d͡ʒ/, on the basis that, having decided that voicing was usefully distinctive for all other obstruents, the in-world engineers of Baseline wouldn't have just left those specific place/manner combinations unused!

Now, I want to consider the case of <tsi-imbi> a little more closely; it's the only word with a hyphen in it, and the only word with consecutive identical vowels if you ignore the hyphen. In fact, no attested words have consecutive vowels at all! I infer that this is to maximize the ease of syllable segmentation, and that the hyphen should in fact represent an additional marginal glottal stop (/ʔ/) phoneme (such as shows up in the English "uh-oh"), which shows up wherever vowels would otherwise be in hiatus. That also allows to resolve any possible ambiguity in the usage of <ah> to transcribe the low-back vowel. Something like <bahob> (a minimal change from the attested <Bohob>) would have to be read as /bæ.hob/, while /baob/ would be phonetically [ba.ʔ.ob], with extra-metrical /ʔ/, and transcribed as <bah-ob>--and /ba.hob/ would be <bahhob>.

Now, this raises a potential problem with the transcription of other consonants; while we have examples of single intervocalic <l> and <r>, there are also a few instance of doubled <ll> and <rr>--but no other doubled consonants. And if we aren't allowing doubled vowels, having geminate continuant consonants across syllable boundaries seems like a very weird choice, completely counter to the goal of making syllabic segmentation easy and unambiguous. One could imagine heterosyllabic /l.ʔ.l/ and /r.ʔ.r/ sequences, with epenthetic glottal stops separating syllables just like they do between vowels, but in the absence of written hyphens in the attested names, I am going to assume that the doubled letters are there purely for purposes of Anglophone aesthetics, and that cross-syllable geminates do not actually exist in Baseline.

That leads to the following consonants chart:

Bilabial/
Labiodental
DentalAlveolarPostalveolar/
Palatal
VelarGlottal
Plosivep bt dk g(ʔ)
Nasalmn
Trillr
Fricativef vθ ðs zʃ ʒh
Affricatet͡s d͡zt͡ʃ d͡ʒ
ApproximantjL

The fricatives are a little bit weird; I probably would have dropped θ/ð and h in exchange for x/ɣ to maximize distinctiveness and get slightly better correspondence between fricative and plosive series. But perhaps the in-world justification is that they just Wanted More Options for making more short words, and the possibility of x/h confusion pushed for pulling in the dental fricatives instead, despite the labial/dental/alveolar confusability. And for the plosives, I think it would make sense if all of the voiceless plosives were also secondarily aspirated--we've only got two plosive series, so we might as well make them as phonetically distinctive as possible!

We can also state the following apparent phonotactic rules:
  • Syllables have the form (C1)V((r)C2)(s|z)), where:
  • C1 is any consonant.
  • C2 is any consonant except /h/
  • The optional /r/ cannot occur before another /r/ in the C2 slot.
  • The optional final sibilant cannot occur after another sibilant in the C2 slot.
  • /s/ cannot occur after voiced stops/fricatives
  • /z/ cannot occur-- after voiceless stops/fricatives
Within a word:
  • A syllable cannot end with the same consonant with which the next syllable starts (nor should t/d precede t͡s/d͡z or t͡ʃ/d͡ʒ, respectively).
  • Vowels cannot occur in hiatus, and l and r cannot in hiatus with themselves, with extra-syllabic glottal stops being inserted for repair.

Making codas more complex than onsets is just weird, and I cannot justify that in-world at all, but that seems to be where the available data is pointing. Maybe it allows sub-syllable-level suffixing/infixing morphology?

We have no data on tone or stress, so I assume that by default that Baseline has some sort of non-lexical, predictable stress system--e.g., strict initial stress. However, based on character's commenting on how many syllables are required to say something in various languages, and treating syllable count as a reliable measure of how long an utterance is / how much effort it takes to express something, I infer that the language is syllable-timed, rather than stress- or mora-timed.

Making another default assumption that the maximum onset principle for syllabification applies, the attested syllables are as follows:

a ath
i il im
el elz
bah bahb
beth bi bo
dath dhi
far
he hob
ka kar kel ko
lan le lim lis lorm
ma mel mer mi
ne nen
pech
ral ran rez rin run
sal
sheth shorm
thal tham thel thin thor
tsi
ya
yals yar ver vor

The possible syllables are a much larger set!

Friday, May 6, 2022

Ord: Spherindricites

< Polybrachs | Introduction

The spherindricites are a derivative of the tetrabrachs, brought about by a mutation that caused repeated cell divisions along the vertical axis prior to limb differentiation, resulting in an elongated (spherindrical) segmented body plan with varying numbers of tetrahedral segments, analogous to the segmented worms which gave rise to arthropods on Earth. The development of segmentation was quickly followed by evolution of invaginations in the body surface to increase surface volume; due to the much higher surface-to-bulk ratio of 4D organisms compared to the surface-to-volume ratios of similar 3D organisms, and the small maximum distance from any point on the interior of a tetrabrach to the surface, small tetrabrachs and early spherindricites had no need for any specialized breathing structures, as liquids and gasses could passively diffuse through the creature from the environment. However, surface pockets which would be alternately compressed and expanded by the creature's movement, thus getting the surface closer to some internal volumes and actively pumping fluid past them, allowed spherindricites to grow to much larger sizes.

The least derived spherindricites, which retain minimal differentiation between their segments, primarily occupy benthic and burrowing niches and are an exceptionally diverse group, just like their close Earthling analogs, the annelids, coming in a wide range of sizes and with a variety of reduced or specialized limb structures. However, one free-swimming group of spherindricites developed encephalization--the fusing and specialization of segments at the mouth end of the creature, which had transitioned from the bottom to the forward orientation, creating creatures with distinct heads and their fronts. The forwardmost set of limbs specialized as mouthparts for grabbing and manipulating food; two of the second-segment limbs specialized as olfactory sense organs, while the remaining two developed more advanced eyes from the terminal ocelli, with ocelli disappearing from the remaining limbs.

One group of cephalic spherindricites, the malakichthys ("soft fish") directly developed a new up-down axial symmetry breaking, with one limb from each body segment specialized as a dorsal stabilizing fin and the remaining three becoming propulsive limbs radially arranged in the sideways plane.

The remaining cephalic spherindricites developed internal mineral storage structures, which would serve as the basis for structural bones. This group further diverged based on three different approaches to developing their own secondary vertical orientation:

  1. Polysphenoids dropped two limbs from each body segment, resulting in alternating left/right and ana/kata-aligned limbs, such that the tips of each limb from any two adjacent segments form the vertices of a disphenoid.
  2. Trilaterians dropped a single limb per segment to allow planar compression, resulting in adjacent body segments forming alternating triangular antiprisms, with each set of limbs arranged in an equilateral triangle in the sideways plane.
  3. Quadrilaterians simply rearranged their four limbs per segment into a square arrangement in the sideways plane rather than a tetrahedron.
All three of these groups would later give rise to different land-dwelling clades which would specialize in different ecological niches suited to their divergent limb arrangements.

Wednesday, May 4, 2022

Ord: Polybrachs


As we saw in the introduction, Ord is a gigantic place. There is enough room on Ord for life to have arisen completely independently several times, and for hundreds of completely unrelated alien civilizations to develop--even though, if they knew which way to walk, they could find each other within a few thousand kilometers.

We will be looking at the development of only one branch of animal-like life. At the highest level, this branch of independently-evolved animal life in Ord's oceans and seas can be split into three groups: sponges, flatworms, and polybrachs. Ordian sponges are much like Earthling sponges--simple sessile colonies of cells which filter food particles from water flowing through them. Ordian sponges, however, are "more spongy"--more porous--than Earthling sponges can be. This is because the four-dimensional space they live in permits qualitatively larger holes, of a fundamentally different kind than exists on Earth. Ordian matter can have linear holes punched through them, just like we can, but they can also have planar holes--and Ordian sponges do, because it allows more water to flow through them from more directions.

Flatworms are spheroidal organisms; they would not look flat to us, but they are flat on Ord, as their entire lower 3D surface can contact the ocean floor simultaneously, and they have very little extent in the upwards direction. These organisms show minimal layered tissue differentiation. Simpler species are completely spherically symmetric, and simply absorb nutrients from stuff they crawl over as they inch their way across the ocean floor. Some more derived species, however, have established a front-back axis specialized for motion; such creatures have more elliptical bodies, and can often be found freely swimming in the ocean bulk.

The flatworms may eventually produce more interesting descendants, but for now the most complex creatures are the polybrachs. These are also spherically-symmetric creatures with an up-down axis, but they have specialized arm structures improving their ability to navigate and manipulate their world. Their symmetrically-arranged body segments and attached arms make them somewhat analogous to Earthling starfish, but with one major difference: while different species of starfish may have have any number of equally-spaced arms, due to the fact that there are infinitely many regular polygons in two dimensions, Ordian polybrachs are restricted to certain fixed numbers of arms corresponding to the faces (or vertices) of different platonic solids, of which there are only a finite number. The polybrachs have further specialized into three major clades based on their early embryonic development: tetrabrachs, cephalobrachs, and dodecabrachs.

In this figure, we can see the 3-or-fewer-dimensional stages of embryonic development from a single egg cell up to 4 or 8 cell structures, which allow the identification of different clades. Tetrabrachs (whose embryonic shape is labelled with a T in the preceding diagram) undergo only two cycles of cell division before adopting a maximally-dense tetrahedral arrangement of cells. The third cell division extends the embryo into the fourth vertical axis, with each tetrahedral segment going on to develop into a portion of the central disk and associated arm. Tetrabrachs tend to specialize in benthic habitats, like symmetrical flatworms, but are capable of much more active lifestyles.

Cephalobrachs (whose embryonic shape is labelled with a C) maintain a more open cellular structure through three divisions, producing a cubical arrangement of cells from which can develop eight distinct equally-spaced arms, corresponding to the faces of an octahedron. Their fourth cycle of division does not produce additional cells associated with an octahedral segment, though; rather, the top cube develops in an entirely different direction from the bottom of the creature, producing a glomular (4-dimensionally spheroidal) head / body cavity. similar to an Earthling cephalopod. Also like cephalopods, many species of cephalobrachs are capable of walking or dragging themselves along the ocean floor, but they are more often found in free-swimming niches.

Dodecabrachs (whose embryonic shape is labelled with a D) maintain an open square arrangement for two cycles of cell division, but then fall into  more close-packed square antiprism arrangement for their third. This third split already corresponds to the division between upper and lower body segments; a further cycle of division could establish cubical/octahedral symmetry, but that is not, in fact, what happens. Instead, several more cycles of cell division produce two joined spherical disks of cells, begin differentiating into distinct organs much later, eventually producing an arm section with either twelve segments in dodecahedral symmetry (hence the name of the clade) or, more rarely, twenty segments in icosahedral symmetry. The 12 vs. 20 choice seems to be easy to flip between as new species of dodecabrachs evolve, but there is a more fundamental division between sessile and medusoid dodecabrachs. In the sessile branch of the family, the body segment extends into a long spherinder (a sphere extruded into the fourth dimension, analogous to a 3D cylinder) which acts as a stalk to attach the animal to a solid surface, with the arms acting to filter nutrients from the water. In the medusoid branch, the body segment instead expands into a wide spherical disk. In some species, the disk remains relatively small such that the arms are free, and swimming is accomplished in a manner similar to an Earthling feather starfish; in most medusoids, however, the upper disk grows large enough to can curve around and enclose the central arm, disk rather like the bell of a 3D jellyfish, allowing jet propulsion by contracting the bell to expel water.

All polybrachs have ocelli (eyespots) at the ends of each of their arms, a feature which is believed to have been inherited from early flatworms before the two clades diverged; spherical flatworms also frequently have eyespots on their upper surfaces, in a variety of regular, semi-regular (corresponding to Archimedean solids) and random arrangements. Within the polybrachs, dodecabrachs appear to be the least-derived clade, with cephalobrachs and tetrabrachs each having split off from a dodecabrach ancestor after settling onto a power-of-two number of arms, which then permitted differentiation decisions to drift earlier in the stages of embryonic development.

Tuesday, May 3, 2022

The Natural History of Ord: Introduction to the Universe

Introduction

The Polybrachs
The Spherindricites

Ord is an inhabited world in an alien universe with 4 spatial dimensions rather than our usual three. It's a different bubble of stabilized space in our eternally-inflating multiverse. This has wide-ranging effects on geometry and physics, and thence on biology. Planets like Ord don't orbit stars in closed ellipses, and they don't have well-defined axes of rotation. From atoms up to galaxies, the entire universe is organized differently from our own. What we are mainly concerned with is the middle scale: how living things develop in four-dimensional seas and on three-dimensional continents. But it will be useful to investigate some high-level features of the universe those creatures are developing in, and the world they are developing on.

First, we will establish a scale. Comparing sizes between universes with different physics, let alone different dimensionalities, is a tricky thing; 1 meter here doesn't inherently mean anything on Ord, and units can seem to match up in different ways depending on what specific things we are comparing. Lets suppose we wanted to somehow "import" a human explorer from Earth to Ord; their normal 3D body would completely fall apart in a 4D space. We would have to somehow re-arrange their bits and pieces into a 4D form. But however we alter the body, we will want to keep the mind--and thus, the neural connections--intact. So, every neuron will need to be accurately mapped and reconstructed--and the number of neurons in an Earth human and an Ord human can be assumed to be the same. Since that will give us some idea of the level of biological complexity necessary for civilized life to arise on Ord as it has on Earth, let's adopt that as the basis for our standard of comparison: we'll declare neural cells to have the same linear size on Ord as they do on Earth. Human neuron bodies are around 100 microns across on average. If we deconstruct a human into individual cells, adapt each cell for Ord's universe, and then re-assemble in a stable 4D arrangement, the resulting explorer would be between 14 and 16 centimeters high--but composed of tens of thousands of times more atoms per cell!

Simply equating atoms between Earth and Ord does not accurately reflect the needs of biological systems. Four-dimensional Ord cells have a much larger proportion of their mass bound up in 3D surface membranes than we do in 2D surfaces, and thus a lower proportion available for interior structures and functions. Thus, on average, they do require thousands of time more atoms to achieve the same functions--we couldn't build an body capable of supporting our explorer's intelligence just by using the same number of atoms on Ord as we do on Earth. However, when it comes to linear measurements, atomic radii are much more precise than average biological cell sizes. Thus, in order to compare the sizes of organisms with the planet they live on, we can declare than Ord's four-dimensional atoms have the same range of radii as our three-dimensional atoms (although their internal compositions can be quite different)--exactly 1 angstrom.

To retain heat and maintain geological activity over geological time scales, Ord would need to have about 4/3rds as many atoms between its surface and its core as Earth does, to maintain the same surface-to-volume (or area-to-bulk) ratio, and thus the same heat loss rate. Earth is about 6.378x10^16 angstroms (average atomic radii) in radius, or 3.189x10^16 atomic diameters. Ord, it turns out, is about 8.5x10^16 angstroms in radius--which means it has about 2.37x10^17 times more atoms in its 4 dimensional bulk than Earth does in its 3 dimensional volume! In terms of atomic mass units, Ord is about 1/4 to 1/3 as massive as our entire galaxy! Fortunately, between a totally incomparable gravitational constant (it has different units in Ord's universe than in ours), gravity following an inverse-cubic law, and flexibility in how we measure units of time, all that extra material still only results in surface gravity comparable to Earths!

Now, about time... cesium atoms and quartz crystals don't exist on Ord (atoms with the same nuclear charges have radically different chemical properties), and pendulums depend on gravity and on our somewhat arbitrary choice of how to measure lengths, so it would seem that there is no really good method of establishing a correspondence. Furthermore, 4D brains are more tightly packed, so nerve signals travel faster, and thought occurs faster than it would in the same neural network "squashed" into a mere three dimensions. Nevertheless, we'll acknowledge the 4D brain architecture as natural for Ord, and declare that what our transposed human explorer perceives as 1 second passing (e.g., when mentally counting out "one Mississippi, two Mississippi," etc.) is one second, and everything else can follow from that. We note that objects seem to fall at a normal-feeling rate, and objects on the scale of our 15-cm-tall explorer's body seem to take normal amounts of effort to push, pull, and lift, and the gravitational constant and inertial mass units can be calculated from those observations.

Now, how much surface does Ord have? Using our angstrom equivalence, it comes out to about 2x10^28 cubic kilometers. Compare with Earth's approximate 5.1x10^8 square kilometers. Or, 2x10^37 cubic meters, compared to Earth's 5.1x10^14 square meters. Directly comparing a 3D surface volume to a 2D surface area is a bit tricky, but that's about the same volume as a sphere of space 23 AUs wide--larger than Saturn's orbit in our solar system! When intelligent creatures like our universally-transposed can be a mere 15 centimeters in height, that's a lot of space for life to fill!

From that, you may guess that Ord's universe is much more densely packed with matter than our own universe is--and you would be right! It has to be, or, with that whole extra dimension to move around in, nothing would ever run into anything else, and nothing interesting would happen! It's almost a blessing, in fact, that two-body orbits are unstable--that forces matter to collapse into interesting structures despite the extra room to expand in. And Ord does not orbit a single star; but, it does have a somewhat chaotic orbit through a globular (or glomular) cluster of stars along with many other such planets, with days and nights distinguished by which side of the world is closer to the brighter, denser center of the cluster. The space-filling distribution of matter in the cluster produces an effective potential with a lower exponent--not quite a harmonic potential as it's not completely uniform, not exactly inverse-square, not even exactly an integer or even completely constant--which, in combination with close encounters with individual other bodies, produces the chaotic nature of Ord's motion. Some day, Ord may fall into the core and be burned up, or be ejected as the cluster evaporates, but for the functional equivalent of billions of years it is mostly-stably bound, wandering through a space of roughly-constant illumination.

Many of the stars in Ord's cluster are not a whole lot more massive than Ord itself, and may someday cool down to become additional planets. How can this be? Well, that requires looking way down at the other end of the size scale, at how atoms are built. The difficulty of fusion in Ord's universe follows a much steeper curve than in ours. In fact, monoprotium can fuse at near absolute zero, if the density is high enough to make collisions probable! This is because, while the atoms of Ord's universe are made out of close analogs to our own protons, neutrons, and electrons, they are put together quite differently. When there is only one electron, it exists almost entirely overlapping the proton, controlled by the interior harmonic potential. With 4 spatial degrees of freedom and 3 quantum spin states for electrons, elements up to duodecium, with twelve protons and electrons and no neutrons in the lightest isotope, are all chemically inert and nuclearly sticky! Only at atomic number 13 do we encounter an atom with an external electron orbital and a nucleus with a distinct positive charge with can repel other nuclei. Ord's chemical equivalent of hydrogen is thus as heavy (in terms of atomic mass units) as our carbon-13 isotope, and much smaller than that in terms of nuclear to atomic radius ratios. With many more orbitals available for electrons to fill (e.g., there are 4 rather than 3 p-orbitals, each of which can hold 4 electrons in different spin states) Ord's periodic table is significantly stretched horizontally, with many types of atoms and bonds that have no analog in our world--and with nuclear-internal electrons and supplies of easily-fusible duodecium isotopes around, Ord has many more elements with higher atomic numbers than we do for chemistry, and biology, to play with.