Gliese 1337

Saturday, February 11, 2023

Some Thoughts on Gripping & Backtile

Gripping and Backtile are languages (or language sketches) by Sai, with collaborators, in the tactile modality--i.e., communicated through the sense of touch.

Gripping, developed in partnership with Alex Fink, is by far the more fully developed of the two. Sai documented the development of the phonology in two YouTube videos, Phonology of a gripping language, part 1 and Phonology of a gripping language, part 2, and Sai & Alex gave an introductory talk on the language at LCC3 (slides available here to follow along). There is also a reference grammar.

Backtile, developed in partnership with Nai Damato, is a language sketch, for which the linked Google Doc is the entirety of available documentation.

My initial approach to relating to either of these languages is to compare them with how natural tactile languages operate. There are not many of those, but between nat-tact-langs, Gripping, and Backtile, we can define a fairly large field of potential design space for tactile languages, which is pretty exciting from the point of view of providing inspiration for other conlangers. Unfortunately... I don't actually know a whole lot about natural tactile languages! In fairness, not many people do--the literature is sparse--but Sai and their collaborator for Backtile, Nai, at least have considerably more personal experience in that area (where "considerably more" means "any at all", and Nai actually has a relevant university degree). Fortunately, the Backtile document does explain its naturalistic inspirations, and Sai, Nai, and Alex have all been available to provide additional context.

I want to be annoyed at the limited documentation for Backtile, but it is admittedly just a sketch, and I can't fault the creators for not feeling like developing it further; if that's all there is to document, then that's all there is to document. As it is, Backtile intersects with some of the coding strategies used in actual Tactile ASL / Protactile. There is, however, also a smidgen of artificiality in the use of braille, with finger and palm pressure used to communicate the dot pattern of a braille cell for spelling. In fairness, this is something that is actually done in Deaf-Blind communities, but importing braille feels to me like importing oral spelling into English--it's a distinctly technological process, tied to the idea of writing, and as such goes against my own intuitions about what is "natural to the medium". Alex has expressed similar feelings, though neither of us can precisely identify which features make it feel artificial, aside from its history as a designed system. But hey, if real life Deaf-Blind people actually do that already, then clearly it works, so you might as well use it! The other aspect in which Backtile feels somewhat artificial is in the extremely restricted articulatory space--it's articulated exclusively on the upper back/shoulder blade (hence the name). However, this feeds into some desirable properties of the system--being less tiring than hand-following, less socially awkward than more high-contact tactile signing, easily accessible, and allowing full duplex communication between two people with hands resting on each other's shoulders.

Gripping is in some ways much more artificial, although it has some of the aesthetic of hand-following tactile signing. It is meant to be a covert communication system for couples in situations where it would not be weird for them to be holding hands in public. Though neither Sai nor Alex were aware of this when constructing Gripping, it has some similarities to Somali Tactile Livestock Negotiation Code, in that involves one hand from each person, communication by pressure on particular points of the receiver's hand, and is meant to be covert--although in the case of Somali Tactile Code, only the message is covert, not the fact that communication is occurring in the first place, and Gripping is considerably more elaborated and flexible! (Note that the linked article refers to the Somali system as a "language", but it's really not--it is a highly context-restricted code with limited expressivity.)

I mentioned above the concept of "natural[ness] to the medium", a useful concept which I was introduced to by Sai. I'll just quote, because they describe it at least as well as I could:

I have a conception of "natural to the medium", which has only a coincidental relationship with "is similar to natlangs". All natlangs are natural to their media (barring some bad conlanging by authorities), but not all things that could be natural to the media are found in an extant natlang, because the latter is by no means a covering set of the possibility space. There are things they just don't happen to have evolved to, but which could occur — and among those, there are both ones that could occur by derivation from extant/historical natlangs, and ones that could not occur from them because the founder effects plus diachronic changes wouldn't reach some part of possibility space.

This is a useful concept to have around for critiquing all sorts of conlangs, but especially for critiquing conlangs in media for which there just aren't a lot or (or any) natural points of comparison. And while Gripping does seem very artificial with things like Tactile ASL and Tactile Auslan as one's basis for comparison, I don't think it can be said that it isn't natural to the medium--and given that it has radically different design goals, I don't think we should expect it to look "naturalistic" with regard to natural tactile sign languages. Indeed, Sai's phonology videos go to great lengths to explain how it is natural to the medium, by detailing the extensive experimental work that Sai and Alex did to find sets of contrastive features that were easy to articulate and easy to perceive, which is really a rather unique opportunity available in this particular medium, because it still uses the human body, just in a new way. That kind of interactive experimentation with a new communication medium is considerably harder to pull off for, e.g., alien languages using senses and articulators that humans don't have, like my own Fysh A language.

One particularly interesting feature of Gripping is that the phonology is asymmetrical--the person whose thumb is on top in the grip produces signs differently at the articulatory-phonetic level than the person whose thumb is on bottom. This is to get around the fact that switching positions is awkward and slow. However, if one were to eliminate the secrecy requirement, a different solution inspired by four-handed Tactile ASL signing presents itself--you could use a double grip for a full-duplex communication, where each participant has a dedicated receiving hand and a dedicated sending hand! There's an idea if any other conlanger wants to pick it up. (Also note that there is a lot more phonetic space available to explore if you drop the requirements for secrecy, so if you want to experiment with your own one-pair-of-hands tactile language, don't assume Gripping has already covered the entirety of available phonological options!) Sai and Alex did not pursue this route for several reasons, in addition to the obvious loss of secrecy:

You can't move around easily while clasping both hands--this would make it a converse-in-place system. However, that's not too different from what actually happens with natural Deaf-Blind tactile communication, so it's not a deal-breaker if someone else wants to run with the idea.
Clasping both pairs of hands is considerably less comfortable in a relaxed position, and more effortful in an un-relaxed position.
Full duplex communication is absurdly mentally challenging, so while having the option is neat, lacking it really isn't a problem. Full duplex really just allows you the option to interrupt and talk over your interlocutor. (But perhaps this could be used as the basis for some linguistic sci-fi--imagine an alien species which does have the mental capacity for full duplex communication, and a medium that supports it!)

As-is, with the single-grip setup, Gripping is somewhere between half- and full-duplex on the phonological level, similar to audio languages in which it is possible to talk over people. Unlike audio languages, however, certain combinations of subordinate and dominant articulations are actually physically impossible to pull off! The half-duplex system currently documented is already an improvement over what is possible with, e.g., Tactile ASL, which does require repositioning to exchange the roles of sending and following hands. On the other hand, Protactile communication does allow for limited back-channeling (e.g., "ACK", "uh-huh", "yeah", etc.), and Sai has speculated that a similar level of discourse-management simultaneous communication could be added to Gripping using the subset of phonology that can be simultaneously articulated and perceived, though they have not yet tried it.

Additionally, just as oral languages can be partially lip-read and ALS signs can be felt, Gripping articulations are useful outside of their primary modality. Sai has given the example using Gripping as a very restricted sign language with no hand motion, for covert hand signals--which is very similar to the "small motion" Atreides sign language that David Peterson developed for the latest film adaptation of Dune.

Some Thoughts... Index

Saturday, February 4, 2023

Some Thoughts On Glossing

A Note: This article has been edited from its original version to take into account feedback from David himself.

David J. Peterson has on several occasions been outspoken against Interlinear Glossing, particularly in the context of developing and documenting conlangs. I find that very strange, as I find glossing absolutely indispensable in language documentation, so after our last social media exchange on this topic, I decided to do some Deep Thinking.

And this is the part where I feel a lot like Brandon Sanderson expressing his dissatisfaction with Audible. I don't dislike DJP! I still have signed copies of his books on conlanging for kids and adults, and I am quite happy to provide those Amazon Affiliate links so other people can give him money for them! But on this one point at least, I think he is wrong, and I want to understand why, and what glossing is really good for.

To that end, I asked a bunch of people on social media about their opinions on glossing. You can see the raw responses on Facebook (group 1, group 2), Reddit, Quora, and Twitter. It turns out to be very difficult to get the question across effectively given the differing restrictions on message length and type on different platforms, but I still got some pretty useful data.

There are, I think, two major factors at play:

First, David does not believe in morphemes--or at least, does not find the concept of morphemes useful language design. And that's fine! David is far from the only person to point out that morphemes aren't necessarily a great concept even in formal linguistics, or to propose alternative models of morphology.

Second, David works for an unusual audience. As essentially the world's only full-time professional conlanger for movies and TV, the primary audiences for his documentary output are:

Actors, who have to be able to pronounce translated lines, but not necessarily understand what they mean.
Set design artists, who need access to font files and translated text, and need to know what it looks like and how much space it will take up... but again, not what it actually means.

Now, I thought that these would be situation where, admittedly, full on Leipzig-style interlinear glossing won't be particularly useful, and may in fact be an inconvenient distraction. I am, in fact, fully ready to admit that there are many situations in which interlinear glossing is not useful. However, while (as I would expect) they are not fully detailed Leipzig-style morphological glosses, David does provide phonetic and word-level interlinear glosses for actor lines, as you can see published on his website, and has said that this is "one of the only places I think it's useful." (Incidentally, William Annis also provides actors with interlinears so they don't stress the wrong words--but again, not with full morphological analyses. I don't know how, e.g., Marc Okrand, Paul Frommer, or Christine Shreyer work, but I would not be at all surprised to find that it's very similar.)

But, I suspect that David's unusual success in this particular context has led him to overgeneralize. Now, to be clear, I don't know that David has any problem with glossing in an academic linguistic context; to quote "When you're glossing to analyze, that's analysis. It has its place, and its place is in analysis, not creation." Thus, some of this might seem like attacking a strawman--but as far as I am concerned, conlang documentation is language documentation, and what's good for natlangs is good for conlangs, and vice-versa. In fact, I don't even particularly disagree with the statement that glossing is for analysis--but I do strongly disagree with the idea that analysis and creation must be considered distinct, and with the implication that analysis should be excluded from presentation, which is where conlang creation and natural language documentation collide.

In this reddit post (a response to an earlier revision of this article), David breaks down the problem as follows:

The problem I see is twofold:

Biased morphological analyses (both betraying the framework being used, and how the language itself is being used).
Taking something readable, like language data, and exploding it, so it's a mess.

These are not bad points. However, they should not be applied overly broadly. Thus, when David goes on to say that "In short, morphological glossing is for analysis, not for presentation—or for comprehension.", I must vehemently disagree--that is throwing the baby out with the bathwater. And while this looks initially like a very different breakdown of the issue than what I came up with, they actually line up pretty well--so let's take a look and my two points and David's to points together.

Regarding morphological analysis: It is true that any particular gloss at a level deeper than word-for-word will entail some kind of analysis, and an associated theoretical position. But does that mean you have to believe in morpheme theory to use glossing? No! Martin Haspelmath doesn't believe in morphemes (see links above), but he is (along with Bernard Comrie and Balthasar Bickel) nevertheless one of the editors for the latest edition of the Leipzig rules! Which rules include several options for notating non-concatenative and non-trivially-segmentable morphology. And while not all such proposals are included in the current edition of the standard rules, there have nevertheless been even more proposals for glossing symbols that explicitly avoid making theoretical claims (e.g., "+" as an alternative to "-" vs. "=", to indicate that some particular subforms are joined without making any claim about whether the joint is an instance of affixation cliticization, or compounding). And nobody is forcing you to strictly stick to the limits of the Leipzig conventions. In fact, one respondent to my social media surveys was explicit about the idea that glosses "should be geared towards the readership, and the idea that glosses should always follow a standard format is wrong".

So, just because you don't accept a particular theoretical position is no reason to reject interlinear glossing altogether. You can design your style of glossing to fit whatever theoretical or non-theoretical considerations you prefer, to best communicate with your target audience.

But suppose you do expose a theoretical bias in your glossing: is that actually a bad thing? Certainly, it can be bad, particularly if you aren't doing it on purpose, and I will address that in more detail below, but it does not need to be. In many cases, the analysis can be the whole point, and betraying the framework being used can be a positive addition to the presentation. Several of my own conlangs, after all (notably, the three "Languages of Spite") exist to demonstrate that a language with a particular theoretical grammatical structure is possible. Omitting the intended analysis from the presentation in such cases would make it nearly worthless. So what is David's solution? "On the other hand, if there's sufficient language data and accompanied by faithful, accurate translations, that's all you really need." Technically, this is true--it's what a working linguist would rely on for documenting a natural language in the first place, after all, and the same can certainly be done with conlangs--but do you really want to ask that of your audience? I don't; no matter how much raw language data is provided, among the people who might read an appreciate a conlang grammar, practically none of them will do the work to produce the analysis themselves. So, again, I think David's success in a particular context has led to overgeneralization--if you are producing a conlang with the intent that it be appreciated through the aesthetics of text produced in that language (where "text" here is meant in the technical sense, encompassing spoken dialog as well as written text), then a gloss may be unnecessary to your purposes. But if you intend the language to be appreciated for itself, as an exercise in constructing a system, then the analysis is the thing, and it must be part of the presentation, just as much as it must be included in any academic paper analyzing the structure of a natural language.

Furthermore, creation and analysis need not be separate stages. If you work that way with your own conlangs, cool, I won't tell you that you're wrong--but it's not the only way. In my own experience, analysis and creation feed back into each other; analyzing what I have already created generates new ideas other features to create, which may interact in unexpected ways and lead to changes in direction or re-analysis that makes the original obsolete. When working this way, interlinears are a way to communicate with myself.

Now, David has also said that

In fact, most of the time when I'm looking at conlangs, I completely ignore the glosses, because they're often (a) incoherent, and (b) wrong.

And... yeah, I can't argue with that. But does that mean that we shouldn't do them at all? No! It simply means that we shouldn't do them badly! (At least, not in the final presentation; if your glosses-for-yourself produced in the creative process are crappy, but they work for you, then more power to you!) If your glosses are incoherent, then get better at glossing! If your glosses are wrong, I am glad that you included them, so that I could make that determination myself. It opens the door to a potentially productive conversation. It is not at all unusual for me to come across a datum in a natlang grammar or analytical paper and think "I'm pretty sure that analysis is wrong", so it should not be at all surprising that the same would happen with conlang grammars. But the gloss provides a means of understanding the author's analysis as well as formulating my own.

Now, let's move on to considering my and David's second points: what's appropriate for the audience, and when does a gloss enhance or detract from the presentation? Well, as I've already explained, when The Analysis Is The Thing, you should include a gloss! But beyond that, I myself thought that appropriate usages were more restricted than it turns out they actually are before I started this little bit of research; in particular, I thought "well, if you are writing a document intended to teach the language, then interlinear glossing is probably not terribly useful." After all, in my formal school studies of French and Russian, not once did I ever encounter a textbook that contained interlinear glosses--they just teach you the vocabulary and morphology ahead of time, or else expect you to memorize complete constructions and infer the rules later, and the main point of a Leipzig-style interlinear gloss is to make the text accessible to someone who doesn't have the necessary background in the language yet (or ever). And yet, if we look up Interlinear Glossing in Wikipedia, the very first phrase of the article is "In linguistics and pedagogy" (emphasis added). And furthermore, consider this excerpt from Nishnaabemwin Reference Grammar by J. Randolph Valentine:

Linguistic researchers may be disappointed to see that morpheme-level segmentations of examples are rarely provided. At a conference held in Thunder Bay, Ontario, in 1996, a steering comittee of Nishnaabemwin speakers explicitly requested that such details not be included, as it was felt that they interfered with the flow of the presentation, and contributed to what is sometimes called the "intellectual mining" of aboriginal languages and cultures. To accommodate these concerns, I [provided] word-level annotations[...]. [I]t seems to me that there are many good reasons for working the annotations into the text. For one, many of my readers will be semi-speakers of Nishnaabemwin, who will benefit from the help annotations provide; secondly, Nishnaabemwin varies dialectically, and the word-level glosses will allow fluent readers to more readily accommodate dialect differences; lastly, of course, the annotations make the language more accessible to those lacking prior exposure.

I have complex feelings about this situation. On the one hand, if this is an academic reference grammar, then yeah, I would be surprised and dismayed at the lack of detailed glosses. But, on the other hand, if it is used largely as study reference for learners of the language, then I agree with steering committee that detailed glosses would indeed interfere with the presentation! But on the gripping hand... readers "will benefit from the help annotations provide". And that triggered a realization that I am shocked I did not have earlier, given that I spent 9 years working for a university language department developing software to improve adult language acquisition! What's the number-one most effective technological assistance you can give to a new language learner? Parallel translations, subtitles, and word-by-word glosses to ensure they are exposed to comprehensible input! Anything that will reduce the friction of discovering the meaning of words or phrases the reader is unsure about, keeping them engaged with the text in a flow state. At the end of my time in academia, we were even looking into automatic morphanalysis for augmented reader applications. So while a fully detailed Leipzig-style interlinear gloss can get distractingly complex, and thus unhelpful, some level of glossing--tailored to the audience, not slavishly holding to the formal Leipzig rules in maximal detail--is clearly appropriate to pedagogical settings!

To quote David again: "The glosses I typically see make the work less accessible, and the work would be improved by their removal." I suspect that Nishnaabemwin steering committee would agree. And yet, "the annotations make the language more accessible". The issue is not, fundamentally, interlinear glossing. The issue is bad interlinear glossing! David is absolutely right that many people (conlangers and academic linguists alike) are not great at writing glosses clearly, and often have a tendency to include far more information than is necessary for the purpose of the given example. This is why you don't give a full morphological breakdown to an actor who just needs to pronounce the line with right prosody! But let's not throw out the good with the bad; don't just stop glossing, learn to consider your audience, and learn to gloss well.

What about not-teaching-a-language settings? Well, that's where glossing really shines. If you are writing an article, or a descriptive grammar, whether documenting or analyzing a natural language or a conlang, pretty much everyone surveyed agrees that interlinear glosses are "crucial" or "absolutely indispensable"--and as I have argued above, there is at least a certain genre of conlang presentation in which The Analysis Is The Thing. But, that genre aside, on its own, this is just an opinion, and no more forceful than "glossing is useless". So, why are interlinears indispensable? Fundamentally, it's because your audience does not know the language (or at least, cannot be assumed to know the language--if you are writing a paper on English in English, you can probably skip the glosses most of the time; same goes for, e.g., writing a paper on Spanish in Spanish, etc., where obviously your audience will know the language under discussion) and will not be learning the language. As one respondent put it, the gloss is

the part that lets me know what the hell I'm even looking at. A string of Latin script doesn't even tell me what your language sounds like, let alone how it works. If you don't gloss, you could literally just as well post gibberish and no one could tell the difference.

Which, given the mystery surrounding the Voynich manuscript, does indeed seem to be true! And as I already stated above, even if you provide "sufficient" language material for the reader to come up with their own analyses... they just won't. Ain't nobody got time for that! To quote David again: "I expect if you give a grammar and unglossed data of a language you've created to someone else to gloss, you're going to get back glosses that surprise you." No doubt! And I would love to run that experiment! But it's not an argument against making your own analysis more accessible.

The more different the structures of the analyzed and analysis languages are, the more important interlinear glossing is to help elucidate the structure of the original and how it actually corresponds to the final translation. It gives you the details that the author knows and the reader needs.

If you are trying to support a particular theoretical analysis, then glosses are a succinct way to illustrate that analysis. As noted about above, you don't have to subscribe to any particular theory to use interlinear glossing, but if you do, an appropriate choice of glossing conventions will let you show it! But whether or not you are committing to an analysis, interlinears are also helpful in allowing the reader to develop their own analyses. To quote another respondent:

An example sentence without a gloss conveys that an example or illustration exists
An example sentence with a gloss actually illustrates how things work and shows the reader information that might actually be part of it working.

(Emphasis added.) In other words, a gloss, which conveys author knowledge about the structure of the analyzed language, helps to prevent the reader from going astray and engaging in "analyzing the translation". It makes the source more accessible without having to analyze huge quantities of text on one's own, such that the reader can usefully formulate and propose their own hypotheses to explain the data, even if the original glosser is theoretically biased. This is particularly important when the point of an example is to illustrate a grammatical feature that may not be fully preserved in the language of analysis. For example, to quote another respondent:

If I were writing in Eng. about resumptive pronouns in X, and I gave a sentence with a translation "I saw the neighbors we sold the car to" without adding "I see-PST neighbors Rel we sell-PST them-DAT car", I'm not really presenting direct info about pronouns in X.

Information is almost always lost in translation, and appropriate glossing allows you to preserve it, and focus on what you want the example to show!

So, should you always gloss everything? No, there are definitely situations where it is not useful for the target audience or purpose of the text. Should you always gloss in maximal detail and with perfect conformance to the formal Leipzig rules? Also no--you should tailor your glosses to the intended audience and what you want to convey with a particular example. But should you never gloss? Also no! If you intend your documentation to actually be useful to other linguists, or appreciable by other conlangers, you should know how and when to gloss, and do so liberally!

Some Thoughts... Index

Tuesday, January 31, 2023

Rylan & Last Starfighter

Nowadays, we are awash in an ocean of geeky sci-fi and fantasy media content--but, it was not always so. There once was a time when a single person could easily list every significant live-action film with fantastical content in the last decade, own them all on VHS, and watch them all in a weekend movie marathon. When I was a small child, The Last Starfighter was one of those rare and precious sci-fi films that my family had on VHS, and I loved it.

(Beware the Amazon affiliate link! If you click it, and buy the Blu-ray, you might end up giving me some money!)

The Last Starfighter was revolutionary in a number of ways. It took the kid protagonist out of the suburbs and into a trailer park. It emphasized the 3D nature of space for navigation and combat--spaceships don't all get conveniently aligned in a flat plane and all facing the same way up! It had the heaviest use of CGI since TRON, and while we wouldn't call the results "photorealistic" today, it was the first film to attempt photographic realism in computer generated imagery.

But that's not why you read this blog! You wanna know what the linguistic content is like!

The Last Starfighter does feature alien dialog from the Rylans, but not a proper conlang, or even a sketch of one--according to the Director Commentary, it's all gibberish. But, it's remarkably well-constructed gibberish! If you're going to do gibberish, The Last Starfighter ain't a bad reference to emulate! (Unlike, say, the Jabba's Palace scene in The Return of the Jedi.)

The alien dialog is entirely confined to a fairly short sequence in between Alex (our human MC) entering the Star Car for transport to Rylos and being given a translator device so that he (and the audience) hears everything for the rest of the movie in English. Some of that dialog is very difficult to hear clearly, as it is diegetically filtered through PA speaker systems in the background of the scenes, and the subtitles don't transcribe it. However, I managed to transcribe a few bits for reference:

[On approach to the Rylan base in the Star Car]
Centauri: Rita. Ritana. Ritana.
Traffic control: Ritana. Ritana sanswela. Ritana.
[Greeting, when Star Car door opens]
Rityensi. Kita / Kritar (pronunciation seems to flip between rhotic and non-rhotic versions over several repetitions of the word; would be good to get multiple sets of ears on it)

[While gesturing through a door]
Kritar ina

[While receiving uniform]
Incredulous Official: Isanjay!
Centauri: Isanjay? Onimatswela! Prita, Prita!
Hula!

[Calling for Alex's attention]
Iai. Kita / Kritar

Throughout the sequence, the filmmakers make heavy use of environmental and actor-emotional cues to communicate the intended meaning (making the exact meaning of the words Irrelevant, and broad gist Obvious), while simultaneously allowing the audience to participate in Alex's confusion at being thrown into contact with this alien culture, and showing through the use of an alien language that it is, well... an alien culture!

There is little enough there that the writers are not at risk of falling afoul of self-contradiction--there's enough freedom to induce a meaningful language behind these utterances if you wanted to. But at the same time, there is enough repetition of phonetic elements to make it feel cohesive and consistent, and re-use of common elements in the same situations. E.g., "Kita", combined with a beckoning gesture (cross wrists with palms inward), is pretty clearly something like "come here" / "go this way". There are lots of 'R's, 'T's, and 'I's, and the "swela" element occurs twice in different contexts--it sounds like a suffix, but maybe it's a distinct word.

Finally, Alex is presented with an automatic translator device pinned to his shirt collar, "so that people don't have to listen to that for the rest of the movie" according to the Director's Commentary. I, for one, would've been fine listening to that for the rest of the movie... but if you have that kind of bulk of dialog, it really would've required actually constructing a proper conlang! So, the writer & director knew their limits, and constructed the film accordingly. What I find quite odd, though, is that the Director's Commentary calls out the use of the automatic translator device as "cheesy"! On the one hand, that's a little overly self-deprecating in light of the ridiculously broad usage of Universal Translators in other SF media--notably, Star Trek. On the other hand, it's nice to hear a filmmaker explicitly acknowledging that not using that trope would have been better. And, while various Star Trek series and episodes really deserve their own separate posts, it is worth noting here that more recent Star Trek productions have been better about not completely background the existence of the translator mechanism--just to point out a couple of notable examples, Star Trek: Prodigy starts out in an environment where prisoners are isolated and unable to communicate with each other because they are not given access to universal translators (implicitly acknowledging that it is a real in-world technology, rather than strictly a narrative convenience); and in season 2 of Star Trek: Discovery Saru's sister expresses confusion about how she is able to understand Michael (who, to her, is an alien)--because, as a pre-technological alien, the existence of translator devices has not yet been explained to her. So, despite its imperfections in the linguistic department, we can identify The Last Starfighter as also being revolutionary in explicitly acknowledging the need for and existence of translators as in-universe technology in a way that is relevant to the audience experience!

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

Monday, January 30, 2023

Linguistics as the Science of Science Fiction

Well, I've been away for a while, and I have several old drafts waiting around for me to get back to them and turn them into more analyses of Linguistically Interesting Media... but right now, I have been inspired to get writing again not by a particular work of fiction, but by this 2018 blog post by Martin Haspelmath.

Linguistic science fiction is relatively limited in how it deploys the actual science of linguistics. Most science fiction that employs linguistics as a background focuses on some variety of the Sapir-Whorf hypothesis: that language constrains thought, and thus teaching people different languages can be employed as an effectively technological solution to various problems--either mundanely altering how people behave, or exercising somewhat fantastical levels of control, or granting people mental superpowers. This kind of linguistic SF conceit shows up in, for example,

Native Tongue by Suzette Hayden Elgin
The Embedding by Ian Watson
The Languages of Pao by Jack Vance
The Dispossessed by Ursula K. LeGuin
Babel-17 by Samuel R. Delany
Arrival (film) / The Story of Your Life (novella) by Ted Chiang

(As usual, all links to media are Amazon Affiliate links.)

And then we have some more peripheral examples:

Embassytown by China Miéville, which features aliens who cannot lie. This is frequently claimed to be an instance of Whorfianism in sci-fi, but the weirdness is not an inherent feature of the fictional language, but rather of the aliens' minds.
Snowcrash by Neal Stephenson, which features a language that can directly reprogram the human brain. This is certainly akin to Whorfianism, but in my mind not quite the same thing, as Stephenson's brain programming language operates at a physiological, rather than psychological, level. Other readers may disagree, but the mechanism of action here feels a lot more like a highly refined version of the kind of direct neurological manipulation that occurs by accident in, e.g., epileptic people who have seizures triggered by flashing lights, rather than the sort of Whorfian personality manipulation seen in The Languages of Pao, etc.

And there are a few other stories available that avoid Whorfianism altogether; Sheila Finch wrote a few collected in The Guild of Xenolinguists, such as "Reading the Bones" (about which I have written previously), which, somewhat like Embassytown, deals with a fictional culture's psychological and cultural relationship to language-as-cognitive-technology--neither aliens nor humans have their minds fundamentally altered by learning each other's languages in that story! Nevertheless, these sorts of examples are hard to find, and I am far from the first commentator to notice the overwhelming overuse of the Sapir-Whorf hypothesis as the basis for linguistic science fiction.

So, what does this have to do with Martin Haspelmath? Well, his blog post made me realize that there is, in fact, a second principle of theoretical linguistics which has informed a great deal of science fiction and fantasy: The Universal Grammar (UG) hypothesis. This shows up explicitly in The Embedding, where the backstory is all about setting up The Forbidden Experiment to test the limits of UG, but it is often much more subtle, to the point that most readers may not even realize that they are being exposed to linguistic sci-fi when they encounter it!

Every time we encounter aliens or eldritch monsters whose languages are beyond human comprehension--that's making an implicit claim about UG, that humans have an innate grammatical system and the aliens have a different, incompatible one. To quote Martin-quoting-Jessica-Coon:

There are grammatical properties we could imagine that we just don’t ever find in any human language, so we know what’s specific to humans and our endowment for language. There’s no reason to expect aliens would have the same system. In fact, it would be very surprising if they did.

If the UG hypothesis is true, then almost all aliens should be fundamentally incomprehensible!

Meanwhile, every time you encounter a universal translator, the story is making an even stronger claim--that not only does UG exist, but it is actually not accidental to human evolution, but rather is the same for all language-using beings, such that the universal translator can just find the right settings for a finite number of switches in order to convert between specific languages flawlessly.

But, there is a more nuanced view, one adopted by Martin himself:

We wouldn’t expect aliens to have the same representational (=UG) constraints as humans, because presumably they have different brains and minds. But their languages would be expected to be subject to very similar functional-adaptive constraints as human languages, if the languages are used for communication in much the same way as humans use their language.

In other words, differences in the construction of other species' brains may indeed cause them to evolve languages with features that we cannot learn to use fluently--and vice-versa! For example, Jeffrey Henning's conlang Fith depends on a very non-human structure for short-term working memory. But even if we could not use the full extent of some specific alien language intuitively and fluently, we should be able to comprehend how it works and interpret it, and to find a common ground that does allow for meaningful communication--e.g., by constructing a creole like Shallow Fith, accessible to both Fithians and Humans. So far, The Guild of Xenolinguists is the only body of fiction work I know of so far that takes this approach seriously!

Considering the physical, cognitive, and functional bases of language allows us to more deeply examine some other common tropes. How often have seen a member of a Superior Species explain how easily they were able to learn our "primitive language"? Well, what might that actually mean? What differences in cognitive ability could create a asymmetry that makes it easy for aliens to learn our languages but not easy for us to learn theirs? Is it a higher bitrate that allows them to use something like Heinlein's Speedtalk? Is it vastly expanded working memory that lets them track a larger number of unambiguous anaphoric references? The possibilities are vast, and could present a variety of distinct types of disability to be faced by a human trying to interact with that culture. Or consider the trope of the name unpronounceable by humans: is it really unpronounceable by humans, or just by this Anglophone human? If it really is unpronounceable by humans, why can the bearer of the name pronounce human speech correctly? Maybe they can't, are implicitly speaking with a thick accent, and our names get butchered by them just as badly as their do by us. Or maybe they are built like birds, and just have a much better ability to synthesize arbitrary sounds than we do. Either possibility can exploited for further plot and characterization!

So... yeah. I have no particularly compelling conclusion here, but go forth and write better linguistic SF!

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

Tuesday, June 21, 2022

The Phonology of Baseline

Dath ilan is an alternate-history Earth envisioned by Eliezer Yudkowski, whose history diverges at least a couple thousand years ago from our own, and in which civilization has achieved a much higher degree of global economic coordination. Part of this increased coordination is that everyone on dath ilan speaks, at minimum, an in-universe conlang called "Baseline". Out-of-universe, Baseline does not actually exist--but descriptions of what it is like do, so I have determined to attempt to remedy the situation. In terms of explicit descriptions of Baseline's phonology, this is all we have:

For example, all the phonemes are a minimum distance away from each other that guarantees people with slightly less acute hearing can understand it when spoken under slightly adverse conditions. In-between phonemes that are possible to pronounce, but potentially difficult to hear correctly, are then reserved for constructing 'conlangs', constructed languages, many of which use 'Baseline' as a baseline but add new short words using the expanded phoneme set.

That seems... not to be super well supported by the data? Like, it appears to contain all three of s/θ/f, which are easily confusable in low-fidelity audio environments. (It's actually rather difficult to figure out what the objective perceptual distance between different phones is, independent of biases induced by a test subjects pre-existing knowledge of any specific language; the closest I could find to that kind of research is the planning that went into designing the NATO Phonetic Alphabet--but even that is optimized to avoid confusion by speakers of particular popular languages, which is overconstrained for our purposes here. However, when native speakers of some language--like English--do in fact confuse phonemes of their own language sometimes, that seems like strong evidence that the underlying phones are actually pretty close!)

However, fortunately for us, the character who speaks that paragraph is not specifically trained in linguistics, and may not know exactly what he's talking about--and there are other constraints on the design of Baseline which may conflict with that one, such that the optimal design for Baseline phonology is not one which optimizes distinctness-of-phonemes in isolation. In particular, Baseline speakers seem to have a strong sense of syllables as the most salient components of word structure, and count of syllables as the obvious way to measure utterance length; and, they value having short words and short utterances for concepts that are common in their culture. Thus, we can also expect to have a large phonemic inventory to allow for the maximum number of individual syllables, maximum information per syllable, and maximal number of short words, which is in direct conflict with keeping individual phones as far apart from each other in acoustic space as possible.

By skimming all of the "Planecrash" stories (about dath ilani people who are in a plane crash, and get isekaied to various other fantasy worlds to have culture shock in), I have extracted a total of five actual Baseline words-that-are-not-names:

dath
ilan
tsi-imbi
farsheth
kelthorkarnen

And then a bunch of personal names:

Alis	Athpechya	Bahb
Bahdhi	Bohob	Corun
Elshorm	Elzbeth	Helorm
Illeia	Karal	Keltham
Limyar	Merrin	Miyalsvor
Nemamel	Ranthal	Salthin
Thellim	Verrez

Most names have two syllables; a few (4 in this list) have 3, or maybe 4. "Bahb" is the only one-syllable names, but I don't think that is actually representative of any real name used for a dath ilani person, as it appears in a context where it is clearly meant to be transcription of the English name "Bob", as part of the set "Alis, Bahb, and Karal", standing in for "Alice, Bob, and Carol", the standard placeholder names for participants in a cryptographic protocol. "Bohob" seems to be an alternative adaptation of "Bob" that fits Baseline naming patterns better. In combination with "Bahdhi", though, the orthographic possibility of "Bahb" suggests the existence of <a> and <ah> as separate vowels. If <h> can only occur in onset positions, there would be minimal ambiguity introduced in the Anglicization by adopting that convention. <Illeia> could be a four-syllable name, but we have a negative example in that <Athpechya> is presented as a dath ilani equivalent for a non-Baseline 4-syllable name, which has been cut down to 3 syllables (assuming <y> is to be interpreted as a consonant). Thus, I am inclined to interpret that intervocalic <i> as a transcriptional variant of <y>, much like <c> is a transcriptional variant of <k>, rather than as a whole extra syllable.

As a cultural note, all dath ilani are mononymic, so there is nothing to be said about the structure of family names / patronymics.

From this data, I conclude that Baseline has a 6 vowel system:

	Front	Back
Hi	/i/ <i>	/u/ <u>
Mid	/ɛ/ <e>	/o/ <o>
Hi	/æ/ <a>	/ɑ/ <ah>

with three degrees of height, a binary front-back distinction, and rounding in the back non-low vowels.

I would like the <e> vowel to be a little higher, to maximize contrast with /æ/, but we've got an explicit negative example where the dath ilani Merrin struggles to pronounce the French name "Félix", which

confirms that the Baseline <e> vowel is not /e/. ¯\_(ツ)_/¯

Attested consonants, based on the assumption that names are supposed to be pronounced in the most obvious possible way for an Anglophone reader, are as follows:

p - /p/

b - /b/
d - /d/
k/c - /k/

f - /f/

v - /v/
s - /s/

z - /z/
th - /θ/

dh - /ð/
sh - /ʃ/
h - /h/

ts - /t͡s/

ch - /t͡ʃ/

l - /L/ (for maximal distinctiveness from /j/, I'm assuming this to be universally a dark/velarized l, rather than copying English's light/dark allophony; the presence of this and /v/ justify the lack of /w/)
r - /r/ (for maximal distinctiveness from /l/, I'll assume this to be a tap/trill even though that's not the most natural reading for most Anglophones).
y - /j/

m - /m/
n - /n/

The lack of /g/ is not typologically odd, but the lack of isolated /t/ (assuming that <ts> is, in fact, an affricate, which seems reasonable given the existence of <ch> and the lack of other /Cs/ clusters in onset positions) in the presence of /p/ and /d/ is a bizarre gap. On that basis, and because there seems to be a fairly robust voicing distinction in the affricates, I infer that there should also be /t/ and /g/ phonemes, even though they happen to be missing from this dataset. Additionally, I feel we ought to fill in unattested */ʒ/, */d͡z/, and */d͡ʒ/, on the basis that, having decided that voicing was usefully distinctive for all other obstruents, the in-world engineers of Baseline wouldn't have just left those specific place/manner combinations unused!

Now, I want to consider the case of <tsi-imbi> a little more closely; it's the only word with a hyphen in it, and the only word with consecutive identical vowels if you ignore the hyphen. In fact, no attested words have consecutive vowels at all! I infer that this is to maximize the ease of syllable segmentation, and that the hyphen should in fact represent an additional marginal glottal stop (/ʔ/) phoneme (such as shows up in the English "uh-oh"), which shows up wherever vowels would otherwise be in hiatus. That also allows to resolve any possible ambiguity in the usage of <ah> to transcribe the low-back vowel. Something like <bahob> (a minimal change from the attested <Bohob>) would have to be read as /bæ.hob/, while /baob/ would be phonetically [ba.ʔ.ob], with extra-metrical /ʔ/, and transcribed as <bah-ob>--and /ba.hob/ would be <bahhob>.

Now, this raises a potential problem with the transcription of other consonants; while we have examples of single intervocalic <l> and <r>, there are also a few instance of doubled <ll> and <rr>--but no other doubled consonants. And if we aren't allowing doubled vowels, having geminate continuant consonants across syllable boundaries seems like a very weird choice, completely counter to the goal of making syllabic segmentation easy and unambiguous. One could imagine heterosyllabic /l.ʔ.l/ and /r.ʔ.r/ sequences, with epenthetic glottal stops separating syllables just like they do between vowels, but in the absence of written hyphens in the attested names, I am going to assume that the doubled letters are there purely for purposes of Anglophone aesthetics, and that cross-syllable geminates do not actually exist in Baseline.

That leads to the following consonants chart:

	Bilabial/ Labiodental	Dental	Alveolar	Postalveolar/ Palatal	Velar	Glottal
Plosive	p b		t d		k g	(ʔ)
Nasal	m	n
Trill		r
Fricative	f v	θ ð	s z	ʃ ʒ		h
Affricate		t͡s d͡z		t͡ʃ d͡ʒ
Approximant				j	L

The fricatives are a little bit weird; I probably would have dropped θ/ð and h in exchange for x/ɣ to maximize distinctiveness and get slightly better correspondence between fricative and plosive series. But perhaps the in-world justification is that they just Wanted More Options for making more short words, and the possibility of x/h confusion pushed for pulling in the dental fricatives instead, despite the labial/dental/alveolar confusability. And for the plosives, I think it would make sense if all of the voiceless plosives were also secondarily aspirated--we've only got two plosive series, so we might as well make them as phonetically distinctive as possible!

We can also state the following apparent phonotactic rules:

Syllables have the form (C1)V((r)C2)(s|z)), where:
C1 is any consonant.
C2 is any consonant except /h/
The optional /r/ cannot occur before another /r/ in the C2 slot.
The optional final sibilant cannot occur after another sibilant in the C2 slot.
/s/ cannot occur after voiced stops/fricatives
/z/ cannot occur-- after voiceless stops/fricatives

Within a word:

A syllable cannot end with the same consonant with which the next syllable starts (nor should t/d precede t͡s/d͡z or t͡ʃ/d͡ʒ, respectively).
Vowels cannot occur in hiatus, and l and r cannot in hiatus with themselves, with extra-syllabic glottal stops being inserted for repair.

Making codas more complex than onsets is just weird, and I cannot justify that in-world at all, but that seems to be where the available data is pointing. Maybe it allows sub-syllable-level suffixing/infixing morphology?

We have no data on tone or stress, so I assume that by default that Baseline has some sort of non-lexical, predictable stress system--e.g., strict initial stress. However, based on character's commenting on how many syllables are required to say something in various languages, and treating syllable count as a reliable measure of how long an utterance is / how much effort it takes to express something, I infer that the language is syllable-timed, rather than stress- or mora-timed.

Making another default assumption that the maximum onset principle for syllabification applies, the attested syllables are as follows:

a ath
i il im
el elz
bah bahb
beth bi bo
dath dhi
far
he hob
ka kar kel ko
lan le lim lis lorm
ma mel mer mi
ne nen
pech
ral ran rez rin run
sal
sheth shorm
thal tham thel thin thor
tsi
ya

yals yar ver vor

The possible syllables are a much larger set!

Friday, May 6, 2022

Ord: Spherindricites

< Polybrachs | Introduction

The spherindricites are a derivative of the tetrabrachs, brought about by a mutation that caused repeated cell divisions along the vertical axis prior to limb differentiation, resulting in an elongated (spherindrical) segmented body plan with varying numbers of tetrahedral segments, analogous to the segmented worms which gave rise to arthropods on Earth. The development of segmentation was quickly followed by evolution of invaginations in the body surface to increase surface volume; due to the much higher surface-to-bulk ratio of 4D organisms compared to the surface-to-volume ratios of similar 3D organisms, and the small maximum distance from any point on the interior of a tetrabrach to the surface, small tetrabrachs and early spherindricites had no need for any specialized breathing structures, as liquids and gasses could passively diffuse through the creature from the environment. However, surface pockets which would be alternately compressed and expanded by the creature's movement, thus getting the surface closer to some internal volumes and actively pumping fluid past them, allowed spherindricites to grow to much larger sizes.

The least derived spherindricites, which retain minimal differentiation between their segments, primarily occupy benthic and burrowing niches and are an exceptionally diverse group, just like their close Earthling analogs, the annelids, coming in a wide range of sizes and with a variety of reduced or specialized limb structures. However, one free-swimming group of spherindricites developed encephalization--the fusing and specialization of segments at the mouth end of the creature, which had transitioned from the bottom to the forward orientation, creating creatures with distinct heads and their fronts. The forwardmost set of limbs specialized as mouthparts for grabbing and manipulating food; two of the second-segment limbs specialized as olfactory sense organs, while the remaining two developed more advanced eyes from the terminal ocelli, with ocelli disappearing from the remaining limbs.

One group of cephalic spherindricites, the malakichthys ("soft fish") directly developed a new up-down axial symmetry breaking, with one limb from each body segment specialized as a dorsal stabilizing fin and the remaining three becoming propulsive limbs radially arranged in the sideways plane.

The remaining cephalic spherindricites developed internal mineral storage structures, which would serve as the basis for structural bones. This group further diverged based on three different approaches to developing their own secondary vertical orientation:

Polysphenoids dropped two limbs from each body segment, resulting in alternating left/right and ana/kata-aligned limbs, such that the tips of each limb from any two adjacent segments form the vertices of a disphenoid.
Trilaterians dropped a single limb per segment to allow planar compression, resulting in adjacent body segments forming alternating triangular antiprisms, with each set of limbs arranged in an equilateral triangle in the sideways plane.
Quadrilaterians simply rearranged their four limbs per segment into a square arrangement in the sideways plane rather than a tetrahedron.

All three of these groups would later give rise to different land-dwelling clades which would specialize in different ecological niches suited to their divergent limb arrangements.