Saturday, March 25, 2023

The Sci-Fi Linguistics of The Embedding

The Embedding, by Ian Watson, is... not good. It tied for the Campbell Award in 1974 and won the Nebula Award for Best Novel in 1975, and has been called a "modern classic", but, much like the Hugonauts' review of Dune*, while I recognize that it has some great ideas, I just don't think they're actually executed all that well. Now, young people don't always like old SF, but I've read and liked a lot of old SF, and this one just really doesn't hold up on the strength of writing or the story. My feelings pretty much mirror those expressed in this review from 2006--its understanding of linguistic theory is slightly confused, and the multiple plot threads are poorly integrated and redundant, which is a great disappointment given the novel's stellar reputation. I can only assume it has that reputation because it blew everyone's minds by actually engaging with theoretical linguistics at all back in 1974, and nobody's seriously re-evaluated it since then.  But it is the most linguisticky linguistic fiction ever, and explores ideas that have not been done better since, despite the low bar! So, let's talk about those ideas.

*Go subscribe to the Hugonauts! They deserve more listeners.

The introduction to the Gollancz SF Masterworks edition says that "There are two ideas in linguistics that have had a particular influence on twentieth-century science fiction.": the Sapir-Whorf hypothesis, and Universal Grammar. The Sapir-Whorf hypothesis--the idea that the language you speak can influence or control how you think--is low-hanging fruit, and there's tons of SF, some of which I have previously reviewed, that plays with that.

The core idea of Universal Grammar is that humans come pre-wired with an understanding of how language should work; that our brains are built with a standard template for grammar with a multitude of switches that different language merely set in different ways. This idea originates with Noam Chomsky, who famously claimed that Martians studying Earth would conclude that we all spoke mere dialects one humanese--although quite a lot of advancement has been made in linguistics since that time, and Chomsky himself no longer holds that extreme view. If you do run with that extreme view, however, it lends itself to the trope of Incomprehensible Aliens--if we are hardwired to Do Language in a particular way, and they are hardwired to Do Language in a different way, then presumably we could never learn to understand each other's languages and communication would be forever impossible.

The idea of built-in, innate grammar arises from the "Poverty of the Stimulus" argument, which basically goes that human children aren't exposed to a large enough sample of language to learn how it works from first principles in the time it actually takes for children to acquire their first language. Any language whose rules didn't conform to whatever that built-in template is then could not be learned by children. This, of course, depends on the assumption that the stimulus is actually impoverished--that there really isn't enough information in the ambient linguistic data to which children are exposed during their lives to calculate the correct rules of a human language--and that assumption is not without controversy.

It is clear that humans must have some innate capacity for language--after all, something makes the difference between a human baby who learns to understand and speak English in only a few years, versus, say, a kitten who grows up in the same house and maybe learn to recognize a few individual words. Whatever that is is called "the biological endowment", and the idea that that could vary between linguistically-capable species is unexplored here, or in any other published story that I am aware of. But exactly how extensive our innate knowledge specific to language is, is still an active area of research, and the idea of an extensive Universal human Grammar can be attacked from two directions:

  1. Showing that, for some particular linguistic feature, the stimulus is not actually impoverished--that children are exposed to enough of the right kinds of examples to just "figure it out".
  2. Showing that we have some innate cognitive biases relevant to linguistic learning, but that they are not specific to linguistic learning--thus, other linguistically-capable species may well exhibit exactly the same linguistic biases, because of developing the same general reasoning capabilities.

Somewhat confusingly, some people use "Universal Grammar" to refer to any innate knowledge relevant to language, not just that which is specific to human language, and only evidence of type 1 is relevant to disproving that kind of Universal Grammar. But given the particular feature that Ian Watson chose to focus on (the eponymous "embedding"), and how interactions with aliens are portrayed in the book (they are fascinated by exactly the same constraint), I have to assume that that was the understanding that Watson had of the term "Universal Grammar"--that it was not merely universal to humans, but cosmologically "universal", based on principles that would be reliably replicated in any intelligent mind.

In some ways, Watson's choice of feature to focus on is a clever one; center embedding is an easy concept to explain to readers who are otherwise lacking in theoretical linguistic education, and he does just that in conversations between characters. For those who do not wish to go read the novel looking for the definition, center embedding is just taking a particular grammatical structure--like a relative clause--and sticking n the middle of another structure of the same type, rather than at one end or the other. For example, take the sentence "This is the malt that the rat ate."--it's got a relative clause in it. We can self-embed another relative clause at the edge like this: "This is the malt that was eaten by the rat that was worried by the cat." Or, we can center-embed that relative clause--stick it in the middle of the first relative clause, breaking that up--like this: "This is the malt that the rat that the cat worried ate." That's harder to understand, but it's the sort of thing that might be said from time-to-time. But what if we add a third clause? "This is the malt that the rat that the cat that the dog chased worried ate." That's... really hard to interpret, and people just don't speak that way! And if you one more level... no, that'll never happen!

However, despite being a clever choice of linguistic phenomenon, it's not actually a test of Universal Grammar, as Chomsky intended the term! In fact, this gets a completely different bit of Chomskyan linguistics, which Watson completely ignores: the distinction between competence and performance. "Competence" is what you know about the rules of language, and your ability to judge things as grammatical or ungrammatical. It is competence that allows us to say, yeah, mechanically, we could add a fourth embedded clause to that horrible incomprehensible sentence, and it wouldn't violate any grammatical rules. We are capable of learning the rules that would let us do that. Competence is what lets us look at the famous sentence "Colorless green ideas sleep furiously," and say "yeah, it's grammatical, but...." Meanwhile, performance is the fact that we sometimes make mistakes that we know are mistakes, and that we can rate things as acceptable or unacceptable, because they do or don't make sense or because they are easy or hard to interpret, independent of whether or not they are grammatical. The limitations on center embedding in English aren't grammatical, and this tell us nothing about the rules of Universal Grammar--they are just a consequence of the fact that humans have limited short-term working memory, so we lose track of the first halves of multiply-embedded structures before we get to the end! And in fact, you can prove that there is no hard grammatical limit on embedding depth by observing that equally-embedded structures can be more or less acceptable depending on which precise nouns, pronouns, and adjectives you happen to use; compare, for example: "The rat which the cat which the dog chased bit fell." vs. "The elegant woman whom the man that I love met moved to Barcelona."

Each of the three major plot threads in the novel has its own mini-linguistic-ideas as well. The aliens engage with the Sapir-Whorf hypothesis in thinking that learning more languages--and specifically, learning a heavily-center-embedding language--will allow them to achieve new metaphysical abilities. The Amazonian natives use drugs to expand their linguistic competence, which could be interpreted as a precursor to the drugs used by Sheila Finch's Guild of Xenolinguists. And the opening thread of the book deals with conducting The Forbidden Experiment--isolating children from natural adult language to see what happens, or what you can make happen, in order to explore the boundaries of the biological endowment and of any Universal Grammar that we might have. As you can see from that Wikipedia link, The Embedding was not the first to explore this idea--it shows up in books, comics, and even The Twilight Zone. And if you believe in the strong Sapir-Whorf hypothesis, or linguistic determinism, it can be quite a compelling idea--raising children without exposure to any existing language would release them from the limitations of those languages, would it not? But... no. That's not actually what happens at all. It is very easy when thinking about problems in language acquisition and Universal Grammar to start thinking, "ugh, if only we could controlled experiments on the acquisition process, we could answer so many questions so much more easily!" In fact, while working on this article, complete by accident, I came across this Tweet:

Which is a serious exaggeration, and if you click through to read the ensuing thread and quote-tweets, it's one which many people disagree with and have very strong feelings about, for good reason! But it is an exaggeration of something real. Let's be clear: when we say "intrusive thoughts", we mean "intrusive thoughts"; a lot of linguists really want to know what's going on in kids' heads when they acquire language, and Lingthusiasm even sells baby onesies with "Daddy's Little Longitudinal Language Acquisition Project" on them (which I have proudly clad all three of my children in!) but nobody is going around thinking "man, I would totally raise some kids in linguistic isolation for 10 years if it weren't for that pesky Ethics Review Board!" (Or at least, we all hope nobody is thinking that!) It is the sort of thing that you put in a depressing dystopian SF novel (which the Embedding most definitely is! No one makes good decisions, and the ending is typically Cold-War-Era depressing)--or, if you think of it "for real", you immediately feel bad about it and move on to trying to find practical methods of getting the data you want, work on something else. Unfortunately, we actually do have some data on linguistic deprivation, from studies of rescued feral children and deaf children of hearing adults who do not speak a sign language, and the effects of these "natural experiments" are dire, and a source of ongoing trauma to the Deaf community. So, no, language deprivation does not give you special insight or psychic powers--it just gives you brain damage.

The fact that the main character of The Embedding actually performed a deprivation experiment thus clearly marks him as a villain, and that's only the first of the many unethical things that are done for the sake of "science" and "progress" in this book. And what's more, the particular experiments Watson describes aren't actually testing Universal Grammar, (the embedding experiment, realistically, is just training short-term working memory) which removes even the scientific justification! Every character is just straight-up unlikeable. So please, if you are an author--go write something that engages with theoretical linguistics as deeply as The Embedding does, but is more fun to read!

An additional note: The novel claims that "Stone Age children" took "hundreds of generations" to develop language; that's seriously misleading, based on what we know from some natural experiments. We have no idea exactly how long it took our biological endowment for language to evolve, but it seems that biologically-modern human children, in an appropriate social context, will spontaneously generate languages within one generation. This is evidenced by the development of pidgins into creole languages, and the spontaneous generation of new sign languages when new deaf communities are established--see, for example, the case of Nicaraguan Sign Language.


If you liked this post, please consider making a small donation!


Wednesday, March 15, 2023

The Sci-Fi Linguistics of Babel-17

Despite being far from the only, or even the best, novel, novella, or short story about the Sapir-Whorf hypothesis, Samuel R. Delaney's Babel-17 (Amazon Affiliate link as usual) is famous as "that novel about the strong Sapir-Whorf hypothesis". But, there is a bit more to it than that.

The opening of the story is highly reminiscent of the much later Story of Your Life by Ted Chiang: a language expert who has previously done work for the military is recruited by a general to decipher some alien communication and tells him that it's impossible without more data:

"Unknown languages have been deciphered without translations, Linear B and Hittite for example. But if I'm going to get further with Babel-17, I'll have to know a great deal more. [...] General, I have to know everything you know about Babel-17; where you got it, when, under what circumstances, anything that might give me a clue to the subject matter. [...] You gave me ten pages of double-spaced typewritten garble with the code name Babel-17 and asked me what it meant. With just that, I can't tell you. With more, I might. It's that simple."

In fact, Rydra Wong is in a much worse position with Babel-17 than historical linguists were with Hittite and Linear B--in each of those cases, although we lacked an equivalent to the Egyptian Rosetta Stone, at least we had the context of history and knowledge of other possible related languages to provide some direction in decoding their texts. Or at least, she would be... if she weren't psychic. Delaney neatly sidesteps the entire problem of actually deciphering and learning the language (excusable, because that's not actually the point of the story) by giving Rydra Wong supernatural powers to extract meaning that just isn't actually there. Some biotechnobabble explanation is given for how her ability to read minds works, but it fails to extend to the fact that she is said to have a history of being able to look ant unbroken code and suddenly intuit what it was meant to say--an ability which she also employs to start cracking Babel-17, and which kind of undercuts the otherwise entirely reasonable claim that she needs more data to actually decipher it! She might as well be a D&D character casting Comprehend Languages.

Once Rydra learns Babel-17, we get only minimal descriptions of how it actually works as a language. It appears to be a sort of oligosynthetic speedtalk and taxonomic language, in which the form of every word encodes its definition, a feature which supposedly promotes clearer thinking and deep understanding of everything in the world that it can name. Additionally, it has no word for "I", which is supposed to imply that thinking in Babel-17 prevents someone from acting with self-awareness, with the explanation that

"Butcher, there are certain ideas which have words for them. If you don't know the words, you can't know the ideas."

Which is, well... crap. After all, we coin new words after conceiving of the new words for, so clearly having the words for ideas is not necessary to having the ideas; rarely do we coin new words and then go looking for novel ideas to attach to them! When Rydra talks about language throughout the rest of the book, it's a mixture of reasonable stuff and linguistic technobabble. For example, as a weaker form of the previous statement, Rydra also explains that

"If you have the right words, it saves a lot of time and makes things easier."

which is absolutely true! That's why technical jargons exist. But this gets taken to a ridiculous science-fiction extreme in the description of another alien language: Supposedly, Çiribians can describe the complete schematics of an industrial facility with novel features that they want to duplicate in nine short words, which is... implausible, to say the least. Why would anyone have pre-existing short words to describe previously-unknown technological innovations developed by other aliens?

Then, we have this:

"Mocky, when you learn another tongue, you learn the way another people see the world, the universe."

Also very true! This is one of the many arguments for why documenting and trying to save dying languages is such important work--every time a language dies, the worldview communicated through that language, and the cultural knowledge encoded in that language, dies with it. But then...

"Well, most textbooks say language is a mechanism for expressing thought, Mocky. But language is thought. Thought is information given form. The form is language."

I was a little surprised that Delaney-via-Rydra would even provide the hedge of "most textbooks say..." there, because for a long time real-world textbooks would've agreed with Rydra, and this is a commonly-assumed position among linguistically-naïve people. The fact is, many people do experience their own thoughts in the form of language, and are shocked and disbelieving when they discover that not everyone else shares this experience! Yet, such people do exist, despite the existence of a good bit of 20th century academic literature claiming that they can't possibly--literature which I spent a mid-term paper in my grad school Intro to Semantics class tearing to shreds. So, there is a certain type of person who would've read that line in the book, and just like me, immediately thought "Bull! Crap! Rydra!"--but if you are not that sort of person, just take it from me that language is not identical with thought.

But let's get to the actual point: that learning Babel-17 turns a person into an agent of the enemy. There is actually a teeny-tiny kernel of truth underlying this conceit: multilingual people do often tend to develop different personalities when using different languages. This is a multilayered effect--partially, it can probably be attributed to the fact that different languages require that you pay attention to different things. Thus, Russian speakers are, on average, better at distinguishing shades of blue than English speakers, and Guugu Yimithirr speakers are better at absolute orientation than English speakers, because the vocabulary choices and grammatical categories required by their languages require them to pay more attention to those things, and thus develop the skills; and its not too hard to imagine that shifting aspects of your attention when shifting between languages could have some impact on personality. But a much, much larger component of the effect is simply an extension of the fact that we all have varying presentations of ourselves in different social groups, and languages are strongly associated with the social groups among whom we learned them and with whom we used them, and with the purposes we have in communicating with those groups. Rydra did not learn Babel-17 from "native speakers", in the presence of the enemy, so in reality, there is no particular reason to believe that just learning the language from decoding intercepted communications would have had anywhere near such a drastic effect on thought processes or personality.

So: neat idea, definitely science fiction. However, we can draw a parallel with a slightly more plausible idea from Neal Stephenson's novel Snow Crash. In both works, language is used as an attack vector to allow an enemy to take control of other people's actions. In Snow Crash, there is a language which acts like a programming language to insert instructions into people's brains; in Babel-17, the language itself is the program. (How is Snow Crash's take on this concept more realistic than Babel-17's? Well, you'll just have to wait for me to get around to reviewing Snow Crash to find that out.)

There is, however, another sci-fi linguistic idea in Babel-17 which is completely overlooked in most discussions of the novel: communication from "discorporate" people can't be remembered. Babel-17 is a wild ride through a psychedelic future with all kinds of ridiculous world-building details thrown in that have no direct bearing on the core premise of intergalactic war and Whorfian linguistic weapons, and one of those is the existence of ghosts, and in fact the requirement that some positions on a starship crew be filled by literal ghosts--or, as they are called in the novel, "discorporate people". The integration of discorporate people into the crew is complicated by the fact that living humans cannot remember anything said by a ghost for more than a few seconds, so special machinery is necessary to allow communication between the living and dead crew members--which is really kind of a neat concept all by itself, and I'd love to see that explored as the basis of a story on its own. (Not necessarily communication with ghosts, but just the idea that there is some class of people whose words cannot be remembered. Cf. the Silence from Doctor Who, but in that case nothing about the person can be remembered once you stop perceiving them, not merely their words.) Rydra ends up using her multilingualism to derive an advantage in this regard--while she can't remember the actual words spoken by a ghost, she gets around this by translating ghosts' speech into another language in her head as they are talking. And while she forgets the original words, she can remember the process of translation, and what she translated them into, and thus recall the content of the conversation without the need for assistive machinery.


If you liked this post, please consider making a small donation!


Tuesday, March 14, 2023

Linguistics & Andy Weir

If you have read one book by Andy Weir, it's probably The Martian (also available in a classroom edition, with less swearing!); or perhaps you have seen the movie, starring Matt Damon!

Unfortunately, there isn't much interesting going on with linguistics or language representation in The Martian. However, Andy Weir has published two other space-adventure hard-SF books: Artemis, set on the Moon, and Project Hail Mary, which goes interstellar. And they actually do some neat stuff which isn't covered by previous books I've reviewed!

The protagonist of Artemis is bilingual in English and Arabic. For the most part, this is just an interesting bit of character-and-world-building background, which ties in to her national origin (not white or American), which in turn ties in to the economy of the titular city of Artemis. The vast majority of the time, she speaks English, and there are couple of brief bits of dialog that are italicized to indicate, aha, this is not English, she's speaking Arabic now. But, there is one absolutely brilliant line of transliterated, but not translated, Arabic dialog, which occurs when Our Heroine is being bothered by a tourist:

"Ma'alesh, ana ma'aref Englizy," I said with a shrug. [...] Nothing like a language barrier to make people leave you alone.

I do not speak Arabic, but I would bet that means something like "Sorry, I don't speak English."

[Goes to check Google Translate.]

Ah, apparently it means "I suck at English." Close enough! As far as I could tell, that is the only place in which bilingualism actually impacts the plot, and it could easily have been left out, but that is totally a thing that a bilingual person might do, in a very relatable situation! It's like a context clue, but relying on your understanding of the social context being described, rather than the literal context.

Project Hail Mary has a much higher count of interesting linguistic bits, but I can't tell you about them without some spoilers. So, if that's a thing you care about, click that Amazon Affiliate link, buy the book, read it, and then come back here and give me another page view!

Are we good now? Good!

The main character, Ryland Grace, starts out monolingual in English. However, he has to interact with speakers of Russian and Chinese, and an alien, along with text in three of their languages. The only Chinese which is directly represented is the name of the Hail Mary mission commander, Yáo Li-Jie, whose family name "Yáo" is represented as a written character and transliterated in Ryland's dialog and narration. The representation of Russian, on the other hand, is not completely consistent, but spans several representational levels in different circumstances:

Level 0: People are said to be speaking in Russian, but Grace doesn't understand it, so we get no explicit representation.

Level 1: Grace can hear people speaking in Russian, and recognize the sounds, so we get a transliteration of the Russian speech into Latin characters. E.g.:

"Eto Stratt. Chto sluchylos?" she demanded.
"Vzryv v issledovatel'skom tsentre," came the reply.
"The research center blew up," she said.

(Also note the partial diegetic translation, with context that allows the non-Russophone reader to infer what the initial question probably was.)

Level 2: When Grace sees Russian text, that text is represented as-is, in the original orthography, regardless of whether or not Ryland can understand it. E.g.:

The name patch reads ИЛЮХИНА, another name from the crest. This was Ilyukhina's uniform.

In this case, Grace does understand it, because he recognizes his crewmate's name, even if he doesn't speak Russian, and we get a diegetic transliteration. The same thing is done with the character for Yáo's name. And we know that he never actually learned Russian because of another instance of direct orthographic representation:

Five 1-liter bags of clear liquid labelled водка. It's Russian for "vodka". How do I know that? Because I spent months on an aircraft carrier with a bunch of crazy Russian scientists. I saw that word a lot.

Not because he learned to actually read Russian--because he saw that word a lot.

There is one example of orthographic representation of Russian in a Russian person's dialog--just a single word--which is where the inconsistency comes in. Ryland wouldn't have understood it (well, maybe he would, just because it sounds really similar to the English word in this case) or known how to write it, so it should've been transliterated for consistency. Unless Andy Weir was just trying to do some fancy thing beyond my understanding with that.

Anyway, the really cool stuff happens once Grace meets an alien, whom he names "Rocky". Rocky is from 40 Eridani A, lives under 28 atmospheres of pressure at over 200 degrees, and "sees" with passive sonar--very reminiscent of the Hot Abyormenites from Hal Clement's Cycle of Fire! (Although the precise mechanism of sound perception and processing between those species is quite different; in that respect, the Eridians remind me more of the Tenebrans from another Hal Clement novel, Close to Critical.) And...

"Fortunately, Rocky speaks with musical chords."

Like the Machi do, or the aliens from The Jupiter Theft by Donald Moffitt. (Huh. Maybe I should review that book some time....) And yeah, that is pretty dang fortunate, because it makes the alien language ridiculously easy to analyze, and to synthesize. I kinda have to assume that that's exactly why Andy Weir decided to design the Eridians that way--Ryland Grace is not a linguist, and while Weir does a remarkably good job of not sweeping first-contact language barriers under the rug, he's made several decisions about how Eridians and their language work that allow skipping a lot of the potential complexity. Donald Moffitt had a slightly different motivation in his work--giving the aliens a musical language allowed him to make it important to the plot that his main character had perfect pitch, which not all humans do, which made that main character specially suited to learn the alien language and, well... be the main character! But, in another parallel between these two works, by the end of the book Weir has Grace using a keyboard to "speak" to Eridians in their native language.

Grace is not stated to have perfect pitch, but he does rely on Rocky speaking in a consistent scale, particularly to have his computer (which does have perfect pitch!) automatically recognize Eridian words. That's not completely unreasonable, but I am quite glad that Weir did not explicitly state that the Eridian language was actually tied to an absolute pitch scale, because, as briefly mentioned in my review of the Machi languages, there are good reasons to think that any naturally-evolved audio communication system for biological beings could not be based on an absolute scale. Additionally, unless I missed something, the simplest syllables that are actually described in the text from Rocky's speech consist of chords of at least two notes, so identifying phonemes by frequency ratios with no fixed scale is a possibility. Unfortunately, we are told two unlikely-seeming things about the nature of Rocky's speech:

  1. Some Eridian words use chords consisting of notes that can be described in terms of named notes on the Western musical scale. That particular pattern of frequencies (or rather, family of patterns of frequencies, depending on which tuning system you use) for making up a scale is not even universal among human cultures, and certainly has no relation to the use of pitch in any human language with phonemic tone or any whistling language, so it kind of defies belief that an alien species would develop a tone-chord phonology that lined up with the modern Western musical scale. I choose to retcon this by saying that Ryland Grace just picked notes that were close enough to the frequency values spit out by his waveform analysis to make things easier to write down.
  2. Rocky is described as transposing his speech by an octave to indicate certain emotional states. It's important that the transposition is exactly one octave, because that makes it easy for Grace to figure out what's going on and fix it when his computer stops recognizing all of Rocky's words. Now, the octave is a very mathematically natural interval... but the idea of octave equivalence isn't actually natural even for humans; it has to be learned, and its importance as a musical concept it also not universal in human cultures. So... why would an alien species develop octave equivalence as a key feature of their natural language?
A lot of the complication of learning an alien language is avoided by making Rocky (a non-viewpoint character) take on most of the load, rather than Grace. Rocky (if not Eridians in general) apparently has an eidetic memory for sounds, including human speech sounds, and can pick up Grace's English words for things on a single exposure. I have to wonder what implications this might have for the childhood Eridian language acquisition process, and how language works for them in general. The immediate implication, however, is that they quickly get to a point where Grace can just speak English and have Rocky understand him, while Rocky adopts a sort of Eridian-English pidgin in which he speaks Eridian words (not being able to articulate the human speech sounds of English) slotted into an English-like grammar. This has the convenient side-effect of meaning that Weir didn't have to actually construct any Eridian grammar! Although, it does appear that Rocky's native language lacks a distinction between nominative and possessive personal pronouns, based on the fact that his italicized dialog never features possessive pronouns.

This kind of "receptive multilingualism", in which each person speaks their own language while understanding the other, is not a new thing, although I believe this is the first media I have reviewed that uses it. It's notably quite common in Star Wars, where it is used for exactly the same purpose: to portray communication between species who can't pronounce each other's languages, most famously when Han Solo is conversing with Chewbacca, or anyone at all is talking with a beeping R2-series droid. However, receptive multilingualism is also a thing in real life, where it does not occur because of differences in physical articulatory abilities (which are the same for nearly all humans), but either as a side-effect of the simple fact that learning to understand a new language is far easier than learning to speak it, or due to cultural restrictions on who is permitted to use various languages.

While the diegetic purposes are the same, however, the presentation to the audience of receptive multilingualism in Star Wars vs. Project Hail Mary is quite different. In Star Wars, multilingual conversations without a translator are always structured such that the half of the conversation which the audience has access to is enough to infer all of the necessary information from the scene. Weir, however, uses a two-layered approach similar to what he does with Russian and Chinese: any Eridian speech that Grace does not understand is presented as a string of Unicode musical note symbols (e.g., ♪ and ♫)--a conceit which I have seen only once before, in Lorinda J. Taylor's The Termite Queen. There are no appropriate Unicode symbols for chords or staffs, so we have to assume that the actually chosen symbols do not represent anything salient about the actual phonetic content of Rocky's speech, except maybe the total number or chords/syllable, or the relative utterance length. Meanwhile, when Grace understands something that Rocky has said, it is presented as an English translation in italics.

As briefly implied above, during their initial interactions Grace uses a computer to record Rocky's utterances and recognize known utterances later to help him understand what Rocky is saying before he learns to recognize Eridian words himself. Additionally, he uses audio waveform analysis software to extract the component frequencies of each utterance. Computer assistance would almost certainly be essential in documenting and decoding any alien language we might come across, but it's too bad that Grace was not trained as a linguist, or he might have known about all of the software tools that exist for analyzing and documenting human language already, and pulled out Praat for doing spectral analysis of Rocky's speech--it would not be the first time Praat had been used to analyze non-human utterances! (A note on worldbuilding: the starship Hail Mary is supposed to have been loaded with every piece of software available to humanity at launch, just in case, so Praat would definitely have been in there.)

There is one instance in which Weir-via-Grace makes an explicit claim about linguistics:
The oldest words in a language are usually the shortest.

Which is... sketchy. Depending on how exactly you interpret it, it might not be false, but it's not particularly useful. For example, old words tend to be common words, and common words tend to be short... but not all common words are old, and not all old words are common. And this topic comes up when Grace is learning Rocky's words for numbers, which brings up the further question of why Grace assumes that numbers would necessarily be old words. However, this statement has absolutely no relevance to the story. Charitably, perhaps it is meant to show that Grace only has no linguistic training, and only folk-understanding of linguistic science? But what really comes across is that the author didn't really know what he was talking about, and the book would've better with that one sentence just cut out.

This does give us a nice segue to talking about Eridian numbers, though. For the most part, the problem of translating between numeric and unit systems, just like the problem of learning a new language, is offloaded to the non-viewpoint character, who is not merely a linguistic savant but also a mathematical savant, able to do unit-of-measure and numeric base conversions instantaneously in his head (er... cephalothorax?). Grace does, however, learn Eridian numbers to decode Eridian clocks, and works out pretty quickly that they have a base-six numeral system. The choice of how to represent Eridian numerals in the text is kind of interesting--much like using musical note symbols to represent Eridian speech (or at least, that Eridian speech is happening), Weir makes use of existing Unicode symbols that are not typically used in English text and which approximate the diegetic forms of the Eridian symbols to show Eridian numerals in the text. That's the closest we come to any representation of Eridian writing, and cleverly avoids needing to include any pictures in the text (aside from the diagram of the ship provided in the front of the book). Now, Rocky has 5 limbs and 15 fingers, so why would the Eridians have a base-6 system? Well, while all of Rocky's limbs are functionally interchangeable, balancing on two legs for a natural tetrapod would be unnecessarily tricky--but an Eridian could stand, and possibly walk, on any three limbs at a time, leaving two free to use as arms, with a total of 6 fingers between them. Thus, developing a base-6 numeral system based on counting the six fingers of two Eridian hands would be directly analogous to humans developing base-10 numeral systems based on counting the 10 fingers of two of our hands. Note that the actual logic behind Eridian numerals is not addressed in the story, but this seems like a reasonable reverse-engineering of the author's probable intent. If Project Hail Mary had instead been written by a human who natively spoke a minority language of Papua New Guinea with a base-27 body-counting system, perhaps the Eridian numeral system would be slightly more opaque.

If you liked this post, please consider making a small donation!


Monday, March 6, 2023

OK, fine, I'll do Arrival

But first, we have to talk about the Story of Your Life, Ted Chiang's novella on which the movie Arrival was based.

(As usual, both of those are Amazon Affiliate links.)

For those who don't know yet, Arrival and Story of Your Life are about the arrival of seven-legged aliens (known simply as Heptapods... because they have seven legs) on Earth,, and Louise Banks's work to decipher their language so humans can talk with them. Or... well, that's what happens. What they are about is a little more complicated.

(Incidentally, I previously reviewed some of Ted Chiang's other stories.)

The novella doesn't show us any of the alien language, but it says a lot about the structure of the language and about linguistics. For example, we know about Heptapod A (the aliens' audio language) that

"The recording sounded vaguely like that of a wet dog shaking the water out of its fur."

In other words, it doesn't use human speech sounds at all. Which is exactly what we should expect from an alien language, really, even though such depictions are conspicuously missing from most movie and TV depiction of aliens--a fact I complained about already in my review of the Halo TV series. This also highlights an issue with xenolinguistics that rarely if ever comes up when doing fieldwork among humans: We might not be able to distinguish alien phonemes! We might not even physically be capable of hearing the frequency bands that contain distinguishing information for alien phonemes! Even if aliens use sound to communicate, deciphering alien languages is going to require a lot more technological assistance than deciphering unknown languages of our species does.

After hearing this recording, Louise, the protagonist and linguist of the story, tells the General trying to recruit her that

"the only way to learn an unknown language is to interact with a native speaker, and by that I mean asking questions, holding a conversation, that sort of thing."

That is... not strictly true. Lost ancient languages (e.g., ancient Egyptian, via the Rosetta stone) have been deciphered with no interaction with existing speakers. But, it's close to true--in every case where we have deciphered a lost language, there was some other source of information available that allowed us to connect form with meaning; parallel translations, or identifying relations to other known languages, etc. So, in Louise's situation, dealing with an extraterrestrial language for which no such auxiliary sources could possibly exist, I might well respond to the General in the same way. 

(If you want to see what this kind of deciphering-a-language-by-conversation stuff looks like in real life, Dan Everett--the Pirahã guy--has a demonstration on YouTube.)

But, the audio language of the Heptapods, known as Heptapod A, is not the most interesting bit of the story. The narratively-important language is Heptapod B, a non-linear two-dimensional written language with no regular correspondence to their spoken language. Heptapod B is not a developed conlang, although it has inspired conlangs like Alex Fink & Sai's UNLWS, and we do get a lot of aesthetic descriptions of it which one could use to try to create a realization of it:

Logograms are stuck together in a giant conglomeration -- sounds kinda like the 3D language of the Demons in Rosemary Kirstein's The Lost Steersman.

Argument roles are indicated by relative orientation compared to the verb -- this feature shows up in the 2D conlang Pinuyo.

Adverbs (or at least the adverb "clearly") can be expressed by regularly morphing the curve of strokes in a verb glyph, and various other semantic features can be indicated by varying a stroke's curvature, thickness, or manner of undulation; or the relative size, distance, or orientation of radicals --  this makes Heptapod B sound like a "fusional" 2D language, as described by Sai.

The overall impression of large Heptapod B utterances is of "fanciful praying mantids drawn in a cursive style" -- which kinda reminds me of Ouwi.

But lest the complexity and integration described for Heptapod B begin to seem impossible to realize for anything usable by a human... "I had seen a similarly high degree of integration before in calligraphic designs,"

So, theoretically, something which fits the design description of Heptapod B should actually be instantiable, even though nobody has actually managed it yet. (Or at least, not made it public that they have done so.) Just... actually using it would be a major undertaking, just like designing a highly-integrated bit of Arabic calligraphy.

Of course, the science fiction bit is not actually realizable--that being that learning Heptapod B, a language that does confine the expression of information to a linear format isomorphic to the flow of time, allows one to break out of the perception of time itself as linear, and see one's entire timeline as a whole.

In the novella, this does not grant anyone any special powers. It's just a vehicle for philosophical ponderings on the nature of free will, and the multiple possible formulations of physics from different points of view--linear cause-and-effect, or wholistic principle of least action. Louise's theoretical knowledge of the future does not allow her to make any different choices; i.e., "Those who read the Book of Ages never admit to it." Once you know the future, you must act it out exactly as it was always going to be.

Now, first let it be known that I actually like the film. It's not perfect, but it's pretty good. And I might be slightly biased by the fact that my oldest child took his first steps in the theatre where we were watching Arrival just after it was released... That was the end of going to movies with a baby!

The film, however, is quite different from the book. Their portrayal of the language is... silly. It's quite understandable that they did not in fact fully instantiate a Heptapod B conlang, but "non-linear" writing is realized just as "writing bent into circles-per-sentence", which display absolutely none of the whole-message graphical integration which is central to the idea of the language in the novella.



Do these look like "fanciful praying mantids" to you? 

The Heptapods themselves are also not entirely text-accurate. In the novella, Heptapod legs are described tentacles, perhaps with supportive vertebrae inside. In the film they are distinctly jointed, with tentacle-like fingers at the ends. In the film, the Heptapods produce ink directly from their own bodies to write, whereas in the novella Heptapods have screens for displaying writing--or at least, they use a machine in which a tentacle is inserted for control. However: 

"I started playing the tape, and watched the web of semagrams being spun out of inky spider's silk."

So, that at least was portrayed pretty well. I have to admit, the swirling ink is a pretty cool visual.

But the more significant changes are to the core theme of the story, and the psychological effect of the Heptapod B language. 

In the novella, humans never enter the alien ship--all communication is by remote viewscreen. There is no attempt to damage the ship, and no humans are ever in danger. There is no politics involved, past Louise convincing the General to give her access in the first place. And the language has no externally observable effects. It let's Louise remember the future, but not tell anyone else about it. It alters psychology in a way that only the experiencer can know. And thus, the duality of points of view remains intact: everything observable, everything that actually happens, can be explained either as a linear sequence of cause and effect, or as a teleological process of optimizing for a known end goal, and neither system will ever disagree with the other. Yes, novella-Heptapod-B exploits a sci-fi Sapir-Whorf effect, but in a subtle way, that doesn't unleash magical powers on the world.

The film throws out all of the interesting philosophy, and just goes with "this language straight-up lets you see the future". And to make that relevant, they have to introduce personal danger, and worldwide political and military turmoil, and explicitly position the language as a tool. In the novella, the Heptapods never explain why they came. They just did. And then they left. Whereas in the film, they have to explain to Louise why they came, so that Louise will realize the power of their tool--they came because they saw the future, and knew that they would need humanity as allies. Thus, they gave humanity their language, and Louise realized that she, too, could use her resultant knowledge of the future to make decisions about the present, just like the movie-Heptapods did. And so Louise's magical abilities brought about by learning an alien language allow her to use information from the future to stop a war. Woohoo.

It's still a good movie, and it's a movie that uses linguistics as the science for its science fiction of which there are not many, and it actually does a halfway decent job of portraying a realistic linguist doing actual fieldwork--admittedly in a very weird environment. But not only does it reach for the absolute bottom of the barrel in terms of what you can do with sci-fi linguistics, exploiting the strong Sapir-Whorf hypothesis to give someone magic powers, it does so despite being derived from an original story which is probably the best usage of the Sapir-Whorf hypothesis that's ever been written, and just throwing that away.

So. Go watch Arrival, it's a good movie. But then, go read Story of Your Life. It's so much better.

If you liked this post, please consider making a small donation!