Saturday, March 25, 2023

The Sci-Fi Linguistics of The Embedding

The Embedding, by Ian Watson, is... not good. It tied for the Campbell Award in 1974 and won the Nebula Award for Best Novel in 1975, and has been called a "modern classic", but, much like the Hugonauts' review of Dune*, while I recognize that it has some great ideas, I just don't think they're actually executed all that well. Now, young people don't always like old SF, but I've read and liked a lot of old SF, and this one just really doesn't hold up on the strength of writing or the story. My feelings pretty much mirror those expressed in this review from 2006--its understanding of linguistic theory is slightly confused, and the multiple plot threads are poorly integrated and redundant, which is a great disappointment given the novel's stellar reputation. I can only assume it has that reputation because it blew everyone's minds by actually engaging with theoretical linguistics at all back in 1974, and nobody's seriously re-evaluated it since then.  But it is the most linguisticky linguistic fiction ever, and explores ideas that have not been done better since, despite the low bar! So, let's talk about those ideas.

*Go subscribe to the Hugonauts! They deserve more listeners.

The introduction to the Gollancz SF Masterworks edition says that "There are two ideas in linguistics that have had a particular influence on twentieth-century science fiction.": the Sapir-Whorf hypothesis, and Universal Grammar. The Sapir-Whorf hypothesis--the idea that the language you speak can influence or control how you think--is low-hanging fruit, and there's tons of SF, some of which I have previously reviewed, that plays with that.

The core idea of Universal Grammar is that humans come pre-wired with an understanding of how language should work; that our brains are built with a standard template for grammar with a multitude of switches that different language merely set in different ways. This idea originates with Noam Chomsky, who famously claimed that Martians studying Earth would conclude that we all spoke mere dialects one humanese--although quite a lot of advancement has been made in linguistics since that time, and Chomsky himself no longer holds that extreme view. If you do run with that extreme view, however, it lends itself to the trope of Incomprehensible Aliens--if we are hardwired to Do Language in a particular way, and they are hardwired to Do Language in a different way, then presumably we could never learn to understand each other's languages and communication would be forever impossible.

The idea of built-in, innate grammar arises from the "Poverty of the Stimulus" argument, which basically goes that human children aren't exposed to a large enough sample of language to learn how it works from first principles in the time it actually takes for children to acquire their first language. Any language whose rules didn't conform to whatever that built-in template is then could not be learned by children. This, of course, depends on the assumption that the stimulus is actually impoverished--that there really isn't enough information in the ambient linguistic data to which children are exposed during their lives to calculate the correct rules of a human language--and that assumption is not without controversy.

It is clear that humans must have some innate capacity for language--after all, something makes the difference between a human baby who learns to understand and speak English in only a few years, versus, say, a kitten who grows up in the same house and maybe learn to recognize a few individual words. Whatever that is is called "the biological endowment", and the idea that that could vary between linguistically-capable species is unexplored here, or in any other published story that I am aware of. But exactly how extensive our innate knowledge specific to language is, is still an active area of research, and the idea of an extensive Universal human Grammar can be attacked from two directions:

  1. Showing that, for some particular linguistic feature, the stimulus is not actually impoverished--that children are exposed to enough of the right kinds of examples to just "figure it out".
  2. Showing that we have some innate cognitive biases relevant to linguistic learning, but that they are not specific to linguistic learning--thus, other linguistically-capable species may well exhibit exactly the same linguistic biases, because of developing the same general reasoning capabilities.

Somewhat confusingly, some people use "Universal Grammar" to refer to any innate knowledge relevant to language, not just that which is specific to human language, and only evidence of type 1 is relevant to disproving that kind of Universal Grammar. But given the particular feature that Ian Watson chose to focus on (the eponymous "embedding"), and how interactions with aliens are portrayed in the book (they are fascinated by exactly the same constraint), I have to assume that that was the understanding that Watson had of the term "Universal Grammar"--that it was not merely universal to humans, but cosmologically "universal", based on principles that would be reliably replicated in any intelligent mind.

In some ways, Watson's choice of feature to focus on is a clever one; center embedding is an easy concept to explain to readers who are otherwise lacking in theoretical linguistic education, and he does just that in conversations between characters. For those who do not wish to go read the novel looking for the definition, center embedding is just taking a particular grammatical structure--like a relative clause--and sticking n the middle of another structure of the same type, rather than at one end or the other. For example, take the sentence "This is the malt that the rat ate."--it's got a relative clause in it. We can self-embed another relative clause at the edge like this: "This is the malt that was eaten by the rat that was worried by the cat." Or, we can center-embed that relative clause--stick it in the middle of the first relative clause, breaking that up--like this: "This is the malt that the rat that the cat worried ate." That's harder to understand, but it's the sort of thing that might be said from time-to-time. But what if we add a third clause? "This is the malt that the rat that the cat that the dog chased worried ate." That's... really hard to interpret, and people just don't speak that way! And if you one more level... no, that'll never happen!

However, despite being a clever choice of linguistic phenomenon, it's not actually a test of Universal Grammar, as Chomsky intended the term! In fact, this gets a completely different bit of Chomskyan linguistics, which Watson completely ignores: the distinction between competence and performance. "Competence" is what you know about the rules of language, and your ability to judge things as grammatical or ungrammatical. It is competence that allows us to say, yeah, mechanically, we could add a fourth embedded clause to that horrible incomprehensible sentence, and it wouldn't violate any grammatical rules. We are capable of learning the rules that would let us do that. Competence is what lets us look at the famous sentence "Colorless green ideas sleep furiously," and say "yeah, it's grammatical, but...." Meanwhile, performance is the fact that we sometimes make mistakes that we know are mistakes, and that we can rate things as acceptable or unacceptable, because they do or don't make sense or because they are easy or hard to interpret, independent of whether or not they are grammatical. The limitations on center embedding in English aren't grammatical, and this tell us nothing about the rules of Universal Grammar--they are just a consequence of the fact that humans have limited short-term working memory, so we lose track of the first halves of multiply-embedded structures before we get to the end! And in fact, you can prove that there is no hard grammatical limit on embedding depth by observing that equally-embedded structures can be more or less acceptable depending on which precise nouns, pronouns, and adjectives you happen to use; compare, for example: "The rat which the cat which the dog chased bit fell." vs. "The elegant woman whom the man that I love met moved to Barcelona."

Each of the three major plot threads in the novel has its own mini-linguistic-ideas as well. The aliens engage with the Sapir-Whorf hypothesis in thinking that learning more languages--and specifically, learning a heavily-center-embedding language--will allow them to achieve new metaphysical abilities. The Amazonian natives use drugs to expand their linguistic competence, which could be interpreted as a precursor to the drugs used by Sheila Finch's Guild of Xenolinguists. And the opening thread of the book deals with conducting The Forbidden Experiment--isolating children from natural adult language to see what happens, or what you can make happen, in order to explore the boundaries of the biological endowment and of any Universal Grammar that we might have. As you can see from that Wikipedia link, The Embedding was not the first to explore this idea--it shows up in books, comics, and even The Twilight Zone. And if you believe in the strong Sapir-Whorf hypothesis, or linguistic determinism, it can be quite a compelling idea--raising children without exposure to any existing language would release them from the limitations of those languages, would it not? But... no. That's not actually what happens at all. It is very easy when thinking about problems in language acquisition and Universal Grammar to start thinking, "ugh, if only we could controlled experiments on the acquisition process, we could answer so many questions so much more easily!" In fact, while working on this article, complete by accident, I came across this Tweet:

Which is a serious exaggeration, and if you click through to read the ensuing thread and quote-tweets, it's one which many people disagree with and have very strong feelings about, for good reason! But it is an exaggeration of something real. Let's be clear: when we say "intrusive thoughts", we mean "intrusive thoughts"; a lot of linguists really want to know what's going on in kids' heads when they acquire language, and Lingthusiasm even sells baby onesies with "Daddy's Little Longitudinal Language Acquisition Project" on them (which I have proudly clad all three of my children in!) but nobody is going around thinking "man, I would totally raise some kids in linguistic isolation for 10 years if it weren't for that pesky Ethics Review Board!" (Or at least, we all hope nobody is thinking that!) It is the sort of thing that you put in a depressing dystopian SF novel (which the Embedding most definitely is! No one makes good decisions, and the ending is typically Cold-War-Era depressing)--or, if you think of it "for real", you immediately feel bad about it and move on to trying to find practical methods of getting the data you want, work on something else. Unfortunately, we actually do have some data on linguistic deprivation, from studies of rescued feral children and deaf children of hearing adults who do not speak a sign language, and the effects of these "natural experiments" are dire, and a source of ongoing trauma to the Deaf community. So, no, language deprivation does not give you special insight or psychic powers--it just gives you brain damage.

The fact that the main character of The Embedding actually performed a deprivation experiment thus clearly marks him as a villain, and that's only the first of the many unethical things that are done for the sake of "science" and "progress" in this book. And what's more, the particular experiments Watson describes aren't actually testing Universal Grammar, (the embedding experiment, realistically, is just training short-term working memory) which removes even the scientific justification! Every character is just straight-up unlikeable. So please, if you are an author--go write something that engages with theoretical linguistics as deeply as The Embedding does, but is more fun to read!

An additional note: The novel claims that "Stone Age children" took "hundreds of generations" to develop language; that's seriously misleading, based on what we know from some natural experiments. We have no idea exactly how long it took our biological endowment for language to evolve, but it seems that biologically-modern human children, in an appropriate social context, will spontaneously generate languages within one generation. This is evidenced by the development of pidgins into creole languages, and the spontaneous generation of new sign languages when new deaf communities are established--see, for example, the case of Nicaraguan Sign Language.


If you liked this post, please consider making a small donation!


No comments:

Post a Comment