Gliese 1337: linguistics

Showing posts with label linguistics. Show all posts

Monday, May 26, 2025

Tlön, Uqbar, Orbis Tertius

Tlön, Uqbar, Orbis Tertius is a 1940 short story by Jorge Borges, translated into English in 1961 and appearing in the collection Labyrinths.

In modern terms, it describes the discovery of a secret society of sci-fi engaged in a multi-generational worldbuilding project--creating an encyclopedia of the world of Tlön. This is essentially the "explain a film plot badly" summary, but a proper summary is very philosophical and literary, and if you want that you can go read the Wikipedia page or something. I'm just here for the linguistic references!

There are no nouns in Tlön's conjectural Ursprache, from which the "present" languages and the dialects are derived: there are impersonal verbs, modified by monosyllabic suffixes (or prefixes) with an adverbial value. For example: there is no word corresponding to the word "moon," but there is a verb which in English would be "to moon" or "to moonate." "The moon rose above the river" is hlör u fang axaxaxas mlö, or literally: "upward behind the onstreaming it mooned."

This isn't actually particularly odd. Many indigenous North American languages--especially the Salish family--are famous for having a heavy preference for verbs, and deriving nouns from relative clauses or participle-like constructions. Native American languages are not particularly famous for monosyllabic roots, but there's no particular reason those features should not be combined. Such a language could easily turn up in nature, and I would not be slightest bit surprised if someone discovered a language exactly like that somewhere in Papua New Guinea! Borges clearly did not generate a complete Tlönian conlang for this short work, but there is some translated Tlönian there which we might as well try to analyze, just for fun. I don't know what the original text looked like, but at least in English translation, there is not a one-to-one correspondence between Tlönian words and English words, which is a plus. Many times, translations between arbitrary will end up with the same number of words just by chance, but by not matching up in number, we know that the words cannot match up one-to-one in sense, which gives us an excuse to think up different ways that information could be organized in the Tlönian sentence. Obviously, with such little data, it's impossible to settle on one obviously correct answer, but I like to think that Tlönian uses something like a relational noun construction (where the "noun" is of course actually a verb), and has reduplication for extended actions, leading to hlör = upward; u fang = at what-is-behind; axaxaxas = multiply-reduplicated form of ax, "to flow", maybe with an adverbial suffix as for "onward", "towards a goal"; and mlö "(it) is-the-moon". Incidentally, "Axaxaxas mlö" is also the title of one of the books mentioned in Borges more famous story The Library of Babel.

We get one other word of Tlönian, though, which seems very mugh like a noun: hrönir (singular hrön), referring to duplicate instances of things which are lost once and then found multiple times. Maybe it's actually a verb meaning "to be found multiple times" and -ir isn't so much a plural as a pluractional or something. About other parts of Tlön, we are told that

In [languages] of the northern hemisphere [..] the prime unit is not the verb, but the monosyllabic adjective. The noun is formed by an accumulation of adjectives. They do not say "moon," but rather "round airy-light on dark" or "pale-orangeof-the-sky" or any other such combination. In the example selected the mass of adjectives refers to a real object, but this is purely fortuitous.

This, too, is actually not so strange after all. Per Topics in Warlpiri Grammar by David Nash, Warlpiri does not formally distinguish nouns from adjectives, and can string them together in any order to pinpoint a more precise concept which is the intersection of all the provided descriptors. What would be strange is if, rather than merely focusing on "adjectives", northern Tlönian in fact only had words for semantic attributes and not entities; but Borge himself seems to have had a hard time conceiving of that, given that he includes "sky" in the provided glosses. The philosophy of not having fixed words for specific objects, but just using contextually-relevant descriptions as needed, regardless of whether we think of any of those descriptors as "adjectives" or "nouns" is, however, strongly reminiscent to me of the communicative philosophy of Toki Pona. So even if this is a less naturalistic vision of language than the first version of Tlönian represents, it has at leat been shown to be emminently workable by a conlang community arising some 61 years after Borge posed this idea.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Tuesday, March 25, 2025

Some Thoughts on Iljena

Iljena is an alien conlang by Pete Bleackley, also the author of Khangaþyagon which I reviewed previously.

The key conceit of Iljena is that all words encode both a nominal root and a verbal root--and based on both the grammar notes and the dictionary, there are no other parts of speech. All verbs are monovalent, and you construct large propositions by chaining together noun-verbs that describe what each participant is doing. It's sort of like the disambiguation strategy sometimes employed in natlangs where a transitive clause that lacks distinctive subject and object marking (like two neuter nouns in a positive-polarity Russian sentence, or nouns with equal animacy in a direct-inverse language) can be split in two with an antipassive clause and a passive clause--i.e., instead of "Bob saw Bill", "Bob saw, Bill was seen". Except that Iljena doesn't have a passive construction, it just has enough different verb roots to cover all the necessary meanings, whether one is an agent or patient or instrument or whatever in any particular scene.

With the lack of any other parts of speech, however, it is unclear how boundaries between clausal constituents are determined, how attachment ambiguities might be resolved, or how references to events-as-things are made, and the only ordering constraint is that

Word order is used to convey the flow of the action between the participants, and to bring together closely related participants.

However, David Gil has shown us that you don't really need to formalize all that grammatical machinery all of the time, and the corpus of Conlang Relay texts in Iljena, which have been translated reasonably faithfully by following relay participants, demonstrates that it does work well enough. Pete's own documentation notes that Iljena could be considered a "verbless" language, based on the idea that verb roots could instead be interpreted as noun cases (which is one of the possible solutions to verblessness I discussed in my own article How To Not Verb), but he (and the fictional Leyen people who speak Iljena) prefer to think of the relevant open lexical class as verb roots, rather than case morphology--and I tend to agree. The complete lack of function words makes Iljena a decidedly non-human language, but that's fine--it's not supposed to be!

As noted, Iljena does seem to work just fine as it is, so I won't presume to suggest improvements--but I think it would be neat to see a language that takes the one verb--one noun approach and embeds it in a larger system of grammatical function words for eliminating structural ambiguities. And it would also be neat to see some more detailed analyses of the existing corpus texts, beyond simple interlinear glosses, that might be able to extract more empirical rules about Iljena grammatical structure.

Some Thoughts... Index

Sunday, October 20, 2024

A Brief Note on John Wick

The actual Russian dialog in the John Wick movies is, uh... not great? But, the fact that John Wick is diegetically fluent in Russian ends up kicking off the plot of the first movie, when Russian gangster Iosef tries to buy John's car. Iosef asks how much, John says it ain't for sale, then, from the script:

                                              IOSEF
                         (in Russian, subtitled)
                     Everything's got a f[*****]g price.
                         
                                              JOHN
                         (in Russian, subtitled)
                     Maybe so... but I don't.

          Taken aback by John's fluency, he watches as John enters the
          vehicle, guns the engine, and drives off.

(Censored for sensitive eyes.)

However, that's not actually how it was filmed! The Russian dialog for that scene in the movie is as follows (or at least, my interpretation of it; the pronunciations are bad):

                                              IOSEF
                     У всего, сука, своя цена.
                         
                                              JOHN
                     А у этой суки нету.

This is closed-captioned as

                                              IOSEF
                     Everything's got a price, b[***]h.
                         
                                              JOHN
                     Not this b[***]h.

Which is not word-for-word, but essentially accurate. Given that Iosef did not expect John to understand him, we have to assume that his switch into Russian was expressing frustration to himself, even though it contains a vocative, clearly addressing the sentiment to John. Possibly, he was going to switch back into English to attempt another pitch, after reminding himself that everything has a price. And if that's what had happened, then this insertion of Russian dialog would've been just a bit of implicit character exposition, with a bit of an Easter Egg for a Russophone audience. But John responding at all suddenly changes the dynamic. That's also an implicit character exposition moment--we learn that John, despite being American, speaks Russian for some reason, which is further explicated later on. But in the scene, Iosef realizes that John must have understood him, and knows that Iosef was insulting him! That turns the outcome of the interaction into a face-threatening issue. Now, in addition to still wanting the car which John has denied him, Iosef has to back up the implied threat of his insult to save face.

The change in dialog from the script also adds a layer of double meaning, because John has his (female) dog with him in the car. Thus, Iosef could be interpreted as insulting the dog (which--spoiler alert--he later kills), which John has a strong emotional attachment to. (It turns out the Russian word for "female dog" has exactly the same insulting double-meaning that it does in English!) Out of context, John's reply could even be interpreted as claiming that his dog is not for sale, as opposed to his car--and both interpretations are true! The same cannot be said about Iosef's statement, but the oblique association is a nice addition to the scene as filmed.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Tuesday, March 19, 2024

Human Actors Shouldn't Be Able to Speak Alien Languages

Isn't a little weird that humans can speak Na'vi? Or that aliens can learn to speak English? Or, heck, Klingon! The Klingon language is weird, but every single sound is used in human languages.

Of course, there's an obvious non-diegetic reason for that. The aliens are played by human actors. Actors wanna act. Directors want actors to act. It's less fun if all of your dialog is synthesized by the sound department. But while it is an understandable and accepted trope, we shouldn't mistake it for representing a plausible reality.

First, aliens might not even use sound to communicate! Sound is a very good medium for communication--most macroscopic animals on Earth make use of it to some extent. But there are other options: electricity, signs, touch, light, color and patterning, chemicals. Obviously, a human actor will not, without assistance, be able to pronounce a language encoded in changing patterns of chromatophores in skin, nor would a creature that spoke that language have much hope of replicating human speech. But since sound is a good and common medium of communication, let's just consider aliens that do encode language in sound.

The argument was recently presented to me that aliens should be able to speak human languages, and vice-versa, due to convergent evolution. An intelligent tool-using species must have certain physical characteristics to gain intelligence and use tools, therefore... I, for one, don't buy the argument that this means humanoid aliens are likely to start with, but supposing we do: does being humanoid in shape imply having a human-like vocal tract, or a vocal tract capable of making human-like noises? I propose that it does not. For one thing, even our closest relatives, the various great apes, cannot reproduce our sounds, and we can only do poor approximations of theirs. Their mouths are different shapes, the throats are different shapes, they have different resonances and constriction points. We have attempted to teach apes sign languages not just because they lack the neurological control to produce the variety of speech sounds that we do, but also because the sounds they can produce aren't the right ones anyway. Other, less-closely-related animals have even more different vocal tracts, and there is no particular reason to think they would converge on a human-like sound producing apparatus if any of them evolved to be more externally human-like. We can safely assume that creatures from an entirely different planet would be even less similar to us in fine anatomic detail. So, Jake Sully should not be able to speak Na'vi in his human body, and should not be able to speak English in his avatar body--yet we see Na'vi speaking English and humans speaking Na'vi all the time in those movies.

And that's just considering creatures that make sounds in essentially the same way that we do: by using the lungs to force air through vibrating and resonant structures connected with the mouth and nose. Not all creatures that produce sound do so with their breath, and not all creatures that produce sound with their breath breathe through structures in their heads! Intriguingly, cetaceans and aliens from 40 Eridani produce sound by moving air through vibrating structures between internal reservoirs, rather than while inhaling or exhaling--they're using air moving through structures in their heads, but not breath!

Hissing cockroaches make noise by expelling air from their spiracles. Arguably, this should be the basis for Na'vi speech as well: nearly all of the other animals on Pandora breathe through holes in their chests, with no obvious connection between the mouth and lungs. They also generally have six limbs and multiple sets of eyes. Wouldn't it have been cooler to see humanoid aliens with those features, and a language to match? But, no; James Cameron inserted a brief shot of a monkey-like creature with partially-fused limbs, no operculi, and a single set of eyes to provide a half-way-there justification for the evolution of Na'vi people who are just like humans, actually.

Many animals produce sound by stridulation. No airflow required. Cicadas use a different mechanism to produce their extremely loud songs: they have structures called tymbals which are crossed by stiff ribs; flexing muscles attached to the timbals causes the ribs to pop, and the rest of the structure to vibrate. It's essentially the same mechanism that makes sound when you stretch or compress a bendy straw (or, as Wikipedia calls them, straws with "an adjustable-angle bellows segment"). This sound is amplified and adjusted by passage through resonant chambers in the insects' abdomens. Some animals use percussion on the ground to produce sounds for communication. Any of these mechanisms could be recruited by a highly intelligent species as a means of producing language, without demanding any deviation from an essentially-humanoid body plan.

There is, of course, one significant exception: birds have a much more flexible sound-production apparatus than mammals, and some of them are capable of reproducing human-like sounds, even though they do it by a completely different mechanism (but it does still involve expelling air from the lungs through the mouth and nose!) Lyrebirds in particular seem to have the physiological capacity to mimic just about anything... but they extent to which they choose to imitate unnatural or human sounds is limited. Parrots and corvids are known to specifically imitate human speech, but they do so with a distinct accent; their words are recognizable, but they do not sound like humans. And amongst themselves, they do not make use of those sounds. Conversely, intraspecific communication among birds tends to make use of much simpler sound patterns, many of which humans can imitate, about as well as birds can imitate us, by whistling. So, sure, some aliens may be able to replicate human speech--but they should have an accent, and if their sound production systems are sufficiently flexible to produce our sounds by different means, there is no reason they should choose to restrict themselves to human-usable sounds in their own languages. Similarly, humans may be able to reproduce some alien languages, but they will not sound like human languages--and when's the last time you heard a human actor in alien makeup whistling? (Despite the fact that this is a legitmate form of human communication as well!)

The most flexible vocal apparatus at all would be something that mimics the action of an electronic speaker: directly moving a membrane through muscular action to reproduce any arbitrary waveform. As just discussed, birds come pretty close to capturing this ability, but they aren't quite there. There are a few animals that produce noise whose waveform is directly controlled by muscular oscillation which controls a membrane, but they are very small: consider bees and mosquitoes, whose buzzing is the result of their rapid wing motions (or, in the case of bumblebees, muscular vibrations of the thorax). Hummingbirds are much bigger than those insects, and they can actually beat their wings fast enough to create audible buzzing sounds (hence, I assume, the name "humming"bird), but they are still prety small animals. And despite these examples of muscule-driven buzzing, it seems rather unlikely that a biological entity--or at least, one which works at all similarly to us--could have the muscular response speed and neurological control capabilities to replicate the complex waveforms of human speech through that kind of mechanism. But if they did (say, like the Tines from Vernor Vinge's A Fire Upon the Deep), just like parrots and crows, why would their native communication systems happen to use any sounds that were natural for humans?

Now, some people might argue with my assertion that "any of these mechanisms could be recruited... as a means of producing language". That doesn't really impinge on my more basic point that an alien language should not reasonably be expected to be compatible with the human vocal apparatus, but let's go ahead and back up the assertion anyway. Suppose a certain creature's sound-production apparatus isn't even flexible enough to reproduce the kinds of distinctions humans use in whistled speech, based on modulating pitch and amplitude (which cicadas certainly can). Suppose, in fact, that it can produce only four distinct sounds. That should be doable by anybody that can produce sound ata ll--heck, there are more than 4 ways of clapping your hands. With 2 consecutive sounds, you can produce 16 distinct words. If you allow 3, it goes up to 80 words. At a word length of 4 or less, you've got 336 possible words. So far, that doesn't sound like very much. But then, there are 1360 possible words of length 5 or less, and 5456 of length 6 or less. At a length of 7, you get 21,840 possible words--comparable to the average vocabulary of an adult English speaker. The average length of English words is a little less than 5 letters, and we frequently (9 letters) use words that are longer than 7 letters, so needing to go up to 7 to fit your entire adult vocabulary isn't too bad. And that's before we even consider the ability to us homophones to compress the number of distinct words needed! So: we might argue about exactly how many words are needed for a fully-functional language with equivalent expressive power to anything humans use, but through the power of combinatorics, even small numbers of basic phonetic segments can produce huge numbers of possible words--indisputably more than any number we might come up with as a minimum requirement. A language with only four sounds might be difficult for humans to use, as it would seem repetitive and difficult to segment... but we're talking about aliens here. If 4 sounds is all their bodies have to work with, their brains would simply specialize to efficiently process those specific types of speech sounds, just as our brains specialize for our speech sounds.

Now, to be clear, this is not intended to disparage any conlanger who's making a language for aliens and using human-compatible IPA sounds to do so. It's an established trope! And even if it's not ever used in a film or audio drama, it can be fun. There are plenty of awesome, beautiful examples of conlangs of this type, and there's no inherent problem with making more if that's what you want to do. Y'all do what you want. But we should not mistake adherence to the trope for real-world plausibility! And it would be great to see more Truly Alien Languages out there.

Sunday, February 25, 2024

Review: "Reading Fictional Languages"

I'm going meta! I'm reviewing people who are reviewing people who use conlangs in fiction!

Reading Fictional Languages (that's an Amazon Affiliate link, but you can also get it directly from Edinburgh University Press) is a collection of articles that follows up on the presentations given at the eponymous Reading Fictional Languages conference, which brings together both creators and scholars of constructed languages used in fictional works. I was provided with a free review copy as a PDF, but not until after I had bought my own hardcover anyway.

The first thing to note is that the title is kind of poorly chosen. It is telling that articles by conlangers refer to their subject as "constructed languages" or "conlangs", while articles by literary scholars refer to their subject as "fictional languages". Based on personal communication with some of the contributors, it seems that the organizers of the conference on which this volume was based (which I did submit an abstract for myself, but was not accepted) were unaware of the modern conlanging community and taken somewhat by surprise when actual language creators showed up to talk about their work! And they had thus developed their own analytical terminology ahead of time in isolation from conlanging practitioners.

Chapter 1, the introduction, contrasts "real" languages with languages which are "imagined for an equally fictional community of users, where the environment is being imagined at the same time as the language is being constructed". However, that misses out on a very important distinction in the types of non-natural languages that are actually used in fictional works: those that do not exist as usable languages in the real world, and those that do. I.e., those which actually are fictional, and those which are real, despite being artificially constructed.

Skipping to page 77, in Chapter 6: "Design intentions and actual perception of fictional languages: Quenya, Sindarin, and Na’vi", by Bettina Beinhoff, specifies that "fictional languages" are a subset of "constructed languages", being languages constructed for use in fictional works. That's sensible, but when talking about Quenya, Sindarin, and Na'vi in particular--all languages which have been heavily developed and actively used by communities outside of their fictional contexts--it really highlights the inadequacy of this academic terminology.

We also get an explanation of the "Reading" part of the title--in short, it's about the reader's interaction with a text, and how the use of invented languages influences the creative process and the reading experience. Apart from defining terminology, however, Chapter 1 does provide a decent overview of the history of invented languages in fiction and of the proceeding contents of the book.

Chapter 2, by David Peterson and Jessie Sams (who has since become Jessie Peterson) explores the nature of working with television and film makers as a language creator. I couldn't possibly do this justice in summary; David and Jessie probably have more experience with film and TV language construction than everyone else in the industry combined, and they certainly know what they're talking about! One complication of working in Hollywood, however, is not unique to working in Hollywood:

A script writer often won’t have heard of language creation and will have no sympathy for someone whose role they don’t understand commenting that the line of dialogue they want to be cut mid-word won’t work in translation because the verb in the conlang comes at the end of the sentence and won’t have been uttered yet if cut off after three words

That's basically the lament of every translator ever! Especially the ones that have to translate dialog for foreign-language editions of novels, movies, and TV shows.

Just from having been active in the conlanging community for a good long time, there was a lot in this chapter that I already knew, even though I could not have articulated it as well as David and Jessie do. But the biggest insight I gained came in an explanation of how the form of a constructed language is constrained by the needs of a film production--and not just in the sense that actors need to be able to use it. Additionally, the language creator needs to be able to translate rapidly, which means they need to construct a language that is easy for them to use without too much practice. I have long thought that Davidsonian languages all seem to have a common sort of character about them, which is partially attributable to David's construction process--but now I can see there's a darn good reason for it, and I can't actually blame him! That's just more reason to work towards getting a greater diversity of language creators into the film industry, so that we can start to see a greater diversity of languages reflecting differences in what is easy for individual creators to use in service of the needs of a film production.

I found Chapter 3 "On the inner workings of language creation: using conlangs to drive reader engagement in fictional worlds", by BenJamin Johnson, Anthony Gutierrez, and Nicolás Matías Campi, to be the most immediately useful to me, and probably to most of the people who read my blog (or at least, the intended audience for the Linguistically Interesting Media Index, which is authors who want to figure out how to do this better!) It's pretty comprehensive, covering why you might want to do this, how to handle collaboration between an author and a conlanger if you don't happen to fill both rolls yourself, and some very basic stuff about the mechanics of actually using a conlang in fiction. This is where BenJamin introduces his 5-level categorization of the types of textual representation for conlangs, which I immediately latched onto and began expanding on after seeing the conference presentation that preceded this chapter, as a complement to my own categorization of comprehension-support strategies.

Chapter 4 is a case study in creating dialectal variation in a constructed language. Useful for a language creator, but you're left on your own as far as making use of that variation in your fiction writing. Personally, I think it might be hard to justify, given the difficulty of representing natural language dialects in a non-annoying way in most modern writing. Of course, if you get one of those coveted film jobs, it becomes more practical; see, for example, Paul Frommers call back to create a new dialect of Na'vi for The Way of Water.

Chapter 5, by Victor Fernandes Andrade and Sebastião Alves Teixeira Lopes, is an exploration of the visual influence of Asian scripts on alien typography in science fiction media. I'm not completely convinced, but the argument is worth reading. They've got interesting data to look over, at least.

I already briefly mentioned Chapter 6; essentially, it determines that the languages studied were perceieved as intended on some subjective axes, such as "pleasantness", by a surveyed population, but failed in aethetic design aims on other axes, and that cultural context is important to aesthetic evaluations. Chapter 7 "The phonaesthetics of constructed languages: results from an online rating experiment" by Christine Mooshammer, Dominique Bobeck, Henrik Hornecker, Kierán Meinhardt, Olga Olina, Marie Christin Walch, and Qiang Xia is essentially the same thing, just better, as it covers a broader selection of conlangs, and gathers responses from both English and German speakers, rather than just English speakers from the UK, and controls for gender, age, and linguistic background. They additionally tested listeners' abilities to discriminate between conlangs, as well as their subjective evaluations. This is potentially useful information for conlangers who are trying to target a particular aesthetic effect on a particular audience--however, it also suggests that doing specific research on this isn't really necessary for a creator, as the languages studied were pretty good at achieving their creators' stated goals already!

Chapter 8 "Tolkien’s use of invented languages in The Lord of the Rings" by James K. Tauber is basically exactly what I do on this blog--an analysis of how secondary languages are used in a fictional work to augment the narrative! I've avoided doing this sort of analysis on The Lord of the Rings myself because it is a Very Large Work, so I'll definitely be coming back to this chapter to see what I can integrate into my own analytical system later.

Chapter 9 "Changing tastes: reading the cannibalese of Charles Dickens’ Holiday Romance and nineteenthcentury popular culture" by Katie Wales analyses the representation of a truly fictional language--one which does not exist as a developed and usable language in the real world--in terms of the sociological environment in which it was published, and how the tastes of modern audiences and thus the appropriate means of cultural representation have changed over time. It is a reminder that appreciating old literature often requires being intentional about not ascribing modern points of view and modern judgments on people of the past, and trying to understand the literature as it would've been read by it's original intended audience.

Chapter 10 "Dialectal extrapolation as a literary experiment in Aldiss’ ‘A spot of Konfrontation’" by Israel A. C. Noletto reads like a pretty standard sample of Dr. Noletto's work; he's the only academic author represented in this volume with whom I have a prior acquaintance, such that I can compare his other work! Noletto argues that " the presence of an unfamiliar fictional language interlaced with English as the narrative medium does not necessarily constitute a barrier to understanding as might otherwise be expected", and that the use of the extrapolated dialect in fact serves as an important means of conveying the theme of the story through narrative style. There's a little bit of my sort of detailed analysis of the text to show it is constructed to support comprehension.

Chapter 11 "Women, fire, and dystopian things" by Jessica Norledge examines the successes, failures, and impact of Suzette Haden Elgin's Láadan language as a language for a dystopia--and particularly as a language meant to expand the user's capacity for thought, in contrast to other dystopian languages, like 1984's Newspeak, which are intended to restrict thought in a Whorfian fashion. The title is of course a reference to George Lakoff's Women, Fire, and Dangerous Things.

Chapter 12 "Building the conomasticon: names and naming in fictional worlds" by Rebecca Gregoryis a broad survey of how names are constructed and reflect language and culture--or fail to do so--in a variety of fictional works. She ends with "with a bid for names to be seen as just as fundamental a part of language creation and conceptualisation as any other of language’s building blocks", which I can only read as a plea to academics doing literary analysis, not language creators or authors, given the broad recognition that already exists in the conlanging community of "naming languages" as a thing that is useful in worldbuilding for fiction across many types of media.

Chapter 13 "The language of Lapine in Watership Down" by Kimberley Pager-McClymont analyses the idioms, conceptual patterns, and attested formal structure of the Lapine language, how it is connected to the embodied experience of rabbits, and thus contributes to generating empathy in the reader for non-human protagonists. An excellent case study to reference for conlangers who want inspiration on the developing the connection between language and culture, and especially for those working on non-human languages.

The final chapter, 14, "Unspeakable languages" by Peter Stockwell, presents another case where my intuitions clash with the chosen terminology. Stockwell examines languages which are difficult or impossible to represent directly in the narrative--i.e., a subset of truly fictional languages which necessarily remain fictional for practical reasons related to their asserted nature, not merely because the author didn't bother to flesh them out. Stockwell introduces the term "nonlang" for what I would simply call a fictional language. Terminological disputes aside, though, this chapter presents an intriguing overview of how science fiction works have dealt with the concept of the "linguistically ineffable"--languages which we can never hope to decipher or understand. The only quibble I have with the actual content is that Stockwell claims that "it is evident that the pragmatics of a question and an exclamation are still carried even in Speedtalk by intonation (marked here by ‘?’ and ‘!’)."--but that is an unwarranted conclusion based on the evidence presented, as intonation is definitely not evident on the page, and we should not assume that the use of '?' and '!' in the text actually correspond to intonation contours in the fictional spoken form--or, if they do, that the intonation contours so indicated actually correspond to questions and exclamations, given that the Speedtalk text is untranslated and explicitly not understood by the character transcribing it.

Overall: I have some complaints, and not all chapters are of equal quality or usefulness from my point of view--but there is plenty of good stuff in here that makes it worth a read, and I for one am strongly in favor of further, perhaps more intentional, collaborations between academics and conlangers in analyzing the use of constructed languages in fiction.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Saturday, February 24, 2024

How Would We Know If We're Talking to Aliens?

A follow-up to my review of Xenolinguistics.

Suppose we encounter aliens and begin linguistic fieldwork in earnest. Or, suppose that we have reason to believe we may have finally successfully decoded a language of cetaceans or cephalopods (who for all practical purposes in this context may as well be aliens, despite living with us on Earth). How would we be able to tell that we actually got it right--that we understand what they mean, and that they understand what we mean? In particular, how would we overcome the Clever Hans Effect?

Language is ultimately a noisy and lossy channel; a great deal of human communication involves the receiver inferring what the sender probably meant, not directly extracting information that is unambiguously encoded in the linguistic signal itself. And even among humans, this can frequently go wrong, resulting is misinterpretations. But at least living humans can object when they are misinterpreted, and try to correct the miscommunication. That is much, much harder for nonhumans with whom we do not already share a language--and for dead or otherwise unavailable humans who have left behind undeciphered documents.

In these situations, it is all too easy to impute meaning from our own minds onto signals that arise from a totally different intent, or have no meaning at all. And if we're only reading out the information that we unintentionally inserted ourselves, we're not really communicating, are we?

So, there need to be ways to validate our decipherments--ways to obtain information from a non-human entity that we know we could not have provided ourselves. One option, which has been used with human texts, is to hold out validation data; if you can decipher the hieroglyphics on the Rosetta Stone without reference to anything but the Rosetta Stone, and then the system you derived turns out to produce sensible-looking results for other collections of hieroglyphics, then you've probably got it right. If you claim to have deciphered the entire Voynich Manuscript, big deal, it's only the 10th claim this year; but if you claim to have deciphered a few pages in isolation, and other people can use your system to make sense of the rest of it, that would be a much stronger claim.

Theoretically, this could be done with aliens as well. We have, as a species, collected quite a lot of recordings of whale song that could serve as validation data, for example. But it does require special circumstances to be able to collect that data. For example, if we find some technologically-primitive tribe in an alien rainforest (or even an Earthly rainforest for that matter), who do not have written records to reference, would we be terribly surprised if they objected to us setting up equipment to record everything they say just so we can analyze it later? It would be much better to have access to interactive methods, even though interaction itself increases the risk of Clever Hans events.

Another option is to attempt to make predictions about the real world based on alien-sourced data--but this also requires special circumstances, insofar as you must find a subject area which humans do not already know about, but can verify. For example, we haven't explored much of the ocean, but we have the ability to dive to specific places in the ocean if it's worth it. So, if someone claims that a whale or a squid gave them the location of a shipwreck, and then we go and find that shipwreck, that's good evidence that they can really communicate. Another option would be checking on solutions to mathematical problems--but, of course, that only works if the aliens have mathematics, and are more advanced than us in at least one area. "We don't know how to answer that." is sadly both a perfectly reasonable true response, and extremely easy to fake. Additionally, even when they exist, those kinds of natural situations can get expensive to investigate.

The obvious alternative is to manufacture such situations. Place the alien in a test environment hidden from the human communicator. Allow the human communicator access to the alien, such that the alien is their only source of information about the test environment. See if they can describe it accurately afterwards. If a human can extract information that is verifiably available through no means other than communication with an alien, then we can be confident in the decipherment scheme used for such communication.

Of course, this does require a certain degree of cooperation from the alien! Ultimately, establishing verifiably accurate communication with an alien species depends largely on the motivation that they have for communicating with us, and their ability to understand our desires prior to establishing linguistic communication. Also note that verifying that we have deciphered a language is entirely different from verifying that an alien species has language. There are observational experiments that can rule out any option other than individuals communicating arbitrary information with each other in an open-ended system, such as observing dolphins executing coordinated swim routines together that they have never done before, and so could not have learned from observation. One instance of that type could be attributed to a limited-usage para-linguistic system, but many observations of individuals acting on information they could only have obtained by communication with another individual allows eventually building up a strong case for the existence of language in an alien species, even if we have no idea how it works.

One significant point brought up in the Xenolinguistics book is that we do not currently have the fieldwork techniques that would be necessary for reliably deciphering and documenting alien languages. Creating protocols for identifying the existence of languages to high degree of certainty is one of those gaps--when we do fieldwork with humans, we can assume with a high degree of certainty that they do use language, and we merely need to figure out their particular language. But if you encountered a bunch of electroceptive alien fish, would it even cross your mind that they might have language and might be worth talking to? But another significant gap is precisely in creating those protocols to validate that our understanding is correct. When working with other humans, there is a huge amount of shared context and instinctual knowledge that we can use to guide our investigation--you don't have to speak another person's language to understand the significance of deictic pointing, or to realize when they are upset or happy. But when it comes to non-human creatures (particularly those, unlike dogs, whom we have not already bred to share understandable signals with us; and unlike cats, whom we have spent enough time with to have some understanding of their desires and body language), all of that goes out the window, and we have to start from a place of no assumptions, and rigorous scientific validation of every conclusion if we are to avoid misunderstanding and misleading ourselves. If you're looking for more ways to incorporate linguistics into science fiction, here you go: propose the missing protocols!

Not being a review of anything in particular, this is not part of The Linguistically Interesting Media Index. But, if you liked this post, please consider making a small donation!

Wednesday, January 10, 2024

A Language of Graphs

Recently I got thinking about syntax trees, and what a purely-written language might be like that was restricted to the syntactic structures available to linearized spoken languages and made those structures explicit in a 2D representation. Or in other words, a graphical (double-entendre fully intended) language consisting of trees--that is, graphs in which there is exactly one path between any two nodes/vertices--whose nodes are either functional lexemes roughly corresponding to internal syntactic nodes and function words in natural languages, or semantic lexemes corresponding to content words--but where, since the "internal" structure is made visible, content words are not restricted to leaf nodes!

Without loss of generality, and for the sake of simplicity, we can even restrict the visual grammar to binary trees--which X-bar theory does for natural languages anyway--although calling them "binary" doesn't make much sense if you don't display them in the traditional top-down tree format with a distinguished root node, since internal nodes can have up to three connections--one "parent" and two "daughters", which are a natural distinction in natlang syntax trees but completely arbitrary when you aren't trying to impose a reversible linearization on the leaf nodes! So, in other terms, we can say that sentences of this sort of language would consist of tree-structured graphs with a maximal vertex degree of 3.

I am hardly the first person to have thought up the idea of 2D written language, but a common issue plaguing such conlang projects (including their most notable example, UNLWS) is figuring out how to lay them out in actual two dimensions; general graphs are three-dimensional, and squishing them onto a plane often requires crossing lines or making long detours, or both. Even when you can avoid crossings, figuring out the optimal way to lay out a graph on the page is a very hard computational problem. Trees, however, have the very nice property that they are always planar, and trivial to draw on a 2D surface; if we allow cycles, or diamonds (same thing with undirected edges), it becomes much more difficult to specify grammatical rules that will naturally enforce planarity--which is whay I've yet to see a 2D language project that even tries. Not only is it easy to planarize trees, there are even multiple ways of doing so automatically, so one could aspire to writing software that would nicely lay out graphical sentences given, say, parenthesized typed input. (Another benefit of trees is that they can be fully specified by matched-parentheses expressions, w we could actually hope to be able to write this on a keyboard!) And then we can imagine imposing additional grammatical rules and pragmatic implications for different standard layout choices--what does it mean if one node is arbitrarily specified as the root, and you do lay it out as a traditional tree? What if you instead highlight a root node by centering it and laying out the rest of the sentence around it? What if you center a degree-two node and split the rest of the sentence into two halves splayed out on either side?

The downside of trees is that semantic structure is not limited to trees; knowledge graphs are arbitrary non-planar graphs. But, linear natural languages already deal with that successfully; expanding our linguistic from a line to a tree should still reduce the kinds of ambiguities that natural languages handle all the time. So, this sort of 2D language will require the equivalent of pronouns for cross-references; but they probably won't look much like spoken pronouns, and there's a lot more potential freedom in where you decide to make cuts in the semantic graph to turn it into a tree, and thus where pronouns get introduced to encode those missing edges, and those choices can probably be filled with pragmatic meaning on top of the implications of visual layout decisions.

Now, what should words--the nodes in these trees--look like? It seems to be common in 2D languages for glyphs to be essentially arbitrary logographs, perhaps with standard boundary shapes or connection point shapes for different word classes. The philosophy behing UNLWS, that it should take maximal advantage of the native possibilities of the written visual medium, even encourages using iconic pictoral expressions when feasible. But that's not how natural languages work; even visual languages (i.e., sign languages), despite having more iconicity on average than oral languages, have a phonological system consisting of a finite number of basic combinatorial units that are used to build meaningful words, analogous to the finite number of phonemes that oral languages have to sring together into arbitrary words. Since we've already got a certain limited set of graphical "phonological" items necessary for drawing syntax trees, and constraint breeds creativity, why not just re-use those?

Here we have an idealized representation of the available phonemes / graphemes / glyphemes: a vertex with one adjoining edge, a vertex with 2 adjoining edges, and a vertex with 3 adjoining edges. On the left, the three -emic forms. On the right, the basic allographic variants. In all cases, absolute orientation and chirality don't matter--if you mirror the "y" glyph, it is still the same glyph. Note that "graph" and "grapheme" are standard terms in linguistics for the written equivalents of "phones" and "phonemes", but that's gonna get really confusing when we're also talking about "graphs" in the mathematical sense. "Glyph" also has a technical meaning, but I am going to repurpose it here to talk about the basic units of this 2D language. So, we have glyphs, glyphemes, and alloglyphs, which are composed into graphs to form lexemes and phrases. Having only 3 glyphemes to work with may seem extremely limiting, but the expanded combinatorial possibilities in 2D vs. 3D make up for it.

While keeping syntax restricted to tree structures is the core idea of this language experiment, lexical items, which don't need to be invented and laid out on the fly, can be more general; we could allow them to be any planar graph. And just as syntax trees can be laid out in many different ways, we could say that lexical items are solely defined by their abstract graphs, which can also laid out in many ways. But, it turns out that recognizing the topological equivalence of two graphs laid out in different ways is a computationall hard problem! If this language is to be usable by humans, that simply will not do. Thus, the layout for lexical items should be significant, up to rotation and reflection equivalence, so that their visual representations are easily recognizable. This doesn't require introducing any additional phonemic elements--the arrangement of phonemes and letters in one-dimensional natural language words also affects meaning, but we don't consider it "phonemic". Despite the Monty Python sketch about the guy who speaks in anagrams, spoken words are not just bags of sounds in arbitrary order, and written words are not just bags of letters--that's why, for example, "bat" and "tab" mean different things, and "bta" just isn't an English word at all. The spatial arrangement--which, in the case of natural language, works out to just linear order--matters a lot, and that sketch only works because it's precisely constructed to use close-enough anagrams with a lot of supporting context. So, what sort of glyphotactic rules should we have to determine the valid and recognizable arrangements of glyphs in 2D space?

With 3 edges per vertex, the most natural-seeming arrangement is to spread them out at 120 degree angles, and degree-2 vertices would sit nicely in a pattern with 180-degree angles (although we probably want to minimize those, since vertices are more noticeable if they are highlighted by a corresponding angle, rather than a straight line through them). That suggests a triangular grid, which can accomodate both arrangements. The idealized glyphemes and alloglyphs shown above are drawn assuming placement on such a triangular grid, with 60, 120, and 180-degree angles. (I will continue to refer to the features of glyphs in terms of 60, 120, and 180-degree angles, but these, too, are idealizations; in practice, non-equilateral grids might be used for artistic or typographic purposes--e.g., as an equivalent to italics--in which case these angle measurements should be interpreted as representing 1, 2, or 3 angular steps around a point in the grid.) So, words shouldn't be completely arbitrary planar graphs--they should be planar graphs with a particular layout on a triangular grid.

It does not make sense to extend a single grid across an entire phrase or sentence; the boundaries of trees grow exponentially, so you'd need a hyperbolic grid to do it in the general case, and hyperbolic paper is hard to come by (although laying out a sentence on a single common grid within, say, a Poincare-disc model of the hyperbolic plane might be a neat artistic exercise). Maintaining a grid within a word is sufficient to maintain graphical recognizability, and breaking the grid is one signal of the boundary between lexicon and morphology on one side and syntax on the other.

Making an analogy to chemistry, I feel, as an aesthetic preference, that word-graphs should have a minimal amount of "strain". That is, glyphotactically valid layouts should use 120-degree angles wherever possible, and squish them to 60 degrees or spread them to 180 degrees only where necessary. So, where is it necessary?

60-degree angles should only occur on 3-vertex triangles, the acute points of 4-vertex diamonds, or as paired 60-degree angles on the interior of a hexagon.
180-degree angles should only occur adjacent to 60-degree angles, or crossing vertices at the centers of hexagons.

Additional restrictions:

All edges should be exactly one grid unit long--i.e., there are no words distinguished by having a straight line across multiple edges, vs. two edges with a 180-degree angle at a vertex in the middle.
Syntactic connections must occur on the outer boundary. I.e., you can't have a word embedded inside another word.
All vertices must have a maximum of three adjacent edges; thus, any word must have at least one exterior vertex with degree 2 or 1, to allow a syntactic adge to attach to it.
As they are nodes in a binary syntax tree, words can have at most 3 external syntactic connection points.

With those restrictions in place, here are all of the possible word skeletons of 2, 3, or 4 vertices:

I refer to these "word skeletons" rather than full words because they abstract away the specification of syntactic binding points--and the choice of binding points may distinguish words (although they should probably be semantically-related words if I'm not being perverse!) Including all of the possible binding point patterns for every skeleton massively increases the number of possibilities, and it quickly gets impractically tedious to enumerate them all and write them down. Here are all of the word skeletons with 5 vertices:

And here are all of the word skeletons with 6 vertices:

And the number of possible 7-vertex words is.... big. Counting graphs turns out to also be a hard problem, so I can't tell you exactly how fast the number of possible words grows, but it grows fast.

Now, I just need to start actually assigning meanings to some of these....

Wednesday, December 27, 2023

What If Marvel Audiences Had to Read Subtitles for Mohawk Dialog?

Episode 6 of season 2 of Marvel's What If... ("What if... Kahhori Reshaped the World?") features Mohawk people and Spanish conquistadors each speaking their own languages on screen, and, excepting a few seconds at a time of English narration, Marvel & Disney+ have trusted audiences to actually read subtitles for nearly all of a 30-minute episode. Good for you, Marvel!

There's a neat trick going on with the subtitling to distinguish the two languages, providing some extra context for people who might not have the ear to easily recognize that the Native Americans and Spaniards are indeed speaking different not-English languages: Mohawk is subtitled in white text, while Spanish is subtitled in yellow text. Not much to analyze there--it's just neat.

However... now I get to rant about subtitles a little bit.

The white and yellow subtitles provided in the "default" presentation of the episode for Anglophone audiences are implemented as "open captions"--text that is "burned in" to the video image, and cannot be dynamically changed. If you switch the language to, say, Spanish, the English subtitles for Spanish dialog don't go away; if you switch to French, the short sections of English dialog are translated to French, but that's the only difference. You have to turn on French closed-caption subtitles separately, and they will display over the burned-in English.

I can only assume that this was done because Disney's streaming platform doesn't support any sort of formatting in closed captions. And sadly, I can't get too mad at Disney in particular for this, because nobody else does any better--Amazon Prime Video has terrible captions, Netflix has terrible captions, Paramount+ has terrible captions, YouTube has terrible captions. And there is no good excuse for any of this. The DVD captioning standard allowed for everything this episode does and far more back in 1996! And yet, nobody really made full use of the possibilities aside from Night Watch, with Lord of the Rings coming in second place. As Pete Bleakley has reminded me (Thanks, Pete!), digital broadcast television, via the CEA-708 closed captioning standard, has had multicolor, positionable closed-captions since the late 1990's, with wide accessibility starting in 2009. Web video, of course, lagged significantly behind, but for a well over a decade now even web browsers have had the built-in capacity to do, as closed-captions, everything that this What if... episode does, and far more.

Come on, streaming companies. If you're going to do captioning at all, please, do captioning right. It's not that hard!

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Monday, November 13, 2023

The Year of Sanderson

Brandon Sanderson has never put a conlang in a book. But he is aware of them, and has done stuff with fictional languages and naming practices. Brandon Sanderson also speaks Korean; not only is he bilingual, but his second language is not just another European language. It's something very different from English which I might expect to have provided him with a greater degree of metalinguistic awareness than the average author, and raises my expectations for linguistic sophistication in his books.

In my review of Larry Niven's Grammar Lesson, I wrote

There are all sorts of other ways that this kind of grammatical quirk could be integrated into a sci-fi story that have nothing to do with exemplifying or manipulating the speakers' psychology. Brandon Sanderson actually gives a good example of this in the Mistborn trilogy... which is something I shall have to discuss after I get my hands on Secret Project Four and can do a Big Unified Sanderson Linguistics Post.

This is that post. Now, I have not read everything that Brandon has ever written, and I have forgotten some of what I have read, so this will not be completely comprehensive, but we can start with that example from the Mistborn trilogy. (<- Amazon Affiliate link.) A large portion of the plot in the later stages of the story revolves around the interpretation of an ancient prophecy, which is complicated both by magical interference that alters the records, and by actual linguistic drift. Whatever language they speak on the planet Scadrial (which realistically has just one standard language amongst its human inhabitants, given the global level of control exercised by the immortal Lord Ruler) in the Mistborn era, it evidently has an English-like system of strictly gendered animate pronouns, whereas the ancient language of the prophecy has an epicene (gender neutral) third person--a feature which Brandon may have been aware of from Korean! This complicates the process of translation, as any given translator must make a choice about how to render this pronoun in the modern language, which biases the interpretations of the modern characters in plot-significant ways. Good job, Brandon! The names are just.. eh, they're fine. But on the bright side, there is so little in the way of native names and non-English cultural terms that the field is wide open for any conlanger who might be hired to create a proper language--there's very little restriction imposed by the existing linguistic cannon!

The bulk of this review, as you can tell from the title, will focus on the four books from the Year of Sanderson: Tress of the Emerald Sea, The Frugal Wizard's Handbook, Yumi & the Nightmare Painter, and The Sunlit Man. (<- All Amazon Affiliate links.)

It turns out that The Sunlit Man has the most linguistic content to comment upon, so I'll be going through the books in reverse publication order. There is still little enough that I can do a nearly-complete listing of the interesting bits.

Starting on page page 2 of the Dragonsteel Premium Hardcover Edition, we get this:

The man shook him, barking at Nomad in a language he didn’t understand.
“Trans . . . translation?” Nomad croaked.
Sorry, a deep, monotone voice said in his head. We don’t have enough Investiture for that.

which is packed with information: there is translation magic, it's not working right now so we'll have to actually deal with the consequences of a language barrier, but we should expect it to start working eventually because explicitly mentioning it here makes it a massive Chekhov's gun, so we won't be getting a language-learning montage.

Page 21 has two bits of secondary language representation, with usages of diegetic translation and contextual irrelevance:

Another of the officers nodded, staring at Nomad. “Sess Nassith Tor,” he whispered.
Curious, the knight says. I almost understood that. It’s very similar to another language I’m still faintly Connected to.
“Any idea which one?” Nomad growled.
No. But . . . I think . . . Sess Nassith Tor . . . It means something like . . . One Who Escaped the Sun.
...

Glowing Eyes gestured to Nomad. “Kor Sess Nassith Tor,” he said with a sneer, then kicked Nomad again for good measure.
A few officers scrambled forward and grabbed him under the arms to drag him off.

For all I know, this connects with stuff in the Stormlight Archive, which I haven't read yet because I'm waiting for the series to be complete, but since I know from his own public statements that Brandon has not created any full conlangs, I kinda suspect this is ad-hoc--but it works because there is little enough there that the possibilities for how to analyze it and justify the translation are practically unrestricted, and it's impossible to prove any inconsistency. But, we also know that whatever this language is, it is definitely not just a relex of English, because Brandon had enough awareness to not allow for a word-for-word matchup! (I'd guess that "nassith" is some kind of participle, but like I said, interpretations are pretty much unrestricted with this little data.) In the second instance, we could try to make some guesses about what "kor" means based on the surrounding contextual actions, but ultimately it just doesn't actually matter, except that the glowy-eyed guy is emphasizing something, which we get from the italics.

Page 28 gives us a Failure To Communicate and a reminder of why translation magic isn't working, and that Nomad needs to be working on fixing that--i.e., reiterating that we ain't gonna see Nomad doing any monolingual fieldwork. After that, we get all the way to page 64 before we get some more metalinguistic description:

He said this in Alethi on purpose, which wasn’t his native tongue. Previous experiences had taught him not to speak in his own language, lest it slip out in the local dialect. That was how Connection worked; what Auxiliary was doing would make his soul think he’d been raised on this planet, so its language came as naturally to him as his own once had.

So, we get a name for a language that Nomad actually knows, we know that it isn't his native language (so maybe that'll come up later?), and we get some more details of how the translation magic actually works, which turns out to be probably the most sensible way to do it!

Pages 71 and 79 tell us about the linguistic environment on this particular planet:

“Is this the stranger? What is his name?”
“I was not graced with such information,” Rebeke said. “He doesn’t seem able to understand the words I speak. As if . . . he doesn’t know language.”
Zeal made a few motions with his hands, gesturing at his ears, then tapping his palms together. He thought maybe Nomad was deaf? A reasonable guess, Nomad supposed. No one else on this planet had tried that approach.

So, apparently there is only one acoustic language on this planet (which turns out to be quite reasonable under the circumstances, as it was in Mistborn), and people are not generally aware that there can be other languages. However, there is also at least one sign language--so, yay for sign representation, and, wow, that implies quite a lot about this very tiny society that's struggling to survive. How the heck do they maintain a sign-using language community when there probably aren't that many deaf people around? But, moving on to page 79:

“I offer this thought: do you suppose he’s from a far northern corridor? They speak in ways that, on occasion, make a woman need to concentrate to understand.”
“If it pleases you to be disagreed with, Compassion,” Contemplation said, “I don’t think this is a mere accent. No, not at all. Regardless, there are more pressing matters.[...]”

it turns out that at least some people do have an awareness of dialect continua! Which, in contrast to the situation on Scadrial, absolutely should exist in this setting.

On page 133, after getting his translation magic to work, Nomad manages to explain the concept of other languages to a local:

“Why do you do that?” Rebeke asked. “Talk gibberish sometimes?”“It’s my own language,” he said. “In other places, Rebeke, people speak all kinds of words you wouldn’t recognize.”

And then on page 175, we get an in-character acknowledgment of the underlying language barrier:

“Wait, how tall are these mountains?” Nomad asked.
“Tall,” Zeal said. “At least a thousand feet.”
A thousand feet? Like a single thousand?
At first, he assumed that the Connection had stopped working, and he hadn’t interpreted those words correctly.

Not much to say about that aside from, hey, any representation of someone actually having realistic struggles with a non-native language is a rare thing and it's nice to see it acknowledged.

On page 238, we get a little background on the Alethi language that Nomad knows but is not his mother tongue, and also a word in his actual mother tongue with diegetic translation:

They called themselves the Alethi, but we knew them as the Tagarut. The breakers, it means.

On page 290, we get a fun cultural note:

“You blessed fool,” Hardy said. “We’re all a group of blessed fools.”
Wait, the knight says. Is that fellow using the word “blessed” as . . . as a curse?
“It’s a conservative religious society,” Nomad said in Alethi. “You use the tools you’re given.”

This is a good acknowledgment that the common sources of curse words vary from culture to culture. The way that Quebecois French speakers swear is etymologically quite different from how the overlapping English-speaking Canadian community swears! It's also worth noting here that for the most part, Brandon uses a non-diegetic translation convention with dialog tags clarifying the diegetic language when it is other-than-standard to indicate the variety of fictional languages present in this setting.

Page 342 is a comparative gold mine, where we get some information about Nomad's mother tongue and about the local culture:

“It is the name I deserve. And it sounds a little like my birth name, in my own language.”
“Which is?”
“Sigzil,” he whispered. [...]
“Nomad,” Compassion said. “A wanderer with no place. That name no longer fits you, Sigzel, because you have a place. Here, with us.” She said the name a little oddly, according to their own accents.
...
“We name you Zellion,” Contemplation said. [...]
“It means One Who Finds,” Compassion said. “Though I know not the original language.”
“It’s from Yolen,” he whispered. “Where my master was born.”

So, now we know that, whatever the word for "nomad" is in Nomad / Zellion's mother tongue, it is phonetically close to "Sigzil"; and we know that the local language has at least slightly different phonological rules, such that they can only approximate it as "Sigzel"; and we've got a probable participle or relativized verb from from a third language, from a named planet so we can potentially correlate that with information from other books in the Cosmere. I really want to emphasize here that, although Brandon isn't being particularly innovative with interpretive techniques (we've just got straight diegetic translation going on), and there are no actual conlangs backing this up, Brandon is still managing to include references to realistic linguistic features that highlight differences that should exist between different fictional languages, which does a lot to add linguistic depth to the setting even without a fully constructed conlang or even a worked-out naming language.

On page 374, we get a couple more names of languages, including, finally, an identification of (a clear Anglicization of) Sigzil/Nomad/Zellion's mother tongue:

“Rosharan,” the man said in his own tongue. “Can we speak in a civilized language, please? Do you speak Malwish?”
Zellion shook his head, pretending not to understand and hoping they didn’t speak any of his native languages. At least he could honestly claim ignorance of Azish, having been forced to overwrite the ability to speak that with the local language.

And that is confirmed on page 413:

It was more of an Alethi thing actually, not an Azish one.

And there we have it: The complete overview of linguistic representation in The Sunlit Man.

Yumi & the Nightmare Painter has a very different approach to linguistic representation. Our two lead characters, Yumi and Painter (aka Nikaro) speak related languages (spoilers: one being a descendant of the other), and this is referenced to explain why they can understand each other, but there is no practical indication in the interactions between Yumi and Nikaro that there are any noticeable differences in the languages (thanks again to some magical translation shenanigans). There is a mention near the end of the book that people from Nikaro's city cannot understand those from Yumi's when the general populations finally meet, so they are in fact different languages, but for all that it impacts foreground character interactions, they might as well be speaking exactly the same language. Accordingly, there is much less material to catalog and analyze.

On page 3 of the Dragonsteel Premium Hardcover Edition, we get an introduction to the term "hion":

After losing his staring match, the nightmare painter strolled along the street, which was silent save for the hum of the hion lines.

which is thoroughly described by the following several paragraphs. But then on page 10, we get introduced to the term "yoki-hijo", with far more ambiguous translation:

The Chosen. The yoki-hijo. The girl of commanding primal spirits.

Are these all different titles? Or does "yoki-hijo" mean "The Chosen"? Or does it mean "the girl of commanding primal spirits"? This gets resolved by implication on page 13, where we have an example of appositional translation:

Yumi was one of the Chosen, picked at birth, granted the ability to influence the hijo, the spirits.

OK, so "hijo" means spirits, so "yoki-hijo" probably means "the girl of commanding primal spirits". That's a lot to pack into the word "yoki" and the semantics of whatever construction is implied by the juxtaposition. Quite a potential challenge for any conlanger who might try to engineer a proper conlang compatible with the textual evidence. (Spoiler: I'd bet the "hi-" in "hion" and "hijo" are meant to be related.)

On the next page (14), we get explicit translation by the narrator (who happens to be Hoid):

Liyun, her kihomaban—a word that meant something between a guardian and a sponsor. We’ll use the term “warden” for simplicity.

Back on page 12, we get introduced to the word "tobok", with a definition implied by context in the process of getting dressed:

Then the tobok, in two layers of thick colorful cloth, with a wide bell skirt.

And explicit translation for the term "getuk":

Torish clogs—they call them getuk—feel like bricks tied to my feet.

"Kihomaban" and "getuk" appear nowhere else after they are introduced and defined, so they seem to serve the sole purpose of providing scene setting--they tell you something about what the language they come from sounds like, and Hoid providing definitions reminds you that these people Are Not Speaking English. "Tobok" gets reused throughout the novel as a borrowed-into-English cultural term for this specific type of clothing, but never in dialog or thoughts by the actual characters. This word is apparently inspired by "bok", the Korean word for "clothing", which backs up the general Korean-inspired aesthetic of the whole book.

Also on page 14, we get an explicit discussion of historical linguistics and grammar:

Yumi quickly rose. “Is it time, Warden-nimi?” she said, with enormous respect.
Yumi’s and Painter’s languages shared a common root, and in both there was a certain affectation I find hard to express in your tongue. They could conjugate sentences, or add modifiers to words, to indicate praise or derision. Interestingly, no curses or swears existed among them. They would simply change a word to its lowest form instead.

This obviously, and Brandon has publically admitted, directly ripped off from Korean and Japanese. But much like "kihomaban" and "getuk", we don't really see this surfaced in the text; instead, dialog is annotated with parenthesized "(lowly)" and "(highly)" where relevant. That's not really something I would've predicted would work, and the fact that Brandon is massively famous and popular already means that I can't really use this book as evidence that it's a good idea. Maybe it's a failed experiment. But, I haven't actually seen any complaints about it in any reviews so far, so maybe that's a positive signal. I probably need to do a survey about this--comment if you have thoughts!

A good bit later, across pages 44 and 45, we get the common nouns "kon":

“Six? A bowl normally costs two hundred kon.”
...
He laid a ten-kon coin on the counter,

Which in context is pretty obviously a unit of currency. After that, all the language evidence is in proper names of people and places. For Yumi's time period (and thus Yumi's language), we have:

Personal Names: Chaeyung Dwookim Gyundok Honam Hwanji Liyun Samjae Sunjun Yumi
Places: Torio Gongsha Ihosen
Common Nouns: getuk kihomaban tobok

For Nikaro's time period, we have:

Personal Names: Akane Gaino Guri Hikiri Ikonora Ito Izumakamo Lee Masaka Nikaro Shinja Shishi Sukishi Takanda Tatomi Tesuaka Tojin Usasha Yuinshi
Places: Fuhima Futinoro Jito Kilahito Nagadan Shinzua
Common Nouns: kon hion

That's a decent corpus of words for a conlanger to start working with. The Nikaro-era names are pretty clearly Japanese-inspired, while the Yumi-era names are more Korean-esque, which implies a quite significant level of cultural changes in naming practices and and phonological shifts between Yumi's ancient language and Nikaro's modern one.

The Frugal Wizard's Handbook for Surviving Medieval England has some explicit paratextual discussion of linguistic issues, but otherwise not much of note. There are culturally-appropriate names for the simulated time period, which is neat and reflects a commendable research effort, but actually feels a little off given that the native-to-the-world characters speak essentially modern English, not the language in which those names would have been generated. There are a few other period-appropriate terms but for the most part they just get diegetically translated. There are two excerpts from the eponymous Handbook which directly address linguistic issues; on page 67 of the Dragonsteel Premium Hardcover Edition, we get this explanation:

GUARANTEE TWO
The people on Great Britain will speak a language that is intelligible to modern English speakers. We chose our dimensional band specifically for this reason!

In other words, there's a darn good diegetic reason why there is no language barrier in this interuniversal travel situation!

And then much later on page 146:

UNINTELLIGIBLE DIMENSIONS
The population of the British Isles in these dimensions doesn’t speak a language intelligible to any known Earth language speakers. Perfect for linguists or those who want an extra challenge! Visit the speedrun section of our website for current records for full dictionary creation in the various language groups.

Which I like to point out just because acknowledgment of linguists makes me happy. On page 132, we have a situation where a proper language might become relevant, as our protagonist runs into some foreigners who do not speak I-Can't-Believe-It's-Not-English; But then... it turns out that their leader does speak English after all. Oh well.

The most interesting thing about this book is the parallel between the exposition provided by excerpts from the Handbook and the more non-diegetic linguistic and cultural notes in Sara Nović's True Biz. With two examples of intercalated paratext, I've gotta think this is a solid expositional technique for linguistic information that deserves further attention. (And I've really gotta just write something up on paratext in general one of these days--especially the more traditional forms, like glossaries and pronunciation guides.)

Tress of the Emerald Sea has even fewer references to language, but there are a few. Starting on page 10 of the Dragonsteel Premium Hardcover Edition, we get a bit of description that acknowledges the existence of multiple languages and writing systems on Tress's world:

As they ate, she considered showing the two men her new cup. It was made completely of tin, stamped with letters in a language that ran top to bottom instead of left to right.

And much later on page 254, we get the sole mention of the (Anglicized) name of Tress's language, and a reference to the translation magic that we also see used in The Sunlit Man:

“Are you even speaking Klisian?” Tress asked.
“Technically yes, though I’m using Connection to translate my thoughts, which are in a language you’ve never heard of.[...]"

And while I don't want to ascribe a character's statements to the author (I have no idea how much Brandon knows about psycholinguistics or translation theory, so I'll give him the benefit of the doubt), I should point out for the sake of readers with less linguistic training that

Not everyone thinks in language--which will be a big "well, duh" to some of you, and absolutely mindblowing to some others. This particular character apparently does, though.
Thinking in one language and then translating those thoughts into another language to speak is not a good way to think. It's very inefficient, and it's not how high-level speakers of adult-acquired languages work. Whether or not you perceive yourself as thinking "in" a particular language, for communicative purposes you should be aiming to encode your thoughts directly into the target language in a single step, not doing translation in your head. I have to assume that translation magic is being used sub-optimally in this case compared to its presentation in The Sunlit Man, and there's just sufficient power behind it to make the results seem competent and fluent anyway.

On page 94, we get introduced to a deaf character (Fort) using an assistive device (acquired from off-world--Tress's planet has a far lower technological level) which transcribes speech for him and allows him to write his reponses. Brandon makes use of bold face to indicate writing on Fort's communication board to distinguish it from acoustic speech in dialog. But the fact that such a device is both needed and useful brings up all sorts of questions about the broader society on Tress's world, which are much more interesting than the mere fact of the typographical convention used to represent it in the story.

We are told that, before acquiring his assistive device, Fort relied on lipreading, despite its limitations (and we are warned about the actual limitations of strict lipreading, so good job dispelling popular misconceptions there, Brandon!), and that this was in his childhood--so he didn't acquire language and literacy, and then lose his hearing as an adult. The Coppermind page for Fort claims that he previously communicated with a mix of sign language and lip reading, but that's not actually supported by the text--the only explicit mention of sign language is on page 448:

And Fort . . . well, he understood. Not because he knew another sign language, but because of that same bond.

And that is narration, not attributed to Fort himself, and doesn't actually indicate that he does know any sign languages. There's an earlier oblique reference on page 293:

Fort didn’t fill the time with idle chitchat, and while you might ascribe this to his deafness, I’ve known more than a few Deaf people who were quite the blabberhands.

But again, that is the narrator talking, and Hoid does not actually say that Fort is capable of using sign language--only that he has met other Deaf people who do.

So, we have a deaf guy on a pre-industrial world who knows how to read and write. His parents cared about him enough to ensure that he was not subject to language deprivation and could learn to lipread for as much as that is worth, and then to become literate. This indicates surprisingly progressive views about deaf people, and we can also infer from other dialog that deaf people aren't particular rare on this world (because someone once met a deaf dancer as well, who might have actually been a made-up stand-in for a deaf princess--but hey, deaf princess!) It's possible that Fort did grow up with sign language, but simply has to deal with a world full of other people who don't understand it themselves, so the board is useful--but given that no character other than narrator, Hoid, ever mentions sign, and Hoid does not mention sign when we are told how Fort actually communicates, it seems that there is not enough of a population of deaf people with the ability to find and interact with each other on this world to sustain a viable sign language community. That's a weird contrast with having the social support to learn lipreading, reading, and writing, and that being common enough that one character was able to meet two socially high-functioning deaf people in not-that-many years of traveling the world. Not at all inconsistent, just kinda weird, and an interesting contrast to the situation in The Sunlit Man, where there is an awareness of sign language despite the extremely small world and corresponding extremely small population.. Maybe everyone on Tress's world is actually a horrible audist and abused Fort into learning to interface with a language he could not perceive in its intended medium, but I kinda like the idea that everyone on Tress's world is just super supportive of deaf people while being completely ignorant of the concept of sign language.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Gliese 1337

Monday, May 26, 2025

Tlön, Uqbar, Orbis Tertius

Tuesday, March 25, 2025

Some Thoughts on Iljena

Sunday, October 20, 2024

A Brief Note on John Wick

Sunday, May 5, 2024

Xenolinguistics Index

Tuesday, March 19, 2024

Human Actors Shouldn't Be Able to Speak Alien Languages

Sunday, February 25, 2024

Review: "Reading Fictional Languages"

Saturday, February 24, 2024

How Would We Know If We're Talking to Aliens?

Wednesday, January 10, 2024

A Language of Graphs

Wednesday, December 27, 2023

What If Marvel Audiences Had to Read Subtitles for Mohawk Dialog?

Monday, November 13, 2023

The Year of Sanderson