Gliese 1337

Saturday, January 20, 2024

Describing Non-human Vision

Thanks to LangTime Studio creating languages for a lot of mammals with dichromatic vision, I few years ago I did a good bit of research into how visual perception varies between different species. The issue of non-human vision came up again yesterday in George Corley's (of Conlangery fame) latest Draconic language stream, so I dug up some old notes on how to describe colors that you can't see. And in fact, this isn't just useful for conlangers trying to come up with vocabulary for a non-human language; this is good information for fantasy and sci-fi writers, too!

Since I started out with researching rabbits... let's talk about rabbits. It turns out that rabbit vision differs from human vision in just about every way that tetrapod vision can, so it makes an excellent case study. Rabbits have 2 types of color-receptive cone cells, corresponding to peak sensitivities in the green and blue ranges, and one rod cell type. I.e., they are dichromats, like most mammals. Rods don't contribute to color differentiation, so we can ignore those. At first glance, this seems similar to human red-green color blindness, except the peak sensitivities of the rabbit green cone and the red/green cones of a deuteranopic human are not in the same place! This is the first are in which human and non-human visual perception can differ--even other trichromats (e.g., penguins, honeybees) may not have the same spectral sensitivities as humans, and so see completely different color distinctions than we do. The rabbit cone sensitivities are shifted downward to a 509nm peak, compared to the human green cones with peak at 530nm, and red cones which peak at 560nm. Thus, not only can rabbits not distinguish red from green, but everything on the red end of the spectrum appears much dimmer than it would to a human, due to weaker response of the Long-Wavelength Cones to those spectral colors. Note, however, that not having separate cones for red and green does not mean that rabbits (or dogs, for that matter) would always see things-we-perceive-as-red and things-we-perceive-as-green as indistinguishable--it depends on the actual spectral signature of each object. For example, where we perceive two objects as having equal perceptual brightness but different hue, rabbits might perceive identical hue but lower perceptual brightness for the red object compared to the green.

Much like humans have an anomalous blue response in our red cones, which causes us to conflate purple (red+blue) and violet (a spectral color, extreme blue), rabbit and rat green cones also have a
sensitivity peak in the ultraviolet. Initially, I assumed that, unlike the human anomalous blue response, UV light would be blocked by the structures of the eye, as it is for humans; however, while talking with a sci-fi writer friend of mine about non-human vision last night (as ya do, y'know), when I mentioned that rabbit and rat green-cone pigments have a weird bi-stable response to UV light, but UV is absorbed by mammalian eye tissue, so it's probably just a random non-conserved evolutionary quirk... he noted that UV is absorbed by primate eye tissue, but had I actually explicitly checked on rabbits? And I had not. So I did. And it turns out that that lapine corneal, lens, and vitreous humor tissues are considerably more transparent to near-UV light than human eye tissues are. Now, nobody (that I have been able to find) is actually saying outright that rabbits (or rats) can see UV... but rabbits might actually be able to see UV. If they can, it would be indistinguishable to them from green (not blue!) If it was not already clear from the shifted sensitivity peaks, I think that should highlight the impossibility of just taking, e.g., a JPEG image captured with equipment built for humans and transforming it into an accurate representation of what some other animal would see--if nothing else, the UV information would be completely missing!

Incidentally, if rabbits are UV-sensitive, the bistable nature of the UV response in their green cones means that they would actually be more strongly sensitive to UV in the dark than they are during daytime illumination. I have no idea what to make of that, as there isn't really a whole lot of
environmental UV going around at night or in tunnels... but that's a quirk you can keep in mind as a possibility for fictional creatures. In general, just note that spectral response can vary in different environmental conditions; in humans, we lose the ability to distinguish color entirely in low-light conditions (and your brain lies to you to fill in the colors that you believe things should be), but things can be more complicated than that.

Another interesting feature of rabbit eyesight is that they have a much less dense foveal region than humans (so less effective resolution), and their color-sensitive cells are not evenly distributed--there is a thin band with a mixture of both green and blue cones, with blue cones concentrated at the bottom of the retina (corresponding to the top of the visual field) and green cones concentrated at the top (corresponding to the bottom of the visual field). I.e., their vision along the horizon is in color, but the top and bottom extents of their visual fields are black and white, and specialized for better spectral response to the most common wavelengths of light coming from those directions--blue from the sky, green from the ground. This isn't too different from human peripheral vision (where color information is inferred by the brain, not actually present in the raw retinal output), except that in rabbits different parts of the peripheral fields actually have a different peak spectral response! In wild rabbits, this is probably just an adaptation to getting the maximum information out of a predominantly-blue-background sky and a predominantly-green(/red)-background ground, but intelligent rabbits could theoretically learn to extract additional color information (e.g., distinguishing monochromatic white from dichromatic white) from an object by wiggling their eyes up and down or tilting their heads to put it in different parts of the visual field. Or not, if their brains just fill in missing color information automatically like ours do.... But if you want to write about creature that can do that, by authorial fiat, they could have a whole auxiliary class of color words, analogous to pattern words like "speckled" or "sparkly", to describe objects that have different appearances in different parts of the visual field.

But, if we abstract away from physiological perceptual abilities, what would their experience of color space be like? Tetrapod retinas pre-process raw cone cells signals into antagonistic opponent channels before color information gets sent to the brain; i.e., what your visual cortex has access to is not the original cone cell activations, but sums and differences of the activations of multiple types of cone cells. In human eyes, that means our brains see color coming down the optic nerves as a combination of red vs. green and blue vs. yellow signals--even though yellow isn't actually a physiological primary color! In dichromats like rabbits, the two raw spectral signals (green and blue) are still
processed by an antagonistic opponent system in the retinal ganglia; thus, just like we can't perceive the impossible colors "reddish green" or "yellowish blue", they cannot have any perception of a distinct blue-green mixture--dim dichromatic light at both spectral peaks will look exactly the same as bright monochromatic light exactly in between, which will be indistinguishable from white. In effect, the loss of one cone type compared to humans reduces the color space from 3 dimensions to 2, and the perceptual dimension that is lost after ganglial processing is that of saturation.

The lapine color space is thus defined by a 2D, triangular range with black at one vertex, white (or whatever you want to call it) at the center of the opposite edge, and pure green and pure blue at the
remaining vertices. The hue and saturation axes are the same, with green fading into white and then white fading into blue.

If the most basic colors are defined by the extrema of the opponent-process space, as they are for humans, there should be 3 basic colors, corresponding to black, blue, and green. White would be
the natural next step, followed perhaps by light and dark shades of blue and green. Or you could call the green extremum "yellow" instead, as the Long Wavelength Cone still has sensitivity into the yellow and red ranges of the spectrum, even though its peak is in green, as I have done in the image above. Fundamentally, the 3D human color space and 2D dichromat color spaces are mathematically incommensurate, so all human-perceptible representations involve some arbitrary choices anyway. Treating the long-wavelength end as "yellow" rather than "red" makes is convenient if you want to do something like copying the Old Norse poetic convention of treating blood and gold as being the same color. :)

We can squish and stretch that gamut to get a representation of the dichromat color wheel, with a radial saturation axis and polar hue and brightness:

And the sort of Cartesian representation that an intelligent dichromat graphic designer would use to pick out colors in a computer graphics program:

Keep in mind that the actual colors used in these illustrations are completely arbitrary, aside from being "towards the long-wavelength end" vs. "towards the short-wavelength end". What matters is just the set of possible distinctions. Figuring out exactly what lapine colors any particular object would correspond to would require recording the actual emission spectrum of that object, and then mapping it into the rabbit color space--and being dichromatic does not merely mean that they see a subset of the colors that we can see; the available distinctions are different. E.g., two objects which look identically purple to a human may be monochromatic in the violet spectral range, or they may be dichromatic with light in the
blue and red ranges, but those two objects will look distinct to a rabbit--the first one being obviously pure blue, the second being light blue or white.

So, that's dichromatism... what about tetrachromatism, or higher? My best reference on this subject is this absolutely lovely article: Ways of Coloring: Comparative Color Vision as a Case Study for Cognitive Science, which contains descriptions of comparative color spaces for humans, bees (also trichromats, but with different frequency response), goldfish, turtles (both of which are tetrachromats), and pigeons (suspected pentachromats). And it has an excellent statement of what the problem actually is:

It is important to realize that such an increase in chromatic dimensionality does not mean that pigeons exhibit greater sensitivity to the monochromatic hues that we see. For example, we should not suppose that since the hue discrimination of the pigeon is best around 600nm, and since we see a 600nm stimulus as orange, pigeons are better at discriminating spectral hues of orange than we are. Indeed, we have reason to believe that such a mapping of our hue terms onto the pigeon would be an error: [...]

Among other things, this result strongly emphasizes how misleading it may be to use human hue designations to describe color vision in non-human species. This point can be made even more forcefully, however, when it is a difference in the dimensionality of color vision that we are considering. An increase in the dimensionality of color vision indicates a fundamentally different kind of color space. We are familiar with trichromatic color spaces such as our own, which require three independent axes for their specification, given either as receptor activation or as color channels. A tetrachromatic color space obviously requires four dimensions for its specification. It is thus an example of what can be called a color hyperspace. The difference between a tetrachromatic and a trichromatic color space is therefore not like the difference between two trichromatic color spaces: The former two color spaces are incommensurable in a precise mathematical sense, for there is no way to map the kinds of distinctions available in four dimensions into the kinds of distinctions available in three dimensions without remainder. One might object that such incommensurability does not prevent one from “projecting” the higher-dimensional space onto the lower; hence the difference in dimensionality simply means that the higher space contains more perceptual content than the lower. Such an interpretation, however, begs the fundamental question of how one is to choose to “project” the higher space onto the lower. Because the spaces are not isomorphic, there is no unique projection relation.

It is also the case that lower-dimensional color spaces, such as those of dogs or rabbits (both dichromats, but in slightly different ways) are incommensurate with our 3D color space, in exactly the same way that our 3D color space is incommensurate with the higher-dimensional perceptions of a pigeon, turtle, or goldfish, and have no unique projections. Thus, visualizations of how your dog or cat sees things are always only approximations--we can try to recreate the kinds of distinctions relevant to a dichromatic animal in our own color space, but we will always experience it differently.

A common feature of all of the systems described is the production of a combined luminance channel from the raw n-dimensional cone cell inputs, and n-1 oppositional chroma channels--in humans, these are the red-green and blue-yellow oppositions, which produce a two-dimensional neurological color space othogonal to the luminosity axis. The YCbCr color space (used for analog color TV transmission) arises from representing the two chromatic dimensions directly in Cartesion coordinates. Saturation arises as the radial dimension--distance from the white-black axis--in a polar transformation of this oppositional color space to produce the trichromat color wheel, with hue arising as the radial coordinate. Trichromat color spaces for different species can vary both in their precise spectral sensitivities, and in how the oppositional chroma channels are generated in the retina; i.e., instead of an RG-B apposition, where R and G physical channels combine to produce Y, there can also be an R-GB opposition: red-cyan vs green-blue. For us, there's no such thing as reddish-green (nor blueish-yellow), because yellow comes in between, but we do have blueish-green. For that other sort of trichromat, reddish-green would make perfect sense, but blueish-green and reddish-cyan would be impossible to perceive instead.

Monochromatic vision is pretty easy to understand--it's just black-and-white / greyscale--luminosity is the only dimension, and leaves zero additional channels for chroma information. As illustrated above, in dichromat vision, the equivalent of the trichromatic color "wheel" is just a line--the radial dimension is not meaningfully distinct from the single linear chromatic dimension, and while we require an additional axis to represent brightness, the dichromat color wheel really does represent every color they can possibly see. As a result, "saturation" and "hue" (or, alternatively, brightness and hue) are indistinguishable to dichromats, and grey (or white, depending on whether you represent the space as a triangular gamut or a Cartesian diamond) is a spectral color. There are only two primary colors (or 4, if you count white and black), and no secondary colors.

In higher-dimensional color spaces, as determined by discrimination experiments on tetrachromatic and pentachromatic organisms, we still see the generation of oppositional color channels from retinal processing. How to generate these oppositional channels, however, is not obvious a-priori; for example, in humans one opposition is between red and green, both of which are primary colors, but the other is between blue, a primary color, and yellow, a composite--and, as mentioned above, that could be reversed in a different species with different specific spectral sensitivities. But why that particular combination for us?

It turns out, across different species, opponent channels are constructed to maximize decorrelation--in other words, to remove redundant information caused by the overlapping response curves of different receptor types. Thus, the precise method of calculating color channels will be slightly different for each species, dependent on physical characteristics of the retinal cells, but they are all qualitatively the same kind of signal, and end up producing a a higher-dimensional chroma-space orthogonal to the white-black luminosity axis. However, there's pretty good reason to believe that this would be a convergently-evolved process to maximize visual acuity (except in some specific circumstances like Mantis shrimp), so this analysis of color perception plausibly applies universally, to most kinds of weird aliens you might come up with, so long as they have eyes at all. Effectively, the retinal ganglia are performing Principle Component Analysis to turn "list of specific frequency activations" information into "total luminosity vs. list of chroma components" information.

Meanwhile, in any such neurological color space, there is only ever a single radial coordinate. Trichromatic vision is kind of special in that it is the first dimensionality at which chroma can be split into saturation and hue components. At higher dimensionalities, the hue space gets more complex, but we can say with some confidence that the extra dimensions introduced in higher-dimensional perceptual color spaces are not some extra sort of radial-coordinate saturation or any kind of weird third thing, but are in fact additional dimensions of hue--and along with extra dimensions of hue, qualitatively different kinds of composite colors!

Monochromats don't have any color. Dichromats don't have any secondary colors--just the spectral colors which, strangely to us, include white/grey. Our three dimensional human color space allows us to perceive two opponent channels, corresponding to 4 pure hues--red, yellow, green, and blue--and weighted binary combinations thereof that give rise to the secondary colors--r+y (orange), y+g (chartreuse?), g+b (cyan), and b+r (magenta), with one non-spectral hue (magenta). Non-spectral colors derive from simulataneous activation of cones with non-adjacent response peaks, and with three cones, there's only one such possibility. Meanwhile, a tetrachromatic system would have 3 opponent axes with 6 basic hues (r-g, y-b, and the new p-q), binary combinations of those hues with their non-opponents producing 12 secondary colors (r+y, r+b, r+p, r+q, g+y, g+b, g+p, g+q, y+p, y+q, b+p and b+q), and ternary combinations producing 8 extremal instances of an entirely new kind of hue--tertiary colors--not found in the perceptual structure of trichromatic color space (r+y+p, r+y+q, r+b+p, r+b+q, g+y+p, g+y+q, g+b+p, g+b+q), just as our secondary colors are not found in the dichromatic space. Additionally, there is not merely one non-spectral secondary color (magenta) in the fully-saturated hue space, but 3--and in general, that number will correspond to however many pairs of non-spectrally-adjacent sensor types there are (which actually works out to the sequence of triangular numbers!) If we assume that r, g, b, and q are the physiological primaries (note that the spectral locations of y and p depend on the decorrelation output for a specific set of 4 receptors with species-specific sensitivities), then the non-spectral secondaries are r+b, r+q, and g+q. All of the tertiary colors are non-spectral.

Ultimate writer takeaway: you may not be able to intuitively understand what non-human color experiences are like, but you can make some arbitrary implicit decisions about retinal physiology (i.e., just decide where you want to the opponent colors to appear along the spectrm), do some basic combinatory math, and then you have a list of descriptions of basic focal colors that you can assign words to--or, if you want to be a little more realistic, assign words to ranges of those focal colors, which you can precisely mathematically describe. This gets more complicated at higher dimensionalities (like pigeons' pentachromatic color space), but tetrachromacy is kind of convenient because you still have only 2 dimensions of hue, so you can actually diagram out what the color regions are, and just tell people "y'all already know how brightness and saturation work, so I don't need to put those on the chart".

Someday, I aspire to have a program where you can input the physiological frequency response curves for an arbitrary organism, and a spectrum, and it'll give you the mathematical description of the perceptual color that that would produce. But till then, you'll just have to do your best at guessing what the aliens and monsters and anthropomorphic animals see whenever a human thinks something is a particular color--but guess informedly, knowing what the structure of their color spaces is like!

P.S. What was that about Mantis shrimp? Well, Mantis shrimp have 16 different light receptor types, with 12 different color receptors, which kinda suggests that they should have a 12-dimensional color space with 10 dimensions of chroma. But... empirically, that's not what happens. Experimentally, they don't actually have all of those different color categories, or a particularly fine capacity for spectral distinction. Rather, they have a large number of different receptor types so that they can identify spectral colors at high speed, without doing any retinal pre-processing--chartreuse cone fires? Cool, that's a chartreuse thing! No need to bother with oponent processing! These kinds of extreme high dimensional visual systems might end up working more like our senses of smell or taste than like our perception of color. However, there's also another aspect of Mantis shrimp vision that's outside of color perception (and not entirely unique to Mantis shrimp, either): they can see polarization (hence the 4 visual receptor types that aren't for color, rather than just 1). This ability is comparatively easy to imagine and describe--it's an overlay of geometric information, that tells you "not only does this light have a particular color, it is also oriented in a particular way". Mantis shrimp are, however, unique in being able to distinguish circularly polarized light; other creatures with polarization sensitivity would be unable to tell circularly polarized from unpolarized light.

Wednesday, January 10, 2024

A Language of Graphs

Recently I got thinking about syntax trees, and what a purely-written language might be like that was restricted to the syntactic structures available to linearized spoken languages and made those structures explicit in a 2D representation. Or in other words, a graphical (double-entendre fully intended) language consisting of trees--that is, graphs in which there is exactly one path between any two nodes/vertices--whose nodes are either functional lexemes roughly corresponding to internal syntactic nodes and function words in natural languages, or semantic lexemes corresponding to content words--but where, since the "internal" structure is made visible, content words are not restricted to leaf nodes!

Without loss of generality, and for the sake of simplicity, we can even restrict the visual grammar to binary trees--which X-bar theory does for natural languages anyway--although calling them "binary" doesn't make much sense if you don't display them in the traditional top-down tree format with a distinguished root node, since internal nodes can have up to three connections--one "parent" and two "daughters", which are a natural distinction in natlang syntax trees but completely arbitrary when you aren't trying to impose a reversible linearization on the leaf nodes! So, in other terms, we can say that sentences of this sort of language would consist of tree-structured graphs with a maximal vertex degree of 3.

I am hardly the first person to have thought up the idea of 2D written language, but a common issue plaguing such conlang projects (including their most notable example, UNLWS) is figuring out how to lay them out in actual two dimensions; general graphs are three-dimensional, and squishing them onto a plane often requires crossing lines or making long detours, or both. Even when you can avoid crossings, figuring out the optimal way to lay out a graph on the page is a very hard computational problem. Trees, however, have the very nice property that they are always planar, and trivial to draw on a 2D surface; if we allow cycles, or diamonds (same thing with undirected edges), it becomes much more difficult to specify grammatical rules that will naturally enforce planarity--which is whay I've yet to see a 2D language project that even tries. Not only is it easy to planarize trees, there are even multiple ways of doing so automatically, so one could aspire to writing software that would nicely lay out graphical sentences given, say, parenthesized typed input. (Another benefit of trees is that they can be fully specified by matched-parentheses expressions, w we could actually hope to be able to write this on a keyboard!) And then we can imagine imposing additional grammatical rules and pragmatic implications for different standard layout choices--what does it mean if one node is arbitrarily specified as the root, and you do lay it out as a traditional tree? What if you instead highlight a root node by centering it and laying out the rest of the sentence around it? What if you center a degree-two node and split the rest of the sentence into two halves splayed out on either side?

The downside of trees is that semantic structure is not limited to trees; knowledge graphs are arbitrary non-planar graphs. But, linear natural languages already deal with that successfully; expanding our linguistic from a line to a tree should still reduce the kinds of ambiguities that natural languages handle all the time. So, this sort of 2D language will require the equivalent of pronouns for cross-references; but they probably won't look much like spoken pronouns, and there's a lot more potential freedom in where you decide to make cuts in the semantic graph to turn it into a tree, and thus where pronouns get introduced to encode those missing edges, and those choices can probably be filled with pragmatic meaning on top of the implications of visual layout decisions.

Now, what should words--the nodes in these trees--look like? It seems to be common in 2D languages for glyphs to be essentially arbitrary logographs, perhaps with standard boundary shapes or connection point shapes for different word classes. The philosophy behing UNLWS, that it should take maximal advantage of the native possibilities of the written visual medium, even encourages using iconic pictoral expressions when feasible. But that's not how natural languages work; even visual languages (i.e., sign languages), despite having more iconicity on average than oral languages, have a phonological system consisting of a finite number of basic combinatorial units that are used to build meaningful words, analogous to the finite number of phonemes that oral languages have to sring together into arbitrary words. Since we've already got a certain limited set of graphical "phonological" items necessary for drawing syntax trees, and constraint breeds creativity, why not just re-use those?

Here we have an idealized representation of the available phonemes / graphemes / glyphemes: a vertex with one adjoining edge, a vertex with 2 adjoining edges, and a vertex with 3 adjoining edges. On the left, the three -emic forms. On the right, the basic allographic variants. In all cases, absolute orientation and chirality don't matter--if you mirror the "y" glyph, it is still the same glyph. Note that "graph" and "grapheme" are standard terms in linguistics for the written equivalents of "phones" and "phonemes", but that's gonna get really confusing when we're also talking about "graphs" in the mathematical sense. "Glyph" also has a technical meaning, but I am going to repurpose it here to talk about the basic units of this 2D language. So, we have glyphs, glyphemes, and alloglyphs, which are composed into graphs to form lexemes and phrases. Having only 3 glyphemes to work with may seem extremely limiting, but the expanded combinatorial possibilities in 2D vs. 3D make up for it.

While keeping syntax restricted to tree structures is the core idea of this language experiment, lexical items, which don't need to be invented and laid out on the fly, can be more general; we could allow them to be any planar graph. And just as syntax trees can be laid out in many different ways, we could say that lexical items are solely defined by their abstract graphs, which can also laid out in many ways. But, it turns out that recognizing the topological equivalence of two graphs laid out in different ways is a computationall hard problem! If this language is to be usable by humans, that simply will not do. Thus, the layout for lexical items should be significant, up to rotation and reflection equivalence, so that their visual representations are easily recognizable. This doesn't require introducing any additional phonemic elements--the arrangement of phonemes and letters in one-dimensional natural language words also affects meaning, but we don't consider it "phonemic". Despite the Monty Python sketch about the guy who speaks in anagrams, spoken words are not just bags of sounds in arbitrary order, and written words are not just bags of letters--that's why, for example, "bat" and "tab" mean different things, and "bta" just isn't an English word at all. The spatial arrangement--which, in the case of natural language, works out to just linear order--matters a lot, and that sketch only works because it's precisely constructed to use close-enough anagrams with a lot of supporting context. So, what sort of glyphotactic rules should we have to determine the valid and recognizable arrangements of glyphs in 2D space?

With 3 edges per vertex, the most natural-seeming arrangement is to spread them out at 120 degree angles, and degree-2 vertices would sit nicely in a pattern with 180-degree angles (although we probably want to minimize those, since vertices are more noticeable if they are highlighted by a corresponding angle, rather than a straight line through them). That suggests a triangular grid, which can accomodate both arrangements. The idealized glyphemes and alloglyphs shown above are drawn assuming placement on such a triangular grid, with 60, 120, and 180-degree angles. (I will continue to refer to the features of glyphs in terms of 60, 120, and 180-degree angles, but these, too, are idealizations; in practice, non-equilateral grids might be used for artistic or typographic purposes--e.g., as an equivalent to italics--in which case these angle measurements should be interpreted as representing 1, 2, or 3 angular steps around a point in the grid.) So, words shouldn't be completely arbitrary planar graphs--they should be planar graphs with a particular layout on a triangular grid.

It does not make sense to extend a single grid across an entire phrase or sentence; the boundaries of trees grow exponentially, so you'd need a hyperbolic grid to do it in the general case, and hyperbolic paper is hard to come by (although laying out a sentence on a single common grid within, say, a Poincare-disc model of the hyperbolic plane might be a neat artistic exercise). Maintaining a grid within a word is sufficient to maintain graphical recognizability, and breaking the grid is one signal of the boundary between lexicon and morphology on one side and syntax on the other.

Making an analogy to chemistry, I feel, as an aesthetic preference, that word-graphs should have a minimal amount of "strain". That is, glyphotactically valid layouts should use 120-degree angles wherever possible, and squish them to 60 degrees or spread them to 180 degrees only where necessary. So, where is it necessary?

60-degree angles should only occur on 3-vertex triangles, the acute points of 4-vertex diamonds, or as paired 60-degree angles on the interior of a hexagon.
180-degree angles should only occur adjacent to 60-degree angles, or crossing vertices at the centers of hexagons.

Additional restrictions:

All edges should be exactly one grid unit long--i.e., there are no words distinguished by having a straight line across multiple edges, vs. two edges with a 180-degree angle at a vertex in the middle.
Syntactic connections must occur on the outer boundary. I.e., you can't have a word embedded inside another word.
All vertices must have a maximum of three adjacent edges; thus, any word must have at least one exterior vertex with degree 2 or 1, to allow a syntactic adge to attach to it.
As they are nodes in a binary syntax tree, words can have at most 3 external syntactic connection points.

With those restrictions in place, here are all of the possible word skeletons of 2, 3, or 4 vertices:

I refer to these "word skeletons" rather than full words because they abstract away the specification of syntactic binding points--and the choice of binding points may distinguish words (although they should probably be semantically-related words if I'm not being perverse!) Including all of the possible binding point patterns for every skeleton massively increases the number of possibilities, and it quickly gets impractically tedious to enumerate them all and write them down. Here are all of the word skeletons with 5 vertices:

And here are all of the word skeletons with 6 vertices:

And the number of possible 7-vertex words is.... big. Counting graphs turns out to also be a hard problem, so I can't tell you exactly how fast the number of possible words grows, but it grows fast.

Now, I just need to start actually assigning meanings to some of these....

Wednesday, December 27, 2023

What If Marvel Audiences Had to Read Subtitles for Mohawk Dialog?

Episode 6 of season 2 of Marvel's What If... ("What if... Kahhori Reshaped the World?") features Mohawk people and Spanish conquistadors each speaking their own languages on screen, and, excepting a few seconds at a time of English narration, Marvel & Disney+ have trusted audiences to actually read subtitles for nearly all of a 30-minute episode. Good for you, Marvel!

There's a neat trick going on with the subtitling to distinguish the two languages, providing some extra context for people who might not have the ear to easily recognize that the Native Americans and Spaniards are indeed speaking different not-English languages: Mohawk is subtitled in white text, while Spanish is subtitled in yellow text. Not much to analyze there--it's just neat.

However... now I get to rant about subtitles a little bit.

The white and yellow subtitles provided in the "default" presentation of the episode for Anglophone audiences are implemented as "open captions"--text that is "burned in" to the video image, and cannot be dynamically changed. If you switch the language to, say, Spanish, the English subtitles for Spanish dialog don't go away; if you switch to French, the short sections of English dialog are translated to French, but that's the only difference. You have to turn on French closed-caption subtitles separately, and they will display over the burned-in English.

I can only assume that this was done because Disney's streaming platform doesn't support any sort of formatting in closed captions. And sadly, I can't get too mad at Disney in particular for this, because nobody else does any better--Amazon Prime Video has terrible captions, Netflix has terrible captions, Paramount+ has terrible captions, YouTube has terrible captions. And there is no good excuse for any of this. The DVD captioning standard allowed for everything this episode does and far more back in 1996! And yet, nobody really made full use of the possibilities aside from Night Watch, with Lord of the Rings coming in second place. As Pete Bleakley has reminded me (Thanks, Pete!), digital broadcast television, via the CEA-708 closed captioning standard, has had multicolor, positionable closed-captions since the late 1990's, with wide accessibility starting in 2009. Web video, of course, lagged significantly behind, but for a well over a decade now even web browsers have had the built-in capacity to do, as closed-captions, everything that this What if... episode does, and far more.

Come on, streaming companies. If you're going to do captioning at all, please, do captioning right. It's not that hard!

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Monday, November 13, 2023

The Year of Sanderson

Brandon Sanderson has never put a conlang in a book. But he is aware of them, and has done stuff with fictional languages and naming practices. Brandon Sanderson also speaks Korean; not only is he bilingual, but his second language is not just another European language. It's something very different from English which I might expect to have provided him with a greater degree of metalinguistic awareness than the average author, and raises my expectations for linguistic sophistication in his books.

In my review of Larry Niven's Grammar Lesson, I wrote

There are all sorts of other ways that this kind of grammatical quirk could be integrated into a sci-fi story that have nothing to do with exemplifying or manipulating the speakers' psychology. Brandon Sanderson actually gives a good example of this in the Mistborn trilogy... which is something I shall have to discuss after I get my hands on Secret Project Four and can do a Big Unified Sanderson Linguistics Post.

This is that post. Now, I have not read everything that Brandon has ever written, and I have forgotten some of what I have read, so this will not be completely comprehensive, but we can start with that example from the Mistborn trilogy. (<- Amazon Affiliate link.) A large portion of the plot in the later stages of the story revolves around the interpretation of an ancient prophecy, which is complicated both by magical interference that alters the records, and by actual linguistic drift. Whatever language they speak on the planet Scadrial (which realistically has just one standard language amongst its human inhabitants, given the global level of control exercised by the immortal Lord Ruler) in the Mistborn era, it evidently has an English-like system of strictly gendered animate pronouns, whereas the ancient language of the prophecy has an epicene (gender neutral) third person--a feature which Brandon may have been aware of from Korean! This complicates the process of translation, as any given translator must make a choice about how to render this pronoun in the modern language, which biases the interpretations of the modern characters in plot-significant ways. Good job, Brandon! The names are just.. eh, they're fine. But on the bright side, there is so little in the way of native names and non-English cultural terms that the field is wide open for any conlanger who might be hired to create a proper language--there's very little restriction imposed by the existing linguistic cannon!

The bulk of this review, as you can tell from the title, will focus on the four books from the Year of Sanderson: Tress of the Emerald Sea, The Frugal Wizard's Handbook, Yumi & the Nightmare Painter, and The Sunlit Man. (<- All Amazon Affiliate links.)

It turns out that The Sunlit Man has the most linguistic content to comment upon, so I'll be going through the books in reverse publication order. There is still little enough that I can do a nearly-complete listing of the interesting bits.

Starting on page page 2 of the Dragonsteel Premium Hardcover Edition, we get this:

The man shook him, barking at Nomad in a language he didn’t understand.
“Trans . . . translation?” Nomad croaked.
Sorry, a deep, monotone voice said in his head. We don’t have enough Investiture for that.

which is packed with information: there is translation magic, it's not working right now so we'll have to actually deal with the consequences of a language barrier, but we should expect it to start working eventually because explicitly mentioning it here makes it a massive Chekhov's gun, so we won't be getting a language-learning montage.

Page 21 has two bits of secondary language representation, with usages of diegetic translation and contextual irrelevance:

Another of the officers nodded, staring at Nomad. “Sess Nassith Tor,” he whispered.
Curious, the knight says. I almost understood that. It’s very similar to another language I’m still faintly Connected to.
“Any idea which one?” Nomad growled.
No. But . . . I think . . . Sess Nassith Tor . . . It means something like . . . One Who Escaped the Sun.
...

Glowing Eyes gestured to Nomad. “Kor Sess Nassith Tor,” he said with a sneer, then kicked Nomad again for good measure.
A few officers scrambled forward and grabbed him under the arms to drag him off.

For all I know, this connects with stuff in the Stormlight Archive, which I haven't read yet because I'm waiting for the series to be complete, but since I know from his own public statements that Brandon has not created any full conlangs, I kinda suspect this is ad-hoc--but it works because there is little enough there that the possibilities for how to analyze it and justify the translation are practically unrestricted, and it's impossible to prove any inconsistency. But, we also know that whatever this language is, it is definitely not just a relex of English, because Brandon had enough awareness to not allow for a word-for-word matchup! (I'd guess that "nassith" is some kind of participle, but like I said, interpretations are pretty much unrestricted with this little data.) In the second instance, we could try to make some guesses about what "kor" means based on the surrounding contextual actions, but ultimately it just doesn't actually matter, except that the glowy-eyed guy is emphasizing something, which we get from the italics.

Page 28 gives us a Failure To Communicate and a reminder of why translation magic isn't working, and that Nomad needs to be working on fixing that--i.e., reiterating that we ain't gonna see Nomad doing any monolingual fieldwork. After that, we get all the way to page 64 before we get some more metalinguistic description:

He said this in Alethi on purpose, which wasn’t his native tongue. Previous experiences had taught him not to speak in his own language, lest it slip out in the local dialect. That was how Connection worked; what Auxiliary was doing would make his soul think he’d been raised on this planet, so its language came as naturally to him as his own once had.

So, we get a name for a language that Nomad actually knows, we know that it isn't his native language (so maybe that'll come up later?), and we get some more details of how the translation magic actually works, which turns out to be probably the most sensible way to do it!

Pages 71 and 79 tell us about the linguistic environment on this particular planet:

“Is this the stranger? What is his name?”
“I was not graced with such information,” Rebeke said. “He doesn’t seem able to understand the words I speak. As if . . . he doesn’t know language.”
Zeal made a few motions with his hands, gesturing at his ears, then tapping his palms together. He thought maybe Nomad was deaf? A reasonable guess, Nomad supposed. No one else on this planet had tried that approach.

So, apparently there is only one acoustic language on this planet (which turns out to be quite reasonable under the circumstances, as it was in Mistborn), and people are not generally aware that there can be other languages. However, there is also at least one sign language--so, yay for sign representation, and, wow, that implies quite a lot about this very tiny society that's struggling to survive. How the heck do they maintain a sign-using language community when there probably aren't that many deaf people around? But, moving on to page 79:

“I offer this thought: do you suppose he’s from a far northern corridor? They speak in ways that, on occasion, make a woman need to concentrate to understand.”
“If it pleases you to be disagreed with, Compassion,” Contemplation said, “I don’t think this is a mere accent. No, not at all. Regardless, there are more pressing matters.[...]”

it turns out that at least some people do have an awareness of dialect continua! Which, in contrast to the situation on Scadrial, absolutely should exist in this setting.

On page 133, after getting his translation magic to work, Nomad manages to explain the concept of other languages to a local:

“Why do you do that?” Rebeke asked. “Talk gibberish sometimes?”“It’s my own language,” he said. “In other places, Rebeke, people speak all kinds of words you wouldn’t recognize.”

And then on page 175, we get an in-character acknowledgment of the underlying language barrier:

“Wait, how tall are these mountains?” Nomad asked.
“Tall,” Zeal said. “At least a thousand feet.”
A thousand feet? Like a single thousand?
At first, he assumed that the Connection had stopped working, and he hadn’t interpreted those words correctly.

Not much to say about that aside from, hey, any representation of someone actually having realistic struggles with a non-native language is a rare thing and it's nice to see it acknowledged.

On page 238, we get a little background on the Alethi language that Nomad knows but is not his mother tongue, and also a word in his actual mother tongue with diegetic translation:

They called themselves the Alethi, but we knew them as the Tagarut. The breakers, it means.

On page 290, we get a fun cultural note:

“You blessed fool,” Hardy said. “We’re all a group of blessed fools.”
Wait, the knight says. Is that fellow using the word “blessed” as . . . as a curse?
“It’s a conservative religious society,” Nomad said in Alethi. “You use the tools you’re given.”

This is a good acknowledgment that the common sources of curse words vary from culture to culture. The way that Quebecois French speakers swear is etymologically quite different from how the overlapping English-speaking Canadian community swears! It's also worth noting here that for the most part, Brandon uses a non-diegetic translation convention with dialog tags clarifying the diegetic language when it is other-than-standard to indicate the variety of fictional languages present in this setting.

Page 342 is a comparative gold mine, where we get some information about Nomad's mother tongue and about the local culture:

“It is the name I deserve. And it sounds a little like my birth name, in my own language.”
“Which is?”
“Sigzil,” he whispered. [...]
“Nomad,” Compassion said. “A wanderer with no place. That name no longer fits you, Sigzel, because you have a place. Here, with us.” She said the name a little oddly, according to their own accents.
...
“We name you Zellion,” Contemplation said. [...]
“It means One Who Finds,” Compassion said. “Though I know not the original language.”
“It’s from Yolen,” he whispered. “Where my master was born.”

So, now we know that, whatever the word for "nomad" is in Nomad / Zellion's mother tongue, it is phonetically close to "Sigzil"; and we know that the local language has at least slightly different phonological rules, such that they can only approximate it as "Sigzel"; and we've got a probable participle or relativized verb from from a third language, from a named planet so we can potentially correlate that with information from other books in the Cosmere. I really want to emphasize here that, although Brandon isn't being particularly innovative with interpretive techniques (we've just got straight diegetic translation going on), and there are no actual conlangs backing this up, Brandon is still managing to include references to realistic linguistic features that highlight differences that should exist between different fictional languages, which does a lot to add linguistic depth to the setting even without a fully constructed conlang or even a worked-out naming language.

On page 374, we get a couple more names of languages, including, finally, an identification of (a clear Anglicization of) Sigzil/Nomad/Zellion's mother tongue:

“Rosharan,” the man said in his own tongue. “Can we speak in a civilized language, please? Do you speak Malwish?”
Zellion shook his head, pretending not to understand and hoping they didn’t speak any of his native languages. At least he could honestly claim ignorance of Azish, having been forced to overwrite the ability to speak that with the local language.

And that is confirmed on page 413:

It was more of an Alethi thing actually, not an Azish one.

And there we have it: The complete overview of linguistic representation in The Sunlit Man.

Yumi & the Nightmare Painter has a very different approach to linguistic representation. Our two lead characters, Yumi and Painter (aka Nikaro) speak related languages (spoilers: one being a descendant of the other), and this is referenced to explain why they can understand each other, but there is no practical indication in the interactions between Yumi and Nikaro that there are any noticeable differences in the languages (thanks again to some magical translation shenanigans). There is a mention near the end of the book that people from Nikaro's city cannot understand those from Yumi's when the general populations finally meet, so they are in fact different languages, but for all that it impacts foreground character interactions, they might as well be speaking exactly the same language. Accordingly, there is much less material to catalog and analyze.

On page 3 of the Dragonsteel Premium Hardcover Edition, we get an introduction to the term "hion":

After losing his staring match, the nightmare painter strolled along the street, which was silent save for the hum of the hion lines.

which is thoroughly described by the following several paragraphs. But then on page 10, we get introduced to the term "yoki-hijo", with far more ambiguous translation:

The Chosen. The yoki-hijo. The girl of commanding primal spirits.

Are these all different titles? Or does "yoki-hijo" mean "The Chosen"? Or does it mean "the girl of commanding primal spirits"? This gets resolved by implication on page 13, where we have an example of appositional translation:

Yumi was one of the Chosen, picked at birth, granted the ability to influence the hijo, the spirits.

OK, so "hijo" means spirits, so "yoki-hijo" probably means "the girl of commanding primal spirits". That's a lot to pack into the word "yoki" and the semantics of whatever construction is implied by the juxtaposition. Quite a potential challenge for any conlanger who might try to engineer a proper conlang compatible with the textual evidence. (Spoiler: I'd bet the "hi-" in "hion" and "hijo" are meant to be related.)

On the next page (14), we get explicit translation by the narrator (who happens to be Hoid):

Liyun, her kihomaban—a word that meant something between a guardian and a sponsor. We’ll use the term “warden” for simplicity.

Back on page 12, we get introduced to the word "tobok", with a definition implied by context in the process of getting dressed:

Then the tobok, in two layers of thick colorful cloth, with a wide bell skirt.

And explicit translation for the term "getuk":

Torish clogs—they call them getuk—feel like bricks tied to my feet.

"Kihomaban" and "getuk" appear nowhere else after they are introduced and defined, so they seem to serve the sole purpose of providing scene setting--they tell you something about what the language they come from sounds like, and Hoid providing definitions reminds you that these people Are Not Speaking English. "Tobok" gets reused throughout the novel as a borrowed-into-English cultural term for this specific type of clothing, but never in dialog or thoughts by the actual characters. This word is apparently inspired by "bok", the Korean word for "clothing", which backs up the general Korean-inspired aesthetic of the whole book.

Also on page 14, we get an explicit discussion of historical linguistics and grammar:

Yumi quickly rose. “Is it time, Warden-nimi?” she said, with enormous respect.
Yumi’s and Painter’s languages shared a common root, and in both there was a certain affectation I find hard to express in your tongue. They could conjugate sentences, or add modifiers to words, to indicate praise or derision. Interestingly, no curses or swears existed among them. They would simply change a word to its lowest form instead.

This obviously, and Brandon has publically admitted, directly ripped off from Korean and Japanese. But much like "kihomaban" and "getuk", we don't really see this surfaced in the text; instead, dialog is annotated with parenthesized "(lowly)" and "(highly)" where relevant. That's not really something I would've predicted would work, and the fact that Brandon is massively famous and popular already means that I can't really use this book as evidence that it's a good idea. Maybe it's a failed experiment. But, I haven't actually seen any complaints about it in any reviews so far, so maybe that's a positive signal. I probably need to do a survey about this--comment if you have thoughts!

A good bit later, across pages 44 and 45, we get the common nouns "kon":

“Six? A bowl normally costs two hundred kon.”
...
He laid a ten-kon coin on the counter,

Which in context is pretty obviously a unit of currency. After that, all the language evidence is in proper names of people and places. For Yumi's time period (and thus Yumi's language), we have:

Personal Names: Chaeyung Dwookim Gyundok Honam Hwanji Liyun Samjae Sunjun Yumi
Places: Torio Gongsha Ihosen
Common Nouns: getuk kihomaban tobok

For Nikaro's time period, we have:

Personal Names: Akane Gaino Guri Hikiri Ikonora Ito Izumakamo Lee Masaka Nikaro Shinja Shishi Sukishi Takanda Tatomi Tesuaka Tojin Usasha Yuinshi
Places: Fuhima Futinoro Jito Kilahito Nagadan Shinzua
Common Nouns: kon hion

That's a decent corpus of words for a conlanger to start working with. The Nikaro-era names are pretty clearly Japanese-inspired, while the Yumi-era names are more Korean-esque, which implies a quite significant level of cultural changes in naming practices and and phonological shifts between Yumi's ancient language and Nikaro's modern one.

The Frugal Wizard's Handbook for Surviving Medieval England has some explicit paratextual discussion of linguistic issues, but otherwise not much of note. There are culturally-appropriate names for the simulated time period, which is neat and reflects a commendable research effort, but actually feels a little off given that the native-to-the-world characters speak essentially modern English, not the language in which those names would have been generated. There are a few other period-appropriate terms but for the most part they just get diegetically translated. There are two excerpts from the eponymous Handbook which directly address linguistic issues; on page 67 of the Dragonsteel Premium Hardcover Edition, we get this explanation:

GUARANTEE TWO
The people on Great Britain will speak a language that is intelligible to modern English speakers. We chose our dimensional band specifically for this reason!

In other words, there's a darn good diegetic reason why there is no language barrier in this interuniversal travel situation!

And then much later on page 146:

UNINTELLIGIBLE DIMENSIONS
The population of the British Isles in these dimensions doesn’t speak a language intelligible to any known Earth language speakers. Perfect for linguists or those who want an extra challenge! Visit the speedrun section of our website for current records for full dictionary creation in the various language groups.

Which I like to point out just because acknowledgment of linguists makes me happy. On page 132, we have a situation where a proper language might become relevant, as our protagonist runs into some foreigners who do not speak I-Can't-Believe-It's-Not-English; But then... it turns out that their leader does speak English after all. Oh well.

The most interesting thing about this book is the parallel between the exposition provided by excerpts from the Handbook and the more non-diegetic linguistic and cultural notes in Sara Nović's True Biz. With two examples of intercalated paratext, I've gotta think this is a solid expositional technique for linguistic information that deserves further attention. (And I've really gotta just write something up on paratext in general one of these days--especially the more traditional forms, like glossaries and pronunciation guides.)

Tress of the Emerald Sea has even fewer references to language, but there are a few. Starting on page 10 of the Dragonsteel Premium Hardcover Edition, we get a bit of description that acknowledges the existence of multiple languages and writing systems on Tress's world:

As they ate, she considered showing the two men her new cup. It was made completely of tin, stamped with letters in a language that ran top to bottom instead of left to right.

And much later on page 254, we get the sole mention of the (Anglicized) name of Tress's language, and a reference to the translation magic that we also see used in The Sunlit Man:

“Are you even speaking Klisian?” Tress asked.
“Technically yes, though I’m using Connection to translate my thoughts, which are in a language you’ve never heard of.[...]"

And while I don't want to ascribe a character's statements to the author (I have no idea how much Brandon knows about psycholinguistics or translation theory, so I'll give him the benefit of the doubt), I should point out for the sake of readers with less linguistic training that

Not everyone thinks in language--which will be a big "well, duh" to some of you, and absolutely mindblowing to some others. This particular character apparently does, though.
Thinking in one language and then translating those thoughts into another language to speak is not a good way to think. It's very inefficient, and it's not how high-level speakers of adult-acquired languages work. Whether or not you perceive yourself as thinking "in" a particular language, for communicative purposes you should be aiming to encode your thoughts directly into the target language in a single step, not doing translation in your head. I have to assume that translation magic is being used sub-optimally in this case compared to its presentation in The Sunlit Man, and there's just sufficient power behind it to make the results seem competent and fluent anyway.

On page 94, we get introduced to a deaf character (Fort) using an assistive device (acquired from off-world--Tress's planet has a far lower technological level) which transcribes speech for him and allows him to write his reponses. Brandon makes use of bold face to indicate writing on Fort's communication board to distinguish it from acoustic speech in dialog. But the fact that such a device is both needed and useful brings up all sorts of questions about the broader society on Tress's world, which are much more interesting than the mere fact of the typographical convention used to represent it in the story.

We are told that, before acquiring his assistive device, Fort relied on lipreading, despite its limitations (and we are warned about the actual limitations of strict lipreading, so good job dispelling popular misconceptions there, Brandon!), and that this was in his childhood--so he didn't acquire language and literacy, and then lose his hearing as an adult. The Coppermind page for Fort claims that he previously communicated with a mix of sign language and lip reading, but that's not actually supported by the text--the only explicit mention of sign language is on page 448:

And Fort . . . well, he understood. Not because he knew another sign language, but because of that same bond.

And that is narration, not attributed to Fort himself, and doesn't actually indicate that he does know any sign languages. There's an earlier oblique reference on page 293:

Fort didn’t fill the time with idle chitchat, and while you might ascribe this to his deafness, I’ve known more than a few Deaf people who were quite the blabberhands.

But again, that is the narrator talking, and Hoid does not actually say that Fort is capable of using sign language--only that he has met other Deaf people who do.

So, we have a deaf guy on a pre-industrial world who knows how to read and write. His parents cared about him enough to ensure that he was not subject to language deprivation and could learn to lipread for as much as that is worth, and then to become literate. This indicates surprisingly progressive views about deaf people, and we can also infer from other dialog that deaf people aren't particular rare on this world (because someone once met a deaf dancer as well, who might have actually been a made-up stand-in for a deaf princess--but hey, deaf princess!) It's possible that Fort did grow up with sign language, but simply has to deal with a world full of other people who don't understand it themselves, so the board is useful--but given that no character other than narrator, Hoid, ever mentions sign, and Hoid does not mention sign when we are told how Fort actually communicates, it seems that there is not enough of a population of deaf people with the ability to find and interact with each other on this world to sustain a viable sign language community. That's a weird contrast with having the social support to learn lipreading, reading, and writing, and that being common enough that one character was able to meet two socially high-functioning deaf people in not-that-many years of traveling the world. Not at all inconsistent, just kinda weird, and an interesting contrast to the situation in The Sunlit Man, where there is an awareness of sign language despite the extremely small world and corresponding extremely small population.. Maybe everyone on Tress's world is actually a horrible audist and abused Fort into learning to interface with a language he could not perceive in its intended medium, but I kinda like the idea that everyone on Tress's world is just super supportive of deaf people while being completely ignorant of the concept of sign language.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Saturday, October 21, 2023

How is Castlevania like Luca?

Remember when I reviewed Disney's Luca? It's an excellent example of how to screw up linguistic worldbuilding.

Well, I recently watched Castlevania: Nocturne on Netflix, and while it follows up on the first Castlevania series in not really trying to do anything particularly notable with language, there is one brief scene in episode 5 that completely breaks the setting for me.

Nocturne is set primarily in France, in the midst of the French revolution, with plenty of native French characters. We can thus assume that everybody is supposed to be speaking French, as there are no language barriers presented, and it would be ridiculous for all of the French people to be speaking anything else. This includes the main character Richter Belmont (a descendent of the Belmont vampire hunters we were introduced to in the earlier Castlevania series), who grew up in America and thus can be presumed to be a native speaker of English, but who, like everyone else, displays no difficulty in communicating with all the French people he now lives with.

Now, with all that background explained: at 10:45 in Episode 5, Richter Belmont says to a group of girls arranging costumes for the personifications of the revolutionary principles of "Liberty, Equality, and Fraternity", that, quote:

You have to be a man to be Fraternity. It means "brotherhood".

To which the reply is

"Sisternity", then.

Now, in English, that should be "sorority", if we want to parallel the derivation of "fraternity". But, I checked with some actual French people just to be sure, and my suspicions were confirmed: this is a conversation that just does not make sense in French, especially not in the historical sociological context in which it occurs. And it seems fairly obvious that the dialog was not originally written with French in mind; there are French audio and subtitle tracks available for the series now, but when I first watched it, the only options were English and Japanese. You can't even say "Fraternity means 'brotherhood'" in French, partially because a straightforward word-for-word translation comes out as the tautological "La fraternité signifie la fraternité." ("Brotherhood means brotherhood."), but also because... that's not actually what "fraternité" means. In fact, if you ask Google Translate how to say "sisterhood" in French, guess what it tells you? Fraternité! Not, incidentally, "sororité", which, while it is a valid French word, is a pretty darn rare one. "Fraternité" would be understood by everyone involved as a gender-inclusive term--besides which, "fraternité" is a grammatically feminine word in French anyway and the strictly female-coded Marianne is the artistic personification of the French Republic and all three virtues of Liberty, Equality, and Fraternity!

I asked several French speakers and actual French people how they would express this conversation in French if they absolutely had to, and none of them would do it, because it just seemed stupid. Since we do now have access to a French dub and subtitles, however, let's see how they officially translated that scene:

- Seul un homme peut incarner la fraternité. Ça vient du mot "frère".
- La Petite-sœur-nité, alors.

"Only a man can incarnate fraternity. It comes from the word 'brother'."
"The little Sister-nity, then."

(French people, is that actually any better than the original English? Update: Non.)

As far as I can tell with my limited French competence, that's about the best you can do to render the intent of the original English is a way that doesn't come across as nonsenical in French, but it still turns out to be factually wrong. Women have portrayed Fraternity, and "Fraternité" does not come from the word "frère"--it comes from the Latin "frāter", which is also the origin of the word "frère", and does mean "brother"... but it also means "sibling", and the two words "fraternité" and "frère" developed independently from their common root (never forget, Etymology Isn't Destiny!)

There is really only one way to save this scene--assume that Richter is, in fact, not merely a non-native French speaker, but also an idiot, who is actually just wrong, and the women he is talking to are just humoring him because they don't feel like getting into an argument with an idiot. In that case, the use of the neologism "La Petite-sœur-nité" kinda makes sense, as it subtly highlights through non-parallelism with the Latinate "frater-" that "fraternity" does not in fact derive from "frère" just as the rarer-but-valid "sororité" does not derive from "sœur" (sister).

But if we are generous and assume that that was in fact the intended interpretation... the writers did not do the work to properly set that up or make it obvious to the audience. Don't be like them. More precisely: don't try to do clever things that exploit etymologies and translation-equivalents of English words to imply things about some other language and culture. Even if you are using a pure-English translation convention, you've got to look into the culture that you are portraying-in-translation enough to know how their language and values will influence how they would talk about things. Sometimes you can get away with, say, using English puns and just assuming that the implied translator did a really good job of replacing an equivalent pun in the diegetic language, but when the entire content of the conversation is just dumb in the cultural context in which you have set your story... that doesn't fly.

Update: Someone asked how this is handled in Spanish, and it turns out... it's even worse:

- Debes ser hombre para ser Fraternidad. Significa hermandad.
- ¿Y eso qué tiene que ver?

See, "hermandad" is explicitly gender-neutral in Spanish, where the words for "brother" and "sister" share the same stem "herman-". Which makes Richter's statement so completely bizarre that the "sisternity" comment would itself be nonsensical--and so it is replaced with "What does that have to do with anything?" Which is honestly kind of an improvement, as it makes it seem like the characters are acknowledging that Richter is being an idiot, rather than us audience members having to infer it.

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

Thursday, September 28, 2023

Babel: Or, the Necessity of Violence

Babel, by R. F. Kuang, is a 2022 Alternate-History low-fantasy novel about translators who perform enchantments for the glory of the British Empire. The magic is fictional, but the translation theory is real: the Oxford translation class lectures are a legit callback to grad school. Why are translators performing magic? Because true translation is fundamentally impossible, and magic arises from the sometimes-subtle, sometimes-vast differences in meanings between attempted translations from one language to another.

Naturally, there is quite a lot of non-English representation in such a novel. Our main character, Robin, is a native speaker of Cantonese, so the first example we get is a a string of orthographic Chinese characters, which I cannot type easily to reproduce for you here--but, we immediately get diegetic transcription and translation:

'Húlún tūn zǎo,' he read slowly, taking care to enunciate every syllable. He switched to English. 'To accept without thinking.'

Note the conventional use of italics for non-English text. Here we get three parallel representations of the same bit of language, allowing the reader to understand what it actually looks like written, approximately how it sounds via romanization, and approximately what it means through Robin's translation of what he just read.

Robin is quickly introduced to the non-magical responsibilities of translation and interpretation:

This all hinged on him, Robin realized. The choice was his. Only he could determine the truth, because only he could communicate it to all parties.

The book is chock full of this kind of stuff--not just directly representing other languages, but explicitly teaching the reader about real concepts in linguistics and translation theory through the mode of having the characters learn and discuss them. Skipping ahead a bit, here is a taste of one of the theory lectures:

'The first lesson any good translator internalizes is that there is no one-to-one correlation between wrds or even concepts from one language to another. [...] If [there was], then translation would not be a highly skilled profession - we would simply sit in a class full of dewy-eyed freshers down with dictionaries and have the completed works of the Buddha on our shelves in no time. Instead, we have to learn to dance between that age-old dichotomy, helpfully elucidated by Cicero and Heironymous: verbum e verbo and sensum e sensu. Can anyone--'
'Word for word,' Letty said promptly. 'And sense for sense.'

And a bit of philosophizing later on reminded me rather strongly of the aliens from The Embedding:

We will never speak the divine language. But by amassing all the world's languages under this roof, by collecting the full range of human expressions, or as near to it as we can get, we can try.

And in fact, this is not a bad description of the project of natural language documentation and typology.

The next instance of non-English representation makes use of footnotes to provide a non-diegetic translation for what he character already understands:

Auferre trucidare rapere falsis nominibus imperium atque ubi solitudinem faciunt pacem appellant.
Robin parsed the sentence, consulted his dictionary to check that auferre meant what he thought it did, then wrote out his translation.*
*'Robbery, butchery, and theft - they call these things empire, and where they create a desert, they call it peace.'

Although in this case, the translation does exist in the story, and so could've been included in-line, that is not so for all of the footnotes, some of which exist entirely outside of the story. For example:

for a full year Robin thought The Rape of the Lock was about fornication with an iron bolt instead of the theft of hair.*
* A reasonable error. By rape, Pope meant 'to snatch, to take by force', which is an older meaning derived from the Latin rapere.

I could continue with a detailed analysis of every sample of non-English language, as I did exhaustively for some other books earlier on in this series--but I would end up quoting from about a thrid of all pages in the book, and we'd be here all day! The range of integrative and interpretive techniques in use is actually pretty well covered by those few examples I have quoted so far. But what's really unique about the book is the extent to which it confronts the reader with concepts that you might not otherwise have to face outside of a graduate-level course in linguistics or translation, and in ways that are actually relevant to the plot. Consider:

What was a word? What was the smallest possible unit of meaning, and why was that different from a word? Was a word different from a character? In what ways was Chinese speech different from Chinese writing?

That matters for understanding the magic system and for understanding the nature of the relationships between characters. This is a masterclass in science fiction with linguistics as the underlying science... except that it's technically fantasy instead of science fiction. There's refreshingly not a single whiff of Whorfianism or UG anywhere--as there shouldn't be as those concepts would not have existed in the historical period in which this story is set!

The book also briefly addresses The Forbidden Experiment--and contributes to foreshadowing the true villainy of one of our antagonists by having him seriously entertain it as a possibility (which is unsurprising, given how he has up till then manipulated the lives of Robin and his friends).

I shall leave off with one more quote on semantic theory:

Does meaning refer to something that supercedes the words we use to describe out world? I think, intuitively, yes. Otherwise we would have no basis for critiquing a translation as accurate or inaccurate, not without some unspeakable sense of what it lacked.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

True Biz & A Literature of Sign

So, remember that whole thing about A Literature of Sign, and how the heck are you supposed to put ASL into a book for an English audience when ASL has no standard orthography?

Well, Sara Nović does some stuff. True Biz is a 2022 novel about the administration, students, and families of students of the fictional River Valley School for the Deaf boarding high school. It's straight-up realistic fiction, practically literary, exploring civil rights and what it's like to grow up deaf in a hearing world--really not my usual genre, but dangit, I liked it anyway, and it's certainly linguistically interesting. There is so much linguistically-interesting stuff, in fact, that I gave up and stopped putting in bookmarks after page 87 out of 381 in the hardcover edition--so, I will not be quoting every example of non-English representation in this review, just a representative sample that's indicative of the range of techniques used.

The first notable thing Nović does in this novel is not use quotation marks to set off dialog, even when characters are speaking orally. It's a little jarring at first, but I got used to it fairly quickly. I am not sure what the authorial intent behind this decision was, but for me it had the effect of turning off (or rather, failing to turn on) my internal voice when encountering dialog, thus distancing my experience of the text from the mental audio loop. Which I could totally believe is part of the intent, since it's a book about Deaf people!

One of our viewpoint characters is Charlie, a severely hard-of-hearing girl whose parents opted for a cochlear implant that doesn't really work right, resulting in language deprivation. She begins learning ASL when transferred to River Valley, and her experience is contrasted with that of Austin, a native signer from a multi-generational Deaf family. Charlie doesn't alwasy understand everything that is being said to or around her, in ASL or in English, and Nović represents this with underscores inserted into dialog in place of words that Charlie missed. Where relevant, there misunderstandings are resolved diegetically--so you, the reader, understand exactly as much and in the same way Charlie. For example:

[The headmistress] looked back at Charlie. _____ here at school will be key, she said. As with any language.
The what? said Charlie.
The headmistress removed a notepad from beneath a pile of paperwork.
IMMERSION she wrote.

Immediately before this, we get a nuanced introduction to simcomm (simultaneous communication), although it is not explicitly referenced that way.

To sign and talk at the same time was an imperfect operation, the headmistress warned, and one Charlie wouldn't see much of at River Valley after today. Charlie longed to find meaning in the arc of the woman''s hands, but that meant looking away from her lips, something she couldn't afford to do.

ASL conversations are all translated into English in italics, but Nović captures some of the spatial nature of ASL by arranging the dialog in columns according to the speaker, so each speaker's ASL dialog is spatially separated on the page just as their signing spaces would be separated in reality. Even when quoting a single ASL speaker, not in a conversation, their words and dialog tags will be confined to a distinct column separated from the flow of the main text, emphasizing the spatially-confined nature of the ASL utterance. The first example of such a conversation is as follows:

You hungry?
Hi, sweetie. How's school? All set up?
Getting there.
How was the meeting?
Fine, she said.
The girl struggled in mainstream.
No surprise there.
I'm sure you'll fix her right up.
We will. Come eat.

Right at the beginning of the book, I was uncertain whether this was intended to be a book for a Deaf audience, or a book to explain Deafness to a hearing audience. One particular feature shifted me solidly to the "this is for us hearies" side, though--the periodic inclusion between chapters of non-fiction explanatory notes on aspects of ASL and of Deaf culture and history that may be relevant to understanding whats going on in the adjacent chapters. This feels like a form of paratext, but where linguistic paratext usually takes the form of, e.g., name pronunciation guides in the front matter or back matter, or glossaries in an appendix--all presentations which can be easily skipped over if the reader doesn't care about them--this is interleaved with the main text, so it must be engaged with. This seems like an excellent way to present additional information about a minority culture in the real world, but I am uncertain how well it would translate to, for example, explaining a conlang in a fictional world. I was slightly reminded of this by the fictionally-non-fictional excerpts from the eponymous guide in Brandon Sanderson's The Frugal Wizard's Guide to Surviving Medieval England (review forthcoming), so it might be workable.

Finally, Nović occasionally includes schematic illustrations of signs inline in the text. Most pervasively, each chapter is headed with an illustration of the ASL fingerspelling handshape for that chapter's viewpoint character's first initial. In a couple of places, however, where Charlie is learning new signs, dictionary-style schematic illustrations of complex signs are included in parallel with the italicized-English translations. This is not at all space efficient, so it can't be used everywhere, but limited deployment works to help teach the reader a small number of signs and provide an initial mental image to help inform how you interpret subsequent conversations as signalled by the ASL-specific page formatting.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index