Tuesday, June 21, 2022

The Phonology of Baseline

Dath ilan is an alternate-history Earth envisioned by Eliezer Yudkowski, whose history diverges at least a couple thousand years ago from our own, and in which civilization has achieved a much higher degree of global economic coordination. Part of this increased coordination is that everyone on dath ilan speaks, at minimum, an in-universe conlang called "Baseline". Out-of-universe, Baseline does not actually exist--but descriptions of what it is like do, so I have determined to attempt to remedy the situation. In terms of explicit descriptions of Baseline's phonology, this is all we have:
For example, all the phonemes are a minimum distance away from each other that guarantees people with slightly less acute hearing can understand it when spoken under slightly adverse conditions. In-between phonemes that are possible to pronounce, but potentially difficult to hear correctly, are then reserved for constructing 'conlangs', constructed languages, many of which use 'Baseline' as a baseline but add new short words using the expanded phoneme set.

That seems... not to be super well supported by the data? Like, it appears to contain all three of s/θ/f, which are easily confusable in low-fidelity audio environments. (It's actually rather difficult to figure out what the objective perceptual distance between different phones is, independent of biases induced by a test subjects pre-existing knowledge of any specific language; the closest I could find to that kind of research is the planning that went into designing the NATO Phonetic Alphabet--but even that is optimized to avoid confusion by speakers of particular popular languages, which is overconstrained for our purposes here. However, when native speakers of some language--like English--do in fact confuse phonemes of their own language sometimes, that seems like strong evidence that the underlying phones are actually pretty close!) 

However, fortunately for us, the character who speaks that paragraph is not specifically trained in linguistics, and may not know exactly what he's talking about--and there are other constraints on the design of Baseline which may conflict with that one, such that the optimal design for Baseline phonology is not one which optimizes distinctness-of-phonemes in isolation. In particular, Baseline speakers seem to have a strong sense of syllables as the most salient components of word structure, and count of syllables as the obvious way to measure utterance length; and, they value having short words and short utterances for concepts that are common in their culture. Thus, we can also expect to have a large phonemic inventory to allow for the maximum number of individual syllables, maximum information per syllable, and maximal number of short words, which is in direct conflict with keeping individual phones as far apart from each other in acoustic space as possible.

By skimming all of the "Planecrash" stories (about dath ilani people who are in a plane crash, and get isekaied to various other fantasy worlds to have culture shock in), I have extracted a total of five actual Baseline words-that-are-not-names:

dath
ilan
tsi-imbi
farsheth
kelthorkarnen

And then a bunch of personal names:

AlisAthpechyaBahb
BahdhiBohobCorun
ElshormElzbethHelorm
IlleiaKaralKeltham
LimyarMerrinMiyalsvor
NemamelRanthalSalthin
ThellimVerrez

Most names have two syllables; a few (4 in this list) have 3, or maybe 4. "Bahb" is the only one-syllable names, but I don't think that is actually representative of any real name used for a dath ilani person, as it appears in a context where it is clearly meant to be transcription of the English name "Bob", as part of the set "Alis, Bahb, and Karal", standing in for "Alice, Bob, and Carol", the standard placeholder names for participants in a cryptographic protocol. "Bohob" seems to be an alternative adaptation of "Bob" that fits Baseline naming patterns better. In combination with "Bahdhi", though, the orthographic possibility of "Bahb" suggests the existence of <a> and <ah> as separate vowels. If <h> can only occur in onset positions, there would be minimal ambiguity introduced in the Anglicization by adopting that convention. <Illeia> could be a four-syllable name, but we have a negative example in that <Athpechya> is presented as a dath ilani equivalent for a non-Baseline 4-syllable name, which has been cut down to 3 syllables (assuming <y> is to be interpreted as a consonant). Thus, I am inclined to interpret that intervocalic <i> as a transcriptional variant of <y>, much like <c> is a transcriptional variant of <k>, rather than as a whole extra syllable.

As a cultural note, all dath ilani are mononymic, so there is nothing to be said about the structure of family names / patronymics.

From this data, I conclude that Baseline has a 6 vowel system:

FrontBack
Hi/i/ <i>/u/ <u>
Mid/ɛ/ <e>/o/ <o>
Hi/æ/ <a>/ɑ/ <ah>

with three degrees of height, a binary front-back distinction, and rounding in the back non-low vowels.

I would like the <e> vowel to be a little higher, to maximize contrast with /æ/, but we've got an explicit negative example where the dath ilani Merrin struggles to pronounce the French name "Félix", which
confirms that the Baseline <e> vowel is not /e/. ¯\_(ツ)_/¯

Attested consonants, based on the assumption that names are supposed to be pronounced in the most obvious possible way for an Anglophone reader, are as follows:

p - /p/
b - /b/
d - /d/
k/c - /k/

f - /f/
v - /v/
s - /s/
z - /z/
th - /θ/
dh - /ð/
sh - /ʃ/
h - /h/

ts - /t͡s/
ch - /t͡ʃ/

l - /L/ (for maximal distinctiveness from /j/, I'm assuming this to be universally a dark/velarized l, rather than copying English's light/dark allophony; the presence of this and /v/ justify the lack of /w/)
r - /r/ (for maximal distinctiveness from /l/, I'll assume this to be a tap/trill even though that's not the most natural reading for most Anglophones).
y - /j/

m - /m/
n - /n/

The lack of /g/ is not typologically odd, but the lack of isolated /t/ (assuming that <ts> is, in fact, an affricate, which seems reasonable given the existence of <ch> and the lack of other /Cs/ clusters in onset positions) in the presence of /p/ and /d/ is a bizarre gap. On that basis, and because there seems to be a fairly robust voicing distinction in the affricates, I infer that there should also be /t/ and /g/ phonemes, even though they happen to be missing from this dataset. Additionally, I feel we ought to fill in unattested */ʒ/, */d͡z/, and */d͡ʒ/, on the basis that, having decided that voicing was usefully distinctive for all other obstruents, the in-world engineers of Baseline wouldn't have just left those specific place/manner combinations unused!

Now, I want to consider the case of <tsi-imbi> a little more closely; it's the only word with a hyphen in it, and the only word with consecutive identical vowels if you ignore the hyphen. In fact, no attested words have consecutive vowels at all! I infer that this is to maximize the ease of syllable segmentation, and that the hyphen should in fact represent an additional marginal glottal stop (/ʔ/) phoneme (such as shows up in the English "uh-oh"), which shows up wherever vowels would otherwise be in hiatus. That also allows to resolve any possible ambiguity in the usage of <ah> to transcribe the low-back vowel. Something like <bahob> (a minimal change from the attested <Bohob>) would have to be read as /bæ.hob/, while /baob/ would be phonetically [ba.ʔ.ob], with extra-metrical /ʔ/, and transcribed as <bah-ob>--and /ba.hob/ would be <bahhob>.

Now, this raises a potential problem with the transcription of other consonants; while we have examples of single intervocalic <l> and <r>, there are also a few instance of doubled <ll> and <rr>--but no other doubled consonants. And if we aren't allowing doubled vowels, having geminate continuant consonants across syllable boundaries seems like a very weird choice, completely counter to the goal of making syllabic segmentation easy and unambiguous. One could imagine heterosyllabic /l.ʔ.l/ and /r.ʔ.r/ sequences, with epenthetic glottal stops separating syllables just like they do between vowels, but in the absence of written hyphens in the attested names, I am going to assume that the doubled letters are there purely for purposes of Anglophone aesthetics, and that cross-syllable geminates do not actually exist in Baseline.

That leads to the following consonants chart:

Bilabial/
Labiodental
DentalAlveolarPostalveolar/
Palatal
VelarGlottal
Plosivep bt dk g(ʔ)
Nasalmn
Trillr
Fricativef vθ ðs zʃ ʒh
Affricatet͡s d͡zt͡ʃ d͡ʒ
ApproximantjL

The fricatives are a little bit weird; I probably would have dropped θ/ð and h in exchange for x/ɣ to maximize distinctiveness and get slightly better correspondence between fricative and plosive series. But perhaps the in-world justification is that they just Wanted More Options for making more short words, and the possibility of x/h confusion pushed for pulling in the dental fricatives instead, despite the labial/dental/alveolar confusability. And for the plosives, I think it would make sense if all of the voiceless plosives were also secondarily aspirated--we've only got two plosive series, so we might as well make them as phonetically distinctive as possible!

We can also state the following apparent phonotactic rules:
  • Syllables have the form (C1)V((r)C2)(s|z)), where:
  • C1 is any consonant.
  • C2 is any consonant except /h/
  • The optional /r/ cannot occur before another /r/ in the C2 slot.
  • The optional final sibilant cannot occur after another sibilant in the C2 slot.
  • /s/ cannot occur after voiced stops/fricatives
  • /z/ cannot occur-- after voiceless stops/fricatives
Within a word:
  • A syllable cannot end with the same consonant with which the next syllable starts (nor should t/d precede t͡s/d͡z or t͡ʃ/d͡ʒ, respectively).
  • Vowels cannot occur in hiatus, and l and r cannot in hiatus with themselves, with extra-syllabic glottal stops being inserted for repair.

Making codas more complex than onsets is just weird, and I cannot justify that in-world at all, but that seems to be where the available data is pointing. Maybe it allows sub-syllable-level suffixing/infixing morphology?

We have no data on tone or stress, so I assume that by default that Baseline has some sort of non-lexical, predictable stress system--e.g., strict initial stress. However, based on character's commenting on how many syllables are required to say something in various languages, and treating syllable count as a reliable measure of how long an utterance is / how much effort it takes to express something, I infer that the language is syllable-timed, rather than stress- or mora-timed.

Making another default assumption that the maximum onset principle for syllabification applies, the attested syllables are as follows:

a ath
i il im
el elz
bah bahb
beth bi bo
dath dhi
far
he hob
ka kar kel ko
lan le lim lis lorm
ma mel mer mi
ne nen
pech
ral ran rez rin run
sal
sheth shorm
thal tham thel thin thor
tsi
ya
yals yar ver vor

The possible syllables are a much larger set!

Friday, May 6, 2022

Ord: Spherindricites

< Polybrachs | Introduction

The spherindricites are a derivative of the tetrabrachs, brought about by a mutation that caused repeated cell divisions along the vertical axis prior to limb differentiation, resulting in an elongated (spherindrical) segmented body plan with varying numbers of tetrahedral segments, analogous to the segmented worms which gave rise to arthropods on Earth. The development of segmentation was quickly followed by evolution of invaginations in the body surface to increase surface volume; due to the much higher surface-to-bulk ratio of 4D organisms compared to the surface-to-volume ratios of similar 3D organisms, and the small maximum distance from any point on the interior of a tetrabrach to the surface, small tetrabrachs and early spherindricites had no need for any specialized breathing structures, as liquids and gasses could passively diffuse through the creature from the environment. However, surface pockets which would be alternately compressed and expanded by the creature's movement, thus getting the surface closer to some internal volumes and actively pumping fluid past them, allowed spherindricites to grow to much larger sizes.

The least derived spherindricites, which retain minimal differentiation between their segments, primarily occupy benthic and burrowing niches and are an exceptionally diverse group, just like their close Earthling analogs, the annelids, coming in a wide range of sizes and with a variety of reduced or specialized limb structures. However, one free-swimming group of spherindricites developed encephalization--the fusing and specialization of segments at the mouth end of the creature, which had transitioned from the bottom to the forward orientation, creating creatures with distinct heads and their fronts. The forwardmost set of limbs specialized as mouthparts for grabbing and manipulating food; two of the second-segment limbs specialized as olfactory sense organs, while the remaining two developed more advanced eyes from the terminal ocelli, with ocelli disappearing from the remaining limbs.

One group of cephalic spherindricites, the malakichthys ("soft fish") directly developed a new up-down axial symmetry breaking, with one limb from each body segment specialized as a dorsal stabilizing fin and the remaining three becoming propulsive limbs radially arranged in the sideways plane.

The remaining cephalic spherindricites developed internal mineral storage structures, which would serve as the basis for structural bones. This group further diverged based on three different approaches to developing their own secondary vertical orientation:

  1. Polysphenoids dropped two limbs from each body segment, resulting in alternating left/right and ana/kata-aligned limbs, such that the tips of each limb from any two adjacent segments form the vertices of a disphenoid.
  2. Trilaterians dropped a single limb per segment to allow planar compression, resulting in adjacent body segments forming alternating triangular antiprisms, with each set of limbs arranged in an equilateral triangle in the sideways plane.
  3. Quadrilaterians simply rearranged their four limbs per segment into a square arrangement in the sideways plane rather than a tetrahedron.
All three of these groups would later give rise to different land-dwelling clades which would specialize in different ecological niches suited to their divergent limb arrangements.

Wednesday, May 4, 2022

Ord: Polybrachs


As we saw in the introduction, Ord is a gigantic place. There is enough room on Ord for life to have arisen completely independently several times, and for hundreds of completely unrelated alien civilizations to develop--even though, if they knew which way to walk, they could find each other within a few thousand kilometers.

We will be looking at the development of only one branch of animal-like life. At the highest level, this branch of independently-evolved animal life in Ord's oceans and seas can be split into three groups: sponges, flatworms, and polybrachs. Ordian sponges are much like Earthling sponges--simple sessile colonies of cells which filter food particles from water flowing through them. Ordian sponges, however, are "more spongy"--more porous--than Earthling sponges can be. This is because the four-dimensional space they live in permits qualitatively larger holes, of a fundamentally different kind than exists on Earth. Ordian matter can have linear holes punched through them, just like we can, but they can also have planar holes--and Ordian sponges do, because it allows more water to flow through them from more directions.

Flatworms are spheroidal organisms; they would not look flat to us, but they are flat on Ord, as their entire lower 3D surface can contact the ocean floor simultaneously, and they have very little extent in the upwards direction. These organisms show minimal layered tissue differentiation. Simpler species are completely spherically symmetric, and simply absorb nutrients from stuff they crawl over as they inch their way across the ocean floor. Some more derived species, however, have established a front-back axis specialized for motion; such creatures have more elliptical bodies, and can often be found freely swimming in the ocean bulk.

The flatworms may eventually produce more interesting descendants, but for now the most complex creatures are the polybrachs. These are also spherically-symmetric creatures with an up-down axis, but they have specialized arm structures improving their ability to navigate and manipulate their world. Their symmetrically-arranged body segments and attached arms make them somewhat analogous to Earthling starfish, but with one major difference: while different species of starfish may have have any number of equally-spaced arms, due to the fact that there are infinitely many regular polygons in two dimensions, Ordian polybrachs are restricted to certain fixed numbers of arms corresponding to the faces (or vertices) of different platonic solids, of which there are only a finite number. The polybrachs have further specialized into three major clades based on their early embryonic development: tetrabrachs, cephalobrachs, and dodecabrachs.

In this figure, we can see the 3-or-fewer-dimensional stages of embryonic development from a single egg cell up to 4 or 8 cell structures, which allow the identification of different clades. Tetrabrachs (whose embryonic shape is labelled with a T in the preceding diagram) undergo only two cycles of cell division before adopting a maximally-dense tetrahedral arrangement of cells. The third cell division extends the embryo into the fourth vertical axis, with each tetrahedral segment going on to develop into a portion of the central disk and associated arm. Tetrabrachs tend to specialize in benthic habitats, like symmetrical flatworms, but are capable of much more active lifestyles.

Cephalobrachs (whose embryonic shape is labelled with a C) maintain a more open cellular structure through three divisions, producing a cubical arrangement of cells from which can develop eight distinct equally-spaced arms, corresponding to the faces of an octahedron. Their fourth cycle of division does not produce additional cells associated with an octahedral segment, though; rather, the top cube develops in an entirely different direction from the bottom of the creature, producing a glomular (4-dimensionally spheroidal) head / body cavity. similar to an Earthling cephalopod. Also like cephalopods, many species of cephalobrachs are capable of walking or dragging themselves along the ocean floor, but they are more often found in free-swimming niches.

Dodecabrachs (whose embryonic shape is labelled with a D) maintain an open square arrangement for two cycles of cell division, but then fall into  more close-packed square antiprism arrangement for their third. This third split already corresponds to the division between upper and lower body segments; a further cycle of division could establish cubical/octahedral symmetry, but that is not, in fact, what happens. Instead, several more cycles of cell division produce two joined spherical disks of cells, begin differentiating into distinct organs much later, eventually producing an arm section with either twelve segments in dodecahedral symmetry (hence the name of the clade) or, more rarely, twenty segments in icosahedral symmetry. The 12 vs. 20 choice seems to be easy to flip between as new species of dodecabrachs evolve, but there is a more fundamental division between sessile and medusoid dodecabrachs. In the sessile branch of the family, the body segment extends into a long spherinder (a sphere extruded into the fourth dimension, analogous to a 3D cylinder) which acts as a stalk to attach the animal to a solid surface, with the arms acting to filter nutrients from the water. In the medusoid branch, the body segment instead expands into a wide spherical disk. In some species, the disk remains relatively small such that the arms are free, and swimming is accomplished in a manner similar to an Earthling feather starfish; in most medusoids, however, the upper disk grows large enough to can curve around and enclose the central arm, disk rather like the bell of a 3D jellyfish, allowing jet propulsion by contracting the bell to expel water.

All polybrachs have ocelli (eyespots) at the ends of each of their arms, a feature which is believed to have been inherited from early flatworms before the two clades diverged; spherical flatworms also frequently have eyespots on their upper surfaces, in a variety of regular, semi-regular (corresponding to Archimedean solids) and random arrangements. Within the polybrachs, dodecabrachs appear to be the least-derived clade, with cephalobrachs and tetrabrachs each having split off from a dodecabrach ancestor after settling onto a power-of-two number of arms, which then permitted differentiation decisions to drift earlier in the stages of embryonic development.

Tuesday, May 3, 2022

The Natural History of Ord: Introduction to the Universe

Introduction

The Polybrachs
The Spherindricites

Ord is an inhabited world in an alien universe with 4 spatial dimensions rather than our usual three. It's a different bubble of stabilized space in our eternally-inflating multiverse. This has wide-ranging effects on geometry and physics, and thence on biology. Planets like Ord don't orbit stars in closed ellipses, and they don't have well-defined axes of rotation. From atoms up to galaxies, the entire universe is organized differently from our own. What we are mainly concerned with is the middle scale: how living things develop in four-dimensional seas and on three-dimensional continents. But it will be useful to investigate some high-level features of the universe those creatures are developing in, and the world they are developing on.

First, we will establish a scale. Comparing sizes between universes with different physics, let alone different dimensionalities, is a tricky thing; 1 meter here doesn't inherently mean anything on Ord, and units can seem to match up in different ways depending on what specific things we are comparing. Lets suppose we wanted to somehow "import" a human explorer from Earth to Ord; their normal 3D body would completely fall apart in a 4D space. We would have to somehow re-arrange their bits and pieces into a 4D form. But however we alter the body, we will want to keep the mind--and thus, the neural connections--intact. So, every neuron will need to be accurately mapped and reconstructed--and the number of neurons in an Earth human and an Ord human can be assumed to be the same. Since that will give us some idea of the level of biological complexity necessary for civilized life to arise on Ord as it has on Earth, let's adopt that as the basis for our standard of comparison: we'll declare neural cells to have the same linear size on Ord as they do on Earth. Human neuron bodies are around 100 microns across on average. If we deconstruct a human into individual cells, adapt each cell for Ord's universe, and then re-assemble in a stable 4D arrangement, the resulting explorer would be between 14 and 16 centimeters high--but composed of tens of thousands of times more atoms per cell!

Simply equating atoms between Earth and Ord does not accurately reflect the needs of biological systems. Four-dimensional Ord cells have a much larger proportion of their mass bound up in 3D surface membranes than we do in 2D surfaces, and thus a lower proportion available for interior structures and functions. Thus, on average, they do require thousands of time more atoms to achieve the same functions--we couldn't build an body capable of supporting our explorer's intelligence just by using the same number of atoms on Ord as we do on Earth. However, when it comes to linear measurements, atomic radii are much more precise than average biological cell sizes. Thus, in order to compare the sizes of organisms with the planet they live on, we can declare than Ord's four-dimensional atoms have the same range of radii as our three-dimensional atoms (although their internal compositions can be quite different)--exactly 1 angstrom.

To retain heat and maintain geological activity over geological time scales, Ord would need to have about 4/3rds as many atoms between its surface and its core as Earth does, to maintain the same surface-to-volume (or area-to-bulk) ratio, and thus the same heat loss rate. Earth is about 6.378x10^16 angstroms (average atomic radii) in radius, or 3.189x10^16 atomic diameters. Ord, it turns out, is about 8.5x10^16 angstroms in radius--which means it has about 2.37x10^17 times more atoms in its 4 dimensional bulk than Earth does in its 3 dimensional volume! In terms of atomic mass units, Ord is about 1/4 to 1/3 as massive as our entire galaxy! Fortunately, between a totally incomparable gravitational constant (it has different units in Ord's universe than in ours), gravity following an inverse-cubic law, and flexibility in how we measure units of time, all that extra material still only results in surface gravity comparable to Earths!

Now, about time... cesium atoms and quartz crystals don't exist on Ord (atoms with the same nuclear charges have radically different chemical properties), and pendulums depend on gravity and on our somewhat arbitrary choice of how to measure lengths, so it would seem that there is no really good method of establishing a correspondence. Furthermore, 4D brains are more tightly packed, so nerve signals travel faster, and thought occurs faster than it would in the same neural network "squashed" into a mere three dimensions. Nevertheless, we'll acknowledge the 4D brain architecture as natural for Ord, and declare that what our transposed human explorer perceives as 1 second passing (e.g., when mentally counting out "one Mississippi, two Mississippi," etc.) is one second, and everything else can follow from that. We note that objects seem to fall at a normal-feeling rate, and objects on the scale of our 15-cm-tall explorer's body seem to take normal amounts of effort to push, pull, and lift, and the gravitational constant and inertial mass units can be calculated from those observations.

Now, how much surface does Ord have? Using our angstrom equivalence, it comes out to about 2x10^28 cubic kilometers. Compare with Earth's approximate 5.1x10^8 square kilometers. Or, 2x10^37 cubic meters, compared to Earth's 5.1x10^14 square meters. Directly comparing a 3D surface volume to a 2D surface area is a bit tricky, but that's about the same volume as a sphere of space 23 AUs wide--larger than Saturn's orbit in our solar system! When intelligent creatures like our universally-transposed can be a mere 15 centimeters in height, that's a lot of space for life to fill!

From that, you may guess that Ord's universe is much more densely packed with matter than our own universe is--and you would be right! It has to be, or, with that whole extra dimension to move around in, nothing would ever run into anything else, and nothing interesting would happen! It's almost a blessing, in fact, that two-body orbits are unstable--that forces matter to collapse into interesting structures despite the extra room to expand in. And Ord does not orbit a single star; but, it does have a somewhat chaotic orbit through a globular (or glomular) cluster of stars along with many other such planets, with days and nights distinguished by which side of the world is closer to the brighter, denser center of the cluster. The space-filling distribution of matter in the cluster produces an effective potential with a lower exponent--not quite a harmonic potential as it's not completely uniform, not exactly inverse-square, not even exactly an integer or even completely constant--which, in combination with close encounters with individual other bodies, produces the chaotic nature of Ord's motion. Some day, Ord may fall into the core and be burned up, or be ejected as the cluster evaporates, but for the functional equivalent of billions of years it is mostly-stably bound, wandering through a space of roughly-constant illumination.

Many of the stars in Ord's cluster are not a whole lot more massive than Ord itself, and may someday cool down to become additional planets. How can this be? Well, that requires looking way down at the other end of the size scale, at how atoms are built. The difficulty of fusion in Ord's universe follows a much steeper curve than in ours. In fact, monoprotium can fuse at near absolute zero, if the density is high enough to make collisions probable! This is because, while the atoms of Ord's universe are made out of close analogs to our own protons, neutrons, and electrons, they are put together quite differently. When there is only one electron, it exists almost entirely overlapping the proton, controlled by the interior harmonic potential. With 4 spatial degrees of freedom and 3 quantum spin states for electrons, elements up to duodecium, with twelve protons and electrons and no neutrons in the lightest isotope, are all chemically inert and nuclearly sticky! Only at atomic number 13 do we encounter an atom with an external electron orbital and a nucleus with a distinct positive charge with can repel other nuclei. Ord's chemical equivalent of hydrogen is thus as heavy (in terms of atomic mass units) as our carbon-13 isotope, and much smaller than that in terms of nuclear to atomic radius ratios. With many more orbitals available for electrons to fill (e.g., there are 4 rather than 3 p-orbitals, each of which can hold 4 electrons in different spin states) Ord's periodic table is significantly stretched horizontally, with many types of atoms and bonds that have no analog in our world--and with nuclear-internal electrons and supplies of easily-fusible duodecium isotopes around, Ord has many more elements with higher atomic numbers than we do for chemistry, and biology, to play with.

Thursday, April 28, 2022

Geography on a 4D World

As noted in my last post, planets in a 4-dimensional universe would have 3-dimensional surfaces. What does that mean for geography?

First off, random landscapes in higher-dimensional spaces are less likely to have local minima and maxima. That's why gradient descent optimization works--if your problem space has enough dimensions, you can just start anywhere you like, head downhill from there, and be pretty sure you'll converge on the optimal solution--the global minimum of the landscape--without getting stuck in any local valleys first. 3D space isn't super high dimensional, but it is higher than the 2D surface of our world, which means fewer local minima and maxima. Fewer lakes, and fewer mountain peaks. And at a large scale, more likelihood of a single fully-connected global ocean (which Earth already has anyway) and a single fully-connected supercontinent (which Earth has had periodically). A 4D world with an Earthlike distribution of land and water is thus less likely to have any Australias or South Americas--large places where life can evolve in divergent ways from the rest of the world.

Rivers are still one-dimensional. No matter how high the dimensionality of space, "downhill" is still a vector! But how large and complex will river systems be? In a 2D space, random lines are guaranteed to intersect, and mergers intersections of rivers to form larger rivers with tributary systems are therefore common. Random lines in 3D space, however, will not intersect--and with more space to move around in, rivers on a 4D world will not merge quite as easily as they do on Earth. That doesn't mean they won't merge at all, though! For one thing, river courses aren't random, and rivers that begin near each other are likely to have downhill vectors that also point towards the same place. Additionally, 3 surface dimensions are not enough to avoid knots! In fact, 3 is the only number of dimensions in which one-dimensional curves can form knots and braids. (Braided rivers on 4D worlds could actually be literally braided!) And as plain-crossing rivers migrate over time, they become highly likely to intersect, for the same reasons that cords always get tangled in your pocket. However, being one-dimensional, rivers do not form natural borders on 4D worlds the way they do on Earth. Terrestrial creatures can always just walk around them, as easily as you can walk around a lamppost.

Mountains, however, are a different matter! Hot-spot volcanic mountain chains will still be one-dimensional, but they don't really form borders on Earth, either (although they will form rare local maxima in the terrain). Mountain chains produced by plate collision, however, can form borders! On Earth, plate boundaries are one-dimensional, and so mountain ranges seem analogous to rivers in forming natural one-dimensional borders--but while rivers are one-dimensional in any universe, plate boundaries are not! Tectonic plate on a 4D world are 3D structures, with 2D boundaries, and mountain ranges created by plate collisions will thus also be spread over a 2D area which can bound a 3D region. So, mountain ranges form natural barriers on 4D worlds just like they do on Earth.

A 4D world would also not necessarily have distinct climate zones by latitude--not unless it had only a single component of rotation. That is possible, but in general any object in four dimensions can rotate in two independent planes simultaneously. Each rotation induces a circular pole, which is coincident with the equator of the complementary rotation. While these two great circles are objectively deducible, though, they are not perceptually salient, and have little or no climatological significance. Essentially, there are no fixed point on the surface of a 4D world--everything moves under rotation somehow. This makes celestial navigation... not straightforward.

Four-Dimensional Urban Planning

At the beginning of this month, I came across this Twitter thread describing a city plan by Leonardo da Vinci. They key concept is to make use of altitude to separate essential functions into different planes--essentially, vertical zoning. Residential areas are on top, over pedestrian pathways, then the commercial and transportation district, and bulk shipping canals on the lowest levels. Separation of zones by planes allows keeping the elements of each zone close together with other zones out of sight, but still easily accessible by moving a short distance through another dimension.

While modern cities do make some use of transportation tunnels (subways, car tunnels, underpasses and overpasses) and stacking residential apartments over commercial spaces in multi-story buildings, a combination of gravity and coordination issues (how do you build new stuff on top of, or underneath, another building?) makes the full realization of da Vinci's 3D city rather difficult. However, there are fictional environments in which it makes perfect sense!

Within the confines of our own universe, 3D zoning makes perfect sense for a large space colony in zero-g. But da Vinci's city plan is also ideal for creatures living in a 4-dimensional universe!

Planets in 4 dimensions are hyperspheres with 3-dimensional surfaces. It is thus possible (and indeed, entirely natural) to build a 3-dimensional city in which every building sits directly on the ground, and there is no need to worry about gravity overcoming the structural strength of other buildings "below" you. Just as unplanned human settlements tend to grow in a roughly circular pattern, the "organic" city growth patterns of a 4 dimensional people would most naturally tend towards blobby spheres--and they can be much more compact. High-rise apartment population density is the natural state for early 4D cities, not a result of advanced construction & logistical technologies, with supplies able to brought in to a city and wastes removed over a whole 2D surface rather than a 1D border.

Zoning is not obviously a more obvious concept in 4 dimensions than in our 3, but once someone comes up with it, it becomes far easier to actually implement. Confining each district to a plane makes internal navigation only as difficult as it already is in our two-dimensionally-arranged cities, and density can be recovered if the 4D people simply learn to build upwards, exploiting their 4th dimension as we exploit our third. Thus, planar zones such as da Vinci envisioned can be constructed next to each other, without needing to be stacked on top of each other. And thus, 4D urban planners could achieve a very high degree of logistical efficiency and provision of utility services for a higher standard of living at a very low level of material technology. 

Tuesday, April 12, 2022

A Literature of Sign

Last month, I came across the article Toward a Literature of Sign Language, by Ross Showalter, and I thought "This is exactly what I write about! I have to find some way to use this!"

Sign languages have a body of literature; there are Deaf poets who compose in ASL, Deaf storytellers who perform in ASL, and I am certain the same is true for other sign languages; their literature is merely encoded in video, rather than text. And that's totally valid on its own... but if you want to include Deaf, or otherwise signing, characters in a book for general audiences, relying on video isn't going to cut it! So how do you incorporate sign into English text, when no sign language currently has a widely-accepted standard orthography?

I have written about sign language representation in fiction 5 times before (1, 2, 3, 4, 5)--kind of a shockingly large proportion given that this is only my 30th entry in the Linguistically Interesting Fiction series--but 4 out of those 5 examples are of sign language in movies or TV; only one, in Rosemary Kirstein's The Steerswoman, involves depiction of signing in text. Two.. and a half strategies are used there--mostly, a combination of simple translation into English, narrow translation that attempts to preserve the syntax of the underlying sign, and descriptions of the performance of signs. All three of strategies which Ross acknowledges, although narrow translation comes very close to glossing, a strategy which author and ASL interpreter Kathy MacMillan explicitly rejects. Ross has a slightly more poetic take on the issue:

Therein lies the contradiction of this method: to render ASL in written English with its syntax intact is to create a strange tension. There is the grammar of ASL, preserved and captured only in syntax—but syntax is only part of a language. To try to render ASL in writing is to suspend yourself halfway between ASL and English.

To do justice to ASL, we need to treat it on its own terms.

And yet, simply translating into fluent English isn't a whole lot better! Why? Well, for all the same reasons that you might want to include any examples of secondary language in Anglophone fiction! Because language is identity. To quote Ross again:

If you use sign language, you sublimate yourself within the Deaf community. You step away from English and the mainstream for a space and language outside standard expectations.

To see sign language and English as interchangeable ignores the cultural legacy that comes with sign language. It ignores the storytelling already shared through signing.

If you're going to include French, then include French, like Graham Bradley did in Kill the Beast--if you just let it all be English, you lose the cultural immersion of the language. And if you are going to include ASL (or any other sign language), then include ASL, for goodness' sake! If I may be permitted a smidge of hyperbole: if you just turn it all into English, then what even was the point?

Ross does not offer a complete solution to writing sign into literature, but he does propose a perspective: signs are made with the body, and portrayal of sign must center what the body does. I suspect, therefore, that out of all the portrayals of signs in The Steerswoman, Ross would be most pleased with the brief instances in which the shapes and gestures are directly described. (Slightly more exploration of the physical-description approach to signs is undertaken in The Lost Steersman, a later book in the Steerswoman series, in which this approach is forced by the fact that the viewpoint characters don't actually understand what is being signed, and so it cannot be translated; but, that's about signs made by sometimes-murderous aliens which might only be paralinguistic anyway, so not really the best example of human sign language representation, although perhaps useful for technical reference.)

For my own part, I have written one story (for submission to an anthology; sadly, not accepted, so who knows when it will find another potential home) which involves signing, when two people who speak unrelated sign languages meet underwater, where they cannot speak orally. Having read Ross's point of view, I feel pretty good about how I handled things there; each character's individual point of view is written with their thoughts rendered in English, because something must be made comprehensible to the reader, but what they each sign is described from the other character's point of view in physical terms, as handshapes, poses, and motions.

Now, is that the best way to do it? I have no freakin' idea. I'm not Deaf; I don't even speak ASL. I think sign languages are neat, and I've studied some of them as a linguist, just like I've studied Coptic, Warlpiri, and Ingush, but that doesn't mean I can actually speak any of those! I am not a member of the Deaf community, and I can't give advice on how they would like to be represented in written literature.

But, like Ross, I'd sure as heck like to see more people give it a try.

If you liked this post, please consider making a small donation!

The Linguistically Interesting Media Index

Monday, April 11, 2022

Some Thoughts on Zimvisz

Zimvisz is a constructed language by Sheldon Ebbeler. It was presented at the 7th Language Creation Conference, but the video of that presentation is... not great. Fortunately, I was able to get in touch with Sheldon and acquire a copy of the presentation slides with speaker's notes, which contain a decent amount of information about the language.

The central conceit of Zimvisz is that all utterances are encoded in integers--with the grammatical constituents of an utterance being encoded as factors of the complete utterance!

The idea of encoding words as numbers is not entirely new; Gottfried Leibniz (one of the inventors of calculus and Isaac Newton's rival) even considered attempting to construct a philosophical language that would allow statements and concepts to be manipulated algebraically. And, of course, every word on this screen is encoded as a binary number in computer memory! And Jörg Rhiemeier has coined the term "arithmographic language" to refer to a theoretical language in which semantic primes are encoded as prime numbers, and semantic composition is represented by multiplication, such that complex concepts get composite numbers. Naively implemented, this would seem to be an inefficient use of the integers, since only square-free numbers would be assigned unique meanings (because why would you ever need to repeat a semantic prime in a compound?). Zimvisz extends this idea to complete sentences, and in so doing uses one problem to solve another!

By multiplying nouns and verbs (on rather, arguments and predicates, as Zimvisz does not distinguish nouns and verbs lexically) to produce single numbers representing entire clauses, Zimvisz runs into the problem of how to encode differing semantic relations; there is no syntax--multiplication doesn't preserve ordering, after all--so "put the subject first and the object last", for example, doesn't mean anything. And it's worse than that--there's no morphology either, as any number that might be assigned to an affix or a function word also gets mixed in with all the rest with no way to associate it with a particular other factor of the final clausal number. Sheldon solved this problem by giving a function to the non-square-free numbers--the exponents of a given factor serve to identify its syntactic function! It is as if a "normal" linear language used various degrees of repetition, and only repetition, to mark syntactic relations--and with no contiguity required for the repeated elements of any constituent!

While this is an ingenious mechanism, however, I think an avenue for optimization has been missed; while Zimvisz does not lexically distinguish nouns, verbs, adjectives, and adverbs, it does retain the four distinct syntactic positions of nominal head, verbal phrase head, nominal modifier, and verb phrase modifier. If we look at a so-called non-configurational language like Warlpiri, for example, we can see that syntactic headedness, and the head-modifier distinction, is not actually semantically necessary. A Zimvisz-like language could thus cut the number of distinct exponents needed for encoding syntactic relations nearly in half, reducing the total repetition of various constituent factors and considerably reducing the integer magnitude of many clauses.

Now, while this is a geniusly executed idea, I think it is worth asking the question "how practical is it, really?" Obviously Zimvisz could not be fluently used by humans! And indeed, it is supposed to be used by 4-dimensional aliens called Zimfidz, who can be assumed to have different mental abilities than humans. A key point, however, is that extracting the semantic content of a Zimvisz utterance requires factoring numbers that can have a very large number of digits! (A fact which is exacerbated by the logically-superfluous proliferation of syntactic categories as noted above.) That is a famously hard problem--so much so that it forms the basis of the RSA crypto system. Quantum computers running Shor's algorithm can theoretically factor large numbers "efficiently"--but "efficiently" in this case just means "in quadratic time rather than exponential". Thus, a sentence with twice as many digits--corresponding very roughly to twice as much semantic content--will take a little over four times longer to comprehend, even if the Zimfidz have quantum-logic brains. Incidentally, parsing linear speech is, in the general case, a problem with cubic time complexity--but human languages tend to use not-the-most-complex-possible grammars, and we focus on only the most probable potential structures, throwing out unlikely hypotheses very aggressively as we hear more and more of a sentence, such that the vast majority of sentences produced by humans can be comprehended in linear time--i.e., it only takes longer to understand when it also takes longer to say, despite the theoretical cubic bound. (The rare exceptions to this tendency are garden-path sentences.) So, is there some way that Zimfidz could structure their utterances to make factoring especially easy along high probability paths? Eh, maybe? But, I kinda doubt it. Not every sentence is going to have a conveniently small prime factor which can be rapidly extracted and whose semantics can be used to predict other probable factors, the way that the first word of any human sentence is immediately comprehensible and can be used to predict possibilities for what comes next. And without that kind of predictive shortcutting, Zimvisz seems more like a particularly clever code than a real functioning language, suitable for conversation. Nevertheless, if it showed up in a sci-fi story, I'd give it the benefit of the doubt!

As a side note, one might reasonably wonder if the difficulty of factorization is a problem for any arithmographic language--but no, it is not necessarily so. Factorization is only necessary in this case because Zimvisz uses multiplication for productive syntactic purposes. If multiplication of primes representing lexemes is only used for compounding or morphological derivation, to produce new lexemes, the meanings of compound words can simply be memorized like any other word, and real-time factorization is unnecessary.

Next, let us consider the writing system, which consists of linked knots. There are 29 basic knot "letters", corresponding to the first 29 primes, which can be linked together with "operator" knots to form any arbitrary prime, and then further linked to form the composite numbers of a Zimvisz clause. This is a fully non-linear writing system, corresponding to the non-linearity of the "spoken" language--but it has a major advantage over the "spoken" language in that the factoring is already done for you, as composite-number sentences are represented not as opaque quantities, but as actual agglomerations of their individual factors, which can be individually viewed and counted. This is where the 4D nature of the Zimfidz becomes really relevant--while Zimvisz writing looks a mess to our eyes, the whole agglomeration is immediately visible with no occlusions to 4D eyes with 3D retinas. Furthermore, they are able to write by forming rings into knots without ever having to cut or join the strands, thanks to the existence of an extra spatial dimension with which to move strands around each other. The Zimvisz writing system sadly does not use the Conway enumeration; I can't call that a problem, but having seen one knot-and-number-based written language, I do think it would be neat to see one that did make use of Conway notation in some way. The only hesitancy I have with the Zimvisz writing system is that it does not impose any particular standard representations of the basic knots, or a standard viewing orientation--all topologically-equivalent links are semantically equivalent. That makes a certain amount of sense, but it requires that readers be potentially capable of solving the knot recognition problem, whose lower complexity bound is currently unknown. But perhaps that is less of an issue for creatures with 3D retinas; again, if it showed up in a sci-fi novel, I would give it the benefit of the doubt.

Sunday, April 3, 2022

Some Thoughts on Khangaþyagon

Pete Bleackley's Khangaþyagon is an artlang developed as the ur-language and magical language of the fictional world of Huna. It is also meant for use in a fantasy novel, so how accessible it is to potential readers is a relevant consideration. As an inherent feature of this fantasy world, it is not subject to historical evolution, and is presented as having come into existence fully formed, with no need for any naturalistic explanations for its features. Nevertheless, it doesn't go in for exoticism in any significant way, and seems to me to be a very ergonomic language that could very well have arisen naturally.

The phonology is not terribly weird from the perspective on an English speaker, with the only "exotic" bits being a distinction between flapped and trilled "r" (familiar from Spanish), the presence of a velar fricative (familiar from Russian and some dialects of German), and the (rare) possibility of using "ng" as a syllable onset or "h" as a coda. With a mostly-familiar-to-English phonology, the romanization is also very straightforward. Most letters have exactly the values you would expect; there are digraphs for sh, zh, and kh, and a few diphthongs, but Pete opts for the archaic English letters þ and ð for the dental fricatives. This seems to be a deliberate attempt to evoke the mythic past, in combination with a runic-style native alphabet, "partly because runic scripts appear to have been used in magical practices". The romanization uses apostrophes, but sparingly and in a reasonable functional way, to divide letters which would otherwise form a digraph. It thus avoids the "fantasy apostrophe syndrome".

The phonotactics are intuitively derived, with coinage of new word being based entirely on what Pete thinks feels right rather than strictly following engineered rules, but Pete has reverse engineered the emergent phonotactics for description in the grammar. The stress system is interesting, because stress placement is fully predictable from morphology--but not from surface segmental sequences or word boundaries. Stress placement can thus occasionally be contrastive, distinguishing compound words from words with affix sequences that happen to look like potential roots. It's half-way in between fixed and lexical stress, and similar in function to--thought much simpler than--the Warlpiri stress system.

The morphology is extremely regular and LEGO-block-like. There don't appear to be any morphonological processes that alter roots or affixes, with the exception of a couple of fully predictable epenthetic vowel insertions. The only significant bit of morphological complexity is a lexically-determined variation in the suffix for the active participle of verbs. This fits in well with the conceit of Khangaþyagon as an unevolved ur-language (although I suppose there's no particular reason why a divinely-appointed ur-language shouldn't be horrendously complex, and full of fusion, suppletion, and irregularity, but I guess Pete's intuition and mine agree on this point), and seems like a good design choice for a language meant to support a novel, as it keeps things transparent and as easy as possible to work out for the potential reader. Lest this seem unnatural, Turkish is also famous for extremely regular concatenative morphology (although it does also have vowel harmony going on, which Khangaþyagon lacks), but an even better comparison in this case might be Warlpiri (mentioned above), or other related Australian language, which shares the feature that head-modifier agreement consists of copying the exact same sequence of inflectional markers on every agreeing stem. Unlike Warlpiri, though, Khangaþyagon still maintains a strict distinction between adjectives and nouns, and between adverbs and verbs, and does not take advantage of this agreement system to allow variable word order or discontinuous constituents. That makes the repetition seem a little bit excessive at times, but again this seems tailor-made to make the language as easily accessible as possible to any potential novel readers.

Khangaþyagon does not have distinct determiners, instead affixing demonstrative, interrogative, and basic quantificational morphemes directly to nouns. However, there is a split between nouns and pronouns in terms of which types of modifiers can occur attached to them (fewer for pronouns than for nouns), which can be used to argue for the relevant existence of separate D and N levels in Khangaþyagon syntax, which(as a strong proponent of the DP hypothesis myself), I find quite lovely. Khangaþyagon also has a well-developed nominalized clause construction following an ergative case-marking pattern, which is both useful and also conforms to my personal preferred theories about noun phrase structure (and CP/DP parallelism).

Khangaþyagon is most head-initial, with basic VSO order, but there are several notable exceptions. There are, for example, no prepositions, and adposition-like functions are handled by a variety of inflectional suffixes--which, if Khangaþyagon had any history, I would assume were derived from postpositions. Additionally, nominal compounds are head-final, and conditional clauses appear before their main clause, rather than after (which would seem to have a straightforward information-structure justification, as it's nice to know as soon as possible when a statement is not actually an unconditioned assertion). Additionally, Khangaþyagon has a topic-fronting construction, using a specific topic-marking affix, but this is only used for subordinate clauses of indirect reported speech, which seems to me like a very strange restriction. Topic marking is a useful thing--if you've got it in one part of the language, why not also use it elsewhere?

Overall, the grammar is nicely organized, compact, and pleasant to read. However, there are a few things I would've liked to see better explained:

Modal verbs

The grammar lists 4 modal verbs, with simple English glosses. English modals, however, are highly ambiguous, and the precise meanings of modals vary quite a bit even between closely related languages; it would thus be nice to have a more detailed description of the semantics and usage of these verbs.

The Negative

The suffix "-she" is said to form "antonyms"; but, there are a lot of different kinds of antonyms! Again, it would be nice to have more explanation.

Predicate Adjective Constructions

Predicate adjectives form compounds with verbs, but there is a lack of actual examples, leaving it unclear what the compound element ordering is supposed to be.

Numeral placement

Numbers are treated as adjectives, but syntax examples for adjectives don't clarify where numbers should be placed--close to the noun, far away, or just wherever?

Subordinate clauses

Apart from conditional clauses and reported speech clauses, the only subordinate clauses explicitly discussed as such seem to be relative clauses with resumptive pronouns. This leaves me wondering how complement clauses work (do they have to universally nominalized?), along with various type of adverbial clauses (e.g., purpose clauses, result clauses, temporal clauses).

Finally, quite a few examples, especially in the earlier sections of the grammar, are missing interlinear glosses.

Now, lest this seem overly critical, let me repeat that on the whole, I found the grammar very well organized and pleasant to read. It's one of the nicer bits of conlang documentation I have read, in fact. But, that doesn't mean it can't get better! And the language itself, apart from the form of its documentation, seems to be very well constructed to meet its stated purposes and intended usage, and has a nice aesthetic effect for me.


Some Thoughts... Index

Tuesday, March 29, 2022

Some Thoughts on AllNoun

AllNoun is a constructed grammar by Tom Breton. It does not actually claim to be a constructed language, because it has no vocabulary--one simply borrows vocabulary as convenient from whatever source language you like to slot into the AllNoun grammar (typically English vocabulary). AllNoun was originally inspired by Glosa, which has a single syntactically flexible lexical class with syntactic functions disambiguated by heterogenous function words. The aim of AllNoun was to take that one step further--eliminate the function words, and produce an entirely monocategorial grammar. In the process, AllNoun became one of the earliest documented language projects to invent the argument-tagging approach to verblessness, as is also seen in Machi, which I previously reviewed.

The central conceit of AllNoun is that any word, with any semantics, can serve either referentially, or as a role marker. This is explained quite vividly in the following excerpt from the AllNoun FAQ:

Question:
Aren't there really two classes of noun, the "parts" and the "roles"?

Answer:
No, they really are interchangeable. Words may tend to be more useful as
roles or parts, but any word really can fit in either category.

As a limiting example, consider that in his column (and later book)
Metamagical Themas, Douglas Hofstadter once asked, in complete
seriousness, "Who is the Dennis Thatcher of America?". By this he meant,
"Who or what in America plays the same role that prime minister Margaret
Thatcher's husband plays in England?"

It seems to me that if the proper noun "Dennis Thatcher" can be a role,
then anything can be.
This is a feature that I have not seen totally replicated in in any other conlang yet, which is kind of a shame. However, despite this gem of a core concept, AllNoun is on the whole a failure. And I don't feel particularly bad about saying that, since Tom himself has been open about problems that he saw in AllNoun a few years after its initial publication. However, I think Tom is not entirely correct about what the real problems actually are. I'll just go through them one-by-one:
Problems? My treatment of adjectives is the big one. It treats subsective and intersective adjectives well enough, but its intersective mechanism is not sufficient for nonsective adjectives. For instance, a "former friend" is not any sort of friend, and not easily seen as a member of "the set of all former things". "Alleged thief" is not the intersection of all thieves and all "alleged things"
This is a legitimate issue, but not as big of one as Tom seems to think. If you regularize the syntactic semantics of English adjectives, you get the same problem--but English merely allows lexical specification of modifier semantics (as do most natural languages). Granted, doing that in AllNoun would disrupt the engineered simplicity and elegance of the syntactic system, but there is another solution, which I employed in WSL: just treat non-intersectives as relations between the target of modification and the final referent. Anything can be a relation (or role) in AllNoun,  so why not non-intersectives?
Another difficulty, which Paul Doudna pointed out to me, is that it is unclear how propositions are to be expressed. At the time, I believed that a top-level nominal should be interpreted as existential. Eg, to say "I ate the apple" you'd express "an eating of the apple by me in the past", and it would be understood as existential "(There is) an eating of the apple by me in the past". However, it isn't as expressive as I would like. Paul also suggested it was weak at expressing fictional contexts, which aren't really existential, but I'm not sure it's any worse than natural language.
This basically comes down to a an aesthetic concern, rather than a functional one. Which is perfectly legitimate--if Tom doesn't like how the predication structure turned out, that's entirely his prerogative. But it does not constitute a functional failure of AllNoun as potential grammar for a language.
Non-declarative moods (questions, imperatives) worked clumsily, by using the declarative mood.
And yet, there are natural language which get by just fine with little or no formal marking of different moods. So why shouldn't AllNoun?
Expressing determinacy ("the", "an", "some") was always clumsy. Determiners more than any other natural words want to modify the nominal adjacent to them, which is totally contrary to AllNoun. So I ended up with a lot of examples that simply did not express determinacy.
And again, there are plenty of natural languages which get by just fine with no articles and no grammaticalized definiteness system. So why shouldn't AllNoun?
Relative clauses, while they worked, tended to not be linear. Eg, you could express "apple *that I ate*", but AllNoun did not neccessarily make the relation of the relative clause to the matrix clause immediately clear. Eg, one might say: (apple (eat agent:me time:past patient:^)). ...where it isn't clear until the end of the relative clause that it is about the apple. Of course it's not neccessarily so: (apple (eat patient:^ agent:me time:past))
And yet again, there are plenty of natural languages that keep you in suspense with important structural information saved till the end, and plenty of natural languages which don't even have embedded relative clauses at all, instead relying on parataxis. So if AllNoun doesn't handle relatives very well... who cares? That's no reason it can't still be perfectly functional for communication!

Finally, Tom does make a very good point about general semantics:
>  The : also seems to inherit some
>  of the many different uses of the possesive and "of". 

Yes.  In a philosophical sense, I believe that relatedness is basic in
communication.  Any communicator in any language has to, at some basic
underlying level, recognize a relation just because it is named,
without additional underlying mechanism.
David Gil, with his theory of associative semantics in Isolating-Monocategorial-Associative (IMA) language would certainly agree!

So, having sung the praises of AllNoun in contradiction to its author why do I think it's a failure? Because, while AllNoun is not a bad basis for some sort of language, it completely fails to meet its own primary design objectives. From the Introduction to AllNoun:
AllNoun has only one part of speech, which is largely but not entirely
analogous to nouns in other languages. Thus the name AllNoun.

Words are never inflected in AllNoun. It is a 100% isolating language.

 And yet, it fundamentally relies on semantically-significant punctuation. Either the punctuation symbols are inflections, in which case AllNoun is not100% isolating, or they are function words, in which case AllNoun does not, in fact, have only a single part of speech. Tom was not unaware of this objection, and addresses it in the FAQ:

Question: Aren't the punctuation markers non-noun parts of speech? So it's not really _all_ nouns, is it? Answer: I suppose in a very abstract philosophical way, one could consider punctuation a part of speech. If that were the case, then we would say that (say) English had not only nouns, verbs, etc. but also commas, periods, and so forth, for maybe 13 or 14 parts of speech. But generally we don't, in any language. Perhaps because there is no useful sense of a vocabulary of punctuation. In any case, I'm satisfied that AllNoun is nearly as homogeneous as possible. IMO if punctuation markers are an anomaly they are a neccessary one.
But this is a false equivalence; natural languages can be, and were for many, many centuries, written entirely without punctuation, and still retain their meaning. Punctuation serves to make reading easier, through a variety of means such as marking sentence types, more generally marking clause or other constituent boundaries, and giving hints to prosody--but it does not, by itself, have semantic content. AllNoun punctuation, on the other hand, makes up a much larger proportion of the system than it does in any natural language--comparable to a typical distribution of function words--and utterly indispensable to encoding meaning. Indeed, the entirety of AllNoun as a "constructed grammar" consists in the rules for how to use the punctuation! Furthermore, Tom acknowledged that, if AllNoun were to be spoken, the punctuation must be pronounced, and described his proposed pronunciations as "words":
Q: So how are you going to pronounce the punctuation? ( ) : ^ A: As I see it, the best way is to treat groups of one or more punctuators _infixed between part and role_ as pronounceable words, and also include single parentheses. That way multiple infix-pronounciation can efficiently join into a single word, except for free parentheses which could stack. 11 short verbal symbols are required: : ): :( ):( ^: )^: ^:( )^:( ^ )^ ) ( 0 1 2 3 4 5 6 7 8 9 a b The symmetries above should be reflected in the sounds. Here is an unofficial proposal for how it might be sounded: : ): :( ):( ^: )^: ^:( )^:( ^ )^ ) ( awf af oof if aws as oos is awsh ash ath ooth /Of/ /Uf/ /Os/ /Us/ /OS/ /&T/ /&f/ /If/ /&s/ /Is/ /&S/ /UT/ So a sentence like... beat boy^(chase dog^(catch cat^(eat mouse^(leave maid cheese^ table:on. ...might sound like... Beat boy awshooth chase dog awshooth catch cat awshooth eat mouse awshooth leave maid cheese awsh table off on.
So, I rest my case. AllNoun is not, in fact, made entirely of nouns, or any single part of speech. Furthermore, AllNoun does not, in fact, represent the simplest that a grammar could possibly be. It is not "as homogenous as possible", nor are punctuation markers a "necessary" anomaly--and this is easy to prove by construction, if we simply demonstrate the existence of actual monocategorial grammars. These can come in two types, as far as I know to date:
  1. Concatenative / combinator grammars.
  2. IMA grammars
Concatenative grammars are based on combinatory logic--or, more specifically, the parenthesis-free SKA calculus, all of whose morphemes belong to a uniform class of combinators with differing arities. Once again, we must mention Fith--Fith does use more than one formal part of speech, which is a legacy of its origin as an artistic language for fictional aliens, but it is a concatenative language, and as such didn't strictly need more than one part of speech from an engineering point of view. One could argue that differing arities make for different parts of speech--but then, for consistency, we would have to consider intransitive, transitive, and ditransitive English verbs to all be different parts of speech as well, and no one does. Nevertheless, we still have another option: David Gil's IMA grammar, introduced in the paper How Much Grammar Does It Take To Sail A Boat? In such a language, the lexicon is strictly monocategorial, with all words belonging to the syntactic class S (or Sentence), and the grammar is strictly isolating and associational. In other words, the only syntactic rule is that two words or phrases that are next to each other can form a larger constituent, and the two parts of a constituent are interpreted as being associated with each other in some way. Unlike concatenative grammars, which are completely structurally unambiguous, an IMA grammar is almost maximally ambiguous--exactly how to group words into a parse tree, and exactly what kind of association each grouping has, is all left up to context and pragmatics. It may seem that no language could possibly actually function that way, and indeed all human languages do have some function words at the very least; however, contrary to Tom's intuition, they aren't often strictly necessary. They exist because, in the words of William Annis, "people be extra", and as David Gil demonstrates, it is actually possible in some languages to find surprisingly extensive examples of people conversing in pure IMA form, without resorting to any function words. It works because language always exists in a larger context, and people are really good at pragmatic inference.

So, there you go. That's how simple a grammar can really be.

So, what of AllNoun? I have rather complicated feelings towards it. Not until sitting down to write this review did I realize that, actually, there's quite a bit that it does well. If we ignore what its creator wanted from it, and treat it as just another loglang, it's pretty neat. It could serve as a great basis on which to construct other languages with other goals, and it does at least one cool thing that I really think is worth playing with more extensively. But I really want to dislike it. From the first time I ever encountered it, it always struck me as a wrong thing--not the way to do what it was supposed to do, and in the unfortunate position of being one of the standard examples of "verbless language" for a generation of conlangers, and thus promulgating a very flawed view of what a minimalist language can really be. It was not until many years later that I formalized my idea of "spitelanging"--creating conlangs just to prove other people wrong about what a language could or could not be like--but early exposure to AllNoun may very well have been the initial spark that primed me to take up that cause!

Monday, March 28, 2022

Some Thoughts on Machi Languages

Machi and Bogomol are a pair of alien languages created by Terrance Donnelly for the mantis-like inhabitants of Amaterasu, a fictional Earthlike planet of Epsilon Indi (who are also called machi).

The machi's vocal apparatus acts much like a flute, producing 15 distinct notes per individual. The machi have a limited ability to actively adjust their vocal tract length to vary their fundamental frequency (useful when singing to bring different individuals in tune). Nevertheless, the range of fundamental frequencies available to a given individual varies as they grow (and thus as their vocal tract grows), and between individuals--thus, Terrance accurately realized that, just like human speech which must be produceable by a wide range of differing human vocal tracts, and unlike engineered languages like Solresol, machi languages could not be based on specific absolute pitches. Humans recognize phonemes despite varying absolute frequencies by comparing the ratios of multiple resonant formants. Similarly, machi languages use multi-note "syllables", which can be reliably identified by the ratios between successive notes; the eponymous Machi language uses two-note syllables, with the syllable count increased by allowing for staccato vs. legato articulation and the use of rapid triplets in place of individual notes. In contrast, Bogomol words are structured as sets of three-note syllables, with no distinctions of articulation type.

Unfortunately, Terrance seems not to have been able to fully commit himself to the idea that individual syllables could be disambiguated on their own. In the description of Machi, it states
Since a machi can theoretically have any fundamental pitch, it is customary for strangers to precede at least the first few utterances with a "reference" syllable consisting of the speaker's lowest pitch, first in long, and then in short, duration.

And for Bogomol:

Instead of a formal, established syllable, Bogomol machi simply precede their utterances when necessary with a long note at their fundamental tone as reference.

This seems very forced an unnatural to me. Even in the absence of simultaneous frequency ratio information, as when interpreting lexical tone in a register-tone language or when listening to single-formant whistled speech, humans are quite capable of inferring the appropriate baseline for a given speaker from a short series of successive notes, and never have to consciously communicate about that baseline. I would expect that aliens who are physiologically restricted to something roughly like whistled speech would only get even better at that. If we take the in-world premise of the documentation seriously, however, the surface level evidence of this cultural practice of the machi can, I think, be saved, by interpreting the human documentarian as an unreliable narrator who misunderstands the purpose of these simple utterances--they could be simple paralinguistic utterances, e.g. for floor-claiming or introduction, which entirely by chance happen to be useful to English and Russian-speaking investigators for identifying individuals and their individual scales.

The grammars of Machi and Bogomol are quite different--although, the surviving documentation on Bogomol grammar is quite sparse, such that I might call it a "fictional language" rather than a "constructed language", especially since the in-world documentation (which states that research on Bogomol is extremely difficult due to the remote environment in which its speakers live) supports the hypothesis that its grammar may have never been worked out in detail in the first place. What is documented for Bogomol, however, is a very neat conceit; all Bogomol content words are highly polysemous, but, rather than expecting the correct meaning to interpreted from discourse context, Bogomol sentences are always concluded with a "rank" specifier, which retroactively specifies the correct set of meanings to activate. This similar to an idea I've had bouncing around for a while to use pre-posed "domain specifiers" to reduce lexical ambiguity, but by choosing to shift that function to the end, Bogomol universally gains an implicature effect similar to what can be done with Fith by choosing to float arguments on the stack. I don't find it particularly realistic, as I doubt any intelligent species would put up with the necessity for that level of poetic processing in every single utterance, but as a sci-fi conceit it is neat.

Machi is a verbless language using the relational-tagging strategy, using special label-words to identify all of the roles played by all of the participants in an event without needing a distinct verb phrase to specify the event itself. Objects and labels are the main parts of speech (so it is clearly not a monocategorial language), and it also has a heterogenous class of syntactic function words, which it calls "parsers", for coordination, listing, and subordination. Overall, by not going all the way to monocategoriality, and maintaining the lexical-functional distinction, but instead inventing new lexical parts of speech slightly different from what is common in human languages, I feel like Machi does an excellent job of looking like a plausible alien language in grammar as well as phonology.


Some Thoughts... Index

Sunday, March 27, 2022

I Wrote & Illustrated a Book

If you've followed me for a while, you might know that I have been developing a 4-dimensional video game. I last wrote about it back in January of 2017, in this series of posts.

Well, 5 years later it's still not ready for public release, because I have a day job and am not a full-time game developer. However, last year, I realized there was a way to bring the joy of higher-dimensional maze navigation to the masses much more quickly!

The result is this book: Maz3s: Puzzles in Three Dimensions

Just like the video game presents you with a 4D maze which can be viewed only in 3D slices, this book contains 3D mazes presented in the form of 2D slices (which nicely fit on a 2D sheet of paper). Unlike the video game, however, you can in fact see all of the slices at the same time, side-by-side. Originally, I had planned on putting successive slices on successive pages, so that the 3D structure of each maze would actually physically exist in the structure of the book--but I was informed that that would be unnecessarily difficult; perhaps for the sequel!

Solutions to each maze are provided in the form of "driving directions" if you want them--but the book also includes generic instructions and suggestions for higher-dimensional maze solving techniques.

Maz3s is currently available in English for Kindle and in paperback format. Translated editions with instructions and solutions are in progress for Spanish, French, Chinese, and Russian. If you are willing to write a review, or want to help with further translated editions, email me or DM me on Twitter for a free PDF copy.

Praise for Maz3s:

"I was astonished at how fast I started giggling. Delightfully fun concept, skillfully executed. The world needs more Maz3s!"

                --Best-selling author Michaelbrent Collings

"Just let me finish this next one...."

                --My Dad

"My dad stayed up past his bedtime because he was having too much fun with it."

                --Me

Saturday, March 12, 2022

The Trilingual Fiction of Eric James Stone

If you haven't heard of Eric Stone, yet... well, you probably don't read a lot of SF short stories.

Eric is a master of short stories, in a wide range of spec. fic. genressword & sorcery, space opera, alternate history, and more. He's been published in uncountable magazines and anthologies (well, technically, they are countable–finite, even!–but there's enough of them that I didn't feel like counting), and is a winner of the Writers of the Future and Nebula awards. He is currently serializing the epic fantasy novel Heir of the Line on Kindle Vella, and has two single-author short story collections: Rejiggering the Thingamajig and Other Stories and The Humans in the Walls and Other Stories.

We'll be looking at some of those stories, but my real reason for writing this review is Eric's near-future sci-fi thriller Unforgettable.

(As usual, all Amazon links in this article are Affiliate linksso if you feel like giving Eric some money, I'll get a small cut.)

Eric is trilingual in English, Spanish, and Italian, and has a great love for Russian techno/pop music (a fact that I discovered when we carpooled to WorldCon in 2011). With this background, I expected to see more secondary language representation in his stories when I started looking for it, but it's actually surprisingly sparse. His knowledge of Italian  shows up on only one page of one story"Tabloid Reporter to the Stars", which appears in Rejiggering the Thingamajig and Other Stories (to the best of my knowledge, anyway, though perhaps there is a story I missed; if it wasn't in one of those collections, it wasn't easily available on my bookshelf for perusal). And that is only three words!

"If they are gray humanoids with bulging heads, they greet you as an old friend, eh, paesano?"
There was Italian ancestry on my mother's side, so he'd taken to calling me paesano, countryman."

"Paesano" is initially introduced here in a syntactic position which makes it Obviously a term of address. However, in this case, the specific meaning is not Irrelevant, as it used to establish something about the main character's background. Thus, we get an immediate follow-up with semi-diegetic appositive translation. (The diegetic status of comments by a first-person character narrator made to the audience is somewhat unclear.)

He though a moment, then laughed. "Buffo. But what you think? [...]"

Here we have an interjection, which is the classic form of Making it Irrelevant. If you happen you know Italian, or just look it up, it turns out to be a semantically appropriate interjection, but if you don't, it just doesn't matter. 

He nodded. "Interessante."

This usage I would class as a very specialized form of Making it Obvious; specifically, Eric is relying on the semantic and graphological similarity of the Italian "interessante" and the English "interesting" to allow the expected Anglophone reader to infer appropriate meaning. This is a dangerous thing to do, but note that he's got a metaphorical parachute here–the scene makes sense with no dialog on this line at all. He nodded, and we all know what that means. So, we could read this as an instance of Making it Irrelevant if we wanted to. This, we will see, becomes a pattern!

I particularly expected to encounter some Spanish in Eric's alternate history Argentinian Empire stories–"By the Hands of Juan Perón" and "A Member of the Peronista Party", both collected in The Humans in the Walls and Other Stories. But, well... I didn't! Most of the dialog in both of these stories would be in Spanish, necessitating a standard narrative translation convention to make it accessible to the Anglophone audience. As we saw in Graham Bradley's Kill the Beast, it is entirely possible to break the translation convention for short periods in order to Show the underlying diegetic language, which in that case is used to help better establish the setting and the cultural background of the characters. Is this perhaps a matter of what works better for a short story vs. a novel or novella? Probably not, but then, I haven't reviewed a lot of short stories yet, and unlike Eric, I am not a master of the form!  I did ask Eric himself what was going on here, and he does not remember whether he thought about this issue or not when writing those stories–an unfortunately extremely common authorial response, and a large part of the reason for me writing these reviews!

Now, onto the juicy bits. Eric James Stone's only trilingual work, the novel Unforgettable, somewhat surprisingly makes use of the language that he is not actually fluent in, featuring a little bit of transliterated Russian in addition to Spanish (and a smidgen of Portuguese, which I guess actually makes it quadrilingual). The plot does meander through Rome for a bit, so there was opportunity for showing off some Italian, but that opportunity was not taken. A number of other languages are relevant (notably, Farsi), but not explicitly shown in the text.

Conveniently, Eric uses the typical convention of italicizing non-English text, so it was fairly easy to skim through for every example. (Whether or not this is a good convention in other respects is a whole other question–one which I may have to find a way to address at some point.) Our first exposure to not-English comes in this scene:

The warm scent of melted cheese escaped from the top box. "Sesenta y dos euros," I said.
The guard said something in rapid-fire Spanish.
With a shrug, I said, "No hablo bien. Americano."
"Who order pizzas?" asked the guard.

The first bit of Spanish is essentially Irrelevant (or perhaps, an Easter Egg); it's something that a pizza delivery guy would say, but if you don't get it, it won't impact the scene. The second bit is Made Obvious by the context of the surrounding two lines–the guard says something in Spanish, then switches to English in response. We are helped out by the similar-to-English word "Americano", but you can probably figure out that this says something like "I don't speak Spanish good" (or the moral equivalent thereof) even if you've never seen or heard a single bit of Spanish in your life.

Next, we get straight diegetic translation:

Then, pretending to remember something, I added, "Seventh floor. Piso siete."

Semi-diegetic explanation:

"¿Dónde está el baño?" I asked.
That was the most useful phrase in the world, for me at least. I could ask where the bathroom was in fifteen different languages.

 And more semi-diegetic appositive translation:

The sign on the door read Criptografía Cuántica–Quantum Cryptography–so it looked like the CIA's source was right on the money.

Next, there's a bit of scene-based Making It Obvious:

"¡Alto!" said the guard, swinging the gun toward me.
I raised my hands.

What does a guard say while pointing a gun at you that makes you raise your hands? (It's not a literal word-for-word translation, but nothing ever is.) 

And then we get something which looks a lot like relying on graphological familiarity again:

"I warned you not to trust her," I said.
"Silencio," said Carlos.

Now, the last bit of Spanish we encounter turns out to be fairly critical to the plot. If you actually speak Spanish, it could even constitute a spoiler–a pretty extreme case of Easter Egging:

We raced past a sign that read Laboratorio de Entrelazar. I stopped running, forcing Yelena to stop as well. Did that mean laboratory of something-lasers?
[...]
At the far end, a pencil-thick shaft of bright violet light hit a prism and split into two weaker beams that extended into holes in the wall.
"That must be the entrelazar," I said.

This is especially fascinating in light of the cases we have previously seen where Eric seems to rely on graphological similarity to English to get the reader to infer the correct meaning. In this case, the character, who also does not actually speak Spanish (as he explained to the guard when dropping off pizzas), is the one making the graphological/phonological inference, and presenting that "translation" to the reader. But in fact (spoiler alert! highlight to read):

"What did it say on the lab door? The exact words?"
"Laboratorio de Entrelazar."
"Entrelazar? You're sure about that?" His voice was excited.
"Yes. What does it mean?"
"Literally, it means to interlace. [...]"

And if you want the rest of that sentence, you'll have to go buy the book yourself!

I find myself absolutely fascinated by this repeated feature of Eric's secondary language use of relying at least partially on graphological similarity to prompt understanding–and its subversion! This is the first genuinely new technique I have encountered since This Darkness Light, reviewed in episode 3 of this series. It is somewhat dangerous, as it makes additional assumptions about the expectations and background knowledge of the reader than other techniques do, and thus I would never recommend trying it in isolation, but its a neat thing to do when you can back it up with a safety net of other techniques.

There is far less Russian representation. First, we have these repeated uses of diegetic translation:

I said, "Nye dvigat'sya," and added, "Don't move," in case he was bilingual.
[...]
I aimed the gun at him and said "Nye dvigat'sya. Don't move."

And then a bit of Making It Obvious:

"Then I will go and catch a plane. Do svidaniya." She walked to the door and opened it.

You're talking about leaving, opening a door, what's the most likely thing to say? (If you guessed "Goodbye"--or, more literally, "until meeting", give yourself a pat on the back. But not too many pats, because the whole point was to make it not that hard!)

And our last smidgen of secondary language is that bit of Portuguese:

I looked at Luiz. "Qual problema?" I asked in my limited Portuguese. He rose cautiously to look over the bar, then stood up all the way. "Sorry, senhor. I think I see gun. Maybe is camera?"

Again, we have some reliance on graphological similarity, but with the safety net of really being kind of Irrelevant--if the protagonist said nothing to Luiz, and if Luiz left out the vocative, no critical meaning would be lost.

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

P. S. The title of this post is kind of a lie, but only because of my own technical terminology; a proper trilingual work would be one which does not need to put any work into integrating multiple secondary languages, because it is in fact written for a trilingual audience and has no "secondary" languages. This is, in fact, closely connected with the issue of whether or not it is a good idea to identify secondary-language content with italics.

It's been a while since my last post in this series, and I blame Ken Liu entirely for that. I had intended to do reviews of some of his short stories, but some of them are just so dang depressing that it kinda put me off the whole project for a while. However, I have thoughts on A Desolation Called Peace (the sequel to A Memory Called Empire), The Termite Queen, and a bunch of Ken Liu stories in the pipeline, so stay tuned!