For example, all the phonemes are a minimum distance away from each other that guarantees people with slightly less acute hearing can understand it when spoken under slightly adverse conditions. In-between phonemes that are possible to pronounce, but potentially difficult to hear correctly, are then reserved for constructing 'conlangs', constructed languages, many of which use 'Baseline' as a baseline but add new short words using the expanded phoneme set.
That seems... not to be super well supported by the data? Like, it appears to contain all three of s/θ/f, which are easily confusable in low-fidelity audio environments. (It's actually rather difficult to figure out what the objective perceptual distance between different phones is, independent of biases induced by a test subjects pre-existing knowledge of any specific language; the closest I could find to that kind of research is the planning that went into designing the NATO Phonetic Alphabet--but even that is optimized to avoid confusion by speakers of particular popular languages, which is overconstrained for our purposes here. However, when native speakers of some language--like English--do in fact confuse phonemes of their own language sometimes, that seems like strong evidence that the underlying phones are actually pretty close!)
However, fortunately for us, the character who speaks that paragraph is not specifically trained in linguistics, and may not know exactly what he's talking about--and there are other constraints on the design of Baseline which may conflict with that one, such that the optimal design for Baseline phonology is not one which optimizes distinctness-of-phonemes in isolation. In particular, Baseline speakers seem to have a strong sense of syllables as the most salient components of word structure, and count of syllables as the obvious way to measure utterance length; and, they value having short words and short utterances for concepts that are common in their culture. Thus, we can also expect to have a large phonemic inventory to allow for the maximum number of individual syllables, maximum information per syllable, and maximal number of short words, which is in direct conflict with keeping individual phones as far apart from each other in acoustic space as possible.By skimming all of the "Planecrash" stories (about dath ilani people who are in a plane crash, and get isekaied to various other fantasy worlds to have culture shock in), I have extracted a total of five actual Baseline words-that-are-not-names:
And then a bunch of personal names:
Most names have two syllables; a few (4 in this list) have 3, or maybe 4. "Bahb" is the only one-syllable names, but I don't think that is actually representative of any real name used for a dath ilani person, as it appears in a context where it is clearly meant to be transcription of the English name "Bob", as part of the set "Alis, Bahb, and Karal", standing in for "Alice, Bob, and Carol", the standard placeholder names for participants in a cryptographic protocol. "Bohob" seems to be an alternative adaptation of "Bob" that fits Baseline naming patterns better. In combination with "Bahdhi", though, the orthographic possibility of "Bahb" suggests the existence of <a> and <ah> as separate vowels. If <h> can only occur in onset positions, there would be minimal ambiguity introduced in the Anglicization by adopting that convention. <Illeia> could be a four-syllable name, but we have a negative example in that <Athpechya> is presented as a dath ilani equivalent for a non-Baseline 4-syllable name, which has been cut down to 3 syllables (assuming <y> is to be interpreted as a consonant). Thus, I am inclined to interpret that intervocalic <i> as a transcriptional variant of <y>, much like <c> is a transcriptional variant of <k>, rather than as a whole extra syllable.
From this data, I conclude that Baseline has a 6 vowel system:
|Hi||/i/ <i>||/u/ <u>|
|Mid||/ɛ/ <e>||/o/ <o>|
|Hi||/æ/ <a>||/ɑ/ <ah>|
with three degrees of height, a binary front-back distinction, and rounding in the back non-low vowels.
I would like the <e> vowel to be a little higher, to maximize contrast with /æ/, but we've got an explicit negative example where the dath ilani Merrin struggles to pronounce the French name "Félix", which
p - /p/
d - /d/
k/c - /k/
f - /f/
s - /s/
th - /θ/
sh - /ʃ/
h - /h/
ts - /t͡s/
l - /L/ (for maximal distinctiveness from /j/, I'm assuming this to be universally a dark/velarized l, rather than copying English's light/dark allophony; the presence of this and /v/ justify the lack of /w/)
r - /r/ (for maximal distinctiveness from /l/, I'll assume this to be a tap/trill even though that's not the most natural reading for most Anglophones).
y - /j/
m - /m/
n - /n/
The lack of /g/ is not typologically odd, but the lack of isolated /t/ (assuming that <ts> is, in fact, an affricate, which seems reasonable given the existence of <ch> and the lack of other /Cs/ clusters in onset positions) in the presence of /p/ and /d/ is a bizarre gap. On that basis, and because there seems to be a fairly robust voicing distinction in the affricates, I infer that there should also be /t/ and /g/ phonemes, even though they happen to be missing from this dataset. Additionally, I feel we ought to fill in unattested */ʒ/, */d͡z/, and */d͡ʒ/, on the basis that, having decided that voicing was usefully distinctive for all other obstruents, the in-world engineers of Baseline wouldn't have just left those specific place/manner combinations unused!
|Plosive||p b||t d||k g||(ʔ)|
|Fricative||f v||θ ð||s z||ʃ ʒ||h|
|Affricate||t͡s d͡z||t͡ʃ d͡ʒ|
The fricatives are a little bit weird; I probably would have dropped θ/ð and h in exchange for x/ɣ to maximize distinctiveness and get slightly better correspondence between fricative and plosive series. But perhaps the in-world justification is that they just Wanted More Options for making more short words, and the possibility of x/h confusion pushed for pulling in the dental fricatives instead, despite the labial/dental/alveolar confusability. And for the plosives, I think it would make sense if all of the voiceless plosives were also secondarily aspirated--we've only got two plosive series, so we might as well make them as phonetically distinctive as possible!
- Syllables have the form (C1)V((r)C2)(s|z)), where:
- C1 is any consonant.
- C2 is any consonant except /h/
- The optional /r/ cannot occur before another /r/ in the C2 slot.
- The optional final sibilant cannot occur after another sibilant in the C2 slot.
- /s/ cannot occur after voiced stops/fricatives
- /z/ cannot occur-- after voiceless stops/fricatives
- A syllable cannot end with the same consonant with which the next syllable starts (nor should t/d precede t͡s/d͡z or t͡ʃ/d͡ʒ, respectively).
- Vowels cannot occur in hiatus, and l and r cannot in hiatus with themselves, with extra-syllabic glottal stops being inserted for repair.
Making codas more complex than onsets is just weird, and I cannot justify that in-world at all, but that seems to be where the available data is pointing. Maybe it allows sub-syllable-level suffixing/infixing morphology?
We have no data on tone or stress, so I assume that by default that Baseline has some sort of non-lexical, predictable stress system--e.g., strict initial stress. However, based on character's commenting on how many syllables are required to say something in various languages, and treating syllable count as a reliable measure of how long an utterance is / how much effort it takes to express something, I infer that the language is syllable-timed, rather than stress- or mora-timed.
Making another default assumption that the maximum onset principle for syllabification applies, the attested syllables are as follows:
i il im
beth bi bo
ka kar kel ko
lan le lim lis lorm
ma mel mer mi
ral ran rez rin run
thal tham thel thin thor