Gliese 1337: conscript

Showing posts with label conscript. Show all posts

Sunday, October 20, 2024

On the Tjugem Alphabet & Font

This Bluesky thread with Howard Tayler reminded me that, although I posted progress updates about it on Twitter back in the day, I never did a comprehensive write-up on how the thing works.

A good place to start is this Reddit comment on Toki Suli.Yeah, it's not Tjugem, but phonetically it works the same way. Quote:

in the WAV files, the 'm' sounds seem to be going up rather than down, such as with "mi", even though the "m" is supposed to be grave. sharp and acute sounds seem to go down rather than up, such as in "tu".
is the linguistic term for "downward" vs "upward" the opposite of what i'd expect from a western music theory perspective? or am i maybe missing something as i'm listening to the files?

Yes, Reddit user, you were missing something! Because in the phonetics of human whistle registers, "grave" and "acute" are positions. not motions. So, if you move from a vowel to a grave consonant, the formant will go down in pitch--from a middle-pitch vowel locus to a low-pitch consonant locus. But when going from a grave consonant to a vowel, pitch will go up--from a low-pitch consonant locus to a middle-pitch vowel locus. An "m" in between two vowels willl be realized by a down-then-up formant motion, while a "t" between two vowels will be realized by an up-then-down motion.

Now, because whistled speech only has a single formant, it turns out to be not-unreasonable to write whistled speech as an image of the formant path on a spectrogram. You can just write a continuous line with a pen! Or, almost. There are some details--like amplitude variation--that are lost if you try to write with a ballpoint, and still difficult to get right if you write with a wide-tip marker or fountain pen. Thus, a few extra embellishments and decorations are useful, but that is the basic concept: each letter is just the shape that that letter makes on a spectrogram when pronounced. And with just that background, you should be able to start to make sense of this chart of Tjugem letters, as they would be written on lined paper:

The correspondence between Tjugem glyphs and the standard romanization is as follows:

Keep in mind, however, that the actual phonemes are whistles--not sounds that are representable with the IPA, despite the fact that the romanization is designed to be pronounceable "normally" if you really want to. And for the sake of space, only the allographs for one vowel environment are shown for each consonant. The G glyph is not so much a "glyph" as a lack of one, which is why it does not show up in the first image; acoustically, the phoneme is just a reduction in the amplitude of a vowel, represented by a break in the line. Thus, any line termination could be interpreted as a G. That necessitated the introduction of the line termination glyphs, which have no phonetic value but just indicate that a word ends with no phonemic consonant. The above-line vs. below-line variants of the Q glyph are chosen to visually balance what comes before or after them. Additionally, the "schwa" vowel (romanized as "E") is not represented by any specific glyph. The existence of a schwa sound in the first place is an unavoidable artifact of the fact that transitioning between certain consonants requires moving through the vowel space, but which vowel loci end up being hit isn't actually important. So, in the Tjugem script, the schwa just turns into whatever stroke happens to make the simplest connection between adjacent consonants.

You shouldn't be expected to always be writing on lined paper, which explains the extra lines--a mark above or below a vowel segment tells you whether it is a high vowel or a low vowel, for those curves which could be ambiguous. And the circular embellishments help to distinguish manner of articulation for different consonants, which have the same spectral shape but different amplitude curves, which would otherwise have to be indicated by varying darkness or line weight. But note in particular that every consonant comes in a pair of mirror-symmetric glyphs: one moving from the vowel space to the consonant locus, and one moving from the consonant locus to the vowel space. And there are three different strokes for each half-consonant depending on which vowel is next to it! Making for a total of six different strokes for every consonant, because the actual spectral shapes of consonants change depending on their environment! It's allophony directly mirrored in allography.

This makes creating a font for Tjugem rather... complicated. Sure, we could assign every allograph to a different codepoint, but that would be very inconvenient to use. It would be nice if we could just type out a sequence of phonemes, one keystroke per phoneme, and have the font take care of the allographic variation for us! Is that sort of thing possible? Yes! Yes, it is!

The individual letter forms get assigned to a list of display symbols, specifying every possible consonant/vowel pairing:

# i_t i_d i_n i_k i_g i_q i_p i_b i_m
# a_t a_d a_n w_a_k j_a_k a_g w_a_q j_a_q a_p a_b a_m
# u_t u_d u_n u_k u_g u_q u_p u_b u_m
# t_i d_i n_i k_i g_i q_i p_i b_i m_i
# t_a d_a n_a k_a g_a q_a p_a b_a m_a
# t_u d_u n_u k_u g_u q_u p_u b_u m_u
# i_i j_a j_u_a j_u
# u_u w_a w_i_a w_i

and the slots for the romanized letters that we actually type out (a b d e g i j k m n p q t u w) are left blank. Contextual ligatures are then used to replace the sequence of input phonemes with an expanded sequence of intermediate initial, final, and transitional symbols, which are then finally substituted by the appropriate display symbols, which are then used to look up the correct alloglyphs. Then, it we update the boring straight-ruled glyph set with a slanted, more flowy-looking version, we can get a calligraphic font slightly reminiscent of Nastaliq, where lines can overlap each other because the ornamentation disambiguates; the Tjugem Tadpole script:

Tuesday, December 7, 2021

The Transgalactic Guide to Solar System M-17

I have been meaning to review the next three books in the Steerswoman series for months now, and continually failing to actually do so. And when I started reading The Transgalactic Guide, I had no idea that it would present me with any reason to review it here. But then it went and had alien language content (and what was I supposed to do? Not blog about that!?), so here we go...

The Transgalactic Guide to Solar System M-17 (beware the Amazon Affiliate link!) by Jeff Rovin is satirical science-fiction travel guide, supposedly published by the Transgalactic touring company to describe the tourist attractions and accommodations available in the eponymous solar system M-17 (but actually published by the Perigee imprint of the Putnam Publishing Group in 1981). As science fiction, it is decidedly archaic; the author doesn't really give a crap about physical, chemical, or ecological plausibility, such that aside from the genre trappings of spaceships and alien planets, it's really more a work of fantasy--think C.S. Lewis's Space trilogy with less plot, less allegory, and more description of the bad science. Rovin makes repeated use of the "make them alien by not giving them eyes" trope (which Wayne Barlow has applied to much greater effect), which is sometimes justified and sometimes... not so much. Nevertheless, for the modern author it may provide a decent source of inspiration for weird and interesting environments and creatures, if you are willing to do the work to clean them up a bit for modern audiences.

But the reason I am bothering to review it here is that each of the 5 worlds of M-17 has at least one, and sometimes several, native alien languages which are represented in the text, along with brief tourist glossaries. As conlangs go, they are also.. not great, although there are some neat ideas. There is some decent effort put into the Alladis logography (which is supposedly tactile in nature), and the basic idea of an Oleran scent-based language (a concept which is developed in more detail in the Semiosis duology).

Excerpts of alien languages are frequently integrated into the text to refer to alien concepts or proper nouns. For the most part, a very straightforward translation strategy, using appositives or parenthesized translations, is employed--and that's really all you would expect from something presenting itself in the style of a travel guide! However, I found it notable that chapter 3, on the planet Morana, actually attempts to Teach The Reader, employing italicized alien words untranslated to refer to objects and locations after they are first introduced; the attempt is not particularly skillful, but it is there!

If you liked this post, please consider making a small donation.

The Linguistically Interesting Media Index

Monday, September 5, 2016

General Thoughts on Writing Signs

In my last post, I have begun to develop an essentially featural writing system for an as-yet undeveloped sign language. Featural writing systems are extremely rare among natural oral languages, but every system for writing sign languages that I know of is featural in some way. So, why is this?

Let's examine some of the possible alternatives. The simplest way to write sign languages, for a certain value of "simple", would be to use logograms. Just as the logograms used to write, e.g., Mandarin, do not necessarily have any connection whatsoever to the way the words of the language are pronounced, logograms for a signed language need not have any systematic relation to how words are signed. Thus, the fact that the language's primary modality is signing becomes irrelevant, and a signed language can be just as "easy" to write as Chinese is.

However, while logograms would be perfectly good as a native writing system for communication between people who already know a sign language and the logographic writing system that goes with it, they are next to useless for documenting a language that nobody speaks yet, or for teaching a language to a non-native learner. For that, you need some additional system to describe how words are actually produced, whether they are spoken orally or signed manually.

Next, we might consider something like an alphabet or a syllabary. (si5s calls itself a "digibet".) In that case, we need to decide what level of abstraction in the sign language we want to assign to a symbol in the writing system. If we want linearity in the writing system to exactly match linearity in the primary language, as it does with an ideal alphabet, then we need one symbol for every combination of handshape, place, and motion, since those all occur simultaneously. Unfortunately, that would result in thousands of symbols, with most words being one or two symbols long, which is really no different from the logography option. So, we need to go smaller. Perhaps we can divide different aspects of a sign into categories like "consonants" and "vowels", or "onsets", "nucleii", and "codas". If we assign one symbol to each handshape, place, and motion... well, we have a lot of symbols, more than a typical alphabet and probably more than a typical syllabary, but far fewer than a logography. In exchange for that, we either have to pick an arbitrary order for the symbols in one "sign-syllable", or else pack them into syllable blocks like Hangul or relegate some of them to diacritic status, and get something like an abugida. Stokoe notation is in that last category. Syllable blocks seem like a pretty good choice for a native writing system, but that won't work for an easily-typable romanization. For that, we're stuck with the artificially linearized options, which is also the approach taken by systems like ASL-phabet.

For a sign language with an intentionally minimalized cheremic inventory, that level of descriptiveness would be quite sufficient. But, there aren't a whole lot of characters you can type easily on a standard English keyboard (and even fewer if you don't want the result to look like crap and be very confusing- parentheses should not be used for -emic value!) Thus, we need to go down to an even lower level of abstraction, and that means going at least partly featural.

Native sign writing systems have a different pressure on them for featuralism: signing and writing are both visual media, which makes possible a level of iconography unavailable to writing systems for oral languages. In the worst case, this leads to awkward, almost pictographic systems like long-hand SignWriting, which is only one step away from just drawing pictures of people signing. But even a more evolved, schematic, abstract system might as well hang on to featural elements for historical and pedagogical reasons.

A System for Coding Handshapes

Sign languages are cool, and conlangs are cool, but there is a serious dearth of constructed sign languages. Or at least, there is a dearth of accessible documentation on constructed sign languages, and for all practical purposes that's the same thing. The only one I know of off-hand is KNSL. Thus, I want to create one.

Part of the problem is that it's just so hard to write sign languages. I, for one, cannot work on a language without having a way to type it first. Not all conlangers work the same way, but even if you can create an unwritten language, the complexity of documenting it (via illustration or video recording) would make it much more difficult to tell other conlangers that you have done so. The advantages of being able to type the language on a standard English keyboard are such that, if I am going to work on a constructed sign language, developing a good romanization system is absolutely critical. If necessary, it is even worth bending the language itself in order to make it easier to write.

There are quite a few existing systems for writing sign, like SLIPA, but just as you don't write English in IPA, it seems important in a developing a new language to come up with a writing system that is well adapted to the phonology/cherology of that specific language.

It occurred to me that binary finger counting makes use of a whole lot of interesting handshapes, and conveniently maps them onto numbers.* Diacritics or multigraphs can then be added to indicate things like whether fingers are relaxed or extended, or whether an unextended thumb is inside or outside any "down" fingers, which don't make any difference to the counting system.

So, I can write down basic handshapes just by using numbers from 0-31, or 0-15, depending on whether or not the thumb is included. There are reasons for and against that decision; including the thumb means the numbers would correspond directly to traditional finger-counting values, which is nice; but, it also results in a lot of potential diacritics / multigraphs not making sense with certain numbers, which has some aesthetic disappeal. On the other hand, lots of potential diacritics wouldn't make sense with certain numbers anyway, so maybe that doesn't matter. On the gripping hand, only using 0-15 and relegating all thumb information to diacritics / multigraphs means I can get away with using single-digit hexadecimal numerals (0-F), which is really convenient.

This page describing an orthography for ASL provides a convenient list of ASL handshapes with pictures and names that we can use for examples. Using hexadecimal numbering for the finger positions, and ignoring the thumb, the basic ASL handshapes end up getting coded as follows:

1: 1
3: 3
4: F
5: F
8: D
A: 0
B: F
C: F
D: 1
E: 0
F: E
G: 1
I: 8

K: 3
L: 1
M: 0
N: 0
O: 0
R: 3
S: 0
T: 0
U: 3
V: 3
W: 7
X: 1
Y: 8

You'll notice that a lot of ASL signs end up coded the same way; e.g., A, M, N, S, and T all come out as 0 in finger-counting notation. Some of that is going to be eliminated when we add a way to indicate thumb positions; if we counted 0-V (32 symbols) instead of 0-F (16), including the thumb as a binary digit, the initial ambiguity would be much smaller. Some of that is expected, and will remain- it just means that ASL makes some cheremic distinctions that don't matter in this new system. That's fine, because this isn't for ASL; we're just using pictures of ASL as examples because they are convenient. However, si5s, another writing system for ASL, got me thinking of using diacritics to indicate additional handshape distinctions beyond just what the finger-counting notation can handle. Typing diacritics on numbers is difficult, but I can easily add multigraphs to provide more information about finger arrangement in addition to thumb positioning.

First off, there are thumb position diacritics. Since one of the thumb positions is "extended", indicating an odd number, these are only applicable to even numbers, where the thumb position is something else (this would change if I went to 0-F notation instead, excluding the thumb). For these, we've got:

p- thumb touching the tips (or 'p'oints) of the "up" fingers
d- thumb touching the tips of the "down" fingers (as in ASL 8, D, F, and O)
s- thumb held along the side of the hand (as in ASL A)
u- thumb under any "down" fingers, or along the palm (as in ASL 4)
b- thumb between any "down" fingers (as in ASL N, M, and T)
e- thumb extended to the side (as in ASL 3, 5, C, G, L, and Y)

The default is thumb on top of any "down" fingers, as in ASL 1, I, R, S, U, V, W, and X, or across the palm.
The hand position of ASL E is ambiguous between thumb under and thumb over- diacritic 'u' or the default, unmarked state.

Note that 'u' and 'b' are indistinguishable from the default for position F, since there aren't any 'down 'fingers. Position 'b' can be interpreted as "next to the down finger" in cases where there is only one finger down (positions 7, B, D, and E).

Next, the "up" fingers can be curled or not, and spread or not, indicated respectively by a 'c' and a 'v'. Position 'v' of course does not make sense for positions without two adjacent fingers up (0, 1, 2, 4, 5, 8, 9, and A- half of the total!), and 'c' doesn't make sense for 0.

This still does not capture all of the variation present in ASL signs, but it does capture a lot, and, as previously noted, the bits that are missed don't really matter since this is not supposed to be a system for coding ASL!

The ASL mapping list with multigraphs added looks like this:

1: 1
3: 3ve
4: Fv
5: Fve
8: Dd
A: 0s
B: Fu
C: Fce
D: 1d
E: -
F: Evd
G: 1e
I: 8

K: -
L: 1e
M: 0b
N: 0b
O: 0d or Fp
R: 3
S: 0
T: 0b
U: 3
V: 3v
W: 7v
X: 1c
Y: 8e

And we can code some additional handshapes from the "blended" list:

3C: 3vce

4C: Fvc

5C: Fvce

78: 9

AG: 1p

AL: 0e

etc.

The crossed fingers of the ASL R are not representable in this compositional system, but I like that handshape, so we can add an extra basic symbol X to the finger-counting 0-F, to which all of the thumb position multigraphs or diacritic can be added.

To complete a notation system for a full sign language, I'd need to add a way of encoding place, orientation, and two kinds of motion- gross motion, and fine motion, where fine motion is stuff from the wrist down. I'll address those in later posts, but this feels like a pretty darn good start which already provides hundreds of basic "syllable nucleii" to start building sign words from.

* Of course, other finger-counting systems (like chisanbop, perhaps) could also be used to come up with cheremic inventories and coding systems for them as well.

Saturday, March 10, 2012

Amateur Linguist Teaches Elvish to Ethiopian Tribe

OK, yeah, not really...

But you just know that would be the headline somewhere if the mainstream press decided to cover this.

The story starts with Paul Bennet planning to go to Ethiopia; specifically, to a region where the native language is Hamer. Hamer, it turns out, has no writing system.

Being an amateur linguist, Paul decided to do what any of us would do in the same situation: design an orthography for them! Hey, if it worked for St. Cyril....

The neighboring and related language Ge'ez does have a writing system, so the initial plan was to borrow it, with the idea that it would do double-duty in helping Hamer speakers record their own language and give them a leg up on literacy anywhere else in Ethiopia. The work in progress can be seen here.

In discussion on the CONLANG mailing list, however, someone just had to notice that the phoneme inventory of Hamer just happens to fit very neatly in the Tengwar grid. There are existing Tengwar modes for writing English and Latin in the Elvish characters, so why the heck not?

The probability of an Ethiopian tribe actually adopting Tolkein's Elvish characters as the basis of their writing system is rather small, especially since the Ge'ez characters are already officially included in Unicode while Tengwar just have a proposal with codepoints subject to change.

But it would be pretty awesome, wouldn't it?