Gliese 1337

Saturday, October 15, 2016

A Phonology Without Phonemic Consonants

There are languages that have been analyzed as lacking phonemic vowels, with all vowels being completely predictable from the consonant string. That doesn't mean that they aren't pronounced with vowels, merely that vowels serve no contrastive function.

So, how about a phonology that does the exact opposite: packs all of the contrast into underlying phonological vowels, with phonetic consonants being completely predictable from the vowel string?

Now, there are lots of ways to do this in an ad-hoc manner. Say, an /i/ and an /a/ always have a [t] inserted between them, while a [u] and an [a] get a [ʒ], just because. But I'm gonna look at something that is potentially naturalistic, where the choice of consonant phones and where they get inserted is a reasonable consequence of vocalic features. There are probably lots of ways to do that, too, but here's just one of them:

For simplicity, we'll start with just /i/, /a/, /u/ as the basic, plain vowels. You could use more, but these are sufficient to demonstrate the rules I have in mind, which I will describe in terms of generic vowel features such that one could add more basic vowels and already know exactly how they would behave. Each of these can come in plain, breathy, rhotic, and nasal varieties, or any combination thereof; i.e., one could have a breathy nasal rhotic vowel, with all three extra qualities at once. I'll assume that any vowel can have any combination of these qualities, and there are no phonotactic restrictions on the underlying vowel string (although certain combinations might require sticking in a syllable boundary to break them up). Changing either of those assumptions could introduce further structural interestingness in other similar phonologies.

All of these vowels can also be long or short, with syllables being maximally 3 morae; thus, one can have one short vowel, two short vowels, three short vowels, one long vowel, or one short vowel and one long vowel per syllable, where all of the vowels in a single syllable must share all of their voicing, rhotic, and nasal features. For consonant-induction purposes a "long syllable" is any syllable containing a long vowel, or a triphthong (long, long+short, short+long, and short+short+short, but not short+short). Ignoring length, this results in 8 possible versions of every basic vowel, which can be transcribed as V, hV, Vn, Vr, hVn, hVr, Vrn, and hVrn. That results in a total of 24 phonemes:

i /i/ a /a/ u /u/
hi /i̤/ ha /a̤/ hu /ṳ/
in /ĩ/ an /ã/ un /ũ/
ir /i˞/ ar /a˞/ ur /u˞/
hin /ĩ̤/ han /ã̤/ hun /ṳ̃/
hir /i̤˞/ har /a̤˞/ hur /ṳ˞/
irn /ĩ˞/ arn /ã˞/ urn /ũ˞/
hirn /ĩ̤˞/ harn /ã̤˞/ hurn /ṳ̃˞/

Or 48, if we count the long versions as separate phonemes.

Tautosyllabic vowels can turn into glides. An /i/ becomes [j], while short /u/ turns into [w]. In long syllables, medial vowels are glided first, such that, e.g., /uia/ becomes [uja], not [wia]. Sequences of /iii/ become [ji:] and /uuu/ become [wu:]' sequences of /aaa/ must be broken into two syllables, either [a:.a] or [a.a:]. Since all vowels in a syllable must have matching features, we can romanize these by grouping the vowels together within one set of voice/breathy/nasal letters. E.g., huin /ṳ̃͡ĩ̤/ [w̤̃ĩ̤], or iar /i˞͡a˞/ [ja˞].

That provides us with two phonetic consonants so far: /j/ and /w/.

Other consonants are induced when transitioning from a vowel that has a certain quality to one that doesn't, or at syllable or morpheme boundaries.

Breathy-voiced vowels basically induce an onset [h] (hence the romanization convention) morpheme-initially or after a non-breathy vowel, but in certain situations this can be mutated into aspiration or lenition of a previous phonetic consonant instead (see below). (A reasonable phonotactic restriction on the vowel string might be that plain vowels can't follow breathy vowels in the same morpheme, just because I find it difficult to perceive the transition from breathy to plain voice. But I'll ignore that for now.)

Plain vowels induce plain coda stops. High front vowels (/i/) induce [t], high back vowels (/u/) induce [k], and low vowels (/a/) induce [ʔ]. These all become aspirated when followed by unstressed breathy voice, absorbing the [h] induced by the following vowel unless it crosses a morpheme boundary; if the breathy syllable is stressed, then [t] becomes [t͡s] and [k] becomes [x], again replacing the [h] induced by the following vowel unless it crosses a morpheme boundary, while [ʔ] is unaffected.

Now, we have 8 more phonetic consonants: [t], [tʰ], [t͡s], [k], [kʰ], [x], [ʔ] and [h].

Since plain vowels already lack other properties, so there are none to lose, these consonant sounds will not occur every time there is a transition to a different class of vowel. Instead, they will only occur at non-utterance-final morpheme boundaries if the syllable is short, and at any non-word-final syllable boundary if the syllable is long; additionally, the consonants will be geminated in long syllables, stealing duration from the vowel. Thus, something like <uuha> /u:a̤/ or <iiha> would be rendered phonetically as [uk:ʰa̤]/[ukxa̤] or [it:ʰa̤]/[itt͡sa̤] respectively, depending on stress placement, and assuming it's monomorphemic.

Nasal vowels induce nasal stops, with non-back vowels (/i/ and /a/) inducing a coda [m] at the end of a word and [n] in other positions, and back vowels (/u/) inducing /ŋ/. Nasalization also interacts with syllable length; like the induced plain stops, induced [n] and [ŋ] will steal length from a long nucleus and become geminated.

Successive non-breathy nasal vowels in different syllables induce an epenthetic [ʁ]. Why? Because that's what I discovered I naturally do when trying to pronounce them! I don't know what exactly the articulatory phonetic justification, but there must be one! Thus, something like <unan> /ũã/ comes out as [ũʁãm], while monosyllabic long /ĩĩ/ (romanized <iin>) is distinguished from disyllabic /ĩ.ĩ/ (romanized <inin>) by the phonetic realizations [ĩ:n] and [ĩʁĩn], respectively. The results of a genuinely rhotic initial vowel (<irnin> /ĩ˞.ĩ/) look different still, as described below.

So far, that adds another 4 phonetic consonants ([m], [n], [ŋ], and [ʁ]), for a total of 16.

Rhotic vowels get a little complicated, due to interaction with other qualities. With combined rhotic+nasal vowels, the coda consonants are ordered R-N.
High front non-nasal vowels (/i/) induce [ɾ] word-medially, and [ɹ] word finally or with nasalization. Low non-back non-nasal vowels (/a/) induce [ɹ], which is ambisyllabic word-medially unless it is produced by a non-breathy vowel followed by a breathy vowel (with an induced [h] onset pushing the ambisyllabic [ɹ] out of the way). Nasal mid vowels induce [ʐ] or [ʒ] in free variation. Back vowels (/u/) induce [ʀ] or [ʁ] in free variation, which like [ɹ] is ambisyllabic unless followed an induced onset [h] or paired with nasalization.

That's another 3-ish phonetic consonants, leaving us with a total inventory looking something like this:

Stops: [t], [tʰ], [k], [kʰ], [ʔ]
Fricatives/affricates: [h], [x], [ts], [ʐ]/[ʒ]
Nasals: [m], [n], [ŋ]
Rhotics: [ʀ]/[ʁ], [ɹ], [ɾ]
Glides: [w], [j]

which really doesn't look that bad! It's sorta-kinda Iroquoian-looking, if you squint, with extra rhotics. Several natural languages get along with fewer consonant phones than that. But, it can still be written mostly-unambiguously (save for specifying morpheme/syllable boundaries) purely as a string of vowels from a 24-character all-vowel alphabet; or perhaps a featural script with three basic vowels and diacritics for the various combinations of nasal, rhotic, and breathy features, and maybe length.

Of course, there are other possible re-analyses of words generated this way. The romanization scheme already embodies one: a three-vowel, three-consonant analysis, where the consonants and vowels have some fairly complex interactions generating a lot of allophones of each, and some particular strange distributional restrictions (like, /h/ is the only consonant that can start a word!) A native speaker of such a language might, however, go for a four-consonant analyses, adding /t/ → [t], [tʰ], [k], [kʰ], [ʔ], [ts], [x]; or even breaking things down further, with no realization of the significance of the extremely limited distribution of these sounds. Speakers might also group things like /t/ →
[t], [tʰ], [k], [kʰ], [ʔ], [ts]; /h/ → [h], [x]; /z/ → [ʐ], [ʒ]; /r/ → [ʀ], [ʁ], [ɹ], [ɾ]; based on perceptual similarity, thus confusing the disparate origins of [h] vs. [x] and masking the commonality of [ʐ] and [ʒ] with the rhotics.

If one were to start with something like this and then evolve it historically, one could easily get a more "normal"-looking inventory (e.g., maybe that tap [ɾ] ends up turning into an [l], and maybe [t͡s] simplifies to plain [s]) with a steadily more opaque relationship to the underlying vocalic features, despite still being regularly predictable from them.

If one were to do an intrafictional description of the language, such as might be written by native linguists, I would be somewhat inclined to go with one of these alternative analyses as the standard native conception, and then dive in to the argument for why it should be re-analyzed as consisting purely of underlying vowels instead. Although, it would be a shame to miss out on the opportunity for a native writing system consisting of a 24-vowel alphabet.

Friday, September 9, 2016

Thoughts on Sign Language Design

Previously: General Thoughts on Writing Signs and A System for Coding Handshapes

One of the problems with designing a constructed sign language is that so little is actually known about sign languages compared to oral languages. For many conlangers and projects (e.g., sign engelangs or loglangs, etc.), this isn't really a big deal, but it is a serious problem for the aspiring naturalistic con-signer, ascribing to the diachronic or statistical naturalism schools.

I have, however, come across one account of the historical development of modern ASL from Old French Sign Language. While it is hard to say if the same trends evidenced here would generalize to all sign languages, they do seem pretty reasonable, and provide a good place for con-signers to start. Additionally, it turns out that many of these diachronic tendencies mesh rather well with the goal of designing a language with ease of writing in mind.

Unsurprisingly, despite the relative ease of visual iconicity in a visual language, actual iconicity seems to disappear pretty darn easily. But I, at least, find it difficult to come up with totally arbitrary signs for things - much more difficult than it is to make up spoken words - and the Diachronic Method is generally considered a good thing anyway, so knowing exactly how iconicity is eroded should allow a con-signer to start with making up iconic proto-signs, and then artificially evolving them into non-iconic "modern" signs.

The general trends in this account of ASL evolution can be summed up as follows:

Signs that require interaction with the environment (like touching a table top) either disappear entirely, replaced by something else, or simplify to avoid the need for props. That seems pretty obvious.
Signs that require the use of body parts other than the hands for lexical (as opposed to grammatical) content tend to simplify to eliminate non-manual components. E.g., facial expressions may indicate grammatical information like mood, but won't change the basic meaning of a word.
Signs tend to move into more restricted spaces; specifically, around the face, and within the space around the body that is easily reached while still keeping the elbows close in. This is essentially a matter of improving ease of articulation.
Signs that occur around the head and face tend to move to one side, while signs occurring in front of the torso tend to centralize. This makes sense for keeping the face in view, especially if facial expressions are grammatically significant.
Two-handed signs around the head and face tend to become one-handed signs performed on just one side. In contrast, one-handed signs performed in front of the torso tend to become symmetrical two-handed signs.
Asymmetrical two-handed signs tend to undergo assimilation in hand shape and motion, so that there is only one hand shape or motion specified specified for the whole sign, though not necessarily place or contact. This is a matter of increasing ease of articulation (reduction how much different stuff you have to do with each hand), as well as increased signalling redundancy.
Signs that involve multiple sequential motions or points of contact "smooth out".
There is an analog to "sound symbolism", where, if a large group of signs in a similar semantic domain happen to share a particular articulatory feature (similar shape, similar motion, etc.), that feature will be analogically spread to other signs in the same semantic domain.

And, of course, multiple of these can apply to a single proto-sign, such that it, for example, eliminates head motion in favor of hand motion, loses a hand, and smooths the resulting hand motion.

Most of the time, all of those trends reduce iconicity and increase arbitrariness of signs, but iconicity increases in cases where it does not contradict those other principles. Thus, a lot of antonyms end up being dropped and replaced by reverse-signs- e.g., you get morphological lexical negation by signing a word backwards, and temporal signs move to end up grouped along a common time-line in the signing space.

Symmetrization makes writing easier because you don't have to encode as much simultaneous stuff. Even though two hands might be used, you don't have to write down the actions of two simultaneous hands if they are doing the same thing. Reduction of the signing space also means you need fewer symbols to express a smaller range of variation in the place and motion parameters, and smoothing simplifies writing essentially by making words shorter, describable with a single type of motion.

Many two-handed ASL signs are still not entirely symmetric. Some, like the verb "to sign", are anti-symmetric, with circling hands offset by 180 degrees. One-handed signing is, however, a thing, and communication can still proceed successfully if only the dominant hand performs its half of the sign, while the other hand is occupied. (I imagine there is some degradation, like eating while talking orally, but I don't know enough about ASL to tell exactly how significant that effect is or how it qualitatively compares to oral impediments.) Thus, it appears that it would not be terribly difficult to make the second hand either completely redundant, or limited in its variations (such as symmetric vs. antisymmetric movement, and nothing else) to make two-handed signs extremely easy to write, and minimize information loss in one-handed signing.

Given the restriction of two-handed signs to particular places (i.e., not around the face), it might even make sense to encode the action of the second hand as part of the place. One could imagine, for example, a non-symmetric case of touching the second hand as a place specification (which would typically remain possible even if that hand is occupied), as well as symmetric second hand and anti-symmetric second hand.

I have no idea if native signers of ASL or any other sign language actually think of the second hand as constituting a Place, just like "at the chin," or "at the shoulder," rather than a separate articulation unto itself, but treating the secondary hand as a place does seem like very useful way to think for a con-sign-lang. Not only does it significantly reduce the complexity required in a writing system, it also ends up smoothing out the apparent surface differences between near-face signs and neutral space signs; in each case, there is underlyingly only one lexical hand, with the second hand filling in when the face is absent to better specify Place information.

Monday, September 5, 2016

General Thoughts on Writing Signs

In my last post, I have begun to develop an essentially featural writing system for an as-yet undeveloped sign language. Featural writing systems are extremely rare among natural oral languages, but every system for writing sign languages that I know of is featural in some way. So, why is this?

Let's examine some of the possible alternatives. The simplest way to write sign languages, for a certain value of "simple", would be to use logograms. Just as the logograms used to write, e.g., Mandarin, do not necessarily have any connection whatsoever to the way the words of the language are pronounced, logograms for a signed language need not have any systematic relation to how words are signed. Thus, the fact that the language's primary modality is signing becomes irrelevant, and a signed language can be just as "easy" to write as Chinese is.

However, while logograms would be perfectly good as a native writing system for communication between people who already know a sign language and the logographic writing system that goes with it, they are next to useless for documenting a language that nobody speaks yet, or for teaching a language to a non-native learner. For that, you need some additional system to describe how words are actually produced, whether they are spoken orally or signed manually.

Next, we might consider something like an alphabet or a syllabary. (si5s calls itself a "digibet".) In that case, we need to decide what level of abstraction in the sign language we want to assign to a symbol in the writing system. If we want linearity in the writing system to exactly match linearity in the primary language, as it does with an ideal alphabet, then we need one symbol for every combination of handshape, place, and motion, since those all occur simultaneously. Unfortunately, that would result in thousands of symbols, with most words being one or two symbols long, which is really no different from the logography option. So, we need to go smaller. Perhaps we can divide different aspects of a sign into categories like "consonants" and "vowels", or "onsets", "nucleii", and "codas". If we assign one symbol to each handshape, place, and motion... well, we have a lot of symbols, more than a typical alphabet and probably more than a typical syllabary, but far fewer than a logography. In exchange for that, we either have to pick an arbitrary order for the symbols in one "sign-syllable", or else pack them into syllable blocks like Hangul or relegate some of them to diacritic status, and get something like an abugida. Stokoe notation is in that last category. Syllable blocks seem like a pretty good choice for a native writing system, but that won't work for an easily-typable romanization. For that, we're stuck with the artificially linearized options, which is also the approach taken by systems like ASL-phabet.

For a sign language with an intentionally minimalized cheremic inventory, that level of descriptiveness would be quite sufficient. But, there aren't a whole lot of characters you can type easily on a standard English keyboard (and even fewer if you don't want the result to look like crap and be very confusing- parentheses should not be used for -emic value!) Thus, we need to go down to an even lower level of abstraction, and that means going at least partly featural.

Native sign writing systems have a different pressure on them for featuralism: signing and writing are both visual media, which makes possible a level of iconography unavailable to writing systems for oral languages. In the worst case, this leads to awkward, almost pictographic systems like long-hand SignWriting, which is only one step away from just drawing pictures of people signing. But even a more evolved, schematic, abstract system might as well hang on to featural elements for historical and pedagogical reasons.

A System for Coding Handshapes

Sign languages are cool, and conlangs are cool, but there is a serious dearth of constructed sign languages. Or at least, there is a dearth of accessible documentation on constructed sign languages, and for all practical purposes that's the same thing. The only one I know of off-hand is KNSL. Thus, I want to create one.

Part of the problem is that it's just so hard to write sign languages. I, for one, cannot work on a language without having a way to type it first. Not all conlangers work the same way, but even if you can create an unwritten language, the complexity of documenting it (via illustration or video recording) would make it much more difficult to tell other conlangers that you have done so. The advantages of being able to type the language on a standard English keyboard are such that, if I am going to work on a constructed sign language, developing a good romanization system is absolutely critical. If necessary, it is even worth bending the language itself in order to make it easier to write.

There are quite a few existing systems for writing sign, like SLIPA, but just as you don't write English in IPA, it seems important in a developing a new language to come up with a writing system that is well adapted to the phonology/cherology of that specific language.

It occurred to me that binary finger counting makes use of a whole lot of interesting handshapes, and conveniently maps them onto numbers.* Diacritics or multigraphs can then be added to indicate things like whether fingers are relaxed or extended, or whether an unextended thumb is inside or outside any "down" fingers, which don't make any difference to the counting system.

So, I can write down basic handshapes just by using numbers from 0-31, or 0-15, depending on whether or not the thumb is included. There are reasons for and against that decision; including the thumb means the numbers would correspond directly to traditional finger-counting values, which is nice; but, it also results in a lot of potential diacritics / multigraphs not making sense with certain numbers, which has some aesthetic disappeal. On the other hand, lots of potential diacritics wouldn't make sense with certain numbers anyway, so maybe that doesn't matter. On the gripping hand, only using 0-15 and relegating all thumb information to diacritics / multigraphs means I can get away with using single-digit hexadecimal numerals (0-F), which is really convenient.

This page describing an orthography for ASL provides a convenient list of ASL handshapes with pictures and names that we can use for examples. Using hexadecimal numbering for the finger positions, and ignoring the thumb, the basic ASL handshapes end up getting coded as follows:

1: 1
3: 3
4: F
5: F
8: D
A: 0
B: F
C: F
D: 1
E: 0
F: E
G: 1
I: 8

K: 3
L: 1
M: 0
N: 0
O: 0
R: 3
S: 0
T: 0
U: 3
V: 3
W: 7
X: 1
Y: 8

You'll notice that a lot of ASL signs end up coded the same way; e.g., A, M, N, S, and T all come out as 0 in finger-counting notation. Some of that is going to be eliminated when we add a way to indicate thumb positions; if we counted 0-V (32 symbols) instead of 0-F (16), including the thumb as a binary digit, the initial ambiguity would be much smaller. Some of that is expected, and will remain- it just means that ASL makes some cheremic distinctions that don't matter in this new system. That's fine, because this isn't for ASL; we're just using pictures of ASL as examples because they are convenient. However, si5s, another writing system for ASL, got me thinking of using diacritics to indicate additional handshape distinctions beyond just what the finger-counting notation can handle. Typing diacritics on numbers is difficult, but I can easily add multigraphs to provide more information about finger arrangement in addition to thumb positioning.

First off, there are thumb position diacritics. Since one of the thumb positions is "extended", indicating an odd number, these are only applicable to even numbers, where the thumb position is something else (this would change if I went to 0-F notation instead, excluding the thumb). For these, we've got:

p- thumb touching the tips (or 'p'oints) of the "up" fingers
d- thumb touching the tips of the "down" fingers (as in ASL 8, D, F, and O)
s- thumb held along the side of the hand (as in ASL A)
u- thumb under any "down" fingers, or along the palm (as in ASL 4)
b- thumb between any "down" fingers (as in ASL N, M, and T)
e- thumb extended to the side (as in ASL 3, 5, C, G, L, and Y)

The default is thumb on top of any "down" fingers, as in ASL 1, I, R, S, U, V, W, and X, or across the palm.
The hand position of ASL E is ambiguous between thumb under and thumb over- diacritic 'u' or the default, unmarked state.

Note that 'u' and 'b' are indistinguishable from the default for position F, since there aren't any 'down 'fingers. Position 'b' can be interpreted as "next to the down finger" in cases where there is only one finger down (positions 7, B, D, and E).

Next, the "up" fingers can be curled or not, and spread or not, indicated respectively by a 'c' and a 'v'. Position 'v' of course does not make sense for positions without two adjacent fingers up (0, 1, 2, 4, 5, 8, 9, and A- half of the total!), and 'c' doesn't make sense for 0.

This still does not capture all of the variation present in ASL signs, but it does capture a lot, and, as previously noted, the bits that are missed don't really matter since this is not supposed to be a system for coding ASL!

The ASL mapping list with multigraphs added looks like this:

1: 1
3: 3ve
4: Fv
5: Fve
8: Dd
A: 0s
B: Fu
C: Fce
D: 1d
E: -
F: Evd
G: 1e
I: 8

K: -
L: 1e
M: 0b
N: 0b
O: 0d or Fp
R: 3
S: 0
T: 0b
U: 3
V: 3v
W: 7v
X: 1c
Y: 8e

And we can code some additional handshapes from the "blended" list:

3C: 3vce

4C: Fvc

5C: Fvce

78: 9

AG: 1p

AL: 0e

etc.

The crossed fingers of the ASL R are not representable in this compositional system, but I like that handshape, so we can add an extra basic symbol X to the finger-counting 0-F, to which all of the thumb position multigraphs or diacritic can be added.

To complete a notation system for a full sign language, I'd need to add a way of encoding place, orientation, and two kinds of motion- gross motion, and fine motion, where fine motion is stuff from the wrist down. I'll address those in later posts, but this feels like a pretty darn good start which already provides hundreds of basic "syllable nucleii" to start building sign words from.

* Of course, other finger-counting systems (like chisanbop, perhaps) could also be used to come up with cheremic inventories and coding systems for them as well.

Friday, June 3, 2016

Possession & State in Valaklwuuxa

When there are no nouns, how do you manage genitive constructions?

Somewhat surprisingly, the answer turns out to be "the same way you form resultatives".

Resultatives

Resultatives are derived predicates that indicate a final state resulting from an action. In English, we often indicate these with passive participles; thus, potatoes which have undergone boiling are "boiled potatoes"- "boiled" is the state that results from boiling.

In Valaklwuuxa, resultatives are derived by the prefix <ves->. Thus, we can have sentence pairs like "nbetsa tu txe Dxan-la." ~ "John sat down." vs. "vesnbetsa txe Dxon-la" ~ "John is sitting.", or "le-val" ~ "It's cooking" vs. "le-vesval" ~ "It is / has been cooked."

These kinds of derived predicates tend to be intransitive, but there are some transitive roots which produce transitive resultatives as well- things like "to touch" -> "to be in contact with something", or "to see" -> "to have been seen by someone".

But what happens if you try to apply that particular derivation to something which is not a process? Well...

Possession

Consider a root like <kusa> "child". If we conjugate that as "le-kusa", it means "He/she is a child"; we can also add an explicit subject, and say "kusa txe Dxon-la" ~ "John is a child." But if we add the prefix <ves->, we get "veskusa txe Dxon-la" ~ "John has a child"; and, in fact, this is a transitive predicate- the unstated object is John's child.

(In fact, possessive predicates actually tend to be ambi-transitive; if additional description of the object is needed or implied, one uses the transitive conjugations; but if not, the intransitive conjugations are also acceptable. This is fairly weird for Valaklwuuxa verbs, where transitivity tends to be quite explicit, but omitting explicit transitivizers or detransitivizers eliminates extra syllables in a situation where different conjugation paradigms usually eliminate ambiguity anyway.)

Now, if I want to say something a little more complicated, like "I see John's child", I can relativize that object, and get "xe-lwokx txe veskusasa txe Dxan-la" (where <lwokx> is the root for "to see something")- note that the inverse voice suffix <-sa> must be used to relativize the child, rather than the child's possessor (John).

The Semantic Connection

In theory, these two usages of <ves-> could be related in two general ways:

1. Accidental homophony- they are two separate prefixes that happen to sound the same, due to historical sound changes or something.
2. Two uses of the same morpheme- somehow, one semantic operation actually covers both cases.

Strange as it may seem, the correct answer is actually (2). This is, in fact, one and the same prefix in both cases, and is in fact modelled on a similar prefix <es-> in Lillooet Salish. This paper explains the morphosyntactic evidence for considering <es-> to be one morpheme in Salish, but for Valaklwuuxa it is sufficient to simply assert that, yes, this is one thing because that's how the conlang was defined, as long as we can provide a reasonably coherent definition for it. That's gonna take a little bit of formal semantics.

One tenuous semantic connection is to consider that possessing something is itself a state, so it makes sense to have a stative marker on possessed things. Similarly, we can conceive of things "having" states. Many languages in fact do this- in Spanish, for example, one is not hungry; rather, one has hunger, and the use of one verb, "to have", to express both possession and the perfect aspect in English is similarly suggestive that there may be a natural connection between these two concepts. Then, stative-on-a-thing = possession, and stative-on-an-action = resulting state. But, we can go deeper than this.

First, let's consider things that have a necessary relation to something else- e.g., a father is always the father of someone, a child is always someone's child, a husband always has a wife, etc. If we look at a root like <kwutanbets> "husband", it is intransitive and therefore takes one external argument- the person who is a husband. However, there is another, hidden, internal argument- the wife of whom he is the husband. What <ves-> does, then, is to pull out the internal argument and make it external. Thus, we can have sets of sentences like "kwutanbets txe Dxan-la" ~ "John is (someone's) husband" / "veskwutanbets txe nBale-la" ~ "Mary has a husband" / "Dxan txe veskwutanbetsa txe nBale-la" ~ "John is Mary's husband".

(And, of course, we can do the same thing with the inverse relation- "sendand txe nBale-la" ~ "Mary is (someone's) wife" / "nBale txe vesendandsa txe Dxan-la" ~ "Mary is John's wife")

This can be generalized so that we assume all "things" have an internal possessor argument, even if it's not an obvious, inherent one, like husband/wife or father/child.

Now, if we consider processes, the (or at least one) external argument is still an entity, a thing; as explained in a previous post, there is after all no difference in Valaklwuuxa between "I act" and "I am an actor". Processes, however, have a different internal argument. One could have a process-root which has the subject's possessor as an internal argument, and then <ves-> would obviously have the same function in every case. If, however, we assume that process-roots have an internal argument for the end-state of the process, then <ves-> still has the same semantic effect- promote an implicit internal argument to an explicit external argument- but produces resultatives for some roots and possessives for others.

Pronominal Possessives

Now, in Salish languages, this is not the only mechanism of indicating possession. In particular, there are pronominal possessive clitics which can be added to a root. In Valaklwuuxa, however, this is not strictly necessary; normal verb inflections already serve that purpose quite adequately. For example, if you wish to say "my rock" or "my house", you can simply conjugate the possessed form (in inverse voice, of course, lest you say "I have a rock instead!") for first person: "veswonglqasaka" or "vesk'elansaka", respectively. Note than in English, possessive pronouns are tied up with determiners and definiteness; i.e., you can "the rock", "a rock", or "my rock", but not *"a my rock"; to express that meaning, you have to resort to a circumlocution like "one of my rocks" or "a rock of mine". In Valaklwuuxa, however, you can mix and match however you like: "my rock" ~ "txe veswonglqasaka", "a rock of mine" ~ "ta veswonglqasaka".

Now, without specifying the thing, how would you say "It is mine!"? Basically, it comes out as "It's my thing!":

"xe-vestuka!" (I have the thing!) / "le-vestuksaka!" (It is my thing!)

where <tuk> is the root for "a thing".

(cf. "vestukend" ~ "I have a thing" / "I have something", using the non-transitive conjugation.)

The existing machinery is also sufficient for asking questions about possession, although there is some ambiguity for pronominal possessors. As described in my last post, one can simply replace an explicit possessor phrase with an interrogative to ask who owns something, although the lack of independent possessive pronouns means the structure of the answer is not exactly parallel to that of the question in this case:

"veswonglqa ta k'aku-la?" ~ "Whose rock is it?"
"le-veswonglqasaka" ~ "It is my rock." (cf. "veswonglqaka" ~ "I have a rock.")

If you want to ask something like "Is that John's rock?", you merely have to add the polar question particle after "John":

"Dxan k'a se veswonglqasa?"

If, however, you want to ask "Is that your rock?", we get some abiguity:

"dwu-veswonglqask k'a se?" ~ "Is that your rock (or someone else's)?" / "Is that your rock (or another thing of yours)?"

This is, however, no worse than the ambiguity that exists in English polar questions, and it would be very rare for that to cause an actual practical problem in a real discourse context.

The Verb <benlqwo>

In addition to forming both predicative ("I have") and attributive ("my") possessives with <ves->, Valaklwuuxa also has a root <benlqwo>, meaning "to have" or "to carry". In situations where <ves-> is unsuitable (e.g., because the derived form would invalidate a serial construction), <benlqwo> can be used for predicative possession. In cases where a form in <ves-> would be feasible, however, <benlqwo> carries a specific connotation of "on one's person". Thus, one might say:

"xe-veshatqakend" ~ "I have/own a rock." (Why yes, there are two different roots that both mean "rock"- "wonglqa" and "hatqak".)

vs.

"xe-veshatqaka se" ~ "I have this rock." / "This is my rock."

vs.

"xe-benlqwond ta hatqak-la" ~ "I have a rock on me (in my hand or in a pocket)".

Thursday, June 2, 2016

Questions & Deixis in Valaklwuuxa

I have been translating the Universal Speed Curriculum into Valaklwuuxa. This is a very simple conversational script; it's not intended to teach you a lot vocabulary, or particularly deep grammar principles- just to get you comfortable with speaking fluently in a target language and capable of asking simple questions and understanding simple answers, so that you can learn more of the target language in the target language.

As such, it starts out with sentences like "What is that?" / "That is a rock." / "Is that a rock?" Basically, you need to be able to ask content questions and polar questions, and name things by pointing (deixis), which we do in English with demonstrative pronouns. These should be easy things to handle in any language, and in fact Valaklwuuxa handles just fine... but given how subjectively weird Valaklwuuxa is, just how it manages may be non-obvious to the typical Anglophone.

If you know a little bit about Valaklwuuxa already (because you've read my previous blog posts or something), you might reasonably think "well, there aren't any normal nouns, and you don't need pronouns except the subject clitics because the verb conjugation takes care of everything else, so maybe there are extra deictic and interrogative conjugations?" And indeed, one could imagine a language that worked that way- the conjugation table would be large an unwieldy, but that never stopped a natlang! But there's a problem: if "what" and "that" are just translated by verb inflections... what gets inflected? There is, after all, no word for "is"!

Interrogatives

To resolve this, the interrogative pronouns "what" and "who" are actually translated in Valaklwuuxa by interrogative verbs, meaning roughly "to be what?" and "to be whom?" These are <k'asa> and <k'aku>, respectively. A third interrogative word, <k'axe>, is what we might be tempted to call a "pro-verb"; it most closely translates into English as "to do what?" In general, there is no morphosyntactic distinction in Valaklwuuxa between sentences like "I act" and "I am an actor- these would both translate the same way. But, Valaklwuuxa distinguishes unergative verb (with an agent-like subject) and unaccusative verbs (with a patient-like subject) in other areas of the grammar, and that is the internal distinction between <k'asa" and "k'axe>. Animate things, however, are always "things one can be" but never "things one can do", so there is only the one (unaccusative) root for "to be whom?"

Using any of these verbs as the predicate of a sentence allows asking questions like "What is it?" If you need to ask a question about an argument of some other verb (like, say "What did you eat?"), you just treat the interrogatives like any other Valaklwuuxa root, and stick them into a relativized argument phrase.

All of these interrogative roots also have corresponding answer words: <dasa> ("to be that"), <daxe> ("to do that"), and <daku> ("to be them"). These, however, are not the deictic (pointing) words that you would use in a question like "What is that?" They are more like regular pronouns (or pro-verbs)- they refer to some thing or action that has already been mentioned earlier in the discourse, which you do not wish to repeat. (And if you think that the schematicism in how answers and questions are regularly related to each other is suspiciously unnatural... well, Russian actually does exactly the same thing!)

Demonstratives

Surprisingly, the actual demonstratives turned out to work pretty much like they do in English- the exact set of them is different, and they divide up space differently, but they pretty much just look like free pronouns. Lest you think that this is not weird enough for a language with such alien-to-Anglophones morphosyntax as Valaklwuuxa... well, that's actually how natural Salish languages handle them, too.

Internally, demonstratives are considered to be pretty much the same as articles- they are things that can head argument phrases, but they can't be predicates. They just happen to be intransitive version of articles (determiners), which don't require a relative clause to follow.

The three generic, non-deictic articles, which always a require a following phrase, are as follows:

<txe> "I know which one"
<ta> "I don't know/care which one"
<kwe> "the one who/which..."

The demonstratives, which can be used with or without an explicit argument, come in pairs distinguished by animacy:

Animate/Inanimate
<tqe>/<se> "this (near me)"
<tqel>/<sel> "that (near you)"
<lel>/<lel> "yon (near it)"

Note that there is no number distinction (e.g., "this" vs." these"). Plural marking can done by attaching the clitic <=ndek> to a determiner, but is not obligatory- it is unlikely to be used, except for emphasis, if number is indicated in some other, such as by the verb conjugation or if a specific number is mentioned.

Demonstratives are also distinguished from articles in that they can also be prefixed with <we->, which is a "pointing" marker; it's not obligatory when you point at something, but can only be used if you are actually pointing at something, and can be approximated as "this/that one right here/there!"

There is also a single set (without any animacy distinction) of question/answer determiners: <k'adza>, for asking "which one?", and the answer <dadza>, used for (approximately) "the same one"/"the same thing".

Asking What Things Are

Now, we have enough to translate:

"What is that (near you)?" ~ "k'asa sel?"
"This (near me) is a rock." ~ "wonglqa se."
(Where <wonglqa> is the word for "to be a rock".)

Now you might think, why did we choose to have interrogative roots and deictic pronouns? Couldn't you just as easily do it the other way around? That would make content questions simpler, because you wouldn't have to construct a relative clause around every interrogative root. And the answer is "yes", some other language could indeed work just the same as Valaklwuuxa in every other repsect, except for flipping that one decision the other way around. But choosing to do things in this way has one really nice consequence: the structure of content questions exactly parallels the structure of their answers. If the rock is "yonder", so that both questioner and answerer use the same demonstrative, you get:

"k'asa lel?"
"wonglqa lel."

Replace the question word with its answer, and everything else stays the same. Treating interrogatives as verbs does bring up another issue, though: when using them in argument positions, which determiner do you use? Typically, you'll use <ta>, the "I don't know which one" article (because if you did know which one, why did you ask?), but any determiner is valid, and they can be used to make much more specific kinds of questions, like:

"dwu-valsk sel k'asa?" ~ "You cooked that what?" / "What's that thing that you cooked?"

A Brief Note on Polar Questions

So, that's pretty much everything you need to know about content questions- but what about polar questions, with a yes/no answer?
The simplest way to form them is simply by intonation; syntactic structure is identical to statements, but a rising-falling tone over a whole clause will turn it into a question. If you want to be more specific, though, there is an interrogative particle <k'a>, which placed immediately after whatever is in doubt. Thus, we can ask:

"dwu-valsk k'a ta wonglqa-la?" ~ "Did you cook a rock?" (as opposed to doing something else with it)
vs.
"dwu valsk ta wonglqa k'a-la? ~ "Did you cook a rock?" (as opposed to some other item)

There is of course a corresponding answer word, <da>, used to confirm the thing in doubt:

"xe-valka ta wonglqa da-la." ~ "Yes, I really did cook a rock."

And an (irregular) negative answer particle:

"xe-valka ta wonglqa pe-la." ~ "No, I did not cook a rock." (but I may have cooked something else)

Errata

In addition to the interrogatives discussed above, there are two more pairs of answer/question roots:

<skwol> / <sdwol> "how many / so many"
<k'akwo> / <dakwo> "which (ordinal) one / that (ordinal) one"

That last one is a thing for which English has no single simple question word, but many languages (like Hindi) do. If you want to elicit a response like "I am the fifth child in my family.", you can imagine a corresponding question like "Which-th child are you?" In English, that's terribly awkward, and there is just no standard way of forming that kind of question, but in Valaklwuuxa, <k'akwo> is the standard translation for "which-th", or "what number".

Now, there's one more bit of interestingness. All of the basic question roots are intransitive, but there is a generic transitivizing suffix <-(e)t>. This usually has a causative meaning ("to make something happen", "to make someone do something"), which means Valaklwuuxa doesn't need to use special verbs for "to make" or "to force" nearly as often as a language like English does, but the precise meaning of a transitivized verb is lexically specified. In the case of <kaxet>/<daxet>, the transitive versions actually mean "to do what to something?"/"to do that to something". So if you want to ask "What did he make you do?", the translation actually does use a separate word for "make" after all.

Tuesday, May 17, 2016

Path & Manner in Valaklwuuxa

How languages describe motion is a particularly interesting subfield of verbal lexicology. In some languages, verbs of motion are even a distinct morphological class. Most of my blog posts on conlanging & linguistics have focused so far on WSL, which has no verbs, and thus no verbs of motion, but since I just started blogging about Valaklwuuxa, which is practically nothing but verbs, this topic suddenly seems relevant!

Conlangery #14 discusses verbs of motion in some depth, but the short version is that there are two major semantic components to motion: the manner in which one moves, and the path or direction along which one does it. Different languages differ in which of these components they encode in verbs, and which they relegate to adverbs, adpositional phrases, or other mechanisms. Germanic languages, for example, tend to encode manner, while Romance languages tend to encode path instead. Since English is a Germanic language with a ton of Romance borrowings, we've got a bit of both- manner verbs like "walk" (go slowly by foot), run (go fast by foot), swim (go by water), drive (go by directing a machine), etc., and path verbs like "ascend" (go up), "traverse" (go across), and "enter" (go into). Russian, in comparison, has verb roots that describe manner, which combine with prefixes for path.

So, how does Valaklwuuxa do it?

Valaklwuuxa has several roots that act a lot like manner verbs. For example:

ketqenda - go slow
petqentqe - go fast
nbatqe - move under one's own power (i.e., for humans, walk; but also applies to flying birds, swimming fish, vehicles, etc.)
lande - drive or ride

But, it also has roots like <tak> "to go along/beside", and <wole> "to move around a circuit", which are clearly path verbs.

And, on a moment's reflection, it seems like there must be both kinds of verbs; when verbs are your only open lexical category, and there's only one preposition, and not many basic adverbs... there's really no choice but to encode both path and manner information in verbs.

There are, however, still ways to determine whether the language is primarily path- or manner-oriented, aside from just looking at a list of lexical entries.

In the case of <ketqenda> and <petqentqe>, while these verbs can be, and are, used to describe motion, they can also be used with a more generic attributive sense. The attributes "fast" and "slow" generally imply motion, but literal displacement over a distance is not always entailed. Thus, one can say, e.g., <xe-petqentqenk!> to mean "I am going quickly!", "I'm hurrying", or just "I am fast!" Similarly, the imperative <(dwu-)petqentqex!> can mean "hurry up!", or "go faster!", but it can also be used in a metaphorical sense similar to "think fast!"

Most of the time when <ketqenda> or <petqentqe> are used with reference to literal motion, they appear in a serial-verb construction with some other verb, as in

dwu-petqentqe takex!
dwu=petqentqe tak=ex
2sg=go.fast go.along=IMP.sg
"Go (along the path) quickly!"

Words like <nbatqe> and <lande>, despite being more prototypical motion-verbs, behave similarly. They are very rarely used as finite verbs by themselves. In fact, they will be used on their own, as the predicate of a clause, almost exclusively in contexts where they are implicitly serialized with another verb, which may not even be a motion verb- and which can radically change the meaning! For example, if someone asked

dwu-valesk?
dwu=val-esk
2sg=cook-2.3sg
"Did you cook it?"

you might answer with

xe-nbatqenk!
xe=nbatqe-nk
1sg=walk-1p
"Yes, I did it myself!"

which is an elliptical form of <xe-nbatqe valesk!> "I cooked it by myself!"

If you want to actually say that someone is walking somewhere, it would be very odd to just say, e.g., <nBale txe nbatqe-la> or <nbatqe txe nBale-la> for "Mary is walking", unless the fact that Mary is travelling has already been established in the discourse and you are just highlighting the fact that she is doing it by walking, as opposed to some other means.

From all this you might get the impression that perhaps "to walk" and "to drive" are just bad glosses, and <nbatqe> and <lande> aren't really motion verbs at all- but that's only half the story! For one thing, something like <xe-nbatqe takend>, assuming that "I" am a adult human, really does mean "I am walking along a path", not just "I am traversing a path by myself", perhaps by awkwardly rolling, or some other means. (If the subject were not an adult human, of course, the translation would change to reflect the "prototypical" mode of movement for whatever the subject is). The phrases <landekwe wole> and <wolekwe lande> are also used somewhat idiomatically to mean "to patrol a perimeter (in/on a vehicle)" or "to drive around a race track" (the first takes an object for the thing you are going around, while the second takes an object for the thing you are riding or driving); again, not "to circumnavigate with help".

Furthermore, while the bare roots are not used much by themselves, there are derived forms which are no longer "verbs of motion" themselves, but depend on the motive meaning of the basic manner verbs. Thus, we have words like <landegwel> (literally "to start travelling by vehicle"), which is used for things like "to set out", "to start the car", "to uncircle to wagons", etc.; and words like <vwenbatqe> (literally "to walk as much as possible"), which is used to mean "to have explored" (and the more derived form <vwenbatqweev> "to be out exploring")- not "to do everything yourself"!

Thus, even though Valaklwuuxa necessarily has motive roots for both path and manner, we can determine from usage patterns that it is in fact primarily a path-oriented language.