Friday, May 13, 2016

Language Without the Clause

Having been playing for a while with WSL (which has no verbs) and Valaklusha, which I have not yet blogged about, which has no nouns, I had a realization that there are some linguistic concepts even more fundamental than the noun/verb distinction which are nevertheless still not essential to communication.

In particular, every language I have ever heard of has something that can be reasonably called a clause (some syntactic structure which describes a particular event, state, or relation) which is usually (though not always) recursively nestable with other clauses to make more complex sentences.

Predicate logic, however, does not have to be analyzed in terms of nicely-bounded clauses in the linguistic sense. (There are things in logic called "clauses", but they're not the same thing.) Predicates describing the completely independent referents of completely independent linguistic clauses can be mixed together in any order you like with no loss of meaning, due to the availability of an effectively infinite number of possible logical variables that you can use to keep things straight. In fact, we can not only get rid of clauses- we can get rid of recursive phrase structure entirely.

There are, of course, practical problems with trying to use an infinite set of possible pronouns in a speakable language. If there weren't, creating good loglangs wouldn't be hard! But even with a relatively small finite set of pronouns representing logical variables, it's possible to create unambiguous logical structures that overlap elements from what would normally be multiple clauses. Thus, we could come up with a language which, rather than using recursive nesting, uses linear overlapping, with no clear boundaries that be used to delimit specific clauses.

After thinking of that, I realized that Gary Shannon's languages Soaloa and Pop are examples of exactly that kind of language, although (as described on that page) they do have an analysis in terms of nested clauses. Eliminating the possibility of a clausal analysis requires something a little more flexible.

A better example is Jeffrey Henning's Fith. It works completely differently from Pop- which should not be surprising! It would be quite surprising to discover that there are only two ways of structuring information, using clauses and not-using-clauses, but that's not how it is. This is not a dichotomy, which means there is a huge unexplored vista of untapped conlanging potential in the organizational territory outside of recursive clause structure land.

Fith is inspired by the programming language FORTH, and is supposed to be spoken by aliens who have a mental memory stack. Some words (nouns) add concepts to the stack, and other words (verbs and such) manipulate the items already on the stack and replace them with new, more complex concepts. If that were all, Fith would look like a simple head-final language, and the stack would be irrelevant- but that is not all! There are also words, called "stack conjunctions" or "stack operators", which duplicate or rearrange (or both) the items already on the stack. Because it can duplicate items on the mental stack, Fith has no need for pronouns, and every separate mention of a common noun can be assumed to refer to a different instance of it- if you meant the same one, you'd just duplicate the existing reference! But more importantly, the existence of stack operators means that components of completely independent semantic structures can be nearly-arbitrarily interleaved, as long as you are willing to put in the effort to use the right sequence of stack operators to put the right arguments in place for each verb when it comes. One can write a phrase-structure grammar that describes all syntactically valid Fith utterances... but it's meaningless. Surface syntax bears no significant relation to semantics at all, beyond some simple linear ordering constraints.

In fact, Fith is a perfect loglang- it can describe arbitrarily complex predicate-argument structures with no ambiguity, and it doesn't even require an arbitrary number of logical variables to do it! Unfortunately, it's also unusable by humans; Fith doesn't eliminate memory constraints, it just trades off remembering arbitrarily-bound pronouns with keeping track of indexes in a mental stack, which is arguably harder. Incidentally, this works for exactly the same reason that combinator calculus eliminates variables in equivalent lambda calculus expressions.

(Side note: the stack concept is not actually necessary for evaluating Fith or combinator calculus- it's just the most straightforward implementation. Some of Lojban's argument-structure manipulating particles actually have semantics indistinguishable from some stack operators, but Lojban grammar never references a stack!)

Having identified two varieties of languages that eschew traditional clauses, here's a sketch of a framework for a third kind of clauseless language:

The basic parts of speech are Pronouns, Common Verbs, Proper Verbs, and Quantifiers; you can throw in things like discourse particles as well, but they're irrelevant to the basic structure. This could be elaborated on in many ways.

Pronouns act like logical variables, but with a twist: just like English has different pronouns for masculine, feminine, and non-human things ("he", "she", and "it"), these pronouns are not arbitrarily assigned, but rather restricted to refer to things in a particular semantic domain, thus making them easier to keep track of when you have a whole lot of them in play in a single discourse. Unlike English pronouns, however, you'd have a lot of them, like Bantu languages have a large number of grammatical genders, or like many languages have a large number of classifiers for counting different kinds of things. In this way, they overlap a bit with the function of English common nouns as well.
In order to allow talking about more than one of a certain kind of thing at the same time, one could introduce things like proximal/distal distinctions.

Verbs correspond to logical predicates, and take pronouns as arguments with variable arity. There might be a tiny bit of phrase structure here, but at least it would be flat, not recursively nestable- and one could treat pronouns as inflections to eliminate phrase structure entirely.
These aren't quite like normal verbs, though- they do cover all of the normal functions of verbs, but also (since they correspond to logical predicates) the functions of nouns and adjectives, and some adverbs. Essentially, they further constrain the identity of the referents of the pronouns that they take as arguments, beyond the lexical semantic restrictions on the pronouns themselves. Common verbs are sort of like common nouns- they restrict the referents of their arguments to members of a certain generic class. Proper verbs, on the other hand, restrict at least one of their arguments to refer to exactly one well-known thing.

Quantifiers re-bind pronouns to new logical variables. They cover the functional range of articles, quantifiers, and pronouns in English. The simplest way for quantifiers to work would be a one-to-one mapping of predicate logic syntax, where you just have a quantifier word combined with a pronoun (or list of pronouns) that it binds. A bit more interesting, however, might be requiring quantifiers to attach to verbs, implicitly re-binding the arguments of the modified verb to the referents which are the participants in the event described by the verb. If that is not done, it might be useful to have "restrictive" vs. "non-restrictive" inflections of verbs, where restrictive verbs limit the domain of the binding quantifiers for their arguments, and non-restrictive verbs merely provide extra information without more precisely identifying (like English restrictive vs. non-restrictive relative clauses).

For very simple sentences, this wouldn't look too weird, except for the pre-amble where you quantify whatever pronouns you intend to use first. But the first thing to notice is that, just like in predicate logic, there is no need for all of the verbs describing the same referent to be adjacent to each other. A sentence in English which has two adjectives describing two nouns could be translated with the translation-equivalent of the nouns all at the front and the translation-equivalent of the adjectives all at the end, with the translation-equivalent of the verb in between. But hey, if you have a lot of agreement morphology, normal languages can sometimes get away with similar things already; although it's not common, separating adjectives from nouns can occur in, e.g., Russian.

Where it gets really weird is when you try to translate a complex sentence or discourse with multiple clauses.
When multiple English clauses share at least one referent, this relation is not indicated by nesting sentences within each other, or conjoining them in series, but by stringing on more verbs to the end, re-using the same pronouns as many times as you like. Occasionally, you stick in another quantifier when you need to introduce a new referent to the discussion, possibly discarding one that is no longer relevant. But, since this can be done re-binding one pronoun at a time while leaving others intact, the discourse can thus blend continuously from one topic into another, each one linked together by the overlap of participants that they have in common, with no clear boundaries between clauses or sentences. If you were very careful about your selection of pronouns, and made sure you had two non-conflicting sets, you could even arbitrarily interleave the components of two totally unrelated sentences without ambiguity!

Note that this does not describe a perfect loglang- the semantic structures it can unambiguously encode are different from those accessible to a language with tree-structured, recursively nested syntax, but they are still limited, due to the finite number of pronouns available at any one time. This has the same effect on limiting predicate logic as limiting the maximum stack depth does in Fith.

When discussing this idea on the CONLANG-L mailing list, some commenters thought that, like Fith, this style of language sounded incredibly difficult to process. But, I am not so certain of that. It definitely has pathological corner cases, like interleaving sentences, but then, so does English- witness garden path sentences, and the horrors of center embedding (technically "grammatical" to arbitrary depth, but severely limited in practice). Actual, cognitively-limited, users would not be obligated to make use of every structure that is theoretically grammatical! And even in the case of interleaved sentences, the ability of humans to do things like distinguishing multiple simultaneous voices, or separate note trains of different frequencies, makes me think it might just be possible to handle, with sufficient practice. Siva Kalyan compared the extremely free word order to hard-to-process Latin poetry, but Ray Brown (although disagreeing that my system sounded much at all like Latin poetry) had this to say on processability:
If it really is like Latin poetry then it certainly is not going to be beyond humans to process in real-time. In the world of the Romans, literature was essentially declaimed and heard. If a poet could not be understood in real-time as the poem was being declaimed, then that poet's work would be no good. It was a world with no printing; every of a work had to be done by hand. Whether silent reading ever occurred is debatable. If you you had the dosh to afford expensive manuscripts, you would have an educated slave reading them to you. What was heard had to be processed in real-time. The fact that modern anglophones, speaking a language that is poor in morphology and has to rely to a large extent on fairly strict word-order, find Latin verse difficult to process in real time is beside the point. Those brought up with it would certainly do so.
And being myself a second-language speaker of Russian, I have to say I am fairly optimistic about the ability of the human mind to deal with extremely free word-order. If I, a native Anglophone, can handle Russian without serious difficulty, I see no reason why the human mind should not be able to handle something even less constrained, especially if it is learned natively. Furthermore, if I think of pronouns like verb agreement inflections in a somewhat-more-complicated-than-usual switch-reference system, where the quantifiers act like switch-reference markers, then it starts to feel totally doable.

There are several devices speakers could use to reduce the load on listeners, like repeating restrictive verbs every once in a while to remind listeners of the current referents of the relevant pronouns. This wouldn't affect the literal meaning of the discourse at all, but would reduce memory load. This is the kind of thing humans do anyway, when disambiguating English pronouns starts to get too complicated.

This system also allows "floating" in the style of Fith, where one binds a pronoun and then just never actually uses it; unlike Fith, though, if the class of pronouns is small enough, it would quickly become obvious that you were intentionally avoiding using one of them, which should make the argument-floating effect more obvious to psychologically-human-like listeners.

Now, to be clear, what I have described thus far is not really a sketch for one single language, but rather a sketch for a general structure that could be instantiated in numerous different ways, in numerous different languages. As mentioned above, pronouns could exist as separate words, or as verb inflections. The components that I've packaged into Pronouns, Quantifiers, and Verbs could be broken up and re-packaged in slightly different ways, cutting up the syntactic classes into different shapes while still maintaining the overall clauseless structure. One could introduce an additional class of "common nouns" which mostly behave like pronouns, and represent logical variables, but have more precise semantics like verbs. This is potentially very fertile ground for developing a whole family of xenolangs with as much variation in them as we find between clause-full human natlangs! And I am feeling fairly confident that a lot of them would end up as something which, sort of like WSL, is still comprehensible by humans even if it could never arise as a human language naturally.