Monday, September 7, 2015

A Sister Language for WSL

My last post was triggered by recent discussions on CONLANG-L which inspired me to start formalizing the semantics of WSL. But after I started formalizing WSL, those discussions kept going. And as it happens, I got inspired to start on the design of a new language, similar in basic structure to WSL but also very different. So, before we get to Part II of WSL's semantic model, we're going to take a brief detour through the basic design of a new sister language.

Background: This idea came out of a discussion on how to describe the semantics of a monocategorial language whose complete syntax could supposedly be described by the following simple grammar:

 w S | 0

Or, "A sentence consists of a list of words." That's it. Any words in the language, in any order- all of them are grammatical sentences. Which, really, is equivalent to "no syntax at all". The obvious choice for semantic rules when presented with that syntax (or lack thereof) is David Gil's polyadic association operator, but the creator was adamant that that was not a correct analysis of his language. My conclusion was that " w S | 0" was simply not the correct grammar as claimed, but rather that it was a two-level structure that grouped words into distinct phrases, but where phrase boundaries are maximally ambiguous. This still permits every possible linear arrangement of words as a valid grammatical sentences, since the order of words within a phrase and the order of phrases within a sentence are still completely free.

With only one class of words and completely free word order, resulting in no way to tell where phrase boundaries are and how words should be grouped together, such a language would initially seem to be fairly useless- even discounting lexical ambiguity, the number of different possible interpretations grows as the square of the number of words in a sentence- an ambiguity load that dwarfs what exists in any natural language, and would quickly swamp what you can reasonably handle with pragmatics.

In a spoken language, though, more function words or morphological words would not necessarily be required to eliminate that ambiguity- phrase and sentence boundaries could be quite adequately delimited by intonation. And intonation can in turn by encoded in text via appropriate punctuation, while still reasonably claiming that this is a monocategorial language at the lexical level (although it will have multiple types of internal syntactic nodes). I've never really played with the intonation rules for a conlang before, and especially not the effect of intonation on semantics; and I haven't seen much of that documented in other people's conlangs, either. So, this is a pretty enticing opportunity to really isolate the semantics of suprasegmental intonation.

Now, the point of WSL was to create something that very obviously does not have anything that could reasonably be called a category of "verbs" at any level, but not necessarily to be simple or minimalistic. And WSL does in fact have quite an array of different parts of speech. But for this one, the aim will be to see how far it can go before it becomes necessary to add any additional lexical classes.

The syntax of this new language ends up looking up like this:

 P C | P .
 w P | w ,

This reads as "A sentence consist of a clause, a clause consists of a phrase followed by another clause or a phrase followed by a period, and a phrase consists of a word followed by another phrase, or a word followed by a comma." We also specify the phonological / orthographical rule that a sequence of ", ." coalesces into a single "."

At the phonological level, the "," and "." are realized as particular intonation patterns on the preceding phrase. I'm not wedded to anything yet, but I'm thinking rising tone over the last word of a phrase for ",", and contrasting falling tone for ".". That would lead to an intonation pattern over a whole sentence that consists of a series of level tones followed by rises, and then terminated by a fall.

The extra level of rules that turns an S into a C may seem superfluous (and if we just want to describe syntactic structure by itself, they are), but the extra level makes the semantic interpretation rules much simpler.

Those basic interpretation rules look like this:

[|S|] = ∃x.[|C|](x)
"There exists some x such that the denotation of is true for x."

[|C: P C|] = λx. ∃y. [|P|](x)(y) & [|C|](x)
[|C: P .|] = λx. ∃y. [|P|](x)(y)
"For some x, there exists some y such that the denotation of P applied to x and y is true, and
the denotation of C is true for x."

[|P: w P|] = λx.λy.[|w|](x)(y) & [|P|](x)(y)
"For some x and y, the denotation of w and the denotation of P applied to x and y are true."

[|P: w ,|] = λx.λy. [|w|](x)(y) & ∃r. r(x,y)
"For some x and y, the denotation of w applied to x and y is true and some relation r exists between x and y."

Basically, this is just a fancy mathematically formalized way of saying that a sentence describes an event which gets passed into each sub-clause, and then each phrase describes its own separate entity, and the meaning of the whole sentence is just the conjunction of the meanings of each word, applied to the whole-sentence event and the entity for that word's containing phrase, along with the assertion that the entity for a phrase has some kind of relationship to the sentence.

Every word in the language has the "semantic interface" of a two-place predicate, or a two-argument curried lambda expression, taking in an event variable and an entity variable and specifying some restriction on either or both referents and/or a relationship between them.

Some words will be simple predicates that restrict the referent of the phrase, or tell you about its properties. They will have meanings the look something like this:

a) λx.λy. red(y)

which completely discards the event and just applies some predicate (in this case, "red") to the entity variable.

Some other words will be two-place relations that tell you about the thematic role of the entity in relation to the event. They will have meanings like

b) λx.λy. ag(x, y)

which tells you that the referent of this phrase (represented by the entity variable y) is the agent of the event.

And a third class of words will tell you about the event itself. These words could come in two sub-varieties; things that look like

c) λx.λy. run(x)

which discard the entity variable and just apply a predicate to the event; and things that look like

d) λx.λy. x = y

which tells you that the entity for this phrase is, in fact, an event, and that the event is a subset or superset of (or, in this particular case, simply is) the entity described by the enclosing phrase.

Now, semantic class c has the interesting property that, since the meanings of words in that class do not depend on the entity of the phrase, they can appear in any phrase in a sentence without altering the literal meaning. That's a fairly unique behavior, and could be used to argue for recognizing them as a separate part of speech from the rest, but they don't have to be analyzed so. Their syntactic behavior is undistinguished from every other word. Even so, I'm not sure if I will want to include some in the language for "fun", or if they should be disallowed so as to avoid the argument.

Also, the boundary between classes b and d is very fuzzy, since subset, superset, and identity could just as well be modeled as binary relations between a phrasal entity and a sentential event as things like "agent" and "patient" are.

Finally, the a category, which would typically seem to correspond with nouns and adjectives, also does not have any distinguished behavior compared to classes b, c, and d. Relation words and event words can be left out, and you can have a complete sentence that consists only of class-a semantic noun-jectives, which are asserted to exist and to have some unspecified relation to some unspecified event[1]. In WSL, role markers are obligatory, but here we have the extra "& ∃r. r(x,y)" in the interpretation of phrases just to account for the case where words with the semantics of a role marker are missing.
On the other hand, you can also leave out all class-a words, and have a complete sentence that consists only of class-b relations; and the same applies to the last two classes of event words as well. Finally, there are no selection rules that cause a word of any of the four classes to disallow the use of any other particular class in the same phrase or sentence; some combinations of words may be contradictory or nonsensical, but every string of words is grammatical, and can be interpreted.

It should also be possible to represent quantifiers in this framework, as totally undistinguished words at the syntactic level which merely happen to have another different internal structure in their lexical semantics. This would allow getting rid of some of the built-in existential quantifiers, but will first require removing a few layers of abstraction from my current semantic notation in order to uncover the set-theoretic mechanics of generalized quantifiers. My efforts to that effect are detailed in this follow-up post.

Next, I'd like to figure out some useful application for stress-marked focus, which could be indicated orthographically with Capital Letters or something. That will take some thinking, since English examples often rely on the semantics of some focus-sensitive lexical item, and using it that way would provide a good argument for recognizing focus-sensitive items as a second part-of-speech. But some really simple rising/falling intonation gets us pretty dang far doing nothing but marking linear phrase boundaries!

[1]  Which means that elliptical answers to questions aren't actually elliptical at all- they're still complete grammatical sentences!