Tuesday, September 1, 2015

A Progressive Model of WSL Syntax & Interpretation: Part 1

Late last year, as the result of a challenge to create a language with no distinction between verbs and nouns (and more strictly, with no verb phrases at all), I began working on a new conlang called WSL. Originally, this stood for "Weird Syntax Language", but has since been backformed to an acronym for the autonym "Wjerih Sarak Lezu", whose documentation is being slowly fleshed out in the linked-to Google Doc.

In response to a follow-up challenge, I have now started the process of producing a formal description of the syntax and semantics of WSL. While relatively simple compared to most languages, WSL is still rather intimidating to make a full syntactic model of in one go. So, I'm going to take the approach of doing little pieces of it at a time, with commentary.

First, we start with the most basic requirements for predicate-argument structure:

S → a e A
A → a p A | 0

This is (almost) equivalent in expressive power to the syntax for a fully binarized neo-Davidsonian predicate logic notation that has recently been under discussion on the CONLANG-L mailing list, whose syntax is given by the single production rule

S → paaS | 0

Where p represents a binary predicate and a represents an argument variable.

Compared to that, the WSL grammar has been altered in two significant ways:

  1. p follows a instead of preceding
  2. Binary predicates are replaced with unary predicates on the surface, and the first (potentially Davidsonian) logical argument of each predicate is required to be identical and specified once separately (being distinguished by the e token rather than a following p).

Extracting the common first argument reduces the expressive freedom of this grammar compared to the paaS language, in that any situation that requires referring to multiple entities that both occupy first-argument positions requires multiple sequential clauses. In exchange, we get the benefit of much reduced repetition.

In a more standard predicate logic notation, like paaSas would be variables or constants with unique referents, and we would require predicates both to indicate the argument place occupied by each variable and to restrict the referents of variables in so doing. In WSL, however, we define ps to represent two-place predicates which specify named argument positions, and as to specify one-place
predicates which each take a unique argument that is not present in the surface syntax. The e can then be a unique symbol (in fact represented by the phonologically-variable clitic <=u>), since all the information about the identity of the shared first semantic argument will be provided by the a that precedes it.

The next level of complexity looks like this:

S → N e A
A → N r A | 0
N → n N | 0

Here, I have replaced the a for argument places with n (for 'noun'), due to the insertion of an N(oun) phrase layer between the individual one-place-predicate words and the A(argument) phrase level, which contains a two-place predicate. The representation of two-place predicates has also changed, replacing p with r (for 'role'). This now allows us to use multiple logical predicates (represented by
multiple surface nouns) to describe the same argument (i.e., to take the same implicit semantic variable as their argument). This allows us to translate English phrases which, for example, use adjectives to describe nouns, or adverbs to describe verbs, except that WSL syntax does not distinguish the adjectives from the noun or the adverbs from the verb. Each shared variable can, however, still occupy only a single role. To relax this restriction, we make the following additional modification:

S → N e A
A → N R A | 0
N → n N | 0
R → r R | 0

Now, we allow multiple two-place predicates to take the same semantic arguments (where each level of A-phrase embedding introduces a new semantic variable), thus allowing for the expression of reflexives (among other things), as well as allowing multiple one-place predicates to take the same arguments. Given an appropriate range of lexical semantics selections for the predicates, this allows for the expression of arbitrarily complex semantic graphs (within the space allowed by the restriction that all two-place predicates share one common first argument) among arbitrarily-precisely described

The next significant addition to the syntax is to allow the use of explicit quantifiers ("all", "most", "some", etc.). That is done as follows:

S  → QP e A
A  → QP R A | 0
QP → Q N
N  → n N | 0
R  → r R | 0

It may at first seem like we could have avoided adding an extra rule, and just modified the production rule for S to S → Q N e A, and for A to A → Q N R A | 0; it will become important later, however, that and N are bound together in a Quantifier Phrase, and that Quantifier Phrases are in fact internal to Arguments

We now have enough of WSL syntax built up to describe some basic, but interesting, declarative sentences. With that foundation laid, we will introduce the interpretation rules that give meaning to the syntax.
(In actuality, all grammatical WSL sentences require an additional part of speech known as a Projector, which distinguishes, for example, declarative sentences from question. The semantics of projectors are, however, rather complex; thus, we will ignore them for now and work strictly with declarative sentences with no explicit projector). 

In the notation for interpretation, [|x|] is used to indicate the denotation the syntax x; in cases where some particular type of syntactic node may have multiple options for the kind and arrangement of daughter nodes that it contains, [|x : y...|] is used to indicate the denotation of some syntactic node x consisting of daughters y....
G[s] is used to indicate looking up the meaning of the symbol in the lexicon, and G[x:s] is used to disambiguate homonymous symbols belonging to different syntactic categories given by x, in the case where their denotations are different.

The denotation of any null syntactic node will be assumed to be empty; there is, however, still the possibility of phonologically-null lexical items, which have contentful denotations, and occupy non-null syntactic nodes. This is the primary use-case for the G[x:s] notation- to distinguish the different kinds of null lexemes.

The interpretation rules for this subset of WSL are as follows:

[|S|]  = [|QP|]([|A|])
[|A|]  = λx.[|QP|](λy. [|R|](x)(y) & [|A|](x))
[|QP|] = λz.G[Q]([|N|])(z)
[|N|]  = λy.G[n](y) & [|N|](y)
[|R|]  = λx.(λy. G[r](x,y) & [|R|](x)(y))

And the forms of the denotations for lexical items (or phrases) in the classes of Q, n, and r are:

G[Q:_] = λz.λw. _y. w(y)  z(y); i.e., some quantifier (represented by the placeholder _) binds a variable y and provides that variable to both of its arguments, where one argument is the denotation of a noun phrase whose truth value implies the truth of the second argument.

G[n:] is always some monovalent predicate.
G[r:] is always some bivalent predicate.

Note that these definitions contain no free variables. All variables are bound by either a quantifier or a lambda expression. This allows us to freely rename variables for clarity and to ensure that we can perform valid beta reductions of lambda expressions in any order.

Temporarily ignoring the syntax and semantics of projectors, we can now fully interpret many simple sentences like

"Ka vesu jes i ajs tey mot."

The parse of this sentence (again dispensing with the projector) is

(S (QP (Q "ves") (N (n 0))) (e "=u")
   (A (QP (Q 0) (N (n "jes"))
      (R (r "i"))
      (A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
         (R (r "mot"))))))

And the first few steps of the semantic derivation are:

[|(S (QP (Q "ves") (N (n 0))) (e "=u")
     (A (QP (Q 0) (N (n "jes"))
        (R (r "i"))
        (A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
        (R (r "mot"))))))
= [|(QP (Q "ves") (N (n 0)))|]([|
    (A (QP (Q 0) (N (n "jes"))
       (R (r "i"))
       (A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
       (R (r "mot")))))
= G["ves"]([|n:0|])([|
    (A (QP (Q 0) (N (n "jes")))
       (R (r "i"))
       (A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
       (R (r "mot"))))
= ∀z. [|n:0|](z) 
      [|(A (QP (Q 0) (N (n "jes")))
            (R (r "i"))
            (A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
               (R (r "mot"))))
= ∀z. U(z) 
      [|(A (QP (Q 0) (N (n "jes")))
           (R (r "i"))
           (A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
              (R (r "mot"))))

Note here that the denotation of the null noun is the universal predicate U, which always true for any argument. This, we can simplify by removing "U(z) ->" from the formula with no change in meaning.
Skipping a few steps for brevity, we get to

= ∀z. G[Q:0]([|(N (n "jes"))|])(λy. [|(R (r "i"))|](z)(y)
  & [|(A (QP (Q 0) (N (N (n "ajs")) (n "tey"))) (R (r "mot")))|](z))

= ∀z. ∃b.[|(N (n "jes"))|](b) 
  (λy. [|(R (r "i"))|](z)(y)
     & [|(A (QP (Q 0) (N (N (n "ajs")) (n "tey")))
         (R (r "mot")))|](z))(b)

Note here that the null quantifier has the semantics of an existential. After a long string of additional reductions, we end up with

= ∀z. ∃b. G[n:"jes"](b) 
    ag(z, b) & ∃d. (λy. G[n:"tey"](y) & G[n:"ajs"](y))(d) 

= ∀z. ∃b. 1sg(b) 
    ag(z, b) & ∃d. place(d) & this(d)  near(z, d)

"For all entities z, there exists some entity b such that b is me and[1] that b is the agent of z and that there exists some entity d such that d is a place and d is 'this-ish' (i.e., nearby and capable of being pointed at), which implies that z occurs near d."

Or, in normal English: "I do everything around here."

Note, however, that this is an extremely hyperbolic version of "I do everything around here." Literally, it means that there exists nothing which is not both close by some other place that's near me and of my doing.
Expressing a more typical meaning for the English sentence like "for all z such that z is near here, I do z" requires additional syntactic machinery to insert the necessary qualifications into the body of the quantifier phrase, which I may or may not ever get around to actually formalizing.
Also note that this model contains no rules for quantifier raising; thus, other possible readings of the English version, like "there is some specific place ('here') near which everything is of my doing", also cannot be expressed. It turns out that WSL does not have quantifier raising at all (not merely excluded from the subset described so far)- the scope of quantifiers in the semantics is exactly given by the order of quantifier phrases in the surface syntax. Expressing different quantifier scopes thus requires some mechanism for allowing the clause specifier (the first QP which is not part of an argument phrase, marked by the e symbol which is realized on the surface as the clitic <=u>) to move around to non-initial positions.

Formalizing the semantics for a larger subset of WSL that allows arbitrary specifier placement to control quantifier scope is rather complicated (as it require splitting the interpretation of argument phrases in half), so I shall leave that for a later installment.

[1] Alternately: "there exists some entity b such that b being me implies...."