Monday, April 11, 2022

Some Thoughts on Zimvisz

Zimvisz is a constructed language by Sheldon Ebbeler. It was presented at the 7th Language Creation Conference, but the video of that presentation is... not great. Fortunately, I was able to get in touch with Sheldon and acquire a copy of the presentation slides with speaker's notes, which contain a decent amount of information about the language.

The central conceit of Zimvisz is that all utterances are encoded in integers--with the grammatical constituents of an utterance being encoded as factors of the complete utterance!

The idea of encoding words as numbers is not entirely new; Gottfried Leibniz (one of the inventors of calculus and Isaac Newton's rival) even considered attempting to construct a philosophical language that would allow statements and concepts to be manipulated algebraically. And, of course, every word on this screen is encoded as a binary number in computer memory! And Jörg Rhiemeier has coined the term "arithmographic language" to refer to a theoretical language in which semantic primes are encoded as prime numbers, and semantic composition is represented by multiplication, such that complex concepts get composite numbers. Naively implemented, this would seem to be an inefficient use of the integers, since only square-free numbers would be assigned unique meanings (because why would you ever need to repeat a semantic prime in a compound?). Zimvisz extends this idea to complete sentences, and in so doing uses one problem to solve another!

By multiplying nouns and verbs (on rather, arguments and predicates, as Zimvisz does not distinguish nouns and verbs lexically) to produce single numbers representing entire clauses, Zimvisz runs into the problem of how to encode differing semantic relations; there is no syntax--multiplication doesn't preserve ordering, after all--so "put the subject first and the object last", for example, doesn't mean anything. And it's worse than that--there's no morphology either, as any number that might be assigned to an affix or a function word also gets mixed in with all the rest with no way to associate it with a particular other factor of the final clausal number. Sheldon solved this problem by giving a function to the non-square-free numbers--the exponents of a given factor serve to identify its syntactic function! It is as if a "normal" linear language used various degrees of repetition, and only repetition, to mark syntactic relations--and with no contiguity required for the repeated elements of any constituent!

While this is an ingenious mechanism, however, I think an avenue for optimization has been missed; while Zimvisz does not lexically distinguish nouns, verbs, adjectives, and adverbs, it does retain the four distinct syntactic positions of nominal head, verbal phrase head, nominal modifier, and verb phrase modifier. If we look at a so-called non-configurational language like Warlpiri, for example, we can see that syntactic headedness, and the head-modifier distinction, is not actually semantically necessary. A Zimvisz-like language could thus cut the number of distinct exponents needed for encoding syntactic relations nearly in half, reducing the total repetition of various constituent factors and considerably reducing the integer magnitude of many clauses.

Now, while this is a geniusly executed idea, I think it is worth asking the question "how practical is it, really?" Obviously Zimvisz could not be fluently used by humans! And indeed, it is supposed to be used by 4-dimensional aliens called Zimfidz, who can be assumed to have different mental abilities than humans. A key point, however, is that extracting the semantic content of a Zimvisz utterance requires factoring numbers that can have a very large number of digits! (A fact which is exacerbated by the logically-superfluous proliferation of syntactic categories as noted above.) That is a famously hard problem--so much so that it forms the basis of the RSA crypto system. Quantum computers running Shor's algorithm can theoretically factor large numbers "efficiently"--but "efficiently" in this case just means "in quadratic time rather than exponential". Thus, a sentence with twice as many digits--corresponding very roughly to twice as much semantic content--will take a little over four times longer to comprehend, even if the Zimfidz have quantum-logic brains. Incidentally, parsing linear speech is, in the general case, a problem with cubic time complexity--but human languages tend to use not-the-most-complex-possible grammars, and we focus on only the most probable potential structures, throwing out unlikely hypotheses very aggressively as we hear more and more of a sentence, such that the vast majority of sentences produced by humans can be comprehended in linear time--i.e., it only takes longer to understand when it also takes longer to say, despite the theoretical cubic bound. (The rare exceptions to this tendency are garden-path sentences.) So, is there some way that Zimfidz could structure their utterances to make factoring especially easy along high probability paths? Eh, maybe? But, I kinda doubt it. Not every sentence is going to have a conveniently small prime factor which can be rapidly extracted and whose semantics can be used to predict other probable factors, the way that the first word of any human sentence is immediately comprehensible and can be used to predict possibilities for what comes next. And without that kind of predictive shortcutting, Zimvisz seems more like a particularly clever code than a real functioning language, suitable for conversation. Nevertheless, if it showed up in a sci-fi story, I'd give it the benefit of the doubt!

As a side note, one might reasonably wonder if the difficulty of factorization is a problem for any arithmographic language--but no, it is not necessarily so. Factorization is only necessary in this case because Zimvisz uses multiplication for productive syntactic purposes. If multiplication of primes representing lexemes is only used for compounding or morphological derivation, to produce new lexemes, the meanings of compound words can simply be memorized like any other word, and real-time factorization is unnecessary.

Next, let us consider the writing system, which consists of linked knots. There are 29 basic knot "letters", corresponding to the first 29 primes, which can be linked together with "operator" knots to form any arbitrary prime, and then further linked to form the composite numbers of a Zimvisz clause. This is a fully non-linear writing system, corresponding to the non-linearity of the "spoken" language--but it has a major advantage over the "spoken" language in that the factoring is already done for you, as composite-number sentences are represented not as opaque quantities, but as actual agglomerations of their individual factors, which can be individually viewed and counted. This is where the 4D nature of the Zimfidz becomes really relevant--while Zimvisz writing looks a mess to our eyes, the whole agglomeration is immediately visible with no occlusions to 4D eyes with 3D retinas. Furthermore, they are able to write by forming rings into knots without ever having to cut or join the strands, thanks to the existence of an extra spatial dimension with which to move strands around each other. The Zimvisz writing system sadly does not use the Conway enumeration; I can't call that a problem, but having seen one knot-and-number-based written language, I do think it would be neat to see one that did make use of Conway notation in some way. The only hesitancy I have with the Zimvisz writing system is that it does not impose any particular standard representations of the basic knots, or a standard viewing orientation--all topologically-equivalent links are semantically equivalent. That makes a certain amount of sense, but it requires that readers be potentially capable of solving the knot recognition problem, whose lower complexity bound is currently unknown. But perhaps that is less of an issue for creatures with 3D retinas; again, if it showed up in a sci-fi novel, I would give it the benefit of the doubt.

No comments:

Post a Comment