Saturday, February 4, 2023

Some Thoughts On Glossing

A Note: This article has been edited from its original version to take into account feedback from David himself.

David J. Peterson has on several occasions been outspoken against Interlinear Glossing, particularly in the context of developing and documenting conlangs. I find that very strange, as I find glossing absolutely indispensable in language documentation, so after our last social media exchange on this topic, I decided to do some Deep Thinking.

And this is the part where I feel a lot like Brandon Sanderson expressing his dissatisfaction with Audible. I don't dislike DJP! I still have signed copies of his books on conlanging for kids and adults, and I am quite happy to provide those Amazon Affiliate links so other people can give him money for them! But on this one point at least, I think he is wrong, and I want to understand why, and what glossing is really good for.

To that end, I asked a bunch of people on social media about their opinions on glossing. You can see the raw responses on Facebook (group 1, group 2), Reddit, Quora, and Twitter. It turns out to be very difficult to get the question across effectively given the differing restrictions on message length and type on different platforms, but I still got some pretty useful data.

There are, I think, two major factors at play:

First, David does not believe in morphemes--or at least, does not find the concept of morphemes useful language design. And that's fine! David is far from the only person to point out that morphemes aren't necessarily a great concept even in formal linguistics, or to propose alternative models of morphology.

Second, David works for an unusual audience. As essentially the world's only full-time professional conlanger for movies and TV, the primary audiences for his documentary output are:
  1. Actors, who have to be able to pronounce translated lines, but not necessarily understand what they mean.
  2. Set design artists, who need access to font files and translated text, and need to know what it looks like and how much space it will take up... but again, not what it actually means.
Now, I thought that these would be situation where, admittedly, full on Leipzig-style interlinear glossing won't be particularly useful, and may in fact be an inconvenient distraction. I am, in fact, fully ready to admit that there are many situations in which interlinear glossing is not useful. However, while (as I would expect) they are not fully detailed Leipzig-style morphological glosses, David does provide phonetic and word-level interlinear glosses for actor lines, as you can see published on his website, and has said that this is "one of the only places I think it's useful." (Incidentally, William Annis also provides actors with interlinears so they don't stress the wrong words--but again, not with full morphological analyses. I don't know how, e.g., Marc Okrand, Paul Frommer, or Christine Shreyer work, but I would not be at all surprised to find that it's very similar.) 

But, I suspect that David's unusual success in this particular context has led him to overgeneralize. Now, to be clear, I don't know that David has any problem with glossing in an academic linguistic context; to quote "When you're glossing to analyze, that's analysis. It has its place, and its place is in analysis, not creation." Thus, some of this might seem like attacking a strawman--but as far as I am concerned, conlang documentation is language documentation, and what's good for natlangs is good for conlangs, and vice-versa. In fact, I don't even particularly disagree with the statement that glossing is for analysis--but I do strongly disagree with the idea that analysis and creation must be considered distinct, and with the implication that analysis should be excluded from presentation, which is where conlang creation and natural language documentation collide.

In this reddit post (a response to an earlier revision of this article), David breaks down the problem as follows:
The problem I see is twofold:
  1. Biased morphological analyses (both betraying the framework being used, and how the language itself is being used).
  2. Taking something readable, like language data, and exploding it, so it's a mess.
These are not bad points. However, they should not be applied overly broadly. Thus, when David goes on to say that "In short, morphological glossing is for analysis, not for presentation—or for comprehension.", I must vehemently disagree--that is throwing the baby out with the bathwater. And while this looks initially like a very different breakdown of the issue than what I came up with, they actually line up pretty well--so let's take a look and my two points and David's to points together.

Regarding morphological analysis: It is true that any particular gloss at a level deeper than word-for-word will entail some kind of analysis, and an associated theoretical position. But does that mean you have to believe in morpheme theory to use glossing? No! Martin Haspelmath doesn't believe in morphemes (see links above), but he is (along with Bernard Comrie and Balthasar Bickel) nevertheless one of the editors for the latest edition of the Leipzig rules! Which rules include several options for notating non-concatenative and non-trivially-segmentable morphology. And while not all such proposals are included in the current edition of the standard rules, there have nevertheless been even more proposals for glossing symbols that explicitly avoid making theoretical claims (e.g., "+" as an alternative to "-" vs. "=", to indicate that some particular subforms are joined without making any claim about whether the joint is an instance of affixation cliticization, or compounding). And nobody is forcing you to strictly stick to the limits of the Leipzig conventions. In fact, one respondent to my social media surveys was explicit about the idea that glosses "should be geared towards the readership, and the idea that glosses should always follow a standard format is wrong".

So, just because you don't accept a particular theoretical position is no reason to reject interlinear glossing altogether. You can design your style of glossing to fit whatever theoretical or non-theoretical considerations you prefer, to best communicate with your target audience.

But suppose you do expose a theoretical bias in your glossing: is that actually a bad thing? Certainly, it can be bad, particularly if you aren't doing it on purpose, and I will address that in more detail below, but it does not need to be. In many cases, the analysis can be the whole point, and betraying the framework being used can be a positive addition to the presentation. Several of my own conlangs, after all (notably, the three "Languages of Spite") exist to demonstrate that a language with a particular theoretical grammatical structure is possible. Omitting the intended analysis from the presentation in such cases would make it nearly worthless. So what is David's solution? "On the other hand, if there's sufficient language data and accompanied by faithful, accurate translations, that's all you really need." Technically, this is true--it's what a working linguist would rely on for documenting a natural language in the first place, after all, and the same can certainly be done with conlangs--but do you really want to ask that of your audience? I don't; no matter how much raw language data is provided, among the people who might read an appreciate a conlang grammar, practically none of them will do the work to produce the analysis themselves. So, again, I think David's success in a particular context has led to overgeneralization--if you are producing a conlang with the intent that it be appreciated through the aesthetics of text produced in that language (where "text" here is meant in the technical sense, encompassing spoken dialog as well as written text), then a gloss may be unnecessary to your purposes. But if you intend the language to be appreciated for itself, as an exercise in constructing a system, then the analysis is the thing, and it must be part of the presentation, just as much as it must be included in any academic paper analyzing the structure of a natural language.

Furthermore, creation and analysis need not be separate stages. If you work that way with your own conlangs, cool, I won't tell you that you're wrong--but it's not the only way. In my own experience, analysis and creation feed back into each other; analyzing what I have already created generates new ideas other features to create, which may interact in unexpected ways and lead to changes in direction or re-analysis that makes the original obsolete. When working this way, interlinears are a way to communicate with myself.

Now, David has also said that
In fact, most of the time when I'm looking at conlangs, I completely ignore the glosses, because they're often (a) incoherent, and (b) wrong.
And... yeah, I can't argue with that. But does that mean that we shouldn't do them at all? No! It simply means that we shouldn't do them badly! (At least, not in the final presentation; if your glosses-for-yourself produced in the creative process are crappy, but they work for you, then more power to you!) If your glosses are incoherent, then get better at glossing! If your glosses are wrong, I am glad that you included them, so that I could make that determination myself. It opens the door to a potentially productive conversation. It is not at all unusual for me to come across a datum in a natlang grammar or analytical paper and think "I'm pretty sure that analysis is wrong", so it should not be at all surprising that the same would happen with conlang grammars. But the gloss provides a means of understanding the author's analysis as well as formulating my own.

Now, let's move on to considering my and David's second points: what's appropriate for the audience, and when does a gloss enhance or detract from the presentation? Well, as I've already explained, when The Analysis Is The Thing, you should include a gloss! But beyond that, I myself thought that appropriate usages were more restricted than it turns out they actually are before I started this little bit of research; in particular, I thought "well, if you are writing a document intended to teach the language, then interlinear glossing is probably not terribly useful." After all, in my formal school studies of French and Russian, not once did I ever encounter a textbook that contained interlinear glosses--they just teach you the vocabulary and morphology ahead of time, or else expect you to memorize complete constructions and infer the rules later, and the main point of a Leipzig-style interlinear gloss is to make the text accessible to someone who doesn't have the necessary background in the language yet (or ever). And yet, if we look up Interlinear Glossing in Wikipedia, the very first phrase of the article is "In linguistics and pedagogy" (emphasis added). And furthermore, consider this excerpt from Nishnaabemwin Reference Grammar by J. Randolph Valentine:
Linguistic researchers may be disappointed to see that morpheme-level segmentations of examples are rarely provided. At a conference held in Thunder Bay, Ontario, in 1996, a steering comittee of Nishnaabemwin speakers explicitly requested that such details not be included, as it was felt that they interfered with the flow of the presentation, and contributed to what is sometimes called the "intellectual mining" of aboriginal languages and cultures. To accommodate these concerns, I [provided] word-level annotations[...]. [I]t seems to me that there are many good reasons for working the annotations into the text. For one, many of my readers will be semi-speakers of Nishnaabemwin, who will benefit from the help annotations provide; secondly, Nishnaabemwin varies dialectically, and the word-level glosses will allow fluent readers to more readily accommodate dialect differences; lastly, of course, the annotations make the language more accessible to those lacking prior exposure.
I have complex feelings about this situation. On the one hand, if this is an academic reference grammar, then yeah, I would be surprised and dismayed at the lack of detailed glosses. But, on the other hand, if it is used largely as study reference for learners of the language, then I agree with steering committee that detailed glosses would indeed interfere with the presentation! But on the gripping hand... readers "will benefit from the help annotations provide". And that triggered a realization that I am shocked I did not have earlier, given that I spent 9 years working for a university language department developing software to improve adult language acquisition! What's the number-one most effective technological assistance you can give to a new language learner? Parallel translations, subtitles, and word-by-word glosses to ensure they are exposed to comprehensible input! Anything that will reduce the friction of discovering the meaning of words or phrases the reader is unsure about, keeping them engaged with the text in a flow state. At the end of my time in academia, we were even looking into automatic morphanalysis for augmented reader applications. So while a fully detailed Leipzig-style interlinear gloss can get distractingly complex, and thus unhelpful, some level of glossing--tailored to the audience, not slavishly holding to the formal Leipzig rules in maximal detail--is clearly appropriate to pedagogical settings!

To quote David again: "The glosses I typically see make the work less accessible, and the work would be improved by their removal." I suspect that Nishnaabemwin steering committee would agree. And yet, "the annotations make the language more accessible". The issue is not, fundamentally, interlinear glossing. The issue is bad interlinear glossing! David is absolutely right that many people (conlangers and academic linguists alike) are not great at writing glosses clearly, and often have a tendency to include far more information than is necessary for the purpose of the given example. This is why you don't give a full morphological breakdown to an actor who just needs to pronounce the line with right prosody! But let's not throw out the good with the bad; don't just stop glossing, learn to consider your audience, and learn to gloss well.

What about not-teaching-a-language settings? Well, that's where glossing really shines. If you are writing an article, or a descriptive grammar, whether documenting or analyzing a natural language or a conlang, pretty much everyone surveyed agrees that interlinear glosses are "crucial" or "absolutely indispensable"--and as I have argued above, there is at least a certain genre of conlang presentation in which The Analysis Is The Thing. But, that genre aside, on its own, this is just an opinion, and no more forceful than "glossing is useless". So, why are interlinears indispensable? Fundamentally, it's because your audience does not know the language (or at least, cannot be assumed to know the language--if you are writing a paper on English in English, you can probably skip the glosses most of the time; same goes for, e.g., writing a paper on Spanish in Spanish, etc., where obviously your audience will know the language under discussion) and will not be learning the language. As one respondent put it, the gloss is
the part that lets me know what the hell I'm even looking at. A string of Latin script doesn't even tell me what your language sounds like, let alone how it works. If you don't gloss, you could literally just as well post gibberish and no one could tell the difference.

Which, given the mystery surrounding the Voynich manuscript, does indeed seem to be true! And as I already stated above, even if you provide "sufficient" language material for the reader to come up with their own analyses... they just won't. Ain't nobody got time for that! To quote David again: "I expect if you give a grammar and unglossed data of a language you've created to someone else to gloss, you're going to get back glosses that surprise you." No doubt! And I would love to run that experiment! But it's not an argument against making your own analysis more accessible.

The more different the structures of the analyzed and analysis languages are, the more important interlinear glossing is to help elucidate the structure of the original and how it actually corresponds to the final translation. It gives you the details that the author knows and the reader needs.

If you are trying to support a particular theoretical analysis, then glosses are a succinct way to illustrate that analysis. As noted about above, you don't have to subscribe to any particular theory to use interlinear glossing, but if you do, an appropriate choice of glossing conventions will let you show it! But whether or not you are committing to an analysis, interlinears are also helpful in allowing the reader to develop their own analyses. To quote another respondent:
An example sentence without a gloss conveys that an example or illustration exists
An example sentence with a gloss actually illustrates how things work and shows the reader information that might actually be part of it working.
(Emphasis added.) In other words, a gloss, which conveys author knowledge about the structure of the analyzed language, helps to prevent the reader from going astray and engaging in "analyzing the translation". It makes the source more accessible without having to analyze huge quantities of text on one's own, such that the reader can usefully formulate and propose their own hypotheses to explain the data, even if the original glosser is theoretically biased. This is particularly important when the point of an example is to illustrate a grammatical feature that may not be fully preserved in the language of analysis. For example, to quote another respondent:
If I were writing in Eng. about resumptive pronouns in X, and I gave a sentence with a translation "I saw the neighbors we sold the car to" without adding "I see-PST neighbors Rel we sell-PST them-DAT car", I'm not really presenting direct info about pronouns in X.
Information is almost always lost in translation, and appropriate glossing allows you to preserve it, and focus on what you want the example to show!

So, should you always gloss everything? No, there are definitely situations where it is not useful for the target audience or purpose of the text. Should you always gloss in maximal detail and with perfect conformance to the formal Leipzig rules? Also no--you should tailor your glosses to the intended audience and what you want to convey with a particular example. But should you never gloss? Also no! If you intend your documentation to actually be useful to other linguists, or appreciable by other conlangers, you should know how and when to gloss, and do so liberally!




1 comment:

  1. This is a really good write-up. I never knew there to be such controversy about glossing!

    ReplyDelete