Tag: Docuscope

Love’s Labour’s Lost: The History

This passage from the Open Source Shakespeare’s Love’s Labour’s Lost shows language patterns that push the play into the area where the Histories cluster, something visible in the scatterplot discussed below. Returning to the taxonomy of Docuscope, this passage has a lot of Description strings combined with a relative lack of Interaction and First Person strings, both of which can be seen in the Docuscope screen shot below. We are looking at something slightly more complicated in this visualization of the text, however, because I have “turned on” the First Person and Interaction strings in the Docuscope Single Text Viewer. I did this because I want to show what cannot be shown: a relative lack of blue (Interaction) and red (First Person) strings combined with a relative abundance of yellow (Description) ones. To really “see” this in the wild, you would have to consult a completely color tagged text of the complete Folio Works and — while reading — keep track of the relative differences in quantities of blue, red and yellow in the different containers (the plays themselves). Only an Argus-eyed text tagger and a statistical analysis can do this. The results are heuristic in that they lead us toward certain areas of the text for continued interpretation. In this case, I have used the color coding facility in Docuscope to scan the entire play (once I knew the categories I was interested in) in order to find a passage like the one above: one that has lots of yellow and very little blue or red.

Inspection of such candidate “History” passages reveals a number of pedantic exchanges like this one between Sir Nathaniel, Holofernes and Dull. This scene is a hilarious sendup of of rhetorical display and vacuous learning, and it burlesques the famous Renaissance idea of verbal variety or copia that was recommended by Erasmus. It makes sense that this kind of passage would withdraw the play from the type of comic verbal interaction analyzed in the previous post. Because these characters are not speaking with one another, but rather are addressing an invisible audience of discerning rhetorical literates, there is not much interaction in the form of second person pronouns or corresponding first person singular pronouns — the very strings one would tend to find in Comic exchanges about acts or actions taken by characters themselves.

We expect this kind of thing from the pedants, but the analysis reveals a continuity of this History-like pattern among the French nobles who have vowed to live a life of Platonic study, characters like Biron who can never resist plumping their own rhetorical plumage. Don Armado, another parodic figure with an almost Quixotic appreciation for his own courtly expression, is also linguistically self-indulgent, and his passages would look similar to the one I have excerpted here and shown color coded below (although with Don Armado, there are marginally more interactions with his Page). The point of this analysis is to show that there are reasons why Docuscope would place Love’s Labour’s Lost with the Histories, and these reasons make sense to us once we begin to think about how the play is put together. This is a world of narcissists, something the Princess and her ladies point out when they defer the proposed courtship that is offered at the end of the play. That narcissism shows itself as a tendency to monologue, which cuts out the interaction that is characteristic of comedy and highlights instead a description-rich kind of oration that pushes these plays into the realm of History.

Would it be fair then to call Love’s Labour’s Lost a History? It depends. I would be comfortable saying that on the level of plot it has the elements of a Comedy, but on the level of its language, it is a History.

What, then, do we make of the historical decision made by Heminges and Condell to call this play a Comedy? Unsupervised statistical analysis has shown us (1) a pattern of groupings among the plays that roughly approximates at least two of H&Cs generic groupings of 1623 but also (2) exceptions to those classifications that make a certain amount of critical sense when we look at the construction of those plays. I would argue that we need an ontology here to sort out what elements of the analysis are fundamental as opposed to derivative. We could have an ontology of levels, for example, which says that “on the linguistic level, the play belongs with one group,” but “on the level of plot, the play belongs with another group.”

But eventually we would have to decide how the levels go together. That is also part of the point of this kind of work, since the overlap-with-divergence of linguistic and historical groupings of the plays introduce the possibility that there are levels of coherence here whose interaction needs to be explained. The language of levels needs a compliment in a theory of objects: what are the things that are being compared here? Aren’t the tagged texts themselves a kind of hypothetical or abstracted version of the text itself? And what is the relationship between this hypothetical object and those that are arrayed into a generic group by, say, the historical editors of the First Folio? I will try, in future posts, to show why these are not trivial metaphysical questions.

By way of preview, however, I think the most fundamental “level” here is the one on which individuals or groups make decisions and act. So I would say that Heminges and Condell’s decision about how to order the plays in the First Folio is the most real thing in the analysis, while the statistical objects (tagged texts, Principal Components, regions of a scatterplot) are derivative. How else could we be “surprised” to find LLL clustering with the Histories, unless we were already enticed by the idea (as I was) that the initial clusterings themselves coincided with the classes stipulated by Shakespeare’s editors? More interesting: what is the abstract recipe of family resemblances or species traits that human beings like Heminges and Condell are carrying around in their heads? Their decision to sort the plays a certain way is real. It is a historical fact. But the “sensibility” or “weightings” that led them to take this empirical action must itself be hypothesized or modeled. We might be able to reconstruct this model, but even H&C may not have had direct access to it. This detour might change the way we think about the status of our statistical model, since that model may be only an approximation of something far more comprehensive — capacity for literary judgment in historical actors — whose dynamic, differential powers of comparison are suggestively approximated things like “principal components.”

The latitude in linguistic practice that makes Loves Labour’s Lost look like a History is evidently something that Heminges and Condell did not notice, and I’m not sure why they should have. But once we have noticed it, this latitude in terms of linguistic practice may makes sense to us. Why couldn’t there be a filiation of Love’s Labour’s Lost with Histories on the level of stance and language that does not “show up” on the level of plot? Surely this filiation is real too. The question is, where and on what level?

July 20, 2009
King or no [King]

I wanted to say a little about a problem we encountered early on when we began counting things in the plays, a problem that gets us into the question of what might be a trivial versus a non-trivial indicator of genre on the microlinguistic level. Several years ago Hope and I began a series of experiments with the plays contained in Shakespeare’s First Folio, feeding them into Docuscope — a text-tagger created at Carnegie Mellon — to see if we could find any ordered groupings in them. The results of that early work were published in the Journal for Early Modern Literary Studies in an article called “The Very Large Textual Object: A Prosthetic Reading of Shakespeare.” I will say more about Docuscope in subsequent posts, but suffice it to say here that it differs from other text-taggers in that it embodies a phenomenological approach to texts. (For the creator’s explanation of how it works, see an early online precis here.) Docuscope, that is, codes words and “strings” of words based on the ways in which they render a world experientially for a reader or listener. The theory behind how texts do this, and thus the rational for Docuscope’s coding strategy, is derived from Michael Halliday’s systemic-function grammar. But what is particularly interesting about Docuscope is the human element involved in its creation. The main architect of the system, a rhetorician named David Kaufer, spent 8 years hand-tagging several million pieces of English according to their rhetorical function, and then expanded out this initial tagging spread with wild-card operators so that Docuscope now classes over 200 million strings of English (1 to 10 words in length) into over 100 distinct categories of use or function.

Obviously there is a lot to say about the program itself, which represents a “built rhetoric” of sorts, one that has emerged through the interplay of one architect, his reading, and the texts he was interested in classifying. In any event, when Hope and I fed the plays into Docuscope, we had to make some initial decisions, and the first was whether to strip anything out of the plays we had obtained from the Moby online version. (We were already thinking about the shortcomings of this conflated, edited corpus as opposed to the text of the plays as it exists in various states in the First Folio, but we had to make do since we were not yet ready to modernize the spelling of F and decide among its internal variants.) So with the Moby text, we had things like Titles, Act and Scene Numbers, and Speech Prefixes (Othello, King Henry, Miranda, etc.). The speech prefixes created the greatest difficulty, because in the history plays the word “King” is, as you can imagine, used an awful lot — it appears in the speech prefixes of characters over and over. And because Docuscope tagged “King” as one of its visible tokens (assigning it to the “bucket” named “Common Authority”), this particular category was off the charts in terms of frequency when it came time to do unsupervised factor analysis on the frequency counts obtained from the plays. (I’ll post more on factor analysis in the future as well.)

Here’s the issue. In the end, we decided that it was “cheating” to let Docuscope count “King” in the speech prefixes, since this was a dead giveaway for History plays, and we wanted something more structural — something more buried in the coordination of word choices and exclusions — to serve as the basis of our linguistic “recipes” for Shakespeare’s genres. As the article shows, we were able to find such a recipe without relying on “King” in the speech prefixes. Indeed, subsequent research has shown that plural first person pronouns combined with a the profusion of concrete, sense objects are really the giveaway for Shakespeare’s histories. (They are also “missing” certain things that other genres have: this combination makes histories the most “visible” genre, statistically speaking” that he wrote.) But is it really fair to decide that certain types of tokens — King in the speech prefix, for example — are superficial marks of history as a genre, and so not worth using in an analysis? Isn’t there a certain interpretive bias here, one that I have and in a sense want to argue for, against the apparatus of the play in favor of something like a deeper set of patterns or stances? To argue for such an exclusion, I would begin by pointing out that they are an artifact of print and are not “said” (even if they are used) in performance, but there is still something to think about here.

A Google search algorithm looks for the “shortest vector” or easiest “tell” that identifies a text as this kind or that — even if it is one of a kind. But those of us who are interested in genre must by definition not be interested in the shortest vector or the easiest tell. We are looking for the longer path. The book historian in me, however, says that apparatus is important, and that “accidental” features never really are. So this is something I want to think more about.

July 2, 2009

Tag: Docuscope

Love’s Labour’s Lost: The History

King or no [King]