Tag: comedy

Early and Late Plato II: The Apology and The Timaeus

In the previous post we were examining three dimensional clusterings of the Platonic dialogues as rated on scaled Principal Components 1, 2 and 5, a technique that allowed us to see the early Platonic dialogues (as defined by Vlastos) standing apart from the middle and later ones. Vlastos’ claim, we remember, was that these early dialogues represent the historical Socrates, whose technique of argumentation was elenctic. Socrates used this technique to draw out the implications of an opponent’s views until those views collapsed under their own contradictions.

The translator of these dialogues, Jowett, would have had to preserve at least some of the linguistic “footings” required for such a dialogical structure in the early dialogues, and it was my contention in the previous post that Docuscope would detect these footings because they are exactly what a translator must preserve. Perhaps a more provocative claim, which I would like to advance now, is that the irony which attends this elenctic method — while not itself visible to Docuscope — might also require certain reliable linguistic pivots. In keeping with our analogy of the body of a dancer, certain upper body moves like the ironic twist in which Socrates seems to be asking a question for the sake of clarification but is actually pushing his interlocutor into deeper confusion, require a lower body stance that can support the weight of the move. If we could define this lower body stance, we would not be defining Socratic irony itself, but rather its linguistic correlates. (At some point, the analogy will break down, since language is not a “weight bearing system”: but it does support gestures and turns, so let’s see how far we can go with it.)

What is it exactly that is happening in these early dialogues that Docuscope and principal component Analysis are able to see from afar? Here is a scree plot which rates the power of the principal components as they are derived sequentially, from most powerful to least:

Scree Plot for Principal Components Derived from Cluster Docuscope Data on Jowett Translations of the Platonic Corpus

The first two principal components are shown here to be quite powerful: together they account for almost 54% of the variation in the entire corpus. When we rate all of the dialogues on just these first two components, we get the following bubble plot:

Graph of Platonic Dialogues Scored on Principal Components 1 and 2

I have highlighted the upper left quadrant, where almost all of the dialogues that Vlastos identified as “early” are clustering. Their presence in this quadrant means that they score low on PC1 and high on PC2. PC1 might be described as an anti-early component, because it powerfully discriminates against early dialogues. PC2, on the other hand, might be described as a pro-early component, since its highly loaded variables are more frequently used in early dialogues. We can literally see the sorting power of these two components here, but it can also be quantified by the Tukey text, which was applied to both principal components, the results being available here and here. Note that the Apology is one of the most strongly “early” dialogues by these measures, whereas the Timaeus is one of the least early. We will pay closer attention to these two items as a way of exemplifying the differences that Docuscope sees between the two types of items.

Before making the comparison, let’s look at the variables that are most powerfully loaded on these components and so are most responsible for discriminating the early/non-early difference. We do this either by consulting the loadings of our variables on the two principal components or by looking at a biplot which arrays those variables in two dimensions, exactly the two that were used to produce the bubble plot above. First the loadings scores (reported as eigenvectors) and then the loadings biplot:

Loadings of Cluster Scores for PC1 and PC2

Loadings Biplot for PC1 and PC2

The loadings biplot (lower diagram) is a two dimensional image of the loadings scores (upper diagram), showing how these variables behave with respect to one another in the entire corpus. Clusters of words that oppose each other by 180 degrees — for example, [Public_Values] and [Special_Referencing] — tend not to co-occur with one another in the same text. Here we are interested in what makes a particular text cluster in the upper left-hand quadrant, so we are looking for vectors (red arrows) that extend furthest to the left and to the top of the diagram. Vectors extending to the left are: Reasoning, Interactivity, Directing Action, Interior Mind and First Person. (These are the clusters that have significant negative loadings on the first column in the top diagram: if an item scores high on words contained in these clusters, it will be “punished” for that abundance and pushed to the left of the plot, as the red dots are above.) Note that we can also use our 180 rule to say something about items that are far left in the bubble plot as well: they must lack items contained in the clusters that are positively loaded on PC1, which are Narrating, Description, and Time Orientation.

Similarly, with PC2, we are looking for the tall vectors heading upward: Emotion, Public Values and Topical Flow. Having tokens that were counted under these clusters will push an item up in the diagram, as will lacking items from the negatively loaded clusters: Directing Readers, Elaborating, Special Referencing. Note that Topical Flow (which is often populated by third person pronoun use) is loaded positively for the second principal component, but also positively for the first, which makes it fork upward and to the right. This means that an item scoring high on Topical Flow tokens will probably lack some of the items to the far left and contain items to the far right, which may discourage that item’s appearance in our “early” quadrant unless there are differences in these other variables.

I have discussed some of these clusters in earlier posts about Shakespeare, so my main focus here will not be on elaborating the contents of the clusters. Rather, I want to use these loadings to zero in on specific words in exemplary passages from the early and later dialogues to see what is captured and then leave it to readers to say what these particular tokens are doing. Looking at our bubble plot above, the two dialogues that exemplify these opposing linguistic trends — in translation — are the Apology and the Timaeus.

Here are two passages from the Apology that exemplify “earliness” in the Platonic corpus, if we agree that the clustering above seems compelling. Note that these are screenshots from Docuscope in which the clusters that are doing the work of pushing the texts up and to the left are turned on or color coded. I have not turned on the clusters that are absent, since these will be exemplified in the Timaeus:

I think these passages are certainly illustrative of the elenctic method described by Vlastos, although it ought to be said that the high amount of dialogical interaction here — one that was a hallmark of comedy in Shakespearean drama — is sometimes implied by Socrates rather than really enacted by both speakers. That is, Socrates sometimes simulates a dialogue that is not really happening (“to him I may fairly answer”), and this procedure actually multiplies the Interaction strings (sky blue) beyond what might be the case in actual interaction. Note too that Docuscope is seeing lots of Public Values words, words that gesture toward communally sanctioned values, in this earlier style: demigods, heroes, fairly, mistaken, good for, doing right, disgrace. These values must be cited in elenctic exchange because they are the topic of conversation (people have opinions about them), but such implied communality may also coerce assent from an interlocutor for reasons that extend beyond mere shame at self-contradiction. We see, too, more emotionally charged words (in orange); the occasional Topical Flow token (their); and some Reason tokens (if he, thus, may, do not).

Now look at a passage from the Timaeus, which does the things that items in the early quadrant (on the whole) cannot do:

This is cosmogeny, not dialogue, which is why we have a number of Narrative strings (the year when, then, the night, overtaken the, as they) and Description strings (orbit, the moon, stars, sun, wanderings, motion, swiftness). Special Referencing here is picking up a lot of abstract references (dark purple) such as animals, measure, relative, the whole, nature, variety and degrees. The slightly lighter purple, Reporting strings, are complimenting the Narrative tokens: having, completion, After this, came into being, received, to the end that, created. This should not be surprising since the two vectors for these clusters were almost overlapping in the loadings biplot above.

Whereas the Apology is staging a dialogue (real or implied), the Timaeus is creating a world and pacing that act of creation (through narrative) with a set of abstract terms that can be referenced in conversation. Indeed, one of the burdens of this kind of world-making, I think, is that the abstractions must be folded in with the concrete descriptions in equal measure so that the passage is something more than a Georgic description of a natural scene or a praise poem to nature. Note too that there is absolutely no irony in this passage from the Timaeus. That is not because Docuscope has a category that allows it to discern irony in its local environs and so rule out such an effect in the Timaeus: only a human being can make such a discrimination, by virtue of being able to look beyond the simple mentioning of words to assess their use. (For Docuscope, all counted words are mentionings of words whose single use has been classed a priori in the categories assigned to them.)

And yet, even in translation, Docuscope may be identifying the linguistic footings of irony: a necessary but not sufficient condition for its use.

March 14, 2010
The Funniest Thing Shakespeare Wrote? 767 Pieces of the Plays

Press to Play: 767 Pieces of Shakespeare in Scaled PCA Space

Now for something a little different. I mentioned before that we can conduct similar analyses on pieces of the plays rather than the plays as a whole. In this experiment, I have been working with 1000 word chunks of Shakespeare plays, which allows me to use many more variables in the analysis. (This was the technique that Hope and I used in our 2007 article on Tragicomedy.) Obviously the plays weren’t written to be read, much less analyzed, in identically sized pieces: the procedure is artificial through and through. It does allow us, however, to see things that Shakespeare does consistently throughout different genres, things that happen repeatedly throughout an entire play rather than just the beginning or end. Another caveat: we partitioned the plays starting at the beginning of each text, making the first 1000 words the first “piece.” This results in a loss of some of the playtext at the end, since any remainder that is less than 1000 words is dropped. In future analyses, we will take evenly spaced 1000 word samples from beginning to end, partitioning losses in between. There are no perfect answers here when it comes to dividing the plays into working units. So this is a first installment.

The video above (press to play) is a three dimensional JMP plot of 767 pieces of Shakespeare in a dataspace of three scaled Principal Components (1, 4, and 9) which I have chosen based on their power to sort the plays using in the Tukey Test. (See Tukey results for PCs 1 and 4.) When you run the video capture, you’ll see a series of dots that are color coded based on generic differences: red is comedy, green history, blue is late plays and orange tragedies. Early in the capture, I move an offscreen slider that creates a series of chromatic “halos” or elipsoid bubbles around neighboring dots: these halos envelop dot groupings as they meet certain contiguity thresholds. You see the two major clusters I am interested in here, histories and comedies, forming in the lower left and upper right respectively. (Green on lower left, red on upper right.) Interestingly enough, the see-saw effect we saw in our analysis of entire plays is repeated here: comedies and histories are the most easily separated, because whenever Shakespeare is using strings associated with comedy, he can’t or won’t simultaneously use strings associated with history (and vice versa). Linguistic weight cannot be placed both sides of this particular generic fulcrum at once.

Now the resulting encrusted object, which I have rotated in three dimensions, is a lot less elegant than the object we would be contemplating were to do discriminant analysis of these groups. I am saving Discriminant Analysis for a later post. For all its imperfections, Principal Component Analysis is still going to give us some results or linguistic patterns we can make sense of, which is the ultimate measure of success here. I think it’s worth appreciating the spatial partitioning here in all of its messiness: the multicolored object presents both a pattern that we are familiar with — comedies and histories really do flock to opposite ends of the containing dataspace — and some jagged edges that show the imperfections of the analysis. Imperfections are good: we want to find exceptions to generic rules, not just confirmations of a pattern.

Looking at the upper right hand quadrant, we see the items that are high on both PC1 and PC4. In this analysis we are using Language Action Types or LATs, the finest grained categories that Docuscope uses (it has 101 of them). We will want to ask which specific LATs are pushing items into the different areas here, and to do so, I have produced the following loading biplot:

A loadings biplot gives information about components in spatial form, showing our different analytic categories (LAT’s such as “Common Authorities,” “DenyDisclaim,” “SelfDisclosure,” etc.) as red arrows or vectors. To read this diagram, consider the two components individually. What makes an item high on PC1? Since PC1 is rated on the horizontal axis, we scan left to right for the vectors or arrows that are at the extremes. To my eye, SelfDisclosure, FirstPer[son] and DirectAddress are the most strongly “loaded” on this component, which means that any piece that has a relatively high score on these variables will be favored by this component and thus pushed to he right had side of a scatterplot (see below). Conversely, any item that is relatively low in the words that fall under categories such as Motions, SenseProperty, Sense Object, and Inclusive will be pushed to the left. Notice that the two variables SelfDisclosure and SenseObject are almost directly opposed: the loadings biplot is telling us here that, statistically at least, the use of this one type of word (or string of words) seems to preclude the use of its opposite. This would be true of all the longer vector arrows in the diagram that extend from opposite sides of the origin.

We can then do the same thing with the vertical axis, which represents PC4. Here we see that LangRef [Language Reference], DenyDisclaim and Uncertainty strings are used in opposition to those classed under the LAT Common Authority. If an item scores high on PC4 (which most comedies do), it will be high in LangRef, Uncertainty and DenyDisclaim strings while simultaneously lacking Common Authority strings. So what about the vectors that bisect the axes, for example, DenyDisclaim, which appears to load positively on both PC1 and PC2? This LAT is shared by the two components: it does something for both. We can learn a lot by looking at this diagram, since — once we’ve decided that these components track a viable historical or critical distinction among texts — it shows us certain types of language “schooling together” in the process of making this distinction. DirectAddress and FirstPer [or, First Person], Autobio and Acknowledge thus tend to go together here (lower right), as do Motions, SenseProperties, and Sense Objects (upper left).

In fact, the designer of Docuscope saw these LATs as being related, which is why elsewhere he aggregated them together into larger “buckets” such as Dimensions or Clusters, the latter being the aggregate we used in our analysis of full plays. What we’re seeing here is a kind of “schooling of like LATs in the wild,” where words that are grouped together on theoretical grounds are associating with one another statistically in a group of texts. If the intellectual architecture of Docuscope’s categories is good, this schooling should happen with almost any biplot of components, no matter what types of texts they discriminate. The power of this combination of Principal Components, then, is that it aligns the filiations and exclusions of the underlying language architecture with genres that we recognize, and will hopefully suggest theatrical or narrative strategies that support these recognizable divisions.

The loadings biplot shows us how the variables in our analysis are pushing items in the corpus into different regions of a dataspace. We can now populate that dataspace with the 767 pieces of Shakespeare’s plays, rating each of them on the two components. Here is how the plays appear in a plot of scaled Component 1 against Component 2, again, color coded with the scheme used above:

Notice the pattern we’ve seen before: comedies (here represented in red) are opposite histories (green) in diagonal quadrants. In general, they don’t mingle. The upper right hand quadrant, which is where the comedies tend to locate, contains the first item that I’d like to discuss: the red dot labelled Merry Wives (circa 2.1). This dot represents a piece of the first scene, second act of The Merry Wives of Windsor. As the item that rates highest on both PC1 and PC4 — components which the Tukey Test shows us to be best at discriminating comedy — this piece of The Merry Wives of Windsor is the most comic 1000 word passage that Shakespeare wrote. Here is an excerpt:

“I’ll entertain myself like one that I am not acquainted withal; for, sure, unless he know some strain in me, that I know not myself, he would never have boarded me in this fury.” In this color coded sentence we can see diagrammed the comic dance step. While I think there are funnier lines — “I had rather be a giantess, and lie under Mount Pelion” — the former is significant for what it does linguistically: it shows a speaker entertaining and then rejecting a perspective on her own situation (that of Falstaff) while comparing it with another (her own). The uncertainty strings (orange) such as “know not,” “doubt” and the indefinite “some” contribute to this mock searching rhetoric. Self-disclosure strings such as “myself” and “makes me” anchor the reality testing exercise to the speaker, who must make explicit her own place in the sentence as the object of doubt, while the oppositional reasoning strings such as “never” and “not” mark the mobility of this speakers perspective: I will try this toying perspective on my honesty, seeing myself as Jack Falstaff does, but will reject it soon enough. The reason that this passage is so highly rated on these two factors has something to do with the multiplication of perspectives that are being juggled onstage: there are two individuals here — Mistress Page and Mistress Ford — who are, as it were, rising above an imbedded perspective contained in Falstaff’s letter, commenting upon that perspective, and then rejecting it. Each time a partition in reality (a level) is broached in the stage action and dialogue, comic language appears.

We can oppose this most comic piece of writing — again, according to PCA — to its opposite in linguistic terms, a piece that contains what the comic one lacks and lacks what the comic one has. Here, then, is a portion of the “most historical” piece of Shakespeare, from Richard II 1.3:

Here we see the formal settings of royal display, a herald offering Mowbray’s formal challenge — no surprise this exemplifies history, a genre in which the nation and its kings are front and center. Yet where the passage really begins to rack up points is in its use of descriptive words, which are underlined in yellow. Chairs, helmets, blood, earth, gentle sleep, drums, quite confines…we don’t think of history as the genre of objects and adjectives, but linguistically it is. Inclusive strings, in the olive colored green, are perhaps less surprising given our previous analyses. We expect kings to speak about “our council” and what “we have done.” But notice that such language is quite difficult to use in comedy: even in a passage of collusion, where we would expect Mistress page and Mistress Ford to be using first person plural pronouns, the language tends to pivot off of first person singular perspectives. The language of “we” really isn’t a part of comedy.

I am less surprised to find, at this finer grained level of analysis, words from official life (what Docuscope tags as Commonplace Authority, in bright green) associated with history, since these are context specific. More interesting is the presence of the purple words, which Docuscope tags as person properties. These are high in history, but show up in comedy as well, as you can see on the loading biplot above. This marked up passage is also useful because it shows us something we’d want to disagree with: you don’t have to be Saul Kripke to see that a proper name like Henry is an imperfect designator of persons, particularly because other proper names such as Richard do not get counted under this category by Docuscope. We live with the imperfections, unless it appears that there are so many mentions of the name Henry in the plays that this entire LAT category must be discounted.

January 15, 2010
Comic Twelfth Night, Tragic Othello (Part 2)

Here is a second comic exchange from Twelfth Night. Maria’s plan has worked wonderfully. Malvolio has arrived cross-gartered and is quoting to Olivia little bits of the love letter he believes she has written to him. The blue and red strings, First Person and Interaction, are again appearing fast and thick as the incomprehension builds. As in the previous passage, which dealt with Cesario’s resistance of Olivia, we have a resistant “you” here who keeps the game going. (Had she succumbed, dismissing Maria to go practice her penmanship, the dialogue would look very different: first and second person singular pronouns would most likely disappear.)

A few things worth noting about the coding in this passage. Docuscope is ignoring the single quotation marks from the Moby Shakespeare. It does not matter that these words are being “mentioned” rather than “used” in the Austinian sense: all “sightings” by Docuscope occur in a kind of weird citational indicative: there is no way for the machine to catch the fact that the speaker, Malvolio, is note really telling Olivia “Go to, thou art made.” This is a flat earth in the rhetorical sense: no ironic depth can be perceived when every item is tagged because it occurs, not because its use in a certain context means a certain thing. One should not be mislead about Docuscope’s powers of interpretation here.

Switching analogies, we might say that – like a Spinozan deity – Docuscope contemplates words from the perspective of eternity: it does not itself follow events from the standpoint of a moving present against which it measures temporally marked events as they arrive and withdraw through time. (Docuscope does not engage in phenomenological protention or retention in the Husserlian sense.) Nor does it situate events in space in any perspectivally located way. The history of what happens in the world of the play, if we were to think of it that way, is a history of “mentioned happenings.” No one does anything; rather, words are mentioned, and Docuscope keeps track of which kinds of words are used (but never how).

Another interesting feature of the passage. Malvolio really doesn’t say anything directly to Olivia in this passage: he is talking past Maria, and is reciting to Olivia what he believes she actually wants to say to him. This sort of indirection, when it is not a group effort, also seems to be contributing to the proliferation of Interaction and First Person strings: the “how,” “what,” “what” paired with the “you” “thou” “thou.” We would expect to find a lot of passages like this in other plays that have disguise and supposition, most of all in Comedy of Errors. I suspect that in the future I will be able to put my finger on a number of passages which parallel this one in terms of their performance on the comedy factor that Docuscope found for the full plays.

A final observation. Here and elsewhere in the play, Malvolio is often the one who supplies the Description strings, which as I have mentioned below, this play lacks in comparison with other plays (just as it has more, on average, Interaction and First Person). Is there anything about this passage that shows us why one cannot put one’s weight on both sides of this equation – Description on the one hand, First Person/Interaction on the other – in a single play or passage? Is there something about the comic posture, linguistically, that prevents such combinations? Malvolio and Feste are the two characters in the play who use the most Description strings, and during the fabulous speech in which Malvolio fantasizes about being married to Olivia while Toby and Maria look on, the linguistic texture of the scene is that of a History play. But as principal component analysis tells us, such moments of “historical” writing – oversimplified as the definition is – may occur occasionally in Comedy, but they will not occur repeatedly. Malvolio can only give so many such monologues, and Feste can only produce his rich, descriptive banter for so long.

But isn’t it important that there is a “dash” of Description in the play, indeed, in this passage? One issue that we need to explore as we think about what it means to find “a lot” of something in a particular type of play is what it also means to find “a little” of something. Is there a sense in which things that occur in small amounts are important as well, and if so, how should we think about those “dashes” of a certain type of word?

August 2, 2009
Comic Twelfth Night, Tragic Othello? (Part I)

Twelfth Night is one of the classic Shakespearean comedies and so it is unsurprising that it appears in the Comedy quadrant that we obtained in our initial analysis. What is it about the language in this play that pushes it toward this quadrant, and would we recognize this comic “itness” if we saw it in the form of an exemplary passage? That is the first question I’ll be looking in the next series of posts, entitled “Comic Twelfth Night, Tragic Othello?” But there is another, more interesting question to ask, given the results we have obtained: why does Othello look to Docuscope like a comedy? Literary critics such as Susan Snyder and Stephen Orgel have noted genealogical links to comedy in this “high tragedy,” so it is particularly intriguing to find unsupervised statistical analysis of the language coming to a similar conclusion. I will try to provide more than one exemplary passage in this series of posts, since these tend to be where the analysis gets interesting (or not).

So, Twelfth Night. In terms of plot, it has three interesting devices — a set of identical twins, a shipwreck, and a disguise, all of which introduce a high degree of unintentional confusion into the action, driving it forward. In a plot that is driven on by accident and what you might call “congruent misunderstanding” (when two people don’t realize that they are speaking at cross-purposes), you expect to find a lot of back and forth between characters as they synch-up their erroneous suppositions (which is funny in and of itself), then more back and forth as they backtrack in order to rehearse why they didn’t understand what was going on when they were so deeply engaged with one another. I haven’t yet looked at the color coded play as I write this, but I expect to find the comic strings at the end, where the confusion is being unravelled, and in scenes of comic abuse (which I know from experience involves a lot of “I”/”thou” exchange characteristic of comedy). The exemplars are below, one from Open Source Shakespeare, the other a screen shot of the same passage as tagged by Docuscope:

The first thing I notice about this exchange is that it involves an extended miscommunication, culminating in the wonderful line “I am not what I am.” The doubled first person is emblematic of the doubling of Viola’s person in Cesario (or in Olivia’s apprehension of Viola as Cesario). The underlined red passages refer to the Docuscope category First Person, which as we remember from the component loadings is high in all of the items on the upper half of the scatterplot. The other type of strings that push plays upward are those underlined in blue, which are coded in Docuscope under the category of Interactions. First person is fairly self-explanatory here — look at the red items — but Interaction is worth pausing at. Notice first that question marks are being tagged here: a piece of punctuation and so not definitively Shakespearean. Maybe it matters that something that could have been added by a compositor is at work in this category, maybe it doesn’t. I don’t think question marks are as open to interpretation, grammatically, as say a comma or semicolon, but this is something for my colleague Jonathan to weigh in on. We see lots of “thee” and “thou” under Interaction, and these words seem to be the mainstay of comedy as a whole from what I’ve seen. “Thee,” “thou,” “thine,” “you,” and “your” are some of the most common words in the Shakespearean corpus that Docuscope tags, so we can be fairly sure that when we find First Person coming up as a relevant loading in a component, it is words such as “these” that are driving the underlying pattern.

Red and blue strings are pushing mostly comic plays up toward the top of the scatterplot. Yellow strings will push plays to the right, which means that the comedies clustering in the upper left exhibit a lack of yellow or Descriptive strings. The entire component that characterizes Comedy, then, is one in which First Person and Interaction strings are mutually elevated from the mean score of all plays, while Descriptive strings are (simultaneously) below the mean. Perhaps there is a reason that a linguist could provide that would explain this pattern as a general feature of the language. That is, someone might be able to show that our language is something that can only “bend” in certain ways, making it quite difficult to use a lot of concrete descriptive nouns and words describing motion or changes in states of objects while simultaneously juggling lots of I/you, my/your strings. But this would not be enough of an explanation for me. We need to say why this type of language pattern –whether or not it is constrained by limits in our grammar, cognition, or underlying semantic maps — coincides with genre classifications made by discriminating humans (Heminges and Condell, Shakespeare’s editors).

Returning the the passage above, I would point out two things. First, the quick trading of I/you, my/your strings in comic dialogue suggests a world in which predicates are being attached to subjects from two and only two points of view. This is not a universe of one, nor is it a crowd. It is not surprising that comic plotting — built as it is on sexual pairings — would favor this type of bivalent, perspectival tagging of action by speakers. But there is something else going on here. Olivia is trying to make something happen here. She says, “do not extort thy reasons from this clause,” and earlier, “I would you were as I would have you be.” The “thy” and “you” here are important because the speaker is trying to create or assert a particular interpretation of how these two individuals relate to one another (and the words traded between them). The essential drama in this situation is the asymmetry of desire that obtains between the two characters, an asymmetry that keeps Viola from assenting to Olivia’s advances. That resistance is actually what forces Olivia to make these statements that are rich with I/you, me/my, since she is using these words as anchors for a broader interpretation that does not yet obtain. She really wants to say we. And Cesario doesn’t, so they remain in I/you dialogue.

So we could offer a preliminary hypothesis here. Shakespeare writes comedies in which characters, sometimes quite perversely, find the wrong way to the ones they love. Often it is chance or an onstage helper who sorts this out. Shakespeare is actually quite reserved when it comes to showing love as naturally progressing through its obstacles unassisted. But given that, in the initial stages of courtship, Shakespearean lovers almost never meet and join in a perfectly symmetrical way — they don’t begin out as stones set in an arch, leaning perfectly on a keystone — we should expect this asymmetry to show itself in the language. Where does it show up? When a resistant individual, a “you,” prevents another “I” from arriving at an interpretation of their relationship that can be referred to as a “we” before others. Let’s call this the “resistant you” hypothesis. We can perhaps test it in the next passage, and in the passages we encounter from Othello.

July 31, 2009
Love’s Labour’s Lost: The History

This passage from the Open Source Shakespeare’s Love’s Labour’s Lost shows language patterns that push the play into the area where the Histories cluster, something visible in the scatterplot discussed below. Returning to the taxonomy of Docuscope, this passage has a lot of Description strings combined with a relative lack of Interaction and First Person strings, both of which can be seen in the Docuscope screen shot below. We are looking at something slightly more complicated in this visualization of the text, however, because I have “turned on” the First Person and Interaction strings in the Docuscope Single Text Viewer. I did this because I want to show what cannot be shown: a relative lack of blue (Interaction) and red (First Person) strings combined with a relative abundance of yellow (Description) ones. To really “see” this in the wild, you would have to consult a completely color tagged text of the complete Folio Works and — while reading — keep track of the relative differences in quantities of blue, red and yellow in the different containers (the plays themselves). Only an Argus-eyed text tagger and a statistical analysis can do this. The results are heuristic in that they lead us toward certain areas of the text for continued interpretation. In this case, I have used the color coding facility in Docuscope to scan the entire play (once I knew the categories I was interested in) in order to find a passage like the one above: one that has lots of yellow and very little blue or red.

Inspection of such candidate “History” passages reveals a number of pedantic exchanges like this one between Sir Nathaniel, Holofernes and Dull. This scene is a hilarious sendup of of rhetorical display and vacuous learning, and it burlesques the famous Renaissance idea of verbal variety or copia that was recommended by Erasmus. It makes sense that this kind of passage would withdraw the play from the type of comic verbal interaction analyzed in the previous post. Because these characters are not speaking with one another, but rather are addressing an invisible audience of discerning rhetorical literates, there is not much interaction in the form of second person pronouns or corresponding first person singular pronouns — the very strings one would tend to find in Comic exchanges about acts or actions taken by characters themselves.

We expect this kind of thing from the pedants, but the analysis reveals a continuity of this History-like pattern among the French nobles who have vowed to live a life of Platonic study, characters like Biron who can never resist plumping their own rhetorical plumage. Don Armado, another parodic figure with an almost Quixotic appreciation for his own courtly expression, is also linguistically self-indulgent, and his passages would look similar to the one I have excerpted here and shown color coded below (although with Don Armado, there are marginally more interactions with his Page). The point of this analysis is to show that there are reasons why Docuscope would place Love’s Labour’s Lost with the Histories, and these reasons make sense to us once we begin to think about how the play is put together. This is a world of narcissists, something the Princess and her ladies point out when they defer the proposed courtship that is offered at the end of the play. That narcissism shows itself as a tendency to monologue, which cuts out the interaction that is characteristic of comedy and highlights instead a description-rich kind of oration that pushes these plays into the realm of History.

Would it be fair then to call Love’s Labour’s Lost a History? It depends. I would be comfortable saying that on the level of plot it has the elements of a Comedy, but on the level of its language, it is a History.

What, then, do we make of the historical decision made by Heminges and Condell to call this play a Comedy? Unsupervised statistical analysis has shown us (1) a pattern of groupings among the plays that roughly approximates at least two of H&Cs generic groupings of 1623 but also (2) exceptions to those classifications that make a certain amount of critical sense when we look at the construction of those plays. I would argue that we need an ontology here to sort out what elements of the analysis are fundamental as opposed to derivative. We could have an ontology of levels, for example, which says that “on the linguistic level, the play belongs with one group,” but “on the level of plot, the play belongs with another group.”

But eventually we would have to decide how the levels go together. That is also part of the point of this kind of work, since the overlap-with-divergence of linguistic and historical groupings of the plays introduce the possibility that there are levels of coherence here whose interaction needs to be explained. The language of levels needs a compliment in a theory of objects: what are the things that are being compared here? Aren’t the tagged texts themselves a kind of hypothetical or abstracted version of the text itself? And what is the relationship between this hypothetical object and those that are arrayed into a generic group by, say, the historical editors of the First Folio? I will try, in future posts, to show why these are not trivial metaphysical questions.

By way of preview, however, I think the most fundamental “level” here is the one on which individuals or groups make decisions and act. So I would say that Heminges and Condell’s decision about how to order the plays in the First Folio is the most real thing in the analysis, while the statistical objects (tagged texts, Principal Components, regions of a scatterplot) are derivative. How else could we be “surprised” to find LLL clustering with the Histories, unless we were already enticed by the idea (as I was) that the initial clusterings themselves coincided with the classes stipulated by Shakespeare’s editors? More interesting: what is the abstract recipe of family resemblances or species traits that human beings like Heminges and Condell are carrying around in their heads? Their decision to sort the plays a certain way is real. It is a historical fact. But the “sensibility” or “weightings” that led them to take this empirical action must itself be hypothesized or modeled. We might be able to reconstruct this model, but even H&C may not have had direct access to it. This detour might change the way we think about the status of our statistical model, since that model may be only an approximation of something far more comprehensive — capacity for literary judgment in historical actors — whose dynamic, differential powers of comparison are suggestively approximated things like “principal components.”

The latitude in linguistic practice that makes Loves Labour’s Lost look like a History is evidently something that Heminges and Condell did not notice, and I’m not sure why they should have. But once we have noticed it, this latitude in terms of linguistic practice may makes sense to us. Why couldn’t there be a filiation of Love’s Labour’s Lost with Histories on the level of stance and language that does not “show up” on the level of plot? Surely this filiation is real too. The question is, where and on what level?

July 20, 2009