The passage from Richard II 3.3 above, taken from the Open Source Shakespeare, is statistically speaking particularly illustrative of some of the things that Shakespeare does when he writes History plays. While it is tempting to go straight to Docuscope and the statistics, it is better to post the passage without any markup, since we don’t want to prejudge what is going on in the language here. So, we read the passage.
One of the things I have been saying about working with statistics and texts is that statistics can tell you that there is a pattern, but only a human being can tell you what that pattern is. The what/that distinction is crucial if we want to be precise about the division of labor that occurs in this kind of work. It is easy to think that statistics have “found” something from out of nowhere in the text, when what is really going on is that they have found “something,” and this something is a reflection of prior decisions we have made about what is worth counting. Perhaps it will become clearer as postings on this blog continue why this distinction is important. For now I’ll try to be as accurate as we can about why this is “History writing:” this is a piece of History because a certain class of words that Docuscope counts are more abundantly present here than in plays of other genres. Now let’s go to the same passage as marked up by Docuscope. (Click on the image to get a good look.)
The underlined words here and in other passages of Richard II are the ones that are responsible for “driving” this and other history plays to the right hand side of the scatterplot from the previous post. These words all belong to a cluster called “Description” in Docuscope’s taxonomy, one that contains various subcategories that are not visible to us at this level of the analysis. (Short story why: with only 36 plays to look at, we need for statistical reasons to be looking for fewer classes of things-to-count than items-in-which-to-count-them. In statistics-speak: we can’t have the number of variables exceed the number of observations.) What kind of words does Docuscope count as “Description?” The best answer is: “the words that are underlined in yellow here.” Jonathan Hope and I have tried to remain agnostic about the explanatory power of the names that have been given to Docuscope’s categories and just look at what the words, as a collection, are doing on the page. But you can begin to guess what the rationale cluster is here: they are words that describe the properties of objects (“little,” “small”), objects themselves (“dish,” “wood”), spatial relations (“buried in the”), and verbs showing changes of physical state (“sighs,” “lodge”). Notice that some strings here are contiguous word seqments, such as “buried in the.” Docuscope can count strings from 1-10 words in length, and counts 200 million of them, classifying them into up to 101 categories at its finest level of resolution.
In this case I have consulted the results of the Principal Component Analysis (see last post), in particular the component loadings shown below, which tell me what cluster does the most work in pulling plays that score highly on PC1 to the right in the scatterplot we were looking at. On the left hand of the loadings chart below, we are looking at the various clusters that Docuscope counts by their cluster names. In the columns to the right, we are seeing the “loadings” of each of these clusters on the different components (PC1-PC5) that carve up the variation within Shakespeare’s writings into underlying patterns. As you can see from the bold item under PC1, Description is overwhelmingly powerful in the first component, scoring 0.913 on a scale from 0 to 1. The yellow underlined items above are those that were tagged as Description by Docuscope, which I know from having pulled up the play in Docuscope’s single text viewer and “turning on” only the items in this cluster (the yellow one on the left) so that I could find a passage that had a lot of yellow in it. (I picked this passage by eyeballing the play for yellowness. We are developing an algorithm to identify exemplary passages of different lengths using a hands-off statistical method; but for now we can use this.) So these words or tokens tell us why the pink dots in the biplot are moved to the right of the origin. But why do they move down? That is the work of PC2, and to understand that, we must look once again at the component loadings:
Now components can be combinations of correlated high and low items — a bit like a trend in a fixed deck of cards which has lots of face cards but very few low numbered cards. The loadings on a principal component can work in a similar way: a component, that is, can pull out a pattern in which Docuscope finds that plays containing lots of Emotion strings (as in PC3) also tend to have a lack of “Special Referencing” strings. This tells us that when Shakespeare does one thing, he is constrained — by genre, expectation, the limits of his actors, taste, style — not to do others. Explaining why this must be the case is for me perhaps the most interesting aspect of working quantitatively with the plays. Now, for PC2: it shows a corellation of high amounts of “First Person” strings with another group of strings called “Interaction.” Because the history plays cluster at the bottom right of the scatterplot, they score low on the second factor, and so lack the items that are highly loaded on this component. (Here we are focusing on boldfaced loadings that are greater than + or – 0.4, a significant statistical threshold.) So PC2 is really describing something that History plays lack: something that you probably wouldn’t look for when reading these plays, but which nevertheless is important to their construction and your experience of them. It is now time to find an un-Historical — untimely? — piece of Richard II that has these First Person and Interactivity strings: this item may show us, by negative example, what History plays and this play do not generally do in comparison to other plays.
Having read through the passage above, we can now look at how it was marked up by Docuscope. Remember that speech prefixes and stage directions have been stripped off of the Moby Shakespeare (which is the same one used by Open Source Shakespeare displayed above).
I have highlighted the Interaction and First Person strings from this passage in Richard II 4.1 that are atypical for history plays, and I think this is an interesting result. First Person includes the first person singular pronouns, first person possessive pronouns, but also references that relate actions or events to a speaker who is marking his or her relationship to those actions or events (“Make me,” “to me”). (Another post will deal with the question of the perspective from which utterances appear as marked; I suspect that Docuscope treats all terms as if they are being “mentioned” according to J.L. Austin’s criteria: use would be a far more complicated thing to tag.) Interaction includes several items, but here the ones that are shown are second person pronouns and possessive pronouns and verbs attached to such pronouns indicating something like recognition of a social relation or mediation (“mayst thou,” “Your care”). Notice that Docuscope is picking up a few archaic forms (“thee” and “mayst”). So what is it that Histories in general lack, but that this passage in particular has in an atypically high degree? The most accurate answer is: the underlined words. Principal Component Analysis tells us that there is a lower proportion of these strings in the Histories than in plays of other genres and says, in a mathematically defensible way, that this “lower” proportion is probably not-accidental. But it is our job to say what is going on, and perhaps why, not simply that something is the case. And so my provisional description (which is always a shorthand form of analysis) of this trend would be the following: History plays lack the verbal back and forth over personal matters and fortunes that is more common in other Shakespearean dramatic texts, a back and forth which seems to correlate — in its absence — with a high degree of concrete language about things and events. That’s what Docuscope “sees” when it sees History plays. I see stories about groups of people rather than individuals, stories whose action revolves around physical rather than emotional conflicts and so requires the description of concrete objects and events. The interpersonal or you-me back and forth style, on the other hand, is reserved for another of Shakespeare’s genres, one that lacks extensive descriptions of objects and things: his Comedies. Shakespeare’s Comedies will be the subject of another post. The next one, however, will treat an entire play that is high in what Histories have and low in what Histories lack, but is not itself a History: Shakespeare’s early comedy, Love’s Labour’s Lost.