Docuscope Goes Live on Shakespeare Quarterly Open Peer Review

Jonathan Hope and I have written a new piece that we submitted to the special issue of Shakespeare Quarterly on “Shakespeare and New Media.” The essay cleared the first stage of editorial review, and is now posted at MediaCommons for general comment and critique prior to final editorial evaluation. Please visit the essay here and make your views known. The abstract and title are as follows:

“The Hundredth Psalm to the Tune of ‘Green Sleeves’”: Digital Approaches Shakespeare’s Language of Genre
In this essay, we explore the underlying linguistic matrix of Shakespeare’s dramatic genres using multivariate statistics and a text tagging device known as Docuscope, a hand-curated corpus of several million English words (and strings of words) that have been sorted into grammatical, semantic and rhetorical categories. Taking Heminges and Condell’s designations of the Folio plays as comedies, histories and tragedies as our starting point, we offer a portrait of Shakespearean genre at the level of the sentence, showing how an identification of frequently iterated combinations of words (either in their presence or absence) can allow us to appreciate the integrity and fluidity of Shakespeare’s genres in new ways. Calling this approach “iterative criticism,” we situate our critical practice in the context of both Shakespearean criticism and more general protocols of reading in the humanities, concluding with a genre map of Shakespeare’s plays in the context of 282 other early modern plays.

As the last line suggests, we have now managed–with the help of Martin Mueller at Northwestern–to produce an analysis of 282 plays from the TCP database alongside the Moby Shakespeare written between 1519 and 1659. I think this is the first visualization of its kind purporting to treat 150 years with of Renaissance drama, which itself feels like something of a hurdle overcome. Here it is:

Dendrogram Produced using Ward’s clustering method on scaled data using 99 LATs to profile 318 plays written between 1519-1659, color coded by genre and separating out the works of Shakespeare as a category of their own: Red=Comedy, Blue=Interlude, Green=History, Cyan=Tragedy, Purple=Tragicomedy, Orange=Masque, Gold=Shakespeare. The item names follow the protocol: (genre)-(date)-(author)-(title).

Two points to make here, although there could be many more. First, this diagram was constructed using scaled data, which means that the “mile away” linguistic markers of similarity and dissimilarity are being balanced with markers whose variation is less visible from a distance. Variables with large standard deviations are not dominating with respect to those with smaller ones. Note then that most of Shakespeare’s works cluster together here, comedies, tragedies and late plays all on the same twig. When I tried this analysis using non-scaled data, these genres split up and Shakespeare’s comedies clustered together with Jonson’s, suggesting that Ward’s clustering procedure on unscaled data is better for picking up genre differences, while the same procedure conducted on scaled data (as is the case here) is more sensitive to authorship. (For an earlier analysis of Shakespeare’s plays only using scaled data with Ward’s clustering technique, see this.) This finding should be tested in other contexts and with other data sets, but it is interesting, since it suggests that authorship becomes legible when fluctuations in variables that contain lots of tokens (say, Description) are coordinated with those that have many fewer tokens. It may be this “adding a dash of something” that pulls the author as such to the fore in an analysis.

I’d like also to offer another observation here about the fact that so many Shakespeare plays are hanging together (as are Shirley’s and Middleton’s), remaining agnostic for the time being about whether it is authorship or genre that is producing these clusterings. The majority of Shakespeare’s plays are clustering on a twig that contains mostly comedies. So when compared with 282 other items written between 1519-1659, Shakespeare’s plays look for the most part like plays that Harbage (in the Annals of English Drama) classed as comedies as opposed to some other genre. (Martin tells me that he followed Harbage for the most part, but made some guesses himself about genre designations based on title page information and common sense.) The thing to remember here is that an individual genre may cluster in different ways depending upon the larger population in which it is situated. That is, a fuller collection of texts from the period–not just the ones that Martin was able to modernize so that we could run a test on them–might show new subdivisions that end up splitting the Shakespeare block into a number of smaller splinters. (Or it may not: this may be a stabilized portrait, more or less.) The best way to understand more about the groupings themselves is to begin looking at them with the help of PCA and other techniques we’ve been using already. That’s where we’re headed next.

