Category: Shakespeare

Shakespeare Out of Place?

When Jonathan Hope and I did our initial Docuscope study of over 300 Renaissance plays, we found Shakespeare’s plays clustering together for the most part. One explanation for this clustering was that it was caused by something distinctive in Shakespeare’s writing, and that this authorial signature becomes visible in the same way genre does—at the level of the sentence. Indeed, in our first approach to this larger dataset (one we’d assembled from the Globe Shakespeare and Martin Mueller’s semi-algorithmically modernized TCP plays), we thought that authorship was overriding genre as source of patterned variance.

But everything which goes into the dataset also comes out. And in this case, it was editorial difference that was helping to isolate Shakespeare’s plays. When we did a further study of the clusters containing works by Shakespeare, we noticed that their elevated levels of two different LATs that dealt with punctuation – TimeDate and LanguageReference – was an artifact of hand modernization.

Several contracted items from the Globe/Moby Shakespeare edition, tagged as Language Reference Strings by Docuscope

The variability in early modern orthography is well known, and we also know that there were many ways of punctuating early modern texts. (In the case of Shakespeare’s plays, we assume that most of the punctuation originated with the compositors who set up the text in the printhouse rather than Shakespeare himself.) But when the Globe editors modernized their sources in the nineteenth century, they consistently applied certain rules of punctuation that skew Docuscope’s counts when these texts (as a group) were compared with the more varied punctuation to be found in the TCP texts. Sequences that were dealt with consistently in the Globe texts – for example, contractions such as [’tis] or [’twas] or [o’clock] – were being handled much more variously in the original-spelling texts that Martin Mueller was modernizing. (He was only modernizing words in his procedure.)

So, the punctuation was a tip off, increasing the chances that Shakespeare’s plays would cluster together.

We now have the ability to skip or blacklist certain word strings, thanks to a newly updated version of Docuscope created by Suguru Ishizaki. At some point, we will open this can of worms–actually modifying Docuscope’s original tagging protocols–but not yet. There is still more to be learned from the results from an unmodified Docuscope: when we don’t touch the contents of its internal dictionaries, we have the ability to compare results across periods or corpora.

In this case, we learn that Docuscope is sensitive to human editorial intervention in texts. So sensitive, in fact, that it produced an almost complete clustering of Shakespeare’s plays in the larger group of 320 that we profiled in the online draft of our “Hundredth Psalm” article.

The large cluster of Shakespeare plays that resulted from our initial comparison of Globe texts with Mueller's semi-algorithmically modernized TCP plays

Once we realized that this grouping was at least partly artifactual–a product of different editorial procedures applied to our combined corpus–we eliminated the LATs that were registering this difference (TimeDate and LangReference). Of course, by eliminating these, we lost their sorting power on the rest of the corpus, so there was a tradeoff. But we felt that it was not fair to give Docuscope this kind of advantage in sorting text when it was the result of modern editorial intervention. In the future, we might blacklist a word like [’tis] so that we can retain the rest of the category, but I don’t think this is necessary. What really needs to happen is that, in our editorial preparation of texts and corpora, we must ensure that no set of texts is isolated from the others through special editorial preparation. The fact that “anything goes” in the current TCP collection – it is full of various compositorial and printhouse styles and conventions – is probably a good thing. And in any event, we still see authors’ works and genres clustering together even where printers are multiple. Here, now, is one of the new Shakespeare clusters once the editorial “tell” of certain types of punctuation was removed:

New clustering of Shakespeare's plays with TimeDate and LangRef eliminated from analysis

Now we see that plays by Munday, Heywood, Marlowe, Shirley, Rowley, Webster, Middleton, and Massinger are showing greater similarities with Shakespeare: the variability of their punctuation is not being used against them. Within the Shakespeare plays that do cluster together, we see some of the same similarities–Coriolanus with Cymbeline, for example. But the terms on which Shakespeare’s plays are related to each other are now more limited–we have eliminated two categories of LATs that may have been sorting Shakespeare’s plays with respect to each other. This relative loss of sorting power within Shakespeare’s works seems tolerable to us, however, because it allows for a more meaningful portrait of Shakespeare’s relationship to other dramatists of the period. What excited us about this large diagram was that it says something about 150 years of early modern drama as a whole, inasmuch as that whole could be represented by over 300 works.

Here is the entire diagram, then, constructed without the LATs that capture the nineteenth-century modernization of the Shakespearean texts. (Many thanks to Kate Fedewa for helping us create this large image.)

Revised dendrogram comparing early modern plays from the TCP collection and the Globe Shakespeare (click on image in new screen to zoom)

September 3, 2010
Crowdsourced Peer Review in NY Times

The Times this morning did a piece on the Shakespeare Quarterly New Media issue that Jonathan Hope and I participated in. We received some terrific feedback, mostly from Shakespeareans, on the article that was posted to Media Commons–feedback that helped us rewrite the essay for the print edition which will be appearing this fall. There was also a piece on the process by Jennifer Howard in the Chronicle for Higher Education, itself the topic of an opinion piece in the Chronicle’s Brainstorm section.

The idea of open peer review in the humanities raises basic questions about the “specialized” nature of our knowledge in the humanities. Could simply anyone weigh in on a debate about a particular text and its interpretation? Wouldn’t that, in principle, be a good thing? I assume that knowledge in the humanities is in principle available to all. But it is also clearly specialized. The word “allegory,” for example, has a deep history and set of contextual meanings that you just couldn’t pick up from a good dictionary. Our research does expand what is known about certain literatures, cultures and writers, and in this sense, we look like a science that aims to extend the range of objects that are understood. We also refine our terms of art and build communities around these terms (i.e, différance, queering, hegemony, subaltern, hybridity, racialization). One could learn to throw these terms around, as Alan Sokal did in his famous hoax and as graduate students do every day in their seminars, but a good critic or editor should be able to say whether or not the writer really understands the terms. (This is where the editors of Social Text failed.) Perhaps if the paper Sokal submitted to Social Text had been vetted through crowdsourced open peer review, the article would have been rejected. In any event, the hoax itself provides an interesting limit case with which to evaluate the promise of open peer review: a writer acting in bad faith, either as author of the article or peer reviewer.

One last thought: the trajectory of learning in the humanities is intensive rather than cumulative. This is what differentiates us from, say, molecular biology, where you must learn certain things first (organic chemistry, cell physiology) in order to understand other things later (gene transcription). Within the humanities, acquiring expertise might mean re-orienting our approach to existing works rather than expanding the range objects that can be known, although the latter is always possible. But the underlying assumption – that in the humanities one can make qualitative advances in knowledge that do not necessarily fit into a progressive sequence – makes any comparison between the humanities and sciences difficult.

August 24, 2010
Penalty Kicks and Distributed Movement

Gabriel Dias, graduate student at RPI, has recently modeled the way in which penalty kickers move their bodies as they prepare for a shoot. His findings suggest that there are several “tells” – for example, the angle of the hips, or the position of the planted foot – which predict the ultimate direction of the shot. In the PBS interview that I’ve linked to above, he alludes to the existence of “distributed” movements which show the physical commitment of the kicker to one outcome or another. I hear the word distributed and I immediately think, “integrated physical system,” like a body that is constrained to do certain things because of the way its different parts interact. We see this integration in the competitive world of athletics and the expressive realm of dance. (Perhaps the adjectives here should be reversed?)

In our analysis of texts, we have also find distributed movements of a sort. We find, that is, that certain types of words tend to move with each other in some genres, and others move away from one another. Does this mean that genre is a physical system like penalty kicking, and that our explanation of these distributed movements – of words rather than points on a body – are themselves grounded in a physical reality? I have myself offered analogies to describe this “commitment of weight” in the process of using words to do certain things: if you want to write a Shakespearean comedy, there are certain things you are likely to do: you will tend to use more first and second person singular pronouns and less description than you would in, say, a history play. If Docuscope is the goalie/keeper, it may need only 30 or 40 lines to decide that the ball is going to go toward comedy rather than history. Other things will be ignored as incidental. If say that tagging a play and watching how its “points” move in a mathematical space is like biotagging a kicker and studying his or her movements, I am proposing an analogy. Like kicking, writing is a behavior. In certain situations (penalty kicking, writing for the stage), some aspects of this behavior are signal or cardinal — position of hips, use of pronouns –while others are inessential, like the curve of the kicker’s index finger. (Actually, given the dynamism of the human body, I would be surprised to find out that there is not, on some level, a connection between finger position and kick.)

So, does this mean I am advocating an essentially structuralist account of genre? Am I saying that, because language use is a behavior, then writing in a particular genre is also a behavior with certain “tells” that are, in a sense, built into the physical system of writing? I think people who are doing iterative criticism need to have an intelligent answer to this question, complete with an analysis of its underlying analogy. My answer would be that writing fiction in a historically bound literary field does, like penalty kicking, count as a behavior and that such behaviors will exhibit coordination. There is as much connective tissue in language, grammar, plot and audience expectation as there is in the fabric of the human body. But this is not the same thing as saying that there is an essential structure to particular types of writing – that the existence of a tell implies an underlying recipe, essence or structure that is genetically dictating the behavior of the writer.

Why doesn’t structuralism follow from linguistic integration? First, writing is not like penalty kicking. Dias chose penalty kicking because it is a binary physical outcome. With respect to the standing keeper, the ball goes left or right. Language, on the other hand, is like a flock of birds: it can break any way, 360 degrees, and is doing so dynamically at all times. “Yet the flock shows direction,” you say. “Individual birds may be wobbling left and right, up or down, but there is a recognizable trajectory within the group.” Perhaps there are deterministic ways of saying where this group is going to go next, but I doubt it. The total behavior is distributed, immanent: it has massive integrity as an aggregate, but the existence of that integrity does not imply some non-negotiable locus of control. Another way of saying this, and now I am channeling Whitehead, is to say that the direction of the flock is a continuously unfolding event or “society” of actual occasions. Thus, the penalty kicking example is good for showing entailment and distributed connection in the elements of literary linguistic analysis, but bad as a model for the errant and multiple trajectories of writing.

The existence of the tell essentially pushes back the timeline of intelligibility of the direction of the ball. A good keeper or student of physiology – like a good literary critic – will know earlier than most what kind of behavior is being exhibited. But unlike a keeper in a football game, the critic is not looking for a binary outcome. Rather, the critic or spectator is comparing the unfolding action onstage to any number of possible theatrical “types” of entertainment and generic conventions. Shakespeare takes five penalty shots at a time, all the time. If you are interested in this aspect of the play – its participation in comic conventions – yes, there will be “signal” or orienting linguistic events at the level of the line which you could consult to predict what he is about to do. But you don’t have to consult the tells and this is not a penalty kick: you already know what is going on and, indeed, are a better judge of the texture and generic tonalities of the play as it unfolds than a keeper who has to wait for the ball to be kicked. (Docuscope really is a keeper; it knows nothing until the event happens.) As we have seen in our research, human beings are massively sensitive to variations and distributed cues in linguistic behavior. We make an astonishing number of connections between the kinds of variation we see among the plays and texts we have encountered. Finding out that there is a linguistic “tell” for comedy doesn’t then mean that comedy essentially or structurally “is” the series of tells we reliably find for it. The “tells” here are a parallel description — and this, after the fact — of a perceptual reality that we render qualitatively and immediately, in our feel for certain types of writing or stories.

I have used the words “signal,” “cardinal” and “orienting” to describe the types of tokens that serve as good landmarks for genre in this alternative descriptive universe. I do not use “essential.” As we work further through this analogy between physical and linguistic behaviors, I think we should adopt Spinoza’s metaphysical position from the Ethics, that there is a parallelism between the twin domains of thought and extended physical beings. Neither has priority. When understood as a species of behavior, theatrical writing or literary production must obviously exhibit certain empirical regularities: it takes place on the fleshy platform of human consciousness and is constrained by the physical limits of our bodies, environment and history. As critic, I would want to insist that no material factor – the practices and limitations of stagecraft, the documented or remembered history of past performances, the politically charged distribution of resources and cultural actors – can be a priori excluded as unexpressable in the behavior that is writing. All constraints are summed and expressed, but in different amounts. But I would also want to insist that– whatever the behavior is that we are tracking – there has to be in place a certain set of agreements to make sense of the “movements” in this system as such. I have to want to count “these types” of words and not those. I have to search for significant coordination of these counted things with respect to “this type of outcome” and not another. Someone has to have the desire to study penalty kicks, for example, or authorship, or genre: behaviors don’t simply want to study themselves.

The tell is a “sign” that speaks for the kicker, and speaks early. It is a signal event worth attending to if you are a keeper. It is simultaneously an element in a causal sequence, constrained by events prior to it, and a negotiable sign or expression of an intention to do something. It is a physical way of saying, “I mean to kick the ball this way.” The point of the parallelism is that you never get to dump one half of the phenomenon. Leaning to the left, we acknowledge: all physical tells may be redescribed as expressions of an intention, and so tokens of meaning. But inclining to the right, we say: all tokens of meaning are, on some level, also indexes of empirical constraints. The keeper has to dive both ways.

July 29, 2010
Genre Dependence on Character Ideolects? (by Mike Stumpf, UW Undergrad)

And yet, we know that when human beings are involved, all findings are provisional. Odd.

Dendrogram displaying various segments from Romeo and Juliet

To expand on Michael Witmore’s comments in his previous post, it is indeed odd how provisional our results are. Case in point: I have been examining what John Burrows and Hugh Craig have called the “ideolects” of characters in connection with the plays in which they appear. I stumbled upon this idea while looking at Shakespeare’s Romeo and Juliet and asking how the language of the title characters may be steering this play towards tragedy or comedy. (This was done as for a panel I presented on with Witmore and William Blake for a digital salon at UW-Madison.) Witmore and Blake are themselves working on an analysis of Hamlet without the prince, and the 1 Henry plays/Merry Wives of Windsor without Falstaff: we’re all interested in this kind of “subtraction experiment.” To see my initial findings using this techniques, you can visit my blog, All Is True.

July 29, 2010
Docuscope Goes Live on Shakespeare Quarterly Open Peer Review

Jonathan Hope and I have written a new piece that we submitted to the special issue of Shakespeare Quarterly on “Shakespeare and New Media.” The essay cleared the first stage of editorial review, and is now posted at MediaCommons for general comment and critique prior to final editorial evaluation. Please visit the essay here and make your views known. The abstract and title are as follows:

“The Hundredth Psalm to the Tune of ‘Green Sleeves’”: Digital Approaches Shakespeare’s Language of Genre
In this essay, we explore the underlying linguistic matrix of Shakespeare’s dramatic genres using multivariate statistics and a text tagging device known as Docuscope, a hand-curated corpus of several million English words (and strings of words) that have been sorted into grammatical, semantic and rhetorical categories. Taking Heminges and Condell’s designations of the Folio plays as comedies, histories and tragedies as our starting point, we offer a portrait of Shakespearean genre at the level of the sentence, showing how an identification of frequently iterated combinations of words (either in their presence or absence) can allow us to appreciate the integrity and fluidity of Shakespeare’s genres in new ways. Calling this approach “iterative criticism,” we situate our critical practice in the context of both Shakespearean criticism and more general protocols of reading in the humanities, concluding with a genre map of Shakespeare’s plays in the context of 282 other early modern plays.

As the last line suggests, we have now managed–with the help of Martin Mueller at Northwestern–to produce an analysis of 282 plays from the TCP database alongside the Moby Shakespeare written between 1519 and 1659. I think this is the first visualization of its kind purporting to treat 150 years with of Renaissance drama, which itself feels like something of a hurdle overcome. Here it is:

Dendrogram Produced using Ward’s clustering method on scaled data using 99 LATs to profile 318 plays written between 1519-1659, color coded by genre and separating out the works of Shakespeare as a category of their own: Red=Comedy, Blue=Interlude, Green=History, Cyan=Tragedy, Purple=Tragicomedy, Orange=Masque, Gold=Shakespeare. The item names follow the protocol: (genre)-(date)-(author)-(title).

Two points to make here, although there could be many more. First, this diagram was constructed using scaled data, which means that the “mile away” linguistic markers of similarity and dissimilarity are being balanced with markers whose variation is less visible from a distance. Variables with large standard deviations are not dominating with respect to those with smaller ones. Note then that most of Shakespeare’s works cluster together here, comedies, tragedies and late plays all on the same twig. When I tried this analysis using non-scaled data, these genres split up and Shakespeare’s comedies clustered together with Jonson’s, suggesting that Ward’s clustering procedure on unscaled data is better for picking up genre differences, while the same procedure conducted on scaled data (as is the case here) is more sensitive to authorship. (For an earlier analysis of Shakespeare’s plays only using scaled data with Ward’s clustering technique, see this.) This finding should be tested in other contexts and with other data sets, but it is interesting, since it suggests that authorship becomes legible when fluctuations in variables that contain lots of tokens (say, Description) are coordinated with those that have many fewer tokens. It may be this “adding a dash of something” that pulls the author as such to the fore in an analysis.

I’d like also to offer another observation here about the fact that so many Shakespeare plays are hanging together (as are Shirley’s and Middleton’s), remaining agnostic for the time being about whether it is authorship or genre that is producing these clusterings. The majority of Shakespeare’s plays are clustering on a twig that contains mostly comedies. So when compared with 282 other items written between 1519-1659, Shakespeare’s plays look for the most part like plays that Harbage (in the Annals of English Drama) classed as comedies as opposed to some other genre. (Martin tells me that he followed Harbage for the most part, but made some guesses himself about genre designations based on title page information and common sense.) The thing to remember here is that an individual genre may cluster in different ways depending upon the larger population in which it is situated. That is, a fuller collection of texts from the period–not just the ones that Martin was able to modernize so that we could run a test on them–might show new subdivisions that end up splitting the Shakespeare block into a number of smaller splinters. (Or it may not: this may be a stabilized portrait, more or less.) The best way to understand more about the groupings themselves is to begin looking at them with the help of PCA and other techniques we’ve been using already. That’s where we’re headed next.

March 14, 2010