Category: Early Modern Drama

The very strange language of A Midsummer Night’s Dream
I just got back from a fun and very educative trip to Shakespeare’s Globe in London, hosted by Dr Farah Karim-Cooper, who is director of research there.

The Globe stages an annual production aimed at schools (45,000 free tickets have been distributed over the past five years), and this year’s play is A Midsummer Night’s Dream. I was invited down to discuss the language of the play with the cast and crew as they begin rehearsals.

This was a fascinating opportunity for me to test our visualisation tools and analysis on a non-academic audience – and the discussions I had with the actors opened my eyes to applications of the tools we haven’t considered before. They also came up with a series of sharp observations about the language of the play in response to the linguistic analysis.

I began with a tool developed by Martin Mueller’s team at Northwestern University: Wordhoard, as a way of getting a quick overview of the lexical patterns in the play, and introducing people to thinking statistically about language.

Here’s the wordcloud Wordhoard generates for a loglikelihood analysis of MSND compared with the whole Shakespeare corpus:

Loglikelihood takes the frequencies of words in one text (in this case MSND) and compares them with the frequencies of words in a comparison, or reference, sample (in this case, the whole Shakespeare corpus). It identifies the words that are used significantly more or less frequently in the analysis text than would be expected given the frequencies found in the comparison sample. In the wordcloud, the size of a word indicates how strongly its frequency departs from the expected. Words in black appear more frequently than we would expect, and words in grey appear less frequently.

As is generally the case with loglikelihood tests, the words showing the most powerful effects here are nouns associated with significant plot elements: ‘fairy’, ‘wall’, ‘moon’, ‘lion’ etc. If you’ve read the play, it is not hard to explain why these words are used in MSND more than in the rest of Shakespeare – and you really don’t need a computer, or complex statistics, to tell you that. To paraphrase Basil Fawlty, so far, so bleeding obvious.

Where loglikelihood results normally get more interesting – or puzzling – is in results for function words (pronouns, auxiliary verbs, prepositions, conjunctions) and in those words that are significantly less frequent than you’d expect.

Here we can see some surprising results: why does Shakespeare use ‘through’ far more frequently in this play than elsewhere? Why are the masculine pronouns ‘he’ and ‘his’ used less frequently? (And is this linked to the low use of ‘lord’?) Why is ‘it’ rare in the play? And ‘they’ and ‘who’ and ‘of’?

At this stage we started to look at our results from Docuscope for the play, visualised using Anupam Basu’s LATtice.

The heatmap shows all of the folio plays compared to each other: the darker a square is, the more similar the plays are linguistically. The diagonal of black squares running from bottom left to top right marks the points in the map where plays are ‘compared’ to themselves: the black indicates identity. Plays are arranged up the left hand side of the square in ascending chronological order from Comedy of Errors at the bottom to Henry VIII at the top – the sequence then repeats across the top from left to right – so the black square at the bottom left is Comedy of Errors compared to itself, while the black square at the top right is Henry VIII.

One of the first things we noticed when Anupam produced this heatmap was the two plays which stand out as being unlike almost all of the others, producing four distinct light lines which divide the square of the map almost into nine equal smaller squares:

These two anomalous plays are Merry Wives of Windsor (here outlined in blue) and A Midsummer Night’s Dream (yellow). It is not so surprising to find Wives standing out, given the frequent critical observation that this play is generically and linguistically unusual for Shakespeare: but A Midsummer Night’s Dream is a result we certainly would not have predicted.

This visualisation of difference certainly caught the actors’ attention, and they immediately focussed in on the very white square about 2/3 of the way along the MSND line (here picked out in yellow):

So which play is MSND even less like than all of the others? A tragedy? A history? Again, the answer is not one we’d have guessed: Measure for Measure.

This is a good example of how a visualisation can alert you to a surprising finding. We would never have intuited that MSND was anomalous linguistically without this heatmap. It is also a good example of how visualisations should send you back to the data: we now need to investigate the language of MSND to explain what it is that Shakespeare does, or does not do, in this play that makes it stand out so clearly. The visualisation is striking – and it allowed the cast members to identify an interesting problem very quickly – but the visualisation doesn’t give us an explanation for the result. For that we need to dig a bit deeper.

One of the most useful features of LATtice is the bottom right window, which identifies the LATs that account for the most distance between two texts:

This is a very quick way of finding out what is going on – and here the results point us to two LATs which are much more frequent in MSND than Measure for Measure: SenseObject and SenseProperty. SenseObject picks up concrete nouns, while SenseProperty codes for adjectives describing their properties. A quick trip to the LATice box plot screen (on the left of these windows):

confirms that MSND (red dots) is right at the top end of the Shakespeare canon for these LATs (another surprise, since we’ve got used to thinking of these LATs as characteristic of History), while Measure for Measure (blue dots) has the lowest rates in Shakespeare for these LATs.

So Docuscope findings suggest that MSND is a play concerned with concrete objects and their descriptions – another counter-intuitive finding given the associations most of us have with the supposed ethereal, fairy, dream-like atmosphere of the play. Cast members were fascinated by this and its possible implications for how they should use props – and someone also pointed out that many of the names in the play are concrete nouns (Quince, Bottom, Flute, Snout, Peaseblossom, Cobweb, Mote and so on) – what is the effect on the audience of this constant linguistic wash of ‘things’?

Here is a screenshot from Docuscope with SenseObject and SenseProperty tokens underlined in yellow. Reading these tokens in context, you realise that many of these concrete objects and qualities, in this section at least, are fictional in the world of the play. A wall is evoked – but it is one in a play, represented by a man. Despite the frequency of SenseObject in this play, we should be wary of assuming that this implies the straightforward evocation of a concrete reality (try clicking if you need to enlarge):

Also raised in MSND are LATs to do with locating and describing space: Motions and SpaceRelations (as suggested by our loglikelihood finding for ‘through’?). So accompanying a focus on things, is a focus on describing location, and movement – perhaps, someone suggested, because the characters are often so unsure of their location? (In the following screenshot, Motions and SpatialRelation tokens are underlined in yellow.)

Moving on, we also looked at those LATs that are relatively absent from MSND – and here the findings were very interesting indeed. We have seen that MSND does not pattern like a comedy – and the main reason for this is that it lacks the highly interactive language we expect in Shakespearean comedy: DirectAddress and Question are lowered. So too are PersonPronoun (which picks up third person pronouns, and matches our loglikelihood finding for ‘he’ and ‘his’), and FirstPerson – indeed, all types of pronoun are less frequent in the play than is normal for Shakespeare. At this point one of the actors suggested that the lack of pronouns might be because full names are used constantly – she’d noticed in rehearsal how often she was using characters’ names – and we wondered if this was because the play’s characters are so frequently uncertain of their own, and others’ identity.

Also lowered in the play is PersonProperty, the LAT which picks up familial roles (‘father’, ‘mother’, ‘sister’ etc) and social ones (job titles) – if you add this to the lowered rate of pronouns, then a rather strange social world starts to emerge, one lacking the normal points of orientation (and the play is also low on CommonAuthority, which picks up appeals to external structures of social authority – the law, God, and so on).

The visualisation, and Docuscope screens, provoked a discussion I found fascinating: we agreed that the action of the play seems to exist in an eternal present. There seems to be little sense of future or past (appropriately for a dream) – and this ties in with the relative absence of LATs coding for past tense and looking back. As the LATtice heatmap first indicated, MSND is unlike any of the recognised Shakespearean genres – but digging into the data shows that it is unlike them in different ways:
- It is unlike comedy in its lack of features associated with verbal interaction
- It is unlike tragedy in its lack of first person forms (though it is perhaps more like tragedy than any other genre)
- It is unlike history in its lack of CommonAuthority
Waiting for my train back to Glasgow (at the excellent Euston Tap bar near Euston Station), I tried to summarize our findings in four tweets (read them from the bottom, up!):

I’ll try to keep in touch with the actors as they rehearse the play – this was a lesson for me in using the tools to spark an investigation into Shakespeare’s language, and I can now see that we could adapt these tools to various educational settings (including schools and rehearsal rooms!).

Jonathan Hope February 2012
February 6, 2012
Tokens of Impersonation in Dekker’s City Comedies

In sixteenth- and seventeeth-century England, the relationship between clothing and identity was complex. As Ann Rosalind Jones and Peter Stallybrass have shown, the fact that clothing circulated as currency among different owners implicitly called into question its supposed correspondence with the wearer’s social and financial status. Stephen Orgel has explored how issues surrounding clothing and identity played out on the Elizabethan and Jacobean stage—a place where clothing was understood at once as the defining token of identity and as disguise, where audiences entered into the fiction that a dress could temporarily transform a lower-class boy into a noble woman. The possibility that appearance might not match reality was problematic for early modern audiences, however, because the English credit culture that emerged in this period depended on people’s ability to assess one another’s presentations of honesty and trustworthiness. By challenging the assumed correspondence between social performance and identity, cross-dressing figures like Moll Cutpurse in Dekker and Middleton’s The Roaring Girl (1611) suggest the fallability of a system in which a person’s economic status is inferred from his or her appearance.

I wondered whether The Roaring Girl’s concern with the instability of credit might be visible at the linguistic level. In Witmore and Hope’s “very large dendrogram” (see Figure 9 here), three plays group tightly with The Roaring Girl: Westward Ho (Dekker and Webster, 1604), Northward Ho (Dekker and Webster, 1605), and The Honest Whore, Part 2 (Dekker, performed 1605 and published 1630). Based on where they cluster in the dendrogram, it is clear that these texts are not merely linked by authorship, genre, or time period. I hypothesized that these four plays might all share The Roaring Girl’s concern with disguise and credit, and that this concern would be one of the factors linking them together stylistically. Still, much of early modern drama, especially city comedy, is concerned with the economics of identity. Assuming that these plays’ treatment of credit and disguise contributes to their linkage, what is uniquely similar about them that pushes the plays together?

To answer this question, I performed Principle Component Analysis (PCA) on 130 plays performed between 1601 and 1621 and found a component that united the plays on The Roaring Girl twig. As it turns out, the cocktail of linguistic factors that joins these four plays includes the categories Docuscope labels “Person Properties” and “Sense Objects.” The component also discriminates against Positive and Negative Standards, Abstract Concepts, and Negativity.

The passage from the four plays that is most exemplary of this component comes from Westward Ho. Words underlined in purple are Person Properties, while bright yellow indicates Sense Objects:

In this scene, the bawd Birdlime tries to protect the identity of one of her clients, Tenterhook, from another who has entered her house. Tenterhook hides in a closet with the prostitute, Luce, and covers her eyes. She tries to identify him by the feel of his hands and what he wears on them. In guessing, she reveals the names of all her clients, thereby contradicting the bawd’s claim that whores practice a kind of doctor-patient confidentiality. The most frequent elements in this scene are Person Properties, Sense Objects, Questions and Direct Address. In other words, in this scene characters address one another based on their perceived identities (mistress, captain) and their interactions with the physical world.

The second most exemplary passage, this time from The Roaring Girl, is even more explicitly concerned with clothing. Here again, purple indicates words tagged as Person Properties, and yellow highlights Sense Objects:

In this scene, Moll’s man Trapdoor reports to Sir Alexander about his mistress, and they hatch a plan to catch her in flagrante delicto with Alexander’s son Sebastian. Again, the passage is dominated by Person Properties (linked mainly to gender and social position) and Sense Objects. Moll’s male apparel is thoroughly catalogued, and the interplay of the repeated terms “girl,” “mistress,” and “man”/ “male” highlights the instability of her identity when she wears these typically masculine items of clothing. The rapid-fire comedic exchange amplifies the effect of the patterns—for example, the repeated pun on “shirt of mail” / “male shirt” creates a glut of Person Properties and Sense Objects in those lines.

It would seem, then, that the component under consideration selects for descriptions of people—their social roles (Person Properties) and the way they dress (Sense Objects)—as well as descriptions of the material world. What does PC2 select against? The least exemplary passage comes from a scene in The Roaring Girl in which Sebastian attempts to persuade his father that Moll is a chaste woman, despite her propensity for brawling and wearing men’s clothing. In this passage, green indicates Positive Standards and Negative Standards; light purple flags Abstract Concepts and various narrative cues such as Reporting Events; and orange highlights Negativity as well as other indicators of interiority such as Subjective Perception:

Sebastian explicitly critiques his father for judging Moll by her appearances; yet the language of this passage is very different from previous ones in which the obsession with appearances and roles was implicit in the preponderance of Person Properties and Sense Objects. Here, the most common elements are Positive and Negative Standards, Abstract Concepts, and Negativity. Given that this passage is the opposite of the component that grouped these four plays together, it would seem that this particular combination of standards, judgment, and interior life is uncommon in the world of these plays.

While the component that sets these four texts apart selects for plays about sex and clothing, it is not merely a “disguise plot” component. Given its opposition to standards and interiority, it might be more broadly defined as language that explores the material world’s inability to accurately reflect abstract truths. I believe this component can show us something about Dekker’s engagement, not only with identity, but with credit culture. In selecting for moments where people are described based on their clothing, appearance, and/or social role, and selecting against value judgments of those people’s performances, this component might highlight plays that represent the impossibility of assessing people based on their public personae. Not only might a woman dress as a man, but a prostitute might present herself as a rich woman, provided she has wealthy enough customers. Similarly, an insolvent gallant might dress well to trick shopkeepers into extending him credit (or their wives into sleeping with him). The fact that Dekker’s treatment of disguise excludes judgments, standards, or appeals to authority suggests that his critique is not of the amorality of the city. Rather, it is of the way that credit relations punish perceived immorality, while often rewarding well-hidden immorality. This explanation might help explain why these particular plays cluster together, rather than blending in with all the rest of Jacobean city comedy.

Richard Wawso argues that all Jacobean drama, through its concern with disguise, counterfeit, and crime, invites audiences to question the credibility of their neighbors. Certainly, Dekker’s stage comedies reflect a sustained interest in the unstable relationship between dress and character, but as this component reveals, they do so in a unique way. I hope my findings might help us begin to understand how different writers’ attitudes toward these issues register at the linguistic level, even when they use the same stock of plot points and characters. While a morally conservative writer like Jonson might condemn the coney-catchers and cross-dressers of the London underworld for wreaking havoc on the institutions of credit that undergird social commerce, Dekker seems more critical of the credit system itself. In its very structure—in its reliance on appearances—the system invites exploitation by those who are willing to play the game. We are able to see this critique coming through in these plays because, like an expert coney-catcher, Docuscope counts the tokens of texts’ identities, registering the affinities that are alternately hidden and revealed by the linguistic “clothing” they wear.

November 19, 2011
The comic ‘I’ and the tragic ‘we’?

In our Shakespeare Quarterly paper, we used Docuscope to come up with a description of Shakespeare’s comic language which centres on the rapid exchange of singular pronouns: I/you and my/your. We claimed there that Shakespearean comedies typically involve people arguing about things, striving to arrive at a ‘we’ of agreement, but not being able to until the final scene. Here’s what we said in more detail (we’re discussing Twelfth Night):

The quick trading of I/you and my/your strings in Comic dialogue suggests a world in which predicates are attached to subjects from two, and only two, points of view. This is not a universe of one; nor is it a crowd. It is not surprising that Comic plotting, built as it is on sexual pairings, would favor this type of bivalent, perspectival tagging of action by speakers. But there is something else going on here. Olivia is trying to make something happen in this exchange. She says, “do not extort thy reasons from this clause,” and earlier, “I would you were as I would have you be!” (3.1/1392, 1381). The “thy” and “you” are important because the speaker is trying to create or assert a particular interpretation of how these two individuals relate to one another (and the words exchanged between them). The essential drama in this situation is the asymmetry of desire that obtains between the two characters, an asymmetry that keeps Viola from assenting to Olivia’s advances. That resistance is actually what forces Olivia to make these statements that are rich with I/you and me/my, since she uses these words as anchors for a broader interpretation that does not yet obtain. She really wants to say we. And Cesario doesn’t, so they remainin I/you dialogue…

Shakespeare writes Comedies in which characters, sometimes quite perversely, find the wrong way to the ones they love. Often it is chance or an onstage helper who sorts this out. Shakespeare is actually quite reserved when it comes to showing love as naturally progressing through its obstacles unassisted. But given that in the initial stages of courtship Shakespearean lovers almost never meet and join in a perfectly symmetrical way—they don’t start out as stones set in an arch, leaning perfectly on a keystone—we should expect this asymmetry to show itself in the language. Where does it show up? It appears when a resistant individual, a “you,” prevents another “I” from arriving at an interpretation of a relationship that might be referred to as a “we” before others. Let’s call this the “resistant-you” hypothesis. Linguistically, the effect manifests itself in the assertion of the self (“FirstPerson”) and the rejection of suggested mental and emotional realities (“DenyDisclaim”).

We’ve been finding that high frequencies of first person pronouns, and other features associated with rapid dialogue, are characteristic of most types of Early Modern comedy. But what of the implied correlative to this? If comedies are the genre of ‘I’; are tragedies the genre of ‘we’?

A quick way to test this is to use Martin Mueller et al.’s excellent Wordhoard tool to run a log likelihood vocabulary test on Shakespeare’s comedies and tragedies. This type of test takes an analysis corpus (in this case Shakespeare’s comedies), and compares it to a reference corpus (Shakespeare’s tragedies). The output flags those words that are either more or less frequent in the analysis corpus than we would expect, given the frequencies found in the reference corpus.

The results in this case are as follows:

What we are interested in here is the list of lemmas in column 1: ‘she’, ‘I’, ‘master’, ‘a’, ‘sir’ etc; and the symbol in column 3 ‘Relative use’ – which tells us if the frequency is greater (+) or less (-) than expected. (Column 4 gives the log likelihood value, and a number of asterisks indicating degree of statistical significance, but all the results we are looking at here are highly significant, so we can ignore this.)

Behold: pronouns used more in the comedies than the tragedies are the singular ‘she’, ‘I’, ‘you’ (let’s assume these are mainly singular uses) – these are all marked + in column 3. Now look at the results for the plural pronouns ‘our’, ‘we’, ‘they’: all marked -, and so lowered in the comedies/raised in the tragedies.

This is a very strong finding (especially considering how frequent pronouns are), and it invites further exploration of the dialogic nature of comedy in comparison with the communal nature of tragedy.
jh/29.7.2011

July 29, 2011
Presentation at London Forum for Authorship Studies/Digital Text and Scholarship Seminar

Jonathan Hope and I presented here in London on a trip arranged by Brian Vickers and Willard McCarty. It was a lovely occasion held in Senate House, attended by some we knew and others we got to know. We began by rolling out paper copies — six feet long scrolls! — of the very large diagram that you saw in the last post. One of the things we have begun to discuss is the ways in which different forces seem to be expressed on various twigs of this dendrogram illustrating relationships among 318 early modern plays. On some twigs, everything that is being grouped together has a common author. On others, the situation is not so clear. Why, for example, aren’t there large groupings of texts written at the same time? (There are some smaller clusters of these.) The principle at work here, when texts are matched in terms of their distance scores on all of Docuscope’s available features (LATs), is that every type of difference present in the population being studies will be expressed in the result. The difficulty is disentangling which type of difference — generational, authorial, generic, company, etc. — is at work in a give grouping.

One thing we spent some time discussing yesterday was three clusters in which Jonson’s plays appear. Here they are below:

All of Jonson’s masques are clustered at the bottom of the diagram (except Cynthia’s Revels, which is clustered in the middle). These are possibly the most distinct items in the entire corpus we are currently working with. Notice how far right the cluster extends before joining with the rest of the diagram: this indicates its dissimilarity with other clusters. But notice too that, within this cluster (as Jonathan pointed out yesterday), there is also a lot of variation. Not only are Jonson’s masques very different from the rest of Renaissance drama (including several interludes), but they are quite different from one another. It’s like a galaxy that is far away from all of the others, but whose stars are themselves quite spread out.

So, what about the other two clusters? We decided to profile all three and came up with some interesting findings. First, the masques. After performing PCA and then rating the clusters on the different components, we found several that were quite good at isolating the items on particular twigs. (This is not a scientific procedure, but it is our first attempt.) With the masques, we found that the language is high in StandardsPositive, StandardsNegative, and ReportingStates. Here’s an exemplary passage, with both StandardsPositive and StandardsNegative in green, and Reporting States in purple:

Masques describe what you are seeing or have just seen in a comparatively static fashion, hence the reporting states. As Brian Vickers pointed out in the question period, the genre of encomium deals with praise and blame, which are the words that are being picked up in the positive and negative standards.

Compare this, now, to the profile of some of Jonson’s other comedies: Poetaster, Volpone, and the other items in the top group. These items are characterized by OralElement (yellow), Question (blue), Intensity (orange), and Person Property (purple):

Here we see a pattern we also saw in Shakespearean comedy: a lot of items associated with one to one interaction. The OralElement here marks the bustle of persons whose social function is marked (PersonProperty) and who are mixing in a state where contact must be established or maintained. Some of the satirical force of the scene is bundled into the intensity strings, which show the emphatic nature of certain social performances that are mannered and so open to mockery. We noticed these intensity strings in Middleton as well, which makes us suspect that a combination of PersonProperty strings and intensity might be a feature of City Comedy. Something to check out in the future.

What makes this top cluster different from the second? Different author? No. Different genre? Not really, at least, not according to the ones we recognize critically. And note too that there are multiple authors on this middle cluster: Chapman, Jonson and Fletcher. Perhaps we should be thinking in terms of modes instead of genres: is there a different mode of storytelling, dramaturgy, or conducting comic business here? When we use PCA to characterize this cluster and compare the results with those that characters the one at top, we find similarity and difference. What’s similar is the OralElement (yellow), Question (blue), and PersonProperty strings (purple):

But we now see strings associated with TimeShift (scarlet), which indicate that a person is marking the difference between two temporal frames (then/now, now/future), and here seems to be associated with figuring out what someone might do or bring about in the present or near future. Here they are anticipatory, looking at what is to come from the standpoint of the present. (In Shakespeare’s late plays, by contrast, we found that action from the past is frequently narrated from the standpoint of the present.) The other thing that is different in this cluster is something that we would never see, because it is not there. The plays in this cluster lack something:

These purple strings, which are classed as ReportingStates. They are tokens that occur frequently in this text — look at how many of them are in this play, which is from the second cluster — but as a whole the plays in this group lack these strings with respect to the larger population of early modern drama (whereas the top group did not). This kind of relative difference between generally quite frequent items is one that you could probably only grasp with the aid of statistics. We hypothesize that these strings are allowing the actors to report action that has taken place offstage in the past, keeping attention focused on the present which is hurtling forward in time. Should this be its own subgenre of Jonson that includes Fletcher and Chapman? Would it be worth naming a grouping like this? Another question for further study.

We received some terrific comments and questions. To our comment that the first Principal Component for this population does seem to track a broad and evolving temporal shift (plays score lower on the component as time goes on), Richard Proudfoot asked if there was more variation in the very early plays in our collection. This is indeed the case, and he followed with the point that we have an uncertain grip on this earlier population because little of it survives. Other explanations for wider variation in the pre-1590 items: English as a language is more fluid prior to 1600, as Jonathan pointed out. It may also be the case that the genre system itself has not stabilized because the professional theater is still gaining its footing in London.

Erica Fudge asked another interesting question: some of the comic strings associated with interaction and comedy (we showed our Shakespeare comedy results) reminded her of the writing in Montaigne. What, she asked, is the relationship between skepticism and comedy, and would we be interested in tracing the presence of something like a skeptical inclination across prose writing and drama. This is a very good question. I would hope that we could study, with these techniques, something like the “sentence level intellectual culture” of the period, one that extends across genres like drama and the essay. Like most of our presentations, we left with more questions and ideas about future experiments. This work seems to us to be provisional in a way that other humanities research is not. You get an idea, talk about it with others, try it, and then decide to try something else. Academic papers at humanities conferences, on the other hand, usually present findings with an air of categorical certainty. And yet, we know that when human beings are involved, all findings are provisional. Odd.

May 27, 2010