The very strange language of A Midsummer Night’s Dream

I just got back from a fun and very educative trip to Shakespeare’s Globe in London, hosted by Dr Farah Karim-Cooper, who is director of research there.

The Globe stages an annual production aimed at schools (45,000 free tickets have been distributed over the past five years), and this year’s play is A Midsummer Night’s Dream. I was invited down to discuss the language of the play with the cast and crew as they begin rehearsals.

This was a fascinating opportunity for me to test our visualisation tools and analysis on a non-academic audience – and the discussions I had with the actors opened my eyes to applications of the tools we haven’t considered before. They also came up with a series of sharp observations about the language of the play in response to the linguistic analysis.

I began with a tool developed by Martin Mueller’s team at Northwestern University: Wordhoard, as a way of getting a quick overview of the lexical patterns in the play, and introducing people to thinking statistically about language.

Here’s the wordcloud Wordhoard generates for a loglikelihood analysis of MSND compared with the whole Shakespeare corpus:


Loglikelihood takes the frequencies of words in one text (in this case MSND) and compares them with the frequencies of words in a comparison, or reference, sample (in this case, the whole Shakespeare corpus). It identifies the words that are used significantly more or less frequently in the analysis text than would be expected given the frequencies found in the comparison sample. In the wordcloud, the size of a word indicates how strongly its frequency departs from the expected. Words in black appear more frequently than we would expect, and words in grey appear less frequently.

As is generally the case with loglikelihood tests, the words showing the most powerful effects here are nouns associated with significant plot elements: ‘fairy’, ‘wall’, ‘moon’, ‘lion’ etc. If you’ve read the play, it is not hard to explain why these words are used in MSND more than in the rest of Shakespeare – and you really don’t need a computer, or complex statistics, to tell you that. To paraphrase Basil Fawlty, so far, so bleeding obvious.

Where loglikelihood results normally get more interesting – or puzzling – is in results for function words (pronouns, auxiliary verbs, prepositions, conjunctions) and in those words that are significantly less frequent than you’d expect.

Here we can see some surprising results: why does Shakespeare use ‘through’ far more frequently in this play than elsewhere? Why are the masculine pronouns ‘he’ and ‘his’ used less frequently? (And is this linked to the low use of ‘lord’?) Why is ‘it’ rare in the play? And ‘they’ and ‘who’ and ‘of’?

At this stage we started to look at our results from Docuscope for the play, visualised using Anupam Basu’s LATtice.



The heatmap shows all of the folio plays compared to each other: the darker a square is, the more similar the plays are linguistically. The diagonal of black squares running from bottom left to top right marks the points in the map where plays are ‘compared’ to themselves: the black indicates identity. Plays are arranged up the left hand side of the square in ascending chronological order from Comedy of Errors at the bottom to Henry VIII at the top – the sequence then repeats across the top from left to right – so the black square at the bottom left is Comedy of Errors compared to itself, while the black square at the top right is Henry VIII.

One of the first things we noticed when Anupam produced this heatmap was the two plays which stand out as being unlike almost all of the others, producing four distinct light lines which divide the square of the map almost into nine equal smaller squares:


These two anomalous plays are Merry Wives of Windsor (here outlined in blue) and A Midsummer Night’s Dream (yellow). It is not so surprising to find Wives standing out, given the frequent critical observation that this play is generically and linguistically unusual for Shakespeare: but A Midsummer Night’s Dream is a result we certainly would not have predicted.

This visualisation of difference certainly caught the actors’ attention, and they immediately focussed in on the very white square about 2/3 of the way along the MSND line (here picked out in yellow):


So which play is MSND even less like than all of the others? A tragedy? A history? Again, the answer is not one we’d have guessed: Measure for Measure.

This is a good example of how a visualisation can alert you to a surprising finding. We would never have intuited that MSND was anomalous linguistically without this heatmap. It is also a good example of how visualisations should send you back to the data: we now need to investigate the language of MSND to explain what it is that Shakespeare does, or does not do, in this play that makes it stand out so clearly. The visualisation is striking – and it allowed the cast members to identify an interesting problem very quickly – but the visualisation doesn’t give us an explanation for the result. For that we need to dig a bit deeper.

One of the most useful features of LATtice is the bottom right window, which identifies the LATs that account for the most distance between two texts:


This is a very quick way of finding out what is going on – and here the results point us to two LATs which are much more frequent in MSND than Measure for Measure: SenseObject and SenseProperty. SenseObject picks up concrete nouns, while SenseProperty codes for adjectives describing their properties. A quick trip to the LATice box plot screen (on the left of these windows):


confirms that MSND (red dots) is right at the top end of the Shakespeare canon for these LATs (another surprise, since we’ve got used to thinking of these LATs as characteristic of History), while Measure for Measure (blue dots) has the lowest rates in Shakespeare for these LATs.

So Docuscope findings suggest that MSND is a play concerned with concrete objects and their descriptions – another counter-intuitive finding given the associations most of us have with the supposed ethereal, fairy, dream-like atmosphere of the play. Cast members were fascinated by this and its possible implications for how they should use props – and someone also pointed out that many of the names in the play are concrete nouns (Quince, Bottom, Flute, Snout, Peaseblossom, Cobweb, Mote and so on) – what is the effect on the audience of this constant linguistic wash of ‘things’?

Here is a screenshot from Docuscope with SenseObject and SenseProperty tokens underlined in yellow. Reading these tokens in context, you realise that many of these concrete objects and qualities, in this section at least, are fictional in the world of the play. A wall is evoked – but it is one in a play, represented by a man. Despite the frequency of SenseObject in this play, we should be wary of assuming that this implies the straightforward evocation of a concrete reality (try clicking if you need to enlarge):


Also raised in MSND are LATs to do with locating and describing space: Motions and SpaceRelations (as suggested by our loglikelihood finding for ‘through’?). So accompanying a focus on things, is a focus on describing location, and movement – perhaps, someone suggested, because the characters are often so unsure of their location? (In the following screenshot, Motions and SpatialRelation tokens are underlined in yellow.)



Moving on, we also looked at those LATs that are relatively absent from MSND – and here the findings were very interesting indeed. We have seen that MSND does not pattern like a comedy – and the main reason for this is that it lacks the highly interactive language we expect in Shakespearean comedy: DirectAddress and Question are lowered. So too are PersonPronoun (which picks up third person pronouns, and matches our loglikelihood finding for ‘he’ and ‘his’), and FirstPerson – indeed, all types of pronoun are less frequent in the play than is normal for Shakespeare. At this point one of the actors suggested that the lack of pronouns might be because full names are used constantly – she’d noticed in rehearsal how often she was using characters’ names – and we wondered if this was because the play’s characters are so frequently uncertain of their own, and others’ identity.

Also lowered in the play is PersonProperty, the LAT which picks up familial roles (‘father’, ‘mother’, ‘sister’ etc) and social ones (job titles) – if you add this to the lowered rate of pronouns, then a rather strange social world starts to emerge, one lacking the normal points of orientation (and the play is also low on CommonAuthority, which picks up appeals to external structures of social authority – the law, God, and so on).

The visualisation, and Docuscope screens, provoked a discussion I found fascinating: we agreed that the action of the play seems to exist in an eternal present. There seems to be little sense of future or past (appropriately for a dream) – and this ties in with the relative absence of LATs coding for past tense and looking back. As the LATtice heatmap first indicated, MSND is unlike any of the recognised Shakespearean genres – but digging into the data shows that it is unlike them in different ways:

  • It is unlike comedy in its lack of features associated with verbal interaction
  • It is unlike tragedy in its lack of first person forms (though it is perhaps more like tragedy than any other genre)
  • It is unlike history in its lack of CommonAuthority

Waiting for my train back to Glasgow (at the excellent Euston Tap bar near Euston Station), I tried to summarize our findings in four tweets (read them from the bottom, up!):



I’ll try to keep in touch with the actors as they rehearse the play – this was a lesson for me in using the tools to spark an investigation into Shakespeare’s language, and I can now see that we could adapt these tools to various educational settings (including schools and rehearsal rooms!).

Jonathan Hope February 2012

This entry was posted in Early Modern Drama, Shakespeare, Uncategorized and tagged , , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted February 7, 2012 at 3:33 pm | Permalink

    Hi: Latest posting on MSND is fascinating but none of the embedded graphics except the Lattice plots are showing up – I’ve tried three different browsers and both a Mac and PC …
    can you fix, please?

  2. Colin
    Posted February 8, 2012 at 10:55 pm | Permalink

    Delightful article and project.

    I think some of Midsummer’s anomalies can be accounted for by the play within the play, which appears twice in the text–once in rehearsal and again in performance–and therefore has a disproportionate influence on the word count. The dialogue of “Pyramus and Thisby,” happens to lean heavily on the words “wall,” “moon,” “lion” and “through,” some of them repeated parodically (“O Wall, O sweet and lovely Wall”). Their large presence may be less a psychological clue than a byproduct of one of the play’s running jokes.

    If that material were not repeated, would the rest of the play remain so unusual in the canon, I wonder?

  3. Mike Gleicher, Jonathan Hope & Michael Witmore
    Posted February 14, 2012 at 5:02 am | Permalink

    sorry david (and anyone else who had problems) – this was due to me using tiff files apparently – should be sorted now

  4. Posted April 17, 2012 at 8:54 pm | Permalink

    Hi Jonathan,

    Is “educative” really a word? 🙂

    Seriously, TOTALLY fascinating post and blog. Love how something seemingly so abstract can have practical consequences for the actors.

    Great stuff, thanks for writing!

6 Trackbacks

  • By On blogging in the Digital Humanities | Michael Ullyot on February 25, 2012 at 1:58 am

    […] This post by Jonathan Hope uses WordHoard and DocuScope, two text-analysis programs, and a really promising visualization program called LATtice that I want to try. But it’s how Hope uses them that’s really interesting: he combines a number of tools to investigate some queries that actors at Shakespeare’s Globe Theatre posed when he visited them this month, and to consider how he could adapt them for different educational purposes. It’s a model of DH blogging: immediate, multi-pronged, sparked by an observation/query, driven by curiosity, transparent about tools and results, and open-ended. […]

  • […] approaches adapt to new media. In Jonathan Hope’s algorithmic criticism, Shakespeare’s folio plays no longer fit the traditional definition of ‘text’, but rather resemble a pointillist painting, […]

  • […] like ‘drown’ ‘island’ ‘isle’ ‘fish’ and ‘sea’ are more likely to appear – but you really don’t need a computer, or complex statistics to tell you that, as Jonathan Hope points out. Digital tools that count things are much better suited to projects […]

  • […] as visual and audio files, so must our critical approaches adapt to new media. For instance, in Jonathan Hope‘s algorithmic criticism, Shakespeare’s plays no longer fit the traditional definition […]

  • […] them to grapple with image, sound, and video alongside (or in place of) the written word. In Jonathan Hope‘s algorithmic criticism, Shakespeare’s plays no longer fit the old definition of text, […]

  • […] them to grapple with image, sound, and video alongside (or in place of) the written word. In Jonathan Hope‘s algorithmic criticism, Shakespeare’s plays no longer fit the old definition of text, […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>