What happens in Hamlet?

We perform digital analysis on literary texts not to answer questions, but to generate questions. The questions digital analysis can answer are generally not ‘interesting’ in a humanist sense: but the questions digital analysis provokes often are. And these questions have to be answered by ‘traditional’ literary methods. Here’s an example.

Dr Farah Karim-Cooper, head of research at Shakespeare’s Globe just asked on Twitter if I had any suggestions for a lecture on Hamlet she was due to give. Ten minutes later I had some ‘interesting’ questions for her.

I began with Wordhoard‘s log-likelihood function, comparing Hamlet to the rest of Shakespeare’s plays. You can view the results of this as a tag cloud:


a tag cloud: looks good, immediate, doesn't tell you much

Tag cloud for Hamlet vs the rest of Shakespeare: black words are raised in frequency; grey words lowered; size indicates strength of effect











which is nice, but for real text analytics you need to read the spreadsheet of figures. Word-frequency analysis is limited in many ways, but it can surprise you if you look in the right places and at the right things.

not nice to look at, but much more information


When I run log-likelihood, I always look first for the items that are lower than expected, rather than those that are raised (which tend to be content words associated with the topic of the text, and thus fairly obvious). I also tend to look at function words (pronouns, articles, auxiliary verbs) rather than nouns or adjectives.

If you look for absences of high-frequency items, you are using digital text analysis to do the things it does best compared to human reading: picking up absence, and analysing high-frequency items. Humans are good at spotting the presence of low frequency items, items that disrupt a pattern (outliers, in statistical terms) – but we are not good at noticing things that are not there (dogs that don’t bark in the night) and we are not good at seeing woods (we see trees, especially unusual trees).

The Hamlet results were pretty outstanding in this respect: very high up the list, with 3 stars, indicating very strong statistical significance, is a minus result for the pronoun ‘I’. A check across the figures shows that ‘I’ occurs in Hamlet about 184 times every 10,000 words (see the column headed ‘Analysis parts per 10,000’ – Hamlet is the ‘analysis text’ here), whereas in the rest of Shakespeare it occurs about 228 times every 10,000 words (see the column headed ‘Reference parts per 10,000) – the reference corpus is the rest of Shakespeare) – so every 10,000 words in Hamlet have about 40 fewer ‘I’ pronouns than we’d expect.


Or, to put it another way, Shakespeare normally uses ‘I’ 228 times every 10,000 words. Hamlet is about 30,000 words long, so we’d expect, all other things being equal, that Shakespeare would use ‘I’ 684 times. In fact, he uses it just 546 times – and Wordhoard checks the figures to see if we could expect this drop due to chance or normal variation. The three stars next to the log likelihood score for ‘I’ tell us that this figure is very unlikely to be due to chance – something is causing the drop.

Digital analysis can’t explain the cause of the drop: the only question it is answering here is, ‘How frequently does Shakespeare use “I” in Hamlet compared to his other plays?’. On its own, this is not a very interesting question. But the analysis provokes the much more interesting question, ‘Why does Shakespeare use “I” far less frequently in Hamlet than normal?’.

Given literary-critical claims that Hamlet marks the birth of the modern consciousness, it is surprising to find a drop in the frequency of first-person forms. But for an explanation of why this might happen, you’ll have to attend Dr Karim-Cooper’s lecture, ask on Twitter: @DrFarahKC – or go back to the play yourself.




This entry was posted in Early Modern Drama, Shakespeare and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Comment

  1. Posted August 23, 2012 at 2:25 pm | Permalink

    This is really intriguing! One of my favorite teaching “tricks” (or “guided revelations”; that sounds better) is getting my students to notice that Hamlet never actually uses “I” in “To be or not to be,” so it’s fascinating that this is borne out on a larger scale.

2 Trackbacks

  • By Shakespeare by numbers « Tympan and Frisket on August 24, 2012 at 10:27 am

    […] WordHoard has been defended by those involved with its creation;  in his recent blog post, ‘What Happens in Hamlet‘, Dr Hope argues that the statistical analysis afforded by these powerful digital creations […]

  • […] identifying “hung” as the past participle of “hang”). And WordHoard, which Jonathan Hope recently used to provoke new questions about Hamlet (as my students did earlier this […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>