Category: Early Modern Drama

  • 2w90p[[8i[;;////////////////////////////////////////////[;

    591 Early Modern Dramas plotted in PCA space, with 'core' group circled and variation boundary marked (line)
    591 Early Modern Dramas plotted in PCA space, with ‘core’ group circled and variation boundary marked (line)

    American/Australian tour

    In March-April 2014, I’ll be in the USA giving a series of talks and conference presentations based around Visualising English Print, and our other work. In June I’ll be in Newcastle, Australia for the very exciting Beyond Authorship symposium.

    I’ll address a series of different themes in the talks, but I’ll use this page as a single resource for references, since they are all (in my head at least) related.

    Some of the talks will be theoretical/state of the field; some will be specific demonstrations of tools. The common thread is something like, ‘what do we think we are doing?’.

    Here’s a general introduction (there’s a list of venues afterwards).

     

    1 Counting things

    Quantification is certainly not new in literary criticism, but it is becoming more noticeable, and, perhaps, more central as critics analyse increasingly large corpora. The statistical tools we use to explore complex data sets (such as Shakespeare’s plays or 20,000 texts from EEBO-TCP) may appear like magical black boxes: feed in the numbers, print out the diagrams, wow your audience. But what is happening to our texts in those black boxes? Scary mathematical things we can’t hope to understand or critique?

    I want to consider the nature of the transformations we perform on texts when we subject them to statistical analysis. To some extent this is analogous to ‘traditional’ literary criticism: we have a text, and we identify other texts that are similar or different to it:

    How does Hamlet relate to other Early Modern tragedies?

    This is a question equally suited to quantitative digital analysis, and traditional literary critical approaches. The ways we define and approach our terms will differ between the two modes, as will the evidence employed, but essentially both answers to this question would involve comparison and assessment of degrees of similarity and difference.

    But there is also something very different to traditional literary criticism going on when we count things in texts and analyse the resulting spreadsheets – something literary scholars may feel unable to understand or critique. What exactly are we doing when we ‘project’ texts into hyper-dimensional spaces and use statistical tools to reduce those spaces down to something we can ‘read’ as humans?

    Perhaps surprisingly, studying library architecture, book history, information science, and cataloguing systems may help us to think about this. Libraries organised by subject ‘project’ their books into three-dimensional space, so that books with similar content are found next to each other. Many statistical procedures function similarly, projecting books into hyper-dimensional spaces, and then using distance metrics to identify proximity and distance within the complex mathematical spaces our analysis creates.

    Once we understand the geometry of statistical comparison, we can grasp the potential literary significance of the associations identified by counting – and we can begin to understand the difference between statistical significance and literary significance, and see that it is the job of the literary scholar, not the statistician, to decide on the latter. A result can be statistically significant, but of no interest in literary terms – and findings that do not qualify for statistical significance may be crucial for a literary argument.

     

    2 Evidence

    Ted Underwood has been posing lots of challenging, and productive, questions for literary scholars doing, or thinking about, digital work. Perhaps most significant is his recent suggestion that the digital causes problems for literary scholars, who are used to basing their arguments, and narratives, on ‘turning points’ and exceptions. Digital evidence, however, collected at scale, tells stories about continuity and gradual change. A possible implication of this is that the shift to digital analysis and evidence will fundamentally change the nature of literary studies, as we break away from a model that has arguably been with us only since the Romantics, and return (?) to one which traces long continuities in genre and form.

    One way of posing this question: does the availability of large digital corpora and tools put us at the dawn of a new world, or are we just in for more (a lot more) of the same?

     

    3 Dates and Venues

    28 March 2014: Renaissance Society of America Plenary Session: Current Trends in the Digital Renaissance (7.00-8.30pm); New York Hilton Midtown, Sutton Rooms: ‘Paradigm Shifts in British Renaissance Literature: The Digital Future’ #rsa14

    2 April 2014: CUNY Graduate Centre, 365 Fifth Avenue, New York (2.00-4.00pm); Room 6495: ‘Flatlands: book history, literary criticism, and hyper-dimensional geometry’

    7 April 2014: University of Pennsylvania,Digital Humanities Forum (12-1.30pm); Room 625-6 Penn Library (Registration required): ‘Visualising English Print’

    Graduate Class: Shakespeare and the History of the Book: ‘The Language of Macbeth‘; preparatory reading: ‘Macbeth language HW2014’  not a public event

    10 April 2014: Shakespeare Association of America, St Louis: 10-12 Seminar: ‘Shakespeare’s language: close and distant reading’; 12-1.30 and 3-6: Digital Projects Room: Visualising English Print; Translation Arrays (project demonstrations)

     

    4 References and resources (these are grouped by topic)

    (a) Statistics and hyper-dimensionality

    Mick Alt, 1990, Exploring Hyperspace: A Non-Mathematical Explanation of Multivariate Analysis (London: McGraw-Hill) – the best book on hyper-dimensionality in statistical analysis: short, clear, and conceptually focussed

    Most standard statistics textbooks give accounts of Principal Component Analysis (and Factor Analysis, to which it is closely related). We have found Andy Field, Discovering Statistics Using IBM SPSS Statistics: And Sex and Drugs and Rock and Roll (London: 2013, 4th ed.) useful.

    Curiously, Early Modern drama, in the shape of Shakespeare, has a significant history in attempts to imagine hyper-dimensional worlds. E.A. Abbott, the author of A Shakespearian Grammar (London, 1870), also wrote Flatland: A Romance of Many Dimensions (London, 1884), an early science fiction work full of Shakespeare references and set in a two-dimensional universe.

    Flatland_cover

    The significance of Flatland to many who work in higher-dimensional geometry is shown by a recent scholarly edition sponsored by the Mathematical Association of America (Cambridge, 2010 – editors William F. Lindgren and Thomas F. Banchoff), and its use in physicist Lisa Randall’s account of theories of multiple dimensionality, Warped Passageways (New York, 2005), pages 11-28 (musical interlude: Dopplereffekt performing Calabi-Yau Space – which refers to a theory of hyper-dimensionality).

    Flatland itself is the subject of a conceptual, dimensional transformation at the hands of poet/artist Derek Beaulieu:

    Derek Beaulieu, 2007, Flatland: a romance of many dimensions (York: information as material)

     

    (b) Libraries and information science

    In thinking about the physical development of libraries, I have enjoyed

    James W.P. Campbell and Will Pryce, 2013, The Library: A World History (London: Thames and Hudson) [- a beautiful book, and images are available on Will Pryce’s blog]

    and

    Richard Gameson, 2006, ‘The medieval library (to c. 1450)’, Clare Sargent, 2006, ‘The early modern library (to c. 1640), and David McKitterick, 2006, ‘Libraries and the organisation of knowledge’, in Elizabeth Leedham-Green and Teressa Webber (eds), The Cambridge History of Libraries in Britain and Ireland vol. 1, ‘To 1640’, pp. 13-50, 51-65, and 592-615

    and also, on real and imagined libraries:

    Craig Dworkin, 2010, The Perverse Library (York: information as material)

    Alec Finlay, 2001, The Libraries of Thought and Imagination (Edinburgh: Polygon Pocketbooks)

    Alberto Manguel, 2006, The Library at Night (New Haven: Yale)

    Roberto Bolaño, 2008 [1996], Nazi Literature in the Americas (New York: New Directions)

    Jane Rickard, 2013, ‘Imagining the early modern library: Ben Jonson and his contemporaries’ (unpublished paper presented at Strathclyde University Languages and Literatures Seminar)

     

    On data, information management and catalogues:

    Ann M. Blair, 2010, Too Much To Know: Managing Scholarly Information before the Modern Age (New Haven: Yale)

    Markus Krajewski, 2011, Paper Machines: About Cards & Catalogs 1548-1929 (Cambridge, MSS: MIT) – on Conrad Gessner

    Daniel Rosenberg, 2013, ‘Data before the Fact’, in Lisa Gitelman (ed), Raw Data is an Oxymoron (Cambridge, MSS: MIT), pp. 15-40 – combines digital analysis with a historicisation of the field, and the notion of ‘data’

     

    (c) Ted Underwood and the digital future

    Ted Underwood, 2013, Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (Stanford) – especially chapter 6, ‘Digital Humanities and the Future of Literary History’, pp. 157-75 – on the strange commitment to discontinuity in literary studies, and the tendency of digital/at scale work to dissolve this into a picture of gradualism – Underwood cites his own work as an e.g. of the resistance scholars using quantification find within themselves to gradualism – and notes the temptation to seek fracture/outlier/turning point narrative

    (also see Underwood’s discussion with Andrew Piper on Piper’s blog: http://bookwasthere.org/?p=1571 – balancing numbers and literary analysis – and Andrew Piper, 2012, Book There Was: Reading in Electronic Times (Chicago) – see chapter 7, ‘By The Numbers’ on computation, DH).

    Ted Underwood and Jordan Sellers, 2012, ‘The Emergence of Literary Diction’, Journal of Digital Humanities, 1.2 (Underwood 2013: 166-70 discusses this paper as an example of the pull to ‘event’ narrative in literary history, despite the gradualism in quantitative work).

     

    Also related:  Underwood’s blog: ‘The Stone and the Shell’ http://tedunderwood.com

    Scottbot ‘Bridging Token and Type’ http://www.scottbot.net/HIAL/?p=40088

     

    ‘longue durée’ History – Underwood has suggested that historians are more comfortable than literary scholars with the ‘long view’ that tends to come with digital evidence, and David Armitage and Jo Guldi have been arguing that the digital is shifting history back to this mode:

    David Armitage and Jo Guldi, 2014, ‘The Return of the Longue Durée: An Anglo-American Perspective’, forthcoming (in French) in Annales. Histoire Sciences sociales, 69 [English version: http://scholar.harvard.edu/files/armitage/files/rld_annales_revised_0.pdf]

    David Armitage, 2012, ‘What’s the big idea? Intellectual history and the longue durée’, History of European Ideas, 38.4, pp. 493-507

     

    (d) Overview/examples of Digital work:

    Early Modern Digital Agendas was an NEH-funded Institute held at the Folger Shakespeare Library in 2013. The EMDA website has an extensive list of resources for Digital work focussed on the Early Modern period.

    Text

    An excellent account of starting text-analytic work by a newcomer to the field:

    http://earlymodernconversions.com/computer-based-textual-analysis-and-early-modern-literature-notes-on-some-recent-research/

     

    Network

    http://sixdegreesoffrancisbacon.com

     

    Geo-spatial

    An example of an info-heavy, ‘reference’ site that makes excellent use of maps – The Museum of the Scottish Shale Oil Industry (!):  http://www.scottishshale.co.uk

    http://mapoflondon.uvic.ca

     

    Image

    British Printed Images to 1700

    Large number of heavy-weight funders/participants

    Bpi1700 makes a database of  ‘thousands’ of prints and book illustrations available ‘in fully-searchable form’. However, searching is text-based (see http://www.bpi1700.org.uk/jsp/)

    Development halted?  ‘Although the main development work has been completed, improvements will continue to be made from time to time. If you have problems or suggestions please contact the project (see the ‘contact’ page).’                             http://www.bpi1700.org.uk/index.html

    ‘Print of the month’ ended May/June 2009 http://www.bpi1700.org.uk/research/printOfTheMonth/print.html

     

    Japanese woodblock prints                                                 http://ukiyo-e.org

    The Ukiyo-e Search site is an amazing resource that represents something genuinely new (rather than just an extension of previously existing word-based catalogue searching), in that it allows searching via an uploaded image. For example, a researcher can upload a phone-image of a print she discovers in a library, and see if the same/similar prints have been previously described, and how many other libraries have copies or versions of the print. The search is ‘fuzzy’ and will often detect different states of altered woodblocks. [Thanks to @GilesBergel for the news that a similar functionality is coming to the Bodleian Ballads project.]

    ‘About’ page with demonstration video:                http://ukiyo-e.org/about

    The Ukiyo-e site was created by one person, John Resig, an enthusiast for Ukiyo-e, who saw the need for the site as a research tool. Development and expansion on-going.

    ‘The database currently contains over 213,000 prints from 24 institutions and, as of September 2013, has received 3.4 million page views from 150,000 people.’                                                                                                                         http://ukiyo-e.org/about

     

    And finally, pictures of my kittens Arthur and Gracie, who will feature in the talks:

    IMG_0827
    Arthur can work a computer (he wrote the title of this post).

     

     

     

    Arthur and Gracie
    Arthur and Gracie
  • Macbeth: The State of Play

    We have a new chapter on the language of Macbeth which appears in this book from Arden. The chapter surveys previous work on the language of the play, and then offers some new analysis we’ve done, chiefly using WordHoard. Along the way, we consider the role of word frequency in literary analysis, and especially the word ‘the’ in Macbeth (we also think about word frequency in this post). Of course you are going to buy the book, which is currently (February 2014) available at a reduced price at the link above, but here is a pre-print of our chapter.

    UPDATE: ‘the’ is attracting a lot of attention. Here is Bill Benzon discussing Matt Jockers’ discussion of it in Macroanalysis, and here is Mark Liberman responding, with references to other work on different rates of ‘the’ in language.

    9781472503206

    INTRODUCTION Ann ThompsonTHE TEXT AND ITS STATUS

    Notes and Queries Concerning the Text of Macbeth Anthony B. Dawson

    Dwelling ‘in doubtful joy’: Macbeth and the Aesthetics of Disappointment Brett Gamboa

     

    HISTORY AND TOPICALITY

    Politic Bodies in Macbeth Dermot Cavanagh

    ‘To crown my thoughts with acts’: Prophecy and Prescription in Macbeth Debapriya Sarkar

    Lady Macbeth, First Ladies and the Arab Spring: The Performance of Power on the Twenty-First Century Stage Kevin A. Quarmby

     

    CRITICAL APPROACHES AND CLOSE READING

    ‘A walking shadow’: Place, Perception and Disorientation in Macbeth Darlene Farabee

    Cookery and Witchcraft in Macbeth Geraldo U. de Sousa

    The Language of Macbeth Jonathan Hope and Michael Witmore

     

    ADAPTATION AND AFTERLIFE

    The Shapes of Macbeth: The Staged Text Sandra Clark

    Raising the Violence while Lowering the Stakes: Geoffrey Wright’s Screen Adaptation of Macbeth Philippa Sheppard

    The Butcher and the Text: Adaptation, Theatricality and the ‘Shakespea(Re)-Told’ Macbeth Ramona Wray

  • Fuzzy Structuralism

    Several years ago I did some experiments with Franco Moretti, Matt Jockers, Sarah Allison and Ryan Heuser on a set of Victorian novels, experiments that developed into the first pamphlet issued by the Stanford Literary Lab. Having never tried Docuscope on anything but Shakespeare, I was curious to see how the program would perform on other texts. Looking back on that work, which began with a comparison of tagging techniques using Shakespeare’s plays, I think the group’s most important finding was that different tagging schemes can produce convergent results. By counting different things in the texts – strings that Docuscope tags and, alternatively, words that occur with high frequency (most frequent words) – we were able to arrive at similar groupings of texts using different methods. The fact that literary genres could be rendered according to multiple tagging schemes sparked the idea that genre was not a random projection of whatever we had decided to count. What we began to think as we compared methods, and it is as exciting a thought now as it was then, was that genre was something real.

    Real as an iceberg, perhaps, genre may have underwater contours that are invisible but mappable with complementary techniques. Without delving too deeply into the specifics of the pamphlet, I’d like to sketch its findings and then discuss them in some of the terms I outlined in the previous post on critical gestures. First the preliminaries. In the initial experiment, we established a corpus (the Globe Shakespeare) and then used two tagging schemes to assign the tokens into those documents to a smaller number of types. (This is the crucial step of reducing the dimensionality of the documents, or “caricaturing” them.) The first tagging scheme, Docuscope, rendered the plays as percentage scores on the types it counts; the second, implemented by Jockers, identified the most frequent words (MFWs) in the corpus and likewise used these as the types or variables for analysis.

    What we found was that the circles drawn by critics around these texts – circles here bounding different genres – could be reproduced by multiple means. Docuscope’s hand-curated tagging scheme did a fairly good job of reproducing the genre groupings via an unsupervised clustering algorithm, but so did the MFWs. We were excited by these results, but also cautious. Perhaps the words counted by Docuscope might include the very MFWs that were producing such good results in the parallel trial, which would mean we were working with one tokenization scheme rather than two. Subsequent experiments on Victorian novels curated by the Stanford team – for example, a comparison of the Gothic Novel versus Jacobin (see pp. 20-23) – showed that Docuscope was adding something over and above what was offered by counting MFWs. MFWs such as “was,” “had,” “who,” and “she,” for example were quite good at pulling these two groups apart when used as variables in an unsupervised analysis. But these high frequency words, even when they composed some of the Docuscope types that were helpful in sorting the genres, were correlated with other text strings that were more narrative in character, phrases such as “heard the,” “reached the,” and “commanded the.” So while we had some overlap in the two tagging schemes, what they shared did not explain the complementary sorting power each seemed to bring to the analysis. The rhetorical and semantic layers picked out in Docuscope were, so to speak, doing something alongside the more syntactically important function words that occur in texts with such high frequency.

    The nature of that parallelism or convergence continues to be an interesting subject for thought as we discover more tagging schemes and contemplate making our own. Discussions in the NEH sponsored Early Modern Digital Agendas workshop at the Folger, some of which I have been lucky enough to attend, have pushed Hope and me to return to the issue of convergence and think about it again, especially as we think about how our research project, Visualizing English Print, 1470-1800, might implement new tagging schemes. If MFWs produce viable syntactical criteria for sorting texts, why would this “layer” of syntax be reliably coordinated with another, Docuscope-visible layer that is more obviously semantic or rhetorical? If different tagging schemes can produce convergent results, is it because they are invoking two perspectives on a single entity?

    Because one doesn’t get completely different groupings of texts each time one counts new things, we must posit the existence of something underneath all the variation, something that can be differently “sounded” by counting different things. The main attribute of this entity is its capacity to encourage or limit certain sorts of linguistic entailments. As I think back on how the argument developed in the Stanford paper with Moretti et al., the crucial moment came when we found that we could describe the Gothic novel as having both more spatial prepositions (“from,” “on,” “in,” “to”) and more narrative verb phrases (“heard the,” “reached the”) than the Jacobin novel. Our next move was to begin asking whether either of the tagging schemes was picking out a more foundational or structural layer of the text – whether, for example, the decision to use a certain type of narrative convention and, so, narrative phrase, entailed the use of corresponding spatial prepositions. As soon as the word “structural” appeared, I think everyone’s heart began to beat a little faster. But why? What is so special about the word “structural,” and what does it mean?

    In the context of this experiment, I think “structural” means “is the source of the entailment;” its use, moreover, suggests that the entailment has direction. We (the authors of the Stanford paper) were claiming that, in deciding to honor the plot conventions of a particular generic type, the writer of a Gothic novel had already committed him or herself to using certain types of very frequent words that critics tend to ignore. The structure or plot was obligating, perhaps in an unconscious way.

    I think now that I would pause before using the word “structure,” a word used liberally in that paper, not because I don’t think there is such a thing, but because I don’t know if it is one or many things. Jonathan Hope and I have been looking for a term to describe the entailments that are the focus of our digital work. We have chose to adopt, in this context, a deliberately “fuzzy structuralism” when talking about entailments among features in texts. We would prefer to say, that is, that the presence of one type of token (spatial preposition) seems to entail the presence of another type (narrative verb phrases), and remain agnostic about the direction of the entailment. Statistical analysis provides evidence of that relationship, and it is the first order of iterative criticism to describe such entailments, both exhaustively (by laying bare the corpus, counts, and classifying techniques) and descriptively (by identifying, through statistical means, passages that exemplify the variables that classify the texts most powerfully). Just as important, we feel one ought where possible to assign a shorthand name – “Gothicness,” “Shakespearean” – to the features that help sort certain kinds of texts. In doing so, we begin to build a bridge connecting our linguistic description to certain already known genre conventions that critics recognize or “circle” in their own thinking. But the application of the term”Gothic,” and the further claim that this names the cause of the entailments we discern by multiple means, deserves careful scrutiny.

    A series of questions about this entailment entity, then, which sits just under the waterline of our immediate reading:

    • How does entailment work? This is a very important question, since it gets at the problem of layers and depth. At one point in the work with the Stanford team, Ryan Heuser offered the powerful analogy alluded to above: genre is like an iceberg, with features visible above the water but depths unseen below. Plot, we all agreed, is an above the waterline phenomenon, whereas MFW word use and certain semantic choices are submerged below the threshold of conscious attention. In the article we say that the below-the-waterline phenomena sounded by our tagging schemes are entailed by the “higher order” choices made when the writer decided to write a “Gothic novel” or “history play.” I still like this idea, but worry it might suggest that all features of genre are the result of some governing, genre-conscious choice. What if some writers, in learning to mimic other writers, take sentence level cues and work “upward” from there? Couldn’t there be some kind of semi-conscious or sentence-based absorption of literary conventions that is specifically not a mimicry of plot?

    • Are the entailments pyramidal, with a governing apex at the top, or are they multi-nodal and so radiating from different points within the entity? I can see how syntax, which is mediated by function or high-frequency words, is closely tied to certain higher order choices. If I want to write stories about lovers who don’t get along, this will entail using a lot of singular pronouns in the first and second person alongside words that support mutual misunderstanding. There is a relationship of entailment between these two things, and the source of that entailment is often called “plot” or “genre.” Here again we are at an interpretive turning point, since the names applied to types of texts are as fluid, at least potentially, as those assigned to types of words. Such names can be misleading. Suppose, for example, that I have identified the distinct signature of something like a “Shakespearean sentence,” and that this signature is apparent in all of Shakespeare’s plays. (An author-specific linguistic feature set was created for J. K. Rowling just last week.) Suppose further that, as Shakespeare is almost singlehandedly launching the history play as a theatrical genre in the 1590s, this authorial feature propagates alongside the plot-level features he establishes for the genre. Now someone shows that this Shakespearean sentence signature is reliably present in most plays that critics now call histories. Is that entailment upheld by the force of genre or authorship? The question would be just as hard to answer if we noticed that the generic signal of history plays spans the rest of Shakespeare’s writing and is a useful feature for differentiating his works from those of other authors.

    • If entailments can be resolved at varying depths of field, like the two cats below, which are simultaneously resolved by the Lytro Camera at multiple focal lengths, how can we be sure that they are individual pieces of a single entity or scene? Different tagging schemes support the same groupings of texts, so there must be something specific “there” to be tagged which has definite contours. I remain astonished that the groupings derived from tagging schemes like Docuscope and MFWs correspond to names we use in literary criticism, names that designate authors and genres of fiction. But entailments are plural: some seem to correspond to what we call authorship, others  genre, and perhaps still others to the medium itself (the small twelvemo, for example, often contains different kinds of words than those found in the larger folio format). There are biological constraints on how long we can attend to a single sentence. The nature and source of these entailments has thus got to be the subject of ongoing study, one that bridges a range of fields as wide as there are forces that constrain language use.

    Entailment is real; it suggests an entity. But how should we describe that entity, and with what terms or analogies can its depths be resolved? Sometimes there may be multiple cats, sitting apart in the same room. Sometimes what seems like two icebergs may in fact be one.

    Image from the Lytro Camera resolving objects at multiple depths

     

     

  • What happens in Hamlet?

    We perform digital analysis on literary texts not to answer questions, but to generate questions. The questions digital analysis can answer are generally not ‘interesting’ in a humanist sense: but the questions digital analysis provokes often are. And these questions have to be answered by ‘traditional’ literary methods. Here’s an example.

    Dr Farah Karim-Cooper, head of research at Shakespeare’s Globe just asked on Twitter if I had any suggestions for a lecture on Hamlet she was due to give. Ten minutes later I had some ‘interesting’ questions for her.

    I began with Wordhoard‘s log-likelihood function, comparing Hamlet to the rest of Shakespeare’s plays. You can view the results of this as a tag cloud:

     

    a tag cloud: looks good, immediate, doesn't tell you much
    Tag cloud for Hamlet vs the rest of Shakespeare: black words are raised in frequency; grey words lowered; size indicates strength of effect

     

     

     

     

     

     

     

     

     

     

    which is nice, but for real text analytics you need to read the spreadsheet of figures. Word-frequency analysis is limited in many ways, but it can surprise you if you look in the right places and at the right things.

    not nice to look at, but much more information

     

    When I run log-likelihood, I always look first for the items that are lower than expected, rather than those that are raised (which tend to be content words associated with the topic of the text, and thus fairly obvious). I also tend to look at function words (pronouns, articles, auxiliary verbs) rather than nouns or adjectives.

    If you look for absences of high-frequency items, you are using digital text analysis to do the things it does best compared to human reading: picking up absence, and analysing high-frequency items. Humans are good at spotting the presence of low frequency items, items that disrupt a pattern (outliers, in statistical terms) – but we are not good at noticing things that are not there (dogs that don’t bark in the night) and we are not good at seeing woods (we see trees, especially unusual trees).

    The Hamlet results were pretty outstanding in this respect: very high up the list, with 3 stars, indicating very strong statistical significance, is a minus result for the pronoun ‘I’. A check across the figures shows that ‘I’ occurs in Hamlet about 184 times every 10,000 words (see the column headed ‘Analysis parts per 10,000’ – Hamlet is the ‘analysis text’ here), whereas in the rest of Shakespeare it occurs about 228 times every 10,000 words (see the column headed ‘Reference parts per 10,000) – the reference corpus is the rest of Shakespeare) – so every 10,000 words in Hamlet have about 40 fewer ‘I’ pronouns than we’d expect.

     

    Or, to put it another way, Shakespeare normally uses ‘I’ 228 times every 10,000 words. Hamlet is about 30,000 words long, so we’d expect, all other things being equal, that Shakespeare would use ‘I’ 684 times. In fact, he uses it just 546 times – and Wordhoard checks the figures to see if we could expect this drop due to chance or normal variation. The three stars next to the log likelihood score for ‘I’ tell us that this figure is very unlikely to be due to chance – something is causing the drop.

    Digital analysis can’t explain the cause of the drop: the only question it is answering here is, ‘How frequently does Shakespeare use “I” in Hamlet compared to his other plays?’. On its own, this is not a very interesting question. But the analysis provokes the much more interesting question, ‘Why does Shakespeare use “I” far less frequently in Hamlet than normal?’.

    Given literary-critical claims that Hamlet marks the birth of the modern consciousness, it is surprising to find a drop in the frequency of first-person forms. But for an explanation of why this might happen, you’ll have to attend Dr Karim-Cooper’s lecture, ask on Twitter: @DrFarahKC – or go back to the play yourself.

     

     

     

  • Shakespeare’s mythic vocabulary – and his invisible grammar

    Universities in the UK are under pressure to demonstrate the ‘impact’ of their research. In many ways, this is fair enough: public taxes account for the vast majority of UK University income, so it is reasonable for the public to expect academics to attempt to communicate with them about their work.

    University press offices have become more pro-active in seeking out stories to present to the media as a way of raising the profile of institutions. Recently, the Strathclyde press office contacted me after they read one of my papers on Strathclyde’s internal research database: they wanted to do a press release to see if any outlets would follow-up on the story.

    The paper they’d read was a survey article I’d written for an Open University course reader. My article reported recent papers by Hugh Craig and Ward Elliott & Robert Valenza, which demolish some common myths about Shakespeare’s vocabulary (its size and originality – and see Holger Syme on this too) – and went on to argue that Shakespeare’s originality might lie in his grammar, rather than in the words he does not make up.

    Indeed they did want to pick up on the story, though I’d have preferred the article to have been a bit clearer, and not to have had a headline that was linguistic nonsense. The Huffington Post did a bit better.

    One particularly galling aspect of the stories: the articles failed to attribute the work on Shakespeare’s vocabulary to Craig or Elliott and Valenza, so it might have looked as though I was taking credit for other people’s work

    Looking back, I don’t think I explained my ideas very well either to Strathclyde’s press office, or to the Daily Telegraph when they rang – hence the rather confused reports. But I was extremely careful to attribute the work to those who had done it – even to the point of sending my original text to the journalist I talked to, and pointing him to the relevant footnote. I did not expect a news story to contain full academic references of course – but a clearly written story could easily have mentioned the originators of the work.

    A minor episode, but it also made me think that there is a fundamental problem with trying to explain complex linguistic issues in the daily press – even if you use Newcastle United’s greatest goalscorers to illustrate the statistics. They want a clear story: you want to get the nuances across. Luckily, this blog allows me to make the full text of my article available (click through twice for a pdf of my article):

    Shakespeare and the English Language

     

    Jonathan Hope, Strathclyde University, Glasgow, February 2012