Quantification and the language of later Shakespeare



The written version of a paper we gave in Paris last year (2013) has just been published by the Société française Shakespeare. Here is the paper (which is in English), and here are the citation details:

Pour citer cet article

Référence papier

Jonathan Hope et Michael Witmore, « Quantification and the language of later Shakespeare », Actes des congrès de la Société française Shakespeare, 31 | 2014, 123-149.

Référence électronique

Jonathan Hope et Michael Witmore, « Quantification and the language of later Shakespeare », Actes des congrès de la Société française Shakespeare [En ligne], 31 | 2014, mis en ligne le 29 avril 2014, consulté le 07 mai 2014. URL : http://shakespeare.revues.org/2830

Posted in Early Modern Drama, Shakespeare, Uncategorized, Visualizing English Print (VEP) | Tagged , , , | Leave a comment

Hamlet in five words

g2g Hamlet


Farah Karim-Cooper asked us to write something for the Globe to Globe Hamlet site. Here it is.

Posted in Early Modern Drama, Shakespeare | Tagged , | Leave a comment

Scotland’s Collections and the Digital Humanities

On 2nd May 2014 I’m presenting at the second event in this series, entitled ‘Working with Data’. This post is intended mainly for those who come to the session as a record of links I’ll mention, and a resource for those starting out in text analysis. It may also be useful for others as a collection of material.

UPDATE: This is Mia Ridge’s page of resources for ‘Data Visualisation for Analysis in Scholarly Research‘, a course she teaches at the British Library (and updates regularly). The list is excellent. Twitter: @mia_out

from my presentation:

http://blogs.ucl.ac.uk/transcribe-bentham/             - crowdsourced, TEI transcriptions

http://www.textcreationpartnership.org/tcp-eebo/             - University funded double keyed transcriptions, TEI

http://www.tei-c.org/index.xml                            - Text Encoding Initiative: standards, training, resources, other projects

http://earlyprint.wustl.edu                                      - examples of visualisations

http://voyant-tools.org                                             – some basic text analysis tools

http://www.wordle.net                                            - self-confessed ‘toy’ for word-clouds

http://mallet.cs.umass.edu                                   – more serious set of text analysis tools

http://ucrel.lancs.ac.uk                                         – Lancaster University’s UCREL site: wide range of corpora and tools including

http://ucrel.lancs.ac.uk/vard/about/                - automatic modernisation of spelling



Resources for text analysis

0          Introductions/anthologies available on the web

0.1       Literary Studies in the Digital Age: An Evolving Anthology, edds Kenneth M. Price, University of Nebraska, LincolnRay Siemens, University of Victoria




Alan Liu, “From Reading to Social Computing”

David L. Hoover, “Textual Analysis”

Susan Schreibman, “Digital Scholarly Editing”

Charles Cooney, Glenn Roe, and Mark Olsen, “The Notion of the Textbase: Design and Use of Textbases in the Humanities”

Stéfan Sinclair, Stan Ruecker, and Milena Radzikowska, “Information Visualization for Humanities Scholars”

      William A. Kretzschmar, Jr., “GIS for Language and Literary Study”

                   Tanya Clement, “Text Analysis, Data Mining, and Visualizations in Literary Scholarship”

Julia Flanders, “The Literary, the Humanistic, the Digital: Toward a Research Agenda for Digital Literary Studies”

Daniel Powell, with Constance Crompton and Ray Siemens, “Glossary of Terms, Tools, and Methods”


0.2       A Companion to Digital Literary Studies (Blackwell)



0.3       A Companion to Digital Humanities (Blackwell)



0.4       Misc: Matt Jockers http://www.matthewjockers.net/2013/01/03/advice-for-dh-newbies/


Ryan Cordell, ‘doing digital humanities’  http://ryan.cordells.us/s13dh/





1          Text Analysis


1.1       TextAnalytics 101 by John Laudun (@johnlaudun)




A very clear guide to basic text analytics, and why numbers might get you and your students into the language of texts.


The piece will set you up to do basic analysis with Python scripts if you want, but you don’t need to do this to follow the argument, which deals thoughtfully with the ‘why bother?’ of text analytics. Highly recommended.


1.2       Where to start with text mining




Slightly different tack from Laudun, as stresses the need to compare large numbers of texts. Very clear about basic principles.


1.2       Lisa Spiro: introduction to text analysis, powerpoint presentation – a good overview with further links, should be understandable even without the talk




1.3       Michael Ullyot: data curation and an overview of text analysis – another excellent overview, specifically on Early Modern material, but relevant generally




1.4       Text Analytics: higher level debates on The Waves



from this graduate DH class http://630dh.cforster.com/syllabus/



2          Network analysis


2.1       Map your facebook network with Gephi: a tutorial





2.2       scottbot on hartlib correspondence: heatmap and network visualisations







3          Literary History


3.1       DH is changing what literary history is – and suggesting that we don’t actually know what it is. Here is Ted Underwood (@Ted_underwood) on the rise and fall of first person in the novel:




and see other posts at http://tedunderwood.com


3.2       For a discussion of ‘influence’, and links to Matt Jockers’ work, see




3.3       Proportions of male/female pronouns:






5          Critique


5.1 Wendy Hui Kyong Chun








6          Lists/surveys of tools:


6.1       Jeffrey McClurken (University of Mary Washington) http://mcclurken.org/   : guide to digital history.





6.2       http://dirt.projectbamboo.org/

“Bamboo DiRT is a tool, service, and collection registry of digital research tools for scholarly use. Developed by Project Bamboo, Bamboo DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music, OCR, statistical analysis packages to mindmapping software.”


6.3       http://journalofdigitalhumanities.org/about/

“The Journal of Digital Humanities (ISSN 2165-6673) is a comprehensive, peer-reviewed, open access journal that features the best scholarship, tools, and conversations produced by the digital humanities community in the previous quarter.

The Journal of Digital Humanities offers expanded coverage of the digital humanities in three ways. First, by publishing scholarly work beyond the traditional research article. Second, by selecting content from open and public discussions in the field. Third, by encouraging continued discussion through peer-to-peer review.”


6.4       http://digitalhumanitiesnow.org/

“Digital Humanities Now showcases the scholarship and news of interest to the digital humanities community through a process of aggregation, discovery, curation, and review. Digital Humanities Now also is an experiment in ways to identify, evaluate, and distribute scholarship on the open web through a weekly publication and the quarterly Journal of Digital Humanities.


Digital Humanities Now highlights work from the open web that has gotten the attention of the digital humanities community or is worthy of such attention, based on critical editorial review. Scholarship—in whatever form—that drives the field of digital humanities field forward is highlighted in the Editors’ Choice column. Additional news items of interest to the field—jobs, calls for papers, conference and funding announcements, reports, and recently-released resources—also are redistributed.”


6.5       http://selection.datavisualization.ch/

“Datavisualization.ch Selected Tools is a collection of tools that we, the people behind Datavisualization.ch, work with on a daily basis and recommend warmly. This is not a list of everything out there, but instead a thoughtfully curated selection of our favourite tools that will make your life easier creating meaningful and beautiful data visualizations.”

6.6       40 Essential Tools and Resources to Visualize Data







Jonathan Hope/May 2014

Posted in Uncategorized | Leave a comment

The Future of the Humanities Will Be Demand-Led


Grégoire IX approbation de la Decretals (détail) 1511. Fresque Stanza della Segnatura, Palazzi Pontifici, Vatican

The following is an unpolished contribution to some recent debates about the wisdom of defending, or ceasing to defend, the humanities. In what follows, I do not discuss what is deep, rich, and wonderful about the humanities. People who already care already know. I believe the public discussion ought to start somewhere else.

When I think about the future of the humanities, I wonder why something that is so imaginative and  absorbing– so obviously disconnected from “making stuff” and “getting ahead” – would ever be tolerated in our society. I think that’s where discussion of the fate of the humanities ought to start, since humanistic thinking of one form or another has been around for a long time, ever since universities were given, by papal gift, the power to confer their own degrees in the 13th century. Why on earth would the Pope give anyone that kind of freedom, which would eventually include freedom of university masters from prosecution for heresy? (Think of Aquinas in scholastic debate on the thesis: God does not exist.) Why let students read all kinds of potentially subversive things in the arts curriculum, even if it was in Latin? The brutal, pragmatic answer: the papal bureaucracy required literate scribes, and universities trained them. It was a deal the Pope had to make.

I wouldn’t shy away from making the same argument today. We need people who actually know how to read and write – who can communicate remotely in large, far flung organizations. If you know how to write well, your ability to advance in a networked bureaucracy multiplies. Indeed, communication through such networks *is* work in the 21st century, so there are a lot of opportunities for humanists to ply their skills. Look at someone who is commanding the world from a Blackberry. Many such people started out as good writers, even if they eventually arrived at the point where they could do their persuading telegraphically – with their thumbs!

The second thing that needs to be said about the humanities is this: the humanities exist to give fundamentalism a run for its money. (I assume fundamentalisms come in many forms: ethnocentric, theological, economic, scientistic, etc.) Get rid of the humanities, and you’ll be spending a lot more time with fundamentalists. In a democratic republic, the humanities are an infrastructure investment, providing the cultural equivalent of a flood barrier. This case is harder to make in a post-culture-wars world, but it is the strongest one I know. Yes, the strongest.

A third argument: global development demands humanistic learning as well as technological savvy. You cannot make intelligent investments, or avoid damaging military entanglements abroad, if you don’t have specific knowledge of other cultures. General Karl Eikenberry has talked eloquently about this, and Paul Smith of the British Council is organizing some events around the world on this topic. Smith, who is stationed here in Washington, talks often of an  “activist humanities.” Perhaps we need “humanities rapid response teams” that can be dispatched at a moment’s notice to deal with situations where deep, cultural knowledge is urgently needed.

Finally, there is the question of humanities vs. academic humanities. The latter is shrinking, and so we may well be entering a post-academic age of the humanities. That might be OK. I think growth in the humanities (yes, growth) is going to be demand-led in the coming decades: as the number of professional academic humanists shrinks (and it will), the driver of humanistic thinking will be people – all kinds of people – who are puzzled by the mysteries of being human and want to talk about them. I see no reason to be anything but hopeful about that kind of future, since it is this population that will want (once again) to spend time studying the incredible texts and objects we humanists find so interesting and important. Humanities professors are a vital part of this broader, demand-led model for the humanities, and may at times influence the demand. In the long term, I suspect that we will want those professors back, and building demand – in schools, public libraries, around dinner tables– is what we ought to do next.


Posted in Uncategorized | Tagged , , , , | Leave a comment


591 Early Modern Dramas plotted in PCA space, with 'core' group circled and variation boundary marked (line)

591 Early Modern Dramas plotted in PCA space, with ‘core’ group circled and variation boundary marked (line)

American/Australian tour

In March-April 2014, I’ll be in the USA giving a series of talks and conference presentations based around Visualising English Print, and our other work. In June I’ll be in Newcastle, Australia for the very exciting Beyond Authorship symposium.

I’ll address a series of different themes in the talks, but I’ll use this page as a single resource for references, since they are all (in my head at least) related.

Some of the talks will be theoretical/state of the field; some will be specific demonstrations of tools. The common thread is something like, ‘what do we think we are doing?’.

Here’s a general introduction (there’s a list of venues afterwards).


1 Counting things

Quantification is certainly not new in literary criticism, but it is becoming more noticeable, and, perhaps, more central as critics analyse increasingly large corpora. The statistical tools we use to explore complex data sets (such as Shakespeare’s plays or 20,000 texts from EEBO-TCP) may appear like magical black boxes: feed in the numbers, print out the diagrams, wow your audience. But what is happening to our texts in those black boxes? Scary mathematical things we can’t hope to understand or critique?

I want to consider the nature of the transformations we perform on texts when we subject them to statistical analysis. To some extent this is analogous to ‘traditional’ literary criticism: we have a text, and we identify other texts that are similar or different to it:

How does Hamlet relate to other Early Modern tragedies?

This is a question equally suited to quantitative digital analysis, and traditional literary critical approaches. The ways we define and approach our terms will differ between the two modes, as will the evidence employed, but essentially both answers to this question would involve comparison and assessment of degrees of similarity and difference.

But there is also something very different to traditional literary criticism going on when we count things in texts and analyse the resulting spreadsheets – something literary scholars may feel unable to understand or critique. What exactly are we doing when we ‘project’ texts into hyper-dimensional spaces and use statistical tools to reduce those spaces down to something we can ‘read’ as humans?

Perhaps surprisingly, studying library architecture, book history, information science, and cataloguing systems may help us to think about this. Libraries organised by subject ‘project’ their books into three-dimensional space, so that books with similar content are found next to each other. Many statistical procedures function similarly, projecting books into hyper-dimensional spaces, and then using distance metrics to identify proximity and distance within the complex mathematical spaces our analysis creates.

Once we understand the geometry of statistical comparison, we can grasp the potential literary significance of the associations identified by counting – and we can begin to understand the difference between statistical significance and literary significance, and see that it is the job of the literary scholar, not the statistician, to decide on the latter. A result can be statistically significant, but of no interest in literary terms – and findings that do not qualify for statistical significance may be crucial for a literary argument.


2 Evidence

Ted Underwood has been posing lots of challenging, and productive, questions for literary scholars doing, or thinking about, digital work. Perhaps most significant is his recent suggestion that the digital causes problems for literary scholars, who are used to basing their arguments, and narratives, on ‘turning points’ and exceptions. Digital evidence, however, collected at scale, tells stories about continuity and gradual change. A possible implication of this is that the shift to digital analysis and evidence will fundamentally change the nature of literary studies, as we break away from a model that has arguably been with us only since the Romantics, and return (?) to one which traces long continuities in genre and form.

One way of posing this question: does the availability of large digital corpora and tools put us at the dawn of a new world, or are we just in for more (a lot more) of the same?


3 Dates and Venues

28 March 2014: Renaissance Society of America Plenary Session: Current Trends in the Digital Renaissance (7.00-8.30pm); New York Hilton Midtown, Sutton Rooms: ‘Paradigm Shifts in British Renaissance Literature: The Digital Future’ #rsa14

2 April 2014: CUNY Graduate Centre, 365 Fifth Avenue, New York (2.00-4.00pm); Room 6495: ‘Flatlands: book history, literary criticism, and hyper-dimensional geometry’

7 April 2014: University of Pennsylvania,Digital Humanities Forum (12-1.30pm); Room 625-6 Penn Library (Registration required): ‘Visualising English Print’

Graduate Class: Shakespeare and the History of the Book: ‘The Language of Macbeth‘; preparatory reading: ‘Macbeth language HW2014’  not a public event

10 April 2014: Shakespeare Association of America, St Louis: 10-12 Seminar: ‘Shakespeare’s language: close and distant reading’; 12-1.30 and 3-6: Digital Projects Room: Visualising English Print; Translation Arrays (project demonstrations)


4 References and resources (these are grouped by topic)

(a) Statistics and hyper-dimensionality

Mick Alt, 1990, Exploring Hyperspace: A Non-Mathematical Explanation of Multivariate Analysis (London: McGraw-Hill) – the best book on hyper-dimensionality in statistical analysis: short, clear, and conceptually focussed

Most standard statistics textbooks give accounts of Principal Component Analysis (and Factor Analysis, to which it is closely related). We have found Andy Field, Discovering Statistics Using IBM SPSS Statistics: And Sex and Drugs and Rock and Roll (London: 2013, 4th ed.) useful.

Curiously, Early Modern drama, in the shape of Shakespeare, has a significant history in attempts to imagine hyper-dimensional worlds. E.A. Abbott, the author of A Shakespearian Grammar (London, 1870), also wrote Flatland: A Romance of Many Dimensions (London, 1884), an early science fiction work full of Shakespeare references and set in a two-dimensional universe.


The significance of Flatland to many who work in higher-dimensional geometry is shown by a recent scholarly edition sponsored by the Mathematical Association of America (Cambridge, 2010 – editors William F. Lindgren and Thomas F. Banchoff), and its use in physicist Lisa Randall’s account of theories of multiple dimensionality, Warped Passageways (New York, 2005), pages 11-28 (musical interlude: Dopplereffekt performing Calabi-Yau Space – which refers to a theory of hyper-dimensionality).

Flatland itself is the subject of a conceptual, dimensional transformation at the hands of poet/artist Derek Beaulieu:

Derek Beaulieu, 2007, Flatland: a romance of many dimensions (York: information as material)


(b) Libraries and information science

In thinking about the physical development of libraries, I have enjoyed

James W.P. Campbell and Will Pryce, 2013, The Library: A World History (London: Thames and Hudson) [- a beautiful book, and images are available on Will Pryce's blog]


Richard Gameson, 2006, ‘The medieval library (to c. 1450)’, Clare Sargent, 2006, ‘The early modern library (to c. 1640), and David McKitterick, 2006, ‘Libraries and the organisation of knowledge’, in Elizabeth Leedham-Green and Teressa Webber (eds), The Cambridge History of Libraries in Britain and Ireland vol. 1, ‘To 1640’, pp. 13-50, 51-65, and 592-615

and also, on real and imagined libraries:

Craig Dworkin, 2010, The Perverse Library (York: information as material)

Alec Finlay, 2001, The Libraries of Thought and Imagination (Edinburgh: Polygon Pocketbooks)

Alberto Manguel, 2006, The Library at Night (New Haven: Yale)

Roberto Bolaño, 2008 [1996], Nazi Literature in the Americas (New York: New Directions)

Jane Rickard, 2013, ‘Imagining the early modern library: Ben Jonson and his contemporaries’ (unpublished paper presented at Strathclyde University Languages and Literatures Seminar)


On data, information management and catalogues:

Ann M. Blair, 2010, Too Much To Know: Managing Scholarly Information before the Modern Age (New Haven: Yale)

Markus Krajewski, 2011, Paper Machines: About Cards & Catalogs 1548-1929 (Cambridge, MSS: MIT) – on Conrad Gessner

Daniel Rosenberg, 2013, ‘Data before the Fact’, in Lisa Gitelman (ed), Raw Data is an Oxymoron (Cambridge, MSS: MIT), pp. 15-40 – combines digital analysis with a historicisation of the field, and the notion of ‘data’


(c) Ted Underwood and the digital future

Ted Underwood, 2013, Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (Stanford) – especially chapter 6, ‘Digital Humanities and the Future of Literary History’, pp. 157-75 – on the strange commitment to discontinuity in literary studies, and the tendency of digital/at scale work to dissolve this into a picture of gradualism – Underwood cites his own work as an e.g. of the resistance scholars using quantification find within themselves to gradualism – and notes the temptation to seek fracture/outlier/turning point narrative

(also see Underwood’s discussion with Andrew Piper on Piper’s blog: http://bookwasthere.org/?p=1571 – balancing numbers and literary analysis – and Andrew Piper, 2012, Book There Was: Reading in Electronic Times (Chicago) – see chapter 7, ‘By The Numbers’ on computation, DH).

Ted Underwood and Jordan Sellers, 2012, ‘The Emergence of Literary Diction’, Journal of Digital Humanities, 1.2 (Underwood 2013: 166-70 discusses this paper as an example of the pull to ‘event’ narrative in literary history, despite the gradualism in quantitative work).


Also related:  Underwood’s blog: ‘The Stone and the Shell’ http://tedunderwood.com

Scottbot ‘Bridging Token and Type’ http://www.scottbot.net/HIAL/?p=40088


‘longue durée’ History – Underwood has suggested that historians are more comfortable than literary scholars with the ‘long view’ that tends to come with digital evidence, and David Armitage and Jo Guldi have been arguing that the digital is shifting history back to this mode:

David Armitage and Jo Guldi, 2014, ‘The Return of the Longue Durée: An Anglo-American Perspective’, forthcoming (in French) in Annales. Histoire Sciences sociales, 69 [English version: http://scholar.harvard.edu/files/armitage/files/rld_annales_revised_0.pdf]

David Armitage, 2012, ‘What’s the big idea? Intellectual history and the longue durée’, History of European Ideas, 38.4, pp. 493-507


(d) Overview/examples of Digital work:

Early Modern Digital Agendas was an NEH-funded Institute held at the Folger Shakespeare Library in 2013. The EMDA website has an extensive list of resources for Digital work focussed on the Early Modern period.


An excellent account of starting text-analytic work by a newcomer to the field:







An example of an info-heavy, ‘reference’ site that makes excellent use of maps – The Museum of the Scottish Shale Oil Industry (!):  http://www.scottishshale.co.uk




British Printed Images to 1700

Large number of heavy-weight funders/participants

Bpi1700 makes a database of  ‘thousands’ of prints and book illustrations available ‘in fully-searchable form’. However, searching is text-based (see http://www.bpi1700.org.uk/jsp/)

Development halted?  ‘Although the main development work has been completed, improvements will continue to be made from time to time. If you have problems or suggestions please contact the project (see the ‘contact’ page).’                             http://www.bpi1700.org.uk/index.html

‘Print of the month’ ended May/June 2009 http://www.bpi1700.org.uk/research/printOfTheMonth/print.html


Japanese woodblock prints                                                 http://ukiyo-e.org

The Ukiyo-e Search site is an amazing resource that represents something genuinely new (rather than just an extension of previously existing word-based catalogue searching), in that it allows searching via an uploaded image. For example, a researcher can upload a phone-image of a print she discovers in a library, and see if the same/similar prints have been previously described, and how many other libraries have copies or versions of the print. The search is ‘fuzzy’ and will often detect different states of altered woodblocks. [Thanks to @GilesBergel for the news that a similar functionality is coming to the Bodleian Ballads project.]

‘About’ page with demonstration video:                http://ukiyo-e.org/about

The Ukiyo-e site was created by one person, John Resig, an enthusiast for Ukiyo-e, who saw the need for the site as a research tool. Development and expansion on-going.

‘The database currently contains over 213,000 prints from 24 institutions and, as of September 2013, has received 3.4 million page views from 150,000 people.’                                                                                                                         http://ukiyo-e.org/about


And finally, pictures of my kittens Arthur and Gracie, who will feature in the talks:


Arthur can work a computer (he wrote the title of this post).




Arthur and Gracie

Arthur and Gracie

Posted in Counting Other Things, Early Modern Drama, Shakespeare, Visualizing English Print (VEP) | Tagged , , , , , , , , , | Leave a comment