Finding the Sherlock in Shakespeare: some ideas about prose genre and linguistic uniqueness

An unexpected point of linguistic similarity between detective fiction and Shakespearean comedy recently led me to consider some of the theoretical implications of tools like DocuScope, which frequently identify textual similarities that remain invisible in the normal process of reading.

A Linguistic Approach to Suspense Plot

Playing around with a corpus of prose, we discovered that the linguistic specs associated with narrative plot are surprisingly unique. Principle Component Analysis performed on the linguistic features counted by DocuScope suggested the following relationship between the items in the corpus:

I interpreted the two strongest axes of differentiation seen in the graph (PC 1 and PC 2) as (1) narrative, and (2) plot. The two poles of the narrative axis are Wuthering Heights (most narrative) and The Communist Manifesto (least narrative). The plot axis is slightly more complicated. But on the narrative side of the spectrum, plot-driven mysteries like “The Speckled Band” and The Canterville Ghost score high on plot, while the least plotted narrative is Samuel Richardson’s Clarissa (9 vols.). For now, I won’t speculate about why Newton’s Optics scores so astronomically high on plot. It is enough that when dealing with narrative, PC 2 predicts plot.

The fact that something as qualitative and amorphous as plot has a quantitative analogue leads to several questions about the meaning of the data tools like DocuScope turn up.

Linguistic Plot without Actual Plot

Because linguistic plot is quantifiable, it allows us to look for passages where plot is present to a relative degree. Given a large enough sample, it is more than likely that some relatively plotted passages will occur in texts that are not plotted in any normal sense. This would at minimum raise questions about how to handle genre boundaries in digital literary research.

Our relative-emplotment test (done in TextViewer) yielded intuitive results when performed on the dozen or so stories in The Adventures of Sherlock Holmes: the passages exhibiting the strongest examples of linguistic plot generally narrated moments of discovery, and moved the actual plot forward in significant ways. Often, these passages showed Holmes and Watson bursting into locked rooms and finding bodies.

When we performed the same test on the Shakespeare corpus, something intriguing happened. The passages identified by TextViewer as exhibiting linguistic plot look very different from the corresponding passages in Sherlock Holmes. There were no dead bodies, no broken-down doors, and no exciting discoveries. Nonetheless, the ‘plotted’ Shakespeare scenes were remarkably consistent with each other. Perhaps most significant in the context of their genre, these scenes had a strong tendency to show characters putting on performances for other characters. Additionally, in a factor that is fascinating even though it is probably a red herring, the ‘plotted’ Shakespeare scenes had an equally strong tendency to involve fairies.

The consistent nature of the ‘plotted’ Shakespeare scenes suggests that the linguistic specs associated with plot when they occur in Sherlock Holmes may have different, but equally specific, effects in other genres. The next step would be to find a meaningful correspondence between the two seemingly disparate literary devices that accompany linguistic plot – detectives bursting into rooms to solve murders, and plays within plays involving fairies. I have some hunches about this. But in many ways the more important question is what is at stake in using DocuScope to identify such unexpected points of overlap.

Enough measurable links between seemingly unlike texts could suggest an invisible web of cognates, which share an underlying structure despite their different appearances and literary classifications. Accordingly, we might hypothesize that reading involves selective ignorance of semantic similarities that could otherwise lead to the socially deviant perception that A Midsummer Night’s Dream resembles a Sherlock Holmes mystery.

The question, then, is this: if the act of reading consists in part of ignoring unfruitful similarities, then what happens when these similarities nonetheless become apparent to us? Looking back at the corpus graph, we begin to see all sorts of possibilities, many of which would be enough make us literary outcasts if voiced in the wrong company. Could Newton’s Optics contain the most exciting suspense plot no one has ever noticed? Could Martin Luther be secretly more sentimental than Clarissa?

Estranging Capacities of Digital Cognates

I have been using the term ‘cognate’ to describe the relationship between linguistically similar but otherwise dissimilar texts. These correspondences will only be meaningful if we can connect them in a plausible way to our readerly understanding of the texts or genres in question. In the case of detective fiction and Shakespearean comedy, this remains to be seen. But our current lack of an explanation does not mean we should feel shy about pursuing the cognates computers direct us to. My analogy is the pop-culture ritual of watching The Wizard of Oz, starting the Pink Floyd album Dark Side of the Moon on the third roar of the MGM lion. The movie and the record sync up in a plausible pattern, prompting the audience to grasp a connection between the cognate genres of children’s movies and psychedelic rock.

If digital methods routinely direct our attention to patterns we would never notice in the normal process of reading, then we can expect them to turn up a large number of such cognates. If we want to understand the results these tools are turning up, we should develop a terminology and start thinking about implications – not just for the few correspondences we can explain, but also for the vast number we cannot explain, at least right now.

This entry was posted in Counting Other Things, Quant Theory, Shakespeare. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted November 9, 2011 at 5:02 pm | Permalink

    Thanks for this marvelously suggestive post. I have a thought about the fairies. One of the things that Hope and I have noticed in our studies of Shakespeare’s plays is that _A Midsummer Night’s Dream_ is quite an anomaly: it has so much concrete description, it looks more like a history play than a comedy. But looking at your fairy passages, it strikes me that because we can’t actually *see* fairies doing the fantastical things they are believed to do, the only way to communicate those fantastic actions is through narration. Thus MSND is full of concrete nouns in a way comedy is not, because all of the cowslips have to be named to be seen in the mind’s eye. The scene from Pyramus and Thisbe above must accomplish something similar: the naive performers want to make a certain action “visible,” not knowing that spectators don’t expect them to narrate every concrete event on stage. These events are understood to be taking place by virtue of the story itself, which the audience already knows. So, you get what you might call “overdescription.” This alignment of the rude mechanicals with the fairies is just the kind of unexpected pairing that I like to find in this type of work: it allows us to think about what, dramaturgically, is being accomplished in two scenes that genre analysis might not be sensitive to. So, Bottom does as Oberon does. And when it’s criminals working their magic instead of fairies, Holmes does the work of “bringing hidden things to light” as well.

  2. Victor Lenthe
    Posted November 12, 2011 at 1:00 pm | Permalink

    Thanks, Mike. I suppose one way to test your hypothesis would be to isolate instances of narrated action in the Shakespeare corpus and see how they compare to everything else on DocuScope’s linguistic definition of plot. But your explanation of linguistic plot in MSND should also make us consider _Hamlet_, which contains prominent examples of the dramatic devices you identify in MSND. In _Hamlet_, a relatively large portion of the action actually occurs off-stage and is only narrated to the audience (I’m thinking of the long episode of Hamlet with the pirates, all the things that happen or are imagined happening in Paris, and the ambassadors negotiating with Norway’s aging king). In addition, as Marshelle Woodward was quick to seize on when I presented my findings to her last year, _Hamlet_ contains a long and important play within the play. Your comment now about overdescription in MSND reminds me that the mousetrap scene too involves a certain degree of overdescription — the ‘real’ audience simply needs to be caught up on the plot of _The Murder of Gonzago_ as efficiently as possible. Despite these elements, however, no scene from Hamlet stood out to DocuScope as especially representative of linguistic plot.

    This brings up two questions to me: First, do narrated events and overdescription necessarily lead to linguistic plot, or are they only preconditions of linguistic plot? If these rhetorical features are simply DocuScope’s definition of linguistic plot, then the necessary narration and overdescription involved in rendering a play within a play may be the extent of the overlap we see between MSND and Sherlock Holmes. In this case, we should expect to see a similar overlap between _Hamlet_ and Sherlock Holmes. But, if _Hamlet_ contains prominent examples of the two dramatic devices we suspect are associated with linguistic plot, then why did no scenes from _Hamlet_ stand out in TextViewer? This is more difficult to answer, but one possibility is that there is another factor besides narrated action involved in linguistic plot. For example, maybe the presence of supernatural agents in MSND leads to certain linguistic features more specific than just narration and overdescription.

One Trackback

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>