The Novel and Moral Philosophy 2: Telling and Feeling, Aunts and Letters

Before I begin commenting on what I see in Serendip’s findings, I think it is worth providing some general information about the work from which the screen shot below is taken. The author, Charlotte Lennox (1730-1804), is most known for her novel The Female Quixote (1752), a picaresque about a romance addict who perpetually confuses the plots of the novels she reads as reality itself. Euphemia (1790), the last novel Lennox (1730-1804) published before she died, unfolds its narrative through the 12-year correspondence between two friends, Maria Harley and Euphemia Neville. The young women are separated by Euphemia’s move to colonial America with her husband, a British lieutenant. As a domestic novel, Euphemia devotes part of its narrative to depicting the unhappiness of this marriage. The novel is also remarkable for its depiction of American colony life in the province of New York during the middle of the eighteenth century from a British female perspective. In this novel and in her earlier Harriot Stuart (1750), Lennox drew on her own experience of growing up in colonial Albany. True to epistolary format, individual letters from the correspondents organize the novel, rather than chapters. The screenshot I am commenting on comes from Letter II.


When looking at the screen shot of Euphemia labeled TEXT: K062108.001 above, I saw what Witmore saw: words identifying social titles such as “Sir” and “Lady,” as well as family relations such as “aunt,” score highest as novel words. Yet there are slight distinctions between the familial words themselves. While “aunt” is shaded most deeply as a novel word, “uncle” is a shade lighter, and “daughter” is a shade lighter still.

The obvious conclusion to draw from these slight distinctions of shading is that the word “aunt” appears more frequently in novels than the word “uncle,” and the word “uncle” features more in novels than the word “daughter.” This is not to say, though, that eighteenth-century novels are more about aunts and uncles than about daughters. In fact a number of them are about or feature female characters that are at the stage in their lives where they are transitioning from being daughters to wives, and the novels themselves have the didactic purpose of educating female readers. In this regard, I think it’s important to recognize that topic word frequency might tell a different story from the frequency of a topic itself. The distribution of words in a topic matters.

Other high scoring novel words are “told” and “dear.” Both words in themselves are interesting to me as they are highly suggestive of the eighteenth-century novel’s history. “Dear,” for instance, is doubly significant in that history. It is a word that can be used to convey affectionate regard for someone when referring to or addressing them, or to address someone at the beginning of a letter. Both senses of the words are used on this page. Why does Serendip mark the word “dear” so intensely red in both cases?



The word “dear,” in its guise of addressing or referring to someone with affection, registers the age of sensibility in which the novel genre developed. (It does so without being an invention of the epistolary novel.)  Sensibility celebrated the ready expression of sympathy and feeling for other humans as a mark of high moral standing as well as social prestige. It promoted a language pattern that displays one’s emotional disposition towards another, such as attaching the word “dear” as a term of endearment to someone’s name. In its function of representing social relations between characters in day-to-day contexts, the eighteenth-century novel would inevitably capture such language patterns. As a popular medium of entertainment, the novel promulgated the patterns further. One might argue that the rise of the novel was in itself a major factor in sensibility’s growth and development as a pervasive cultural movement.

On the other hand, the word “dear” as a form of address used to begin a letter is also a high scorer as a novel word. Like the other usage of the word, it is invariably attached to a proper name, or the role of an identified person, such as “friend.”  The first canonical novel to spur the cultural movement of sensibility was an epistolary novel, Samuel Richardson’s Pamela. So influential was this novel on the development of the novel genre, literary historians of the early 20th century identified it as the “first novel” written in English. This passage from Richardson’s Pamela (volume III, letter II) displays the same pattern observed in Euphemia:


A high number of eighteenth-century novels were written as epistolary narratives. Fiction that presented a series of letters written from the point of view of a character created a sense of intimacy and immediate involvement with narrative events in recognizably day-to-day contexts. Such experiences were not available in earlier forms of literature. This is one of the reasons why epistolary narrative was such a novel (new-seeming) and popular genre for eighteenth-century readers, and why it was conducive to the flourishing of sensibility in eighteenth-century culture.

A key moment in the novel takes place when one of the main characters, Mr. B., undergoes a conversion from villainous sexual aggressor to loving suitor of the heroine because he was so “moved” by the letters to her parents in which she details her ordeals: “O my dear girl!  you have touched me sensibly with your mournful tale, and your reflections upon it.”  Likewise, eighteenth-century readers were “sensibly touched” by Pamela’s letters—the very letters that make up the novel—to the extent that they could not get enough of the style of fiction in which they appeared. Eighteenth-century fiction writers imitated Pamela’s epistolary format as well as theme of “virtue in distress” for several more decades of the remaining century. It is no surprise, then, that in the epistolary novel Euphemia (1790) by Charlotte Lennox (a novelist Richardson whom admired and supported), Serendip is picking up on “dear,” used as a form of address in beginning a letter, and as a more general novel topic word that appears over and over again.

It should not be surprising that the word “told” is picked up as a high scoring novel topic word as well. Telling is an activity of narration, and the novel itself is a narrative genre:


Narratives in eighteenth-century novels often involve the revelation of stories about unhappy or unfortunate events that happened in the past, events that affect the characters of the novel. This is certainly the case with Gothic novels, which derive their narrative tensions and conflicts from the inadvertent uncovering of long suppressed criminal events and actions. For instance, the epigraph for Ann Radcliffe’s non-epistolary Gothic novel, A Sicilian Romance, is a line from Hamlet spoken by Hamlet’s father: “I could a tale unfold.” In an epistolary narrative, where the fictional letter writer is reporting to the addressee what has already happened, the act of telling would be in the past tense, “told.”

“Telling,” as Stuart Sherman reminds us in Telling Time,is not only what narratives do (they tell what happens in time), but also what clocks do with time. The fact that Serendip shades “hour” a deep red conveys the time-specific quality of narrative during this period, its concern with the quotidian and the everyday above all. A sentence from the screenshot of a page from Lennox’s Euphemia certainly captures this sense of “hour”: “She complains of a pain in her breast; of shortness of breath; and declares, that when she has read to you an hour or two, she feels as if she was ready to expire with a strange oppression and faintness.” In this sentence, the quotidian context of the word “hour” is strikingly apparent in its connection to the experience of a character’s body at a specific moment in time.

The very premise of epistolary correspondence is to overcome not just spatial distance, but also temporal disconnection. The letter-writer wants to replace one’s absence from another’s life with a sense of living through the same experiences one has had by retelling those moments through the medium of the letter. By being specific about time—how long things take by the hour, for instance (“when she has read to you an hour or two”), this sense of intimacy with someone else’s everyday experiences becomes possible.

In shading darkly those words that denote familial relations and social standing (aspects of subjectivity that render oneself legible in day-to-day social settings), as well as words related to conventions of epistolary and emotional address such as “dear,” as well as words signifying temporality, such as “hour,” Serendip picks up on the novel’s reality effects. It picks up, in other words, the features of eighteenth-century novels that defined its groundbreaking method of realism.

Even as it confirms and reinforces critical commonplaces about the novel’s generic markers—especially those concerning its status as a unique mediator for realism, verisimilitude and individual personhood—Serendip also reveals generic tendencies that have not been so well-covered by literary historians. The words shaded blue—or, the words strongly related to moral philosophy—indicate this. Scholars such as Ian Watt have argued that the novel’s generic identity lies in the way it represents experience through seventeenth-century epistemologies, such as the subjectivism of René Descartes and the empiricism of John Locke. These philosophical tendencies are already apparent in the novel words—red-shaded—I have commented on above; the words all relate to the assumption that events and experiences derive from subjective standpoints, and are realizable through their placement on the time-space continuum.

However, the words shaded blue by Serendip reveal another level of philosophical realism in that they come out of a vocabulary of moral philosophy that Serendip helps us to recognize. What I notice about these words is that they are abstract nouns and impersonal words that are detached or detachable from human agents. They are also adjectives or adverbs that relate to philosophical measurements such as “natural,” “perfect,” “perfectly” and impersonal seeming actions, such as “enumerate” and “produced.”

Screen Shot 2014-10-24 at 10.16.23 PM

I also notice that some “moderate” or “light” novel words—words shaded medium or light red as opposed to dark red—do not seem as if they would be out of place in the list of philosophy words from moral philosophy texts. These include “mind,” “consequence,” “opinion,” “life,” and such abstract nominalizations as “viewing” and “disposition.”  (Indeed, given the way that the topic model works, some of these words would at times belong to that topic, but that is for Eric and Mike to explain in a future post.) This notable tendency toward abstraction in the novel might express some historically distinctive formality of social language in eighteenth-century England, or even a higher state of fusion, during this time, between works of fiction and non-fiction, or both. We should investigate these possibilities a more focused way, perhaps with some of the techniques we are getting a glimpse of here.


Posted in Uncategorized | Comments closed

The Novel and Moral Philosophy 1: What Does Charlotte Lennox Have to Do with Adam Smith?

ilennox001p1The Visualizing English Print group is using new visualization tools to study genre dynamics in our corpus of texts spanning the years 1530-1799. While far from comprehensive, the corpus spans an interesting period in the history of English print. Most literary historians, for example, would agree that this is the period when the novel emerges as a distinct generic form. One of the tools we are using – a re-orderable matrix and topic modeling tool called Serendip – has generated topics that illuminate this development in our corpus. We began that work by first labeling all 1080 items by genre, something we had to do if we were going to see any patterns in the larger collection. (A downloadable spreadheet of both the items and the genre labels applied to them appears in a spreadsheet here.) This post deals with two algorithmically generated topics that we found useful in identifying items we had previously labeled “prose fiction” and “philosophy.”  The topics were generated through a process known as Latent Derichlet Allocation (LDA), a technique commonly used to sort through web pages or documents in large collections of texts.

In exploring the VEP corpus with Serendip, we saw that our prose fiction texts – particularly the eighteenth century novels – were related to our philosophy texts in some interesting ways. We began to understand that relationship when we noticed that prose fiction and philosophy texts shared the topics that are present in large measure in each of them individually. (A topic is a collection of words that tend to co-occur with one another in individual documents; one might think of them as “ingredients” that are mixed together to create the full variety of documents in the corpus.) The first of these topics was characteristically present in texts classed as prose fiction, which was reasonably interesting. More interesting still: we found that the type of texts next most likely to contain words from this “prose fiction” topic were those we classed as “philosophy.” And the topic that was most prevalent in philosophy texts – in this case, works of moral philosophy by thinkers such as Smith and Hume – were also present in our prose fiction novels.

Why this overlap or sharing of ingredients? Where does the novel stop and moral philosophy begin? Before attempting an answer, it is important to understand what kinds of works qualified, in our naming game, for membership in these two groups. A complete list of works in the corpus, with their genre classes, can be found at the link above. Below we list only the works in these two classes. Our naming convention begins with a date of publication, short title of the work, author, and assigned genre class. Our dates here refer to the date of the edition transcribed by TCP in a corpus assembled at random: per our earlier post on the corpus, it is composed of 40 randomly selected texts per decade. The corpus was thus specifically not created for the purpose of exhaustive surveying any one literary form. Our purpose, rather, was to see how much we could learn from a relatively small sample of what TCP had transcribed.

Fictional Prose:

1588 PandostoTriumphOfTime Greene, Robert, 1558-1592
1639 MoresUtopia More, Thomas
1634 StrangeMetamorphosisOfMan Brathwait, Richard, 1588?-1673
1667 LovingEnemyATrueHistory Camus, Jean-Pierre, 1584-1652.|Wright, John
1659 GovernmentOfWorldInMoon Cyrano de Bergerac, 1619-1655.|St. Serfe, Thomas, fl. 1668
1668 LifeOfMeritonLatroon Head, Richard, 1637?-1686?
1680 TheEnglishRoguePart2 Head, Richard, 1637?-1686?
1700 HistoryOfChildrenInTheWood
1572 SchoolOfWiseConceits Blague, Thomas, d. 1611
1759 PoliticalRomanceToYork Sterne, Laurence, 1713-1768
1799 TisAllForTheBest More, Hannah, 1745-1833
1724 HistoryOfJohnOfBourbon Aulnoy, Madame d’ (Marie-Catherine), 1650 or 51-1705
1753 SirCharlesGrandisonV1 Richardson, Samuel, 1689-1761
1753 SirCharlesGrandisonV5 Richardson, Samuel, 1689-1761
1749 TomJonesV1 Fielding, Henry, 1707-1754
1749 TomJonesV3 Fielding, Henry, 1707-1754
1789 Arundel Cumberland, Richard, 1732-1811
1712 AppendixToJohnBull Arbuthnot, John, 1667-1735
1797 FantomNewFashionedPhilosopher More, Hannah, 1745-1833
1748 RoderickRandomV2 Smollett, Tobias George, 1721-1771
1748 ClarissaV1 Richardson, Samuel, 1689-1761
1751 ClarrissaV8 Richardson, Samuel, 1689-1761
1777 CharlesCharlotteV1 Pratt, Mr. (Samuel Jackson), 1749-1814
1777 CharlesCharlotteV2 Pratt, Mr. (Samuel Jackson), 1749-1814
1764 CastleOfOtranto Walpole, Horace, 1717-1797
1763 HistoryLadyJuliaMadeville Brooke, Frances, 1724?-1789
1794 AdventuresOfHughTrevor Holcroft, Thomas, 1745-1809
1790 JuliaNovelAndPoems Williams, Helen Maria, 1762-1827
1752 FemaleQuixote Lennox, Charlotte, ca. 1729-1804
1758 HenriettaTwoVolumes Lennox, Charlotte, ca. 1729-1804
1790 EuphemiaFourVolumes Lennox, Charlotte, ca. 1729-1804
1782 CeciliaV3 Burney, Fanny, 1752-1840
1782 CeciliaV5 Burney, Fanny, 1752-1840
1741 PamelaV3 Richardson, Samuel, 1689-1761
1741 PamelaV4 Richardson, Samuel, 1689-1761
1785 RecessTaleOfOtherTimes Lee, Sophia, 1750-1824
1795 HenryFourVolumes Cumberland, Richard, 1732-1811
1776 PupilOfPleasure Pratt, Mr. (Samuel Jackson), 1749-1814
1753 ShakespeareIllustrated Lennox, Charlotte, ca. 1729-1804
1792 AnnaStIvesNovel Holcroft, Thomas, 1745-1809
1766 VicarOfWakefieldTale Goldsmith, Oliver, 1730?-1774
1788 MusicalTourMrDibdin Dibdin, Charles, 1745-1814
1775 LiberalOpinionsAnecdotes Pratt, Mr. (Samuel Jackson), 1749-1814


1534 ErasmusAgainstWar Erasmus, Desiderius, d. 1536
1532 DespisingTheWorld Erasmus, Desiderius, d. 1536.|Paynell, Thomas
1531 TreatiseSufferFriendsDeath Erasmus, Desiderius, d. 1536
1590 RoyalExchangeAphorisms Rinaldi, Oraziofin /upd.|Greene, Robert, 1558?-1592
1614 LabyrinthOfMansLife Norden, John, 1548-1625?
1576 AnatomyOfTheMind Rogers, Thomas, d. 1616
1580 PatternOfAPassionateMind Rogers, Thomas, d. 1616.|Rogers, Thomas, d. 1616.|H. W
1561 CicerosFiveQuestions Cicero, Marcus Tullius.|Dolman, John
1675 FreedomOfWill Sterry, Peter, 1613-1672
1741 EveryManHisOwnWayEpistle Duck, Stephen, 1705-1756
1752 TheRambler Johnson, Samuel, 1709-1784
1740 TreatiseHumanNatureAbstract Hume, David, 1711-1776
1741 EssaysMoralAndPolitical Hume, David, 1711-1776
1759 EpistlesPhilosphicalAndMoral Kenrick, W. (William), 1725?-1779
1734 EssaysOnSeveralSubjects Forbes of Pitsligo, Alexander Forbes, Lord, 1678-1762
1751 EssaysOnTheCharacteristics Brown, John, 1715-1766
1759 TheoryMoralSentimentsSmith Smith, Adam, 1723-1790
1734 EssayonMan Pope, Alexander, 1688-1744.]

A look at these lists confirms that our corpus contains significant examples of both the eighteenth-century novel (Richardson, Burney, Lennox) and important texts in the history of moral philosophy, for example, Adam Smith’s Theory of Moral Sentiments. Noting these landmarks, we want now to explore this overlap in vocabularies and share some preliminary thoughts about why novels share the vocabulary of moral philosophy and how those vocabularies function in each genre.

The next three posts are structured as a dialogue, beginning with some remarks by Michael Witmore (a Serendip user) and Eric Alexander (Serendip’s designer). These remarks focus on how Serendip helped them to pinpoint this kinship between the two genres. In the next post, we have a “reaction” from a scholar of the Eighteenth Century Novel, Julie Park, who was recently a fellow at the Folger Shakespeare Library where Serendip was tested. Her post, entitled “Telling and Feeling, Aunts and Letters,” introduces some historical context for the development of the eighteenth century novel, moving on to show how the topic words associated with the prose fiction texts contribute to the latter’s project of rendering everyday life and moral sensibility for readers. Park offers specific readings of some of the topic words that Serendip flagged as highly present of our clusters of topic words, offering the perspective of a new user/interpreter on the results produced by a software tool still in development. In a final post entitled “What Does Lennox Do with Moral Philosophy Words?” Witmore expands on Park’s analysis, offering an interpretation of the differences between the two topical fields we are associating with the novel and moral philosophy.


We begin with a few words about what Serendip is and how it works. At its highest level, Serendip allows users to visualize how topics are distributed across a document set. “Topics,” in this instance, are significant collections of words (extracted by an algorithm known as Latent Dirichlet Allocation, or LDA) that tend to occur in the same documents across a corpus. Serendip displays the occurrence of these topics in a re-orderable matrix that plots documents, in the vertical axis, against topics, in the horizontal axis, indicating individual proportions with circular glyphs of varying size. Documents can be displayed individually or in aggregate groups. After some tuning by Alexander, who is the original designer of Serendip, a user (in this case, Witmore) takes the tool and begins to explore these topics, looking at what words they contain and what texts score highest on each topic. The power of the tool is the ability it gives its user to re-order the matrix according to individual topics, texts, or text groups.

We are not going to discuss how topic modeling works in this post. (A good explanation can be found on Ted Underwood’s blog.) We do want to show something that happened when we began exploring this corpus using the topics that had been generated for us. You’ll see several screen shots below. For the time being, focus on the center pane with the yellow circles that look like planets. Across the top are the topics, which were named according to Witmore’s best guess at what they captured in texts. (Naming topics is a task that seems to have been designed for human beings: the judgments are highly contextual and built upon the study of examples.) Witmore’s topic names were based, first, on his examination of the word distribution in that topic (the window at right labeled “Novel”), but also on his knowledge of the works displayed in the lower right hand pane. (The lower right hand pane displays individual texts within a given subgroup of texts – here the ones that our bibliographer had labeled “prose fiction”). A lot of this is subjective, which is as it should be.


On this screen, Witmore had selected the topic which he had named “Novel” at the top left portion of the page and then re-ordered the matrix to show all of the genre types which contain those topic words. (The genres are listed vertically in descending order down the red column at left.) The size of these circles represents the frequency with which this topic occurs in a given group of texts; additional information about outliers is furnished by the Saturn-like rings. We can also disaggregate this group and see how individual texts score on this topic, again in descending order:


Witmore’s initial name for this topic was “Novel,” which seems to accord well with the actual texts that are highly rated on this topic: Charlotte Lennox’s Henrietta, followed by two parts of a Richardson novel, a few dramas, and then more novels by Lennox and Richardson. Knowing that he needed to consult an expert, he decided to talk to Julie Park, a scholar of eighteenth century literature, whom he hoped could help him understand this topic. The initial identification of this topic, however, seemed right given that the matrix in the previous screenshot identifies texts classed as “Fictional Prose,” “Autobiography,” “Drama,” “Travelogue,” and “Biography” as high scorers on this topic. (“Legal Prose,” not so much, which is all for the good.)

Neither Witmore nor Park was surprised to see that the words making up the “Novel” topic (mr, mrs, lady, madam, sir, miss, dear) occur frequently in epistolary novels, which make up a large proportion of this group. For structural reasons, the narrative voice of epistolary novels must register and mark an awareness of addressee (Mr., Sir, etc.); letters also recount dialogue (and so, once again, use terms of address and quotational words like “cried,” “told,” “replied,”). The drama of these novels is a social one; we are not surprised to find words that tag an individual’s social standing. (Technical terms from geometry or botany are not featured high on this list, for example.) The initial finding suggested to us that we were operating in the same universe as the tool; it was doing things we understood.

But you can always know what you know in new ways and you can also try to describe that knowledge in different terms. This is what we were interested in doing with the tool that Eric had built. Re-ordering was the next step in the process.

Look now at a second re-ordering of the matrix, this time on the basis of a topic named “Moral Philosophy” which is the third column to the right in light blue. The topic words here are obviously abstract – the highest scorers are words like “object,” “mankind,” “idea,” “system” – but further down the list, they seem to focus on the dynamics of moral deliberation. “Sentiment,” “moral,” “characters,” “propriety” and “sentiments” are all words that seem useful in this context. (One never knows for sure how words are going together or behaving, of course, until one sees these words working in a text.) Here again, the ratings of genre groups in descending order seemed plausible, beginning with “Philosophy”  and then moving through “Argumentation” and other forms of “Nonfiction Prose.”


We get an even better sense when we rate items on this topic at a more granular level, going work by work in descending order. The “Moral Philosophy” topic – the blue, leftmost column – is now rating individual works:


An abstract of David Hume’s Treatise of Human Nature is the top scorer here, and a little further down one sees Adam Smith’s Theory of Moral Sentiments. Calling this topic “Moral Philosophy” rather than “Natural Philosophy” or “Metaphysics” was seeming like the right move.

Now look at what happens when we re-organize the matrix according to the human generated genre designations on the left hand side – essentially asking which computer generated topics a human designated genre group is made up of. Returning to a view that shows us the groups down the left hand side, we re-ordered the matrix according to the topic scores of texts that a human being has classified as “Fictional Prose:”


“Fictional Prose” texts are, as a group, rated horizontally on their prevalent topics, again in descending order, now from right to left. What we are seeing now are the topics of which “Fictional Prose” texts are generally composed. The first one listed is “Novel,” to which we say, “so far, so good.” But look just to the right. Going next in sequence, we see that “Moral Philosophy” has moved across the screen to become the second most highly ranked topic for this type of text, followed closely by another topic named “Tales of Chance and Virtue.”

Now things become interesting. Why would prose fictional texts, largely epistolary and high scorers on the “Novel” topic, also be associated with the “Moral Philosophy” topic? What does Charlotte Lennox do that Adam Smith does as well?

To answer this question, we needed to begin looking at the topic words in context, which we did through Serendip’s ability to drill down into the documents, allowing us to view passages. We generated several views of the texts that showed texts by Charlotte Lennox and Adam Smith with topic words highlighted in different colors (red for the novel, blue for moral philosophy). To get a sense of what the “novel” words in red are actually doing in context, we asked Julie Park to produce the reflection that follows in the next post, which begins with an analysis of novel words in Charlotte Lennox’s Euphemia. We also furnished her with several screenshots of Adam Smith’s Theory of Moral Sentiments, since this text contained a significant number of topic words that we are associating with moral philosophy. We post here a few screenshots of each work as a preface to the next installment.



Posted in Visualizing English Print (VEP) | Tagged , , , , | Comments closed

Digital approaches to the language of Shakespearean Tragedy

This post supplies data and further diagrams for Digital approaches to Shakespearean tragedy to be published in the forthcoming Oxford Handbook of Shakespearean Tragedy, edited by Michael Neill and David Schalkwyk.

You can download our main spreadsheet as an Excel file, with details of all plays included in the study, and frequencies for Docuscope LATs:

Tragedy Data

And here are distribution box plots for some of the features we discuss in the paper. Each box plot gives the distribution of one LAT in the entire corpus of printed drama (554 plays). Frequency values are along the horizontal axis, with number of plays corresponding to each frequency score plotted on the vertical axis. The shaded areas indicate where tragedies are placed within the overall corpus. The whisker plot above the bar chart shows outliers (black dots for tragedies, grey for other plays). Note how the distributions for these LATs in tragedies are shifted to the right, indicating an increase in frequency (and note the tendency for outliers to be tragedies).

LT boxplot TR Anger LT boxplot TR Fear LT boxplot TR Negativity LT boxplot TR Stds Neg LT boxplots TR Sad


TR v rest Autobio TR v rest First Person TR v rest Self Disclosure

Posted in Uncategorized | Leave a comment

Adjacencies, Virtuous and Vicious, and the Forking Paths of Library Research

Folger Secondary Stacks, western view

Folger Secondary Stacks, western view

Browsable stacks – shelves of books that you can actually look at, pull of the shelf, read a while, and put back. They’re wonderful. Folger readers regularly comment on the fact that they can walk freely through the stacks of the secondary collection, which in our case means books published after 1830. That collection is arranged by Library of Congress call number, and many know the system intuitively after years of library work. (I frequently find myself in the PRs and PNs.)

Recently I was looking through section PN6420.T5 for books on early modern proverbs, a topic I have been writing about for years. I was looking for Morris Palmer Tilley’s collection, A Dictionary of Proverbs in England in the Sixteenth and Seventeenth Centuries (Michigan: University of Michigan Press, 1950). There it was, right where it was supposed to be: a landmark piece of scholarship that is the first source for anyone interested in the topic. Yet this was only the first stop. On the shelves above and below this important source were about 30 other books on the subject, some of which I began to explore. Some very useful books turned up next to the one I had initially intended to find. Some of them have even turned up in my footnotes, the ultimate test, perhaps, of a book’s usefulness to a scholar.

Stack browsers are on the lookout for this kind of happy accident. You go into the stacks looking for this book, but another, more interesting, happens to be nearby. Now you can have a look, nibble around the edges of the promising title, which is an excellent form of procrastination if you are stuck or unready to begin writing. Having done my share of meandering in open stacks, I am intrigued when readers describe these moments of discovery ­– which after all are part of the natural progression of research – as happy accidents or the products of chance. Aren’t accidents things that you cannot, by definition, bring about or encourage?

The fact remains that libraries is set up to make such accidents happen. They arrange books on the shelves in a certain way – not at random, but on a plan designed to increase the likelihood that, nearby the book you think you want, there will be others you also want to read. When someone says, “and then I happened upon this great book,” they may be describing the advantages of the library’s structured arrangement of books by (say) subject matter. Partly an effect of a classification system, partly one of the physical arrangement of the space, Libraries are designed to promote “lucky finds.”

Such “encouragable accidents” are really the consequence of a simple principle that governs the entire space of the library: that of structured adjacency. As I will try to show in a moment, this principle can be seen at work in both the physical spaces of the stacks and the digital discovery spaces designed to give us access to the collection. The root of the word adjacency is the Latin verb jacere, which means to throw. When books appear side by side on a library shelf, their adjacency is not a product of chance: they have been placed (hopefully not thrown) together so that one is next to another of similar kind. How might one structure such adjacencies? One technique would be to shelve books by size. In some medieval monasteries, books of a similar size were placed on the same shelf. In addition to saving shelf space (think about it), this arrangement located collection access in the mind of the librarian or keeper who knew where different titles were. These collections weren’t designed to be browsed, so the principle made sense.

Now think of a modern, browsable stack of books arranged along the Library of Congress call number model. Here the principle of access exists in two places: the launching point of the card catalogue (which tells you where in the stacks to start looking) and then on the shelves themselves, where books on similar subjects are grouped together. The idea here is to use the intellectual scaffolding of subject cataloguing to structure the physical space of the collection. With respect to subject, physical adjacencies on the shelf become virtuous instead of vicious.

What is a virtuous adjacency? It is a collocation of two items likely to appeal to any-user-whatever whose item search is itself structured along principles which the cataloguing supports: usually author, date, title, subject, although there are many other forms of search. It doesn’t matter who you are or how deep your knowledge of the subject is: if you know enough to find one book on proverbs, you can find many in the Library of Congress system, because you are helped along by the arrangement in the physical space of the library. That arrangement is principled and intentional. It is virtuous.

But every virtuous adjacency can quickly become vicious, and this is because virtue (as I’m calling it) resides in the principles that inform any given reader’s search for a book. Suppose I know about Tilley’s book on proverbs, and I know it by title. Once I am pointed to that book by the catalogue, I go and look at it, and I see some terrific proverbs about apes, for example, “To make her husband her ape.”  I start to think about this. Maybe what I’m really interested in is how the behavior of apes helps people think about the nature of mimicry and mimesis in the early modern period. (Early modern references to apes are often veiled references to the mimetic power of artists, who “ape” nature.)

Proverb from Tilley's A Dictionary of Proverbs in England

Proverb from Tilley’s A Dictionary of Proverbs in England

Now the principle that governs the space flips. What I need to do is go to H. W. Janson’s magnificent Apes and Ape Lore in the Middle Ages and Renaissance, which has the call number GR730.A6 J3. What made the first adjacency surrounding “books about proverbs” virtuous was the collocation of books in space by subject. That was where the manufactured serendipity happened. But now that very principle of adjacency has become an impediment – it has become vicious – because Tilley is not surrounded by books about apes. I could search again under the latter subject, but that would not be adjacency, it would be search. We advert to catalogues in order to re-orient ourselves within the physical universe of books-on-shelves, or the virtual space of digital collections. But we cannot simply wander into that next thing that meets our new interest. To do this, I really would have to be lucky: “Oh look, there’s Jansen’s book on apes, just lying across the aisle….”

The moral of this story – or is it the proverb? – is that “every virtuous adjacency is also vicious.” When it comes to the arrangement of books, virtue is relative: it depends upon what the researcher thinks he or she is looking for, a thinking that often changes in the course of research. Once you’ve flipped from proverbs to apes, the physical arrangement of books on shelves is not going to help you. The virtuous arrangement that allowed you to lay your hands on that first book (“hey, my favorite book on proverbs!”) is now working against you (“shouldn’t I be looking at books about apes?”).

As we gain greater access to the contents of books; as digitized books and their machine actionable contents become more and more arrangeable with the assistance of mathematical principles like the topic model, the physical space of search is being transformed into something more plastic, even Borgesian. While the physical space of the library cannot be re-plotted whenever the research forks out onto another garden path, researchers have more options in the virtual space of text searching to find cut-throughs. There is a problem here, of course, which is that in such a virtual world of association, there are infinite pathways for association. It becomes more challenging to figure out where to go next when you could go anywhere.

But there may be other ways to multiply virtuous pairings given the tools that librarians of the future will create. Instead of starting with Tilly and then hoping that I’ll be lucky enough to bump into Jansen, I might rely on my mobile device to reach into the contents of the book I’m interested in now and, based on a principle of adjacency I supply, rearrange all the books in the library around that first book in concentric layers of immediacy of different types – layers that might allow readers to move from one virtuous adjacency to the next. There is no way around the virtuous/vicious symmetry, since it is precisely that symmetry which makes research necessary: in exploring the connection between these five books on proverbs, you are giving up the opportunity to think about that other, really, really good book about apes. (You can tell I wish I’d found Jansen earlier.) What makes an adjacency for one research task virtuous makes that adjacency vicious for the next.

That’s why answers to research questions do not turn up instantly. You have to decide when to shift directions, and the physical layout of library stacks according to a single principle of adjacency (e.g., subject cataloguing) is going to sustain some inquiries while simultaneously shutting down others. No amount of dynamic text search is going to put an end to the virtuous/vicious circle: their pairing represents a real constraint on knowledge – the fact that thinking is progressive, and moves on discrete pathways – rather than a technological or physical limitation to be overcome.

That is not to say that there aren’t new ways of mapping adjacencies among digitized texts. Abstract models of the contents of books such as topic models, however, do offer us other pathways in the research process; they are an additional principle of adjacency that we can invoke if we don’t want to “jump the hedge” by consulting a book’s footnotes (say) and then searching for new items based on the titles referenced there. (On topic models, see Ted Underwood’s very helpful blog post.) We have been using topic models in the Wisconsin VEP project to look at our collections of texts, and they do seem to open up adjacencies that we would never have thought about. (An upcoming blog post will deal with the relationship between the novel and English moral philosophy.) A topic model can suggest, for any given book or passage, another book or passage that might be relevant for reasons only a user could recognize (but might not be able spontaneously to supply). As with other techniques of dimension reduction (e.g., PCA, factor analysis), there may be more topics than we can name or recognize: a topic does not become a principle of association until a human being recognizes and affirms that principle in action.

If libraries are gardens with many forking paths, the hedges that separate those paths are absolutely real. Even a fully virtual, instantly re-arrangeable virtual rendering of our shelf spaces will not put an end to vicious adjacencies, since they too will become virtuous if research takes a new turn.  Our challenge is not a physical one; it’s not even computational. In a future library where any two books could be placed alongside one another in an instant, we might never find anything we want to read.

The task of library research is not simply that of poking around clusters of items on a shelf, or more grandly, finding ways of reclustering books continuously in hopes of finding the ultimate, virtuous arrangement. There is no Leibnizian, maximally virtuous arrangement of books, and never will be. (Leibniz must have hit upon this melancholy thought when he was librarian at Wolfenbuttel.)

But there are more or less definite lines of thought, each on its way to becoming other, equally definite, lines of thought. There is no point in celebrating the fact that such lines can fork off in an infinite number of directions. We know already that a researcher can only follow one of them at a time.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Quantification and the language of later Shakespeare



The written version of a paper we gave in Paris last year (2013) has just been published by the Société française Shakespeare. Here is the paper (which is in English), and here are the citation details:

Pour citer cet article

Référence papier

Jonathan Hope et Michael Witmore, « Quantification and the language of later Shakespeare », Actes des congrès de la Société française Shakespeare, 31 | 2014, 123-149.

Référence électronique

Jonathan Hope et Michael Witmore, « Quantification and the language of later Shakespeare », Actes des congrès de la Société française Shakespeare [En ligne], 31 | 2014, mis en ligne le 29 avril 2014, consulté le 07 mai 2014. URL :

Posted in Early Modern Drama, Shakespeare, Uncategorized, Visualizing English Print (VEP) | Tagged , , , | Leave a comment