These two visualizations spark two interesting questions: What do people read during a revolution? What is the connection between what people read and political events? Both images spike dramatically around moments of upheaval in the Western World: The English, American, and French Revolutions, the mid-19th-century Europe-wide overthrow of governments, and World War I, to name just a few. These images are all the more striking because they did not arise from a historical study of warfare or publishing, but from a more workaday task—that of categorizing all books from 1600 to 2010 according to Library of Congress subject headings. (The source of the data was Google’s catalog of books as of 2010.) The visualizations were shown in passing during a 2010 meeting between researchers at Google, where the data had been produced, and a group of humanities scholars and advocates, who were meeting with the Google team to exchange ideas. When Google’s Jon Orwant flashed this image on the screen, the professors in the assembly gasped. Genuinely gasped. We could see in this visualization of data things that had been debated for centuries, but that had never been seen: a connection between the world of print and the world of political action, a link between revolution and reading a certain kind of book.
We are experienced readers of books, book history, and—we like to think—of book diagrams. Humans invented stream charts well before the age of computing; this style of conveying information is at least 250 years old and draws on sources that are even older. (See Rosenberg and Grafton, Cartographies of Time, 2010.) However, the union of technologies—modern cataloging systems, the increasingly systematized concatenation of library catalogs worldwide, and the capacity to render data chronologically in the style of a geological diagram—produces a compact vision of Western print culture hitherto unseen. Simple in execution, the visualization prompts new thinking.
Like any metaphorical or mathematical rendering, the diagram below should be read with care: the strata are normalized so that the spikes do not necessarily indicate a greater number of books published, but rather a shift in the proportion of books composed of a given subject. A spike in one layer of the diagram can give the illusion that all strata of the diagram have increased in size, a trick of the eye that the mind needs to combat. The second visualization helps with this by zooming in and thereby singling out the area of greatest mathematical change, but it, too, needs to be viewed critically.
Now that the caveats have been put to one side, we return to the original questions and then offer a reformulation.
What are people reading during a revolution? Poetry? Books on military technology? Theology? No. If we take the first spike, the years leading up to the English Revolution, the answer in the years leading up to the 1642 regicide seems to be “Old World History.” The second chronological peak—in the decades around the American (1776) and French (1789) Revolutions—shows the same pattern. In periods that historians would link to major political upheaval, the world of print shows similar disruptions: publishers are offering more history for readers who, perhaps, think of themselves as living through important historical changes.
We should be precise: these data don’t indicate that more people are reading history, but that a higher proportion of books published by presses can be classed by cataloguers as history. There are many follow up questions one might ask here. Does publication tie strongly to actual reading, or are these only loosely connected? Are publishers reducing the number of books in other subject areas because of scarcity of resources or some other factor, which would again lead to the proportional spikes seen above? Are the cataloguing definitions of what counts as Old World History or history in general themselves modeled on the books published during the spike years?
One has to ask questions about the size and representativity of the dataset, the uniformity of the classifications, and the nature of the spatial plot in order to understand what is going on. And, crucially in this case, one has to have the initial insight—born of a reading knowledge of history itself—that the timing of the spikes is important. But if you’ve got that kind of knowledge in the room, you might see something you haven’t seen before.
9 Comments
Thanks for this post, I hope it opens up a good discussion. My thoughts on this issue were lengthy enough that I decided to write a full blog post at Publick Occurrences. The upshot is that I worry that graphing books contained in the Google Books database obscures a whole range of data about publishing and reading from the period of each of the upheavals you cite that would give us much better information about what people were reading (and publishing, for that matter).
Thanks for your comment. We all know that the Google data set is patchy, particularly before 1600, and so these must be seen as provocative impressions rather than definitive proof. As we note at the end of the post, the power of the diagram is limited by the size and representativity of the dataset. That would be the next thing to understand.
I just put up a post too–in short, although I agree with the general argument about what we can do with data like this (though more open than Google’s actual data), the location of the spikes in English and French history make me suspect this is mostly an artifact of library shelving patterns for historical documents, not actual book publication.
This is an extremely interesting response. I agree about the need for open metadata: without it, we’re lost. I am also interested in the cataloging issue, which is why I added the caveat about catalogue definitions growing out of moments that are perceived as being exemplary of the thing to be catalogued (ie, the spikes). But I had understood the circularity differently: I thought that the initial cataloging sweep might begin with the thought, “Now I want to define the subject heading for French History, so I’ll start by considering documents produced around the French Revolution.”
Ben’s explanation is much more elegant, and opens up some new ways of thinking about the vaguaries of subject classification that I hadn’t considered. I have written to Jon Orwant to see if there’s more he can share about the contents of the data set when the visualization was created. I’ll post anything I learn.
A thought: if this reflects major political upheavals, why is there no spike at all in 1848? There’s a small spike on the main chart around 1850, but this apparently stems from American history; European history shows no change at all…
As promised, Michael emailed me, but I didn’t respond promptly.
The visualization I’d showed was derived exclusively from the metadata feed of the Library of Congress. The LoC catalog doesn’t restrict itself to American or even English language books, but likely does have some sample bias (as every union catalog does).
Mike alerted me to this conversation in hopes that as his head of cataloging at the Folger, I may be able to shed some light on use of the “–History” subdivision in the Library of Congress subject heading system and how that relates to the question at hand. As it happens, I’ve long been vocally dissatisfied with the provisions regarding this particular subdivision. The contradictory use of “History” in LC subdivision practice makes it impossible to rely on the kind of analysis implied by the visualization (insofar as I understand it). Ben Schmidt has come close to putting his finger on the problem when he writes that “This means works will get filed as “History” which are historically important, but which are not works of history by historians.”
The formulation and application of Library of Congress subject headings are guided by substantive and detailed documentation. Although subject headings are easily the most subjective [heh] and unpredictable part of the bibliographic record, there are nevertheless formal, prescriptive rules to be followed for correct application. (How faithfully these rules are followed by LC staff, let alone by other libraries applying Library of Congress subject headings, is another topic entirely.)
Subject heading strings are composed of the primary heading, and where warranted, one or more subdivisions that narrow the scope or application of the heading. Subdivisions can be of two types: as part of fully-established strings: e.g., Hundred Years’ War, 1339-1453 — Campaigns — France (http://lccn.loc.gov/sh00009369), or in strings made up of a heading with appended free-floating subdivisions: e.g., Chocolate –Therapeutic use –Early works to 1800 (http://shakespeare.folger.edu/cgi-bin/Pwebrecon.cgi?BBID=76698), the latter consisting of a topical heading followed by two free-floating subdivisions.
And now to the source of our woes. The “History” subdivision carries two entirely different meanings, depending on whether it’s incorporated as part of an established string, or whether it’s appended as a free-floating subdivision. As part of an established string, it signifies works about an earlier time period, regardless of date of publication or whether they can properly be considered “history”; as a free-floating subdivision, it indicates an historical treatment of the topic, again, regardless of the date of publication.
Historical periods are typically established as complete strings: e.g. Great Britain –History –Civil War, 1642-1649 (http://lccn.loc.gov/sh85056792); Europe –History –1848-1849 (http://lccn.loc.gov/sh85045711). Works contemporary with historical events are given this type of subject heading, even though the works themselves are not historical in nature. Likewise, works about these historical periods are given the same subject headings. The way LC subdivisions are constructed, therefore, there is no easy way to make a distinction based on subject headings alone between contemporary works about a period and later historical treatments.
Given all this, and especially in the absence of basic information about the search strategies and algorithms used by Google Books to obtain their results, not to mention how they define “General and Old World History”, I do not believe the Google interpretation can be supported.
Obviously we’ve found the right person to look at this question. There are two issues here: are the LOC subject headings too unstable to provide a meaningful sense of what “–History” refers to; is there any way to understand how Google books applied LOC subject headings to the books they scanned, assuming that their corpus is itself incomplete? As I understand the visualization, a book falling into “–History” that is written at a later date is not captured in the spikes we observe, since these are not books/items whose publication date falls within the years covered by that spike. So, assuming that Google has simply *applied* LOC subject headings to scanned items for which a match could be made, the spikes would be accounted for by cataloguers who had already applied the LOC subject headings to those items in LOC cataloguing efforts. Given what Deborah has said about the fluidity of that category, there is more than enough latitude for the kind of “inclusion” that Ben refers to.
I have another question. If we wanted to *test* this, what would be the best way — and where would the data be — to see what kinds of items are actually qualifying under this subject heading in Google Books? Is this something we could ask Google for? Surely they have some stake in seeing this question answered.
Bravo to Ben and to Deborah for these thoughts. Perhaps we should have called our original post, “What do Cataloguers Catalogue (Long) After a Revolution?”
Perhaps one place to start, at least in the Folger collection’s contemporary material classed as Great Britain–History–Civil War, 1642-1649, would be here:
http://shakespeare.folger.edu/cgi-bin/Pwebrecon.cgi?ti=1,0&Search%5FArg=Great%20Britain%2D%2DHistory%2D%2DCivil%20War%2C%201642%2D1649&Search%5FCode=SUBJ%5F%2B&CNT=50&REC=0&RD=0&RC=0&PID=nWgPTcr_v0PWqakYekrxz3U&SEQ=20130226114759&SID=1
2 Trackbacks
[…] Witmore and Robin Valenza have a fascinating post up this morning at Wine Dark Sea asking, “What Do People Read During a Revolution?” They ran a visualization based on Google Books’ massive database (as of 2010) categorized […]
[…] had a request for a clearer version of the image we received from Jon Orwant at Google discussed in our post last year, which shows changes in the catalogued subject of Library of Congress books over the course of […]