Google n-grams and Philosophy: Use Versus Mention

Well, the Google n-gram corpus is out, and the world has been introduced to a fabulous new intellectual parlor game. Here are a few searches I ran today which deal with philosophers and philosophical terms:

A lot of people are going to be playing with this tool, and I think there are some genuine discoveries to be made. But here is a question: is what’s being counted in these n-gram searches “uses” of certain words or “mentions” of those words? The use/mention distinction is a favorite one in analytic philosophy, and has roots in the theory of “suppositio” explored by the medieval Terminists. It is useful here as well. The google n-gram corpus is simply a bag of words and sequences of words divided by year. So what does it mean that an n-gram occurs more frequently in one bag rather than some other? Does philosophy become more interested in “the subject” as opposed to “the object” around 1800? (Never mind that these terms have precisely the opposite meaning for medieval thinkers.) Does Heidegger eclipse Bergson in importance in the mid-1960s? Does “ethics” displace “morality” as a way of thinking about what is right or wrong in human action?

These are different cases; in each, however, we ought to read the results returned from the n-gram corpus search as “mentionings” of these terms. Understanding how these words are used, and in what kinds of texts, is much more difficult than saying that they are mentioned in such and such a quantity. The important question, then, concerns what can you learn from the occurrence or mention of a word in a field as wide as this. I think the mention of a proper name like “Heidegger” is probably more revealing than the mention of a particular philosophical term like “subject” or “object.” While it’s not an earth-shaking discovery that Heidegger gets more mentions than Bergson in the latter half of the twentieth century, this fact is nevertheless interesting and useful. In the case of terms such as “subject” and “object,” however, we are dealing with terms that are regularly used outside of philosophical analysis: they may not have a “philosophical use” in the cases being counted. Another factor to consider: the name Heidegger likely refers to the German philosopher, but it could also point to other individuals sharing this name. The philosopher Donald Davidson, for example, who spent a lot of time thinking about the use/mention distinction, would not necessarily be picked out of a crowd by a search on his surname. Even with a rare proper name we can’t be certain that mention accomplishes something like Kripke’s “rigid designation.”

We could get closer to a word’s use by trying a longer string, something along the lines of Daniel Shore’s study of uses of the subjunctive with reference to Jesus, as in “what would Jesus do?” ¬†When it is embedded in the string’s Shore identifies, the proper name Jesus seems designates its referent more precisely. So too, the word “do” refers to the context of ethical deliberation, although even now there are ironic uses of the phrase that are really “mentionings” of earnest uses of these words by evangelicals. The special use-case of irony would, I ¬†suspect, be the hardest to track in large numbers. But there may be phrases that are invented by philosophers precisely in order to specify their own use, which is what makes them reliably citable or iterable in philosophical discourse. Terms of art, such as “a priori synthetic judgment,” are actually highly compressed attempts to specify a writer’s use of terms. As use-specific strings, terms of art are likely to produce use-specific results when they are used as search terms. Indeed, it seems likely that most philosophers are actually doing a roundabout form of mentioning when they coin such phrases. Such moments are imperative contracts, meaning something like: “whenever you see the phrase ‘a priori synthetic,’ interpret it as meaning ‘a judgment that pertains to experience but is not itself derived experientially.'”

It would be nice if we could see occurrences displayed by subject heading of book. That would allow the user to be more precise in linking occurrence claims to use claims, a link that must inevitably be made in quantitative studies of culture. I suspect it is much harder to link occurrence to use than most people think; this tool may have the unintended use of bearing out that fact.

