Tag: Jockers

  • The Musical Mood of the Country

    This morning the New York Times published a story today about a group of mathematicians who are counting types of words in popular songs in order to get a handle on something like the mood of the country. In trying to data-mine mood, they do what all people who count things do: move from something that you can quantify empirically to something that you can’t. We do this as well when we move from “types of words” or Docuscope strings in Shakespeare plays to “genre.” The strings are empirically countable — they are either there in an established corpus or they aren’t — but one must argue for any connection between what is counted and what such counts represent (genre, mood, etc.). The point I have tried to make on this blog is that the connection is interpretive, and so relies on the hermeneutic skills of the one proposing the link.

    In the abstract for the paper, recently published in the Journal of Happiness Studies, they write that: “Among a number of observations, we find that the happiness of song lyrics trends downward from the 1960s to the mid 1990s while remaining stable within genres, and that the happiness of blogs has steadily increased from 2005 to 2009, exhibiting a striking rise and fall with blogger age and distance from the Earth’s equator.” This is an interesting finding, particularly the part about blogger age and distance from the equator. One of the selling-points of their analysis is that the data they have obtained is voluntarily supplied, and so perhaps less subject to the social pressures that accompany surveying. I would want to know, on this score, whether a song-title (for example) is subject to other types of pressures. For example, the songwriter is not just “reporting” an inner state by naming a song in a particular way — take the Ramones song, “I Wanna Be Sedated” for example — but offering this title to an audience. Song-names are rhetorical, and so subject to a different set of pressures than “reporting.” There is another kind of self-interference here that doesn’t seem to be taken into account.

    One of the lead researchers on the paper, Peter Sheridan Dodds, argues that data supplied voluntarily on the web can serve as a kind of “remote sensor of well-being.” (I remember hearing similar arguments made about baby names a while back; you don’t have to pay for them and they’re important: therefore they are a good measure of national feeling and trends.) For example, teenagers appear to be the least happy because they more frequently use words such as “sick,” “hate” and “stupid.” Wouldn’t it be more interesting to track how the use of these words (or absence of them) compares to groups of populations that teenagers themselves describe as “unhappy?” My inclination here would be to use data-mining techniques to assay and re-describe classifications made by a given social group in terms that they may not necessarily be aware of. Then the factual claim would be: when teenagers describe someone as happy, that person is x% less likely to use words like “sick,” “hate” and “stupid.”

    I can imagine the authors of the Music-Mood study making the following set of claims:

    Claim 1) Research on web-logs, lyrics and other sources of expression show that words like “sick,” “hate” and “stupid” occur more frequently in a representative group of works by teenagers. This would be the empirical claim.

    Claim 2) People who are experiencing a mood such as “well-being” are less likely to mention words like “sick,” “hate,” and “stupid” in unprompted work such as songwriting or blogging. This is an interpretive claim that must be argued for.

    Claim 3) Teenagers are less likely than others to be experiencing a mood of well-being. This is logically true if you accept 1 and 2.

    Now, what’s interesting about 2 — the interpretive claim — is that it could be made without numbers. In a sense, you either believe this or you don’t. Which begs the question, what exactly are the numerical claims doing in this argument? What if claim 2 is “kind of true,” or “true only among certain people”? Would this mean that “kind of a lot” of teenagers are unhappy?

    I would be more comfortable saying that teenagers use more of the following words (“hate,” “stupid”), and that a close look at the contexts in which they use them (which can never be comprehensive) suggests that their use is connected to mood in the following way (e.g., their use allows teenagers to gain social attention by citing negative emotions, their use indicates depression, their use indexes the presence of Goth subculture, etc.). But I would want to know how the words are used rather than simply making inferences from the fact that they occur. The counter-argument here is that the law of large numbers guarantees that even if there is a wide variation of uses of the words (granting, in effect, that not all occurrences are “reports” of mood), there is nevertheless a broad enough pattern to make a generalization. Fair enough, but what numbers are you going to use to make the generalization?

    I’m all for the empirical investigation of abstract concepts like happiness, genre, authorial intent. These higher order concepts don’t come from outer space: we create them to capture some suite of characteristics we find in reality or in ourselves. But the Music-Mood analysis lacks a crucial ingredient: an explicit human judgment about the classes that are being measured by the tokens that are being counted. Unless you make that judgment explicit — saying something like “x% of people who experience what persons y and z would describe as ‘well-being’ also produce unprompted work containing these words — you are really just saying that “a lot” of people who we think are happy do this.

    Naming something with a word is a way of creating a class of things (as long as that word is not a proper name), and it is classes of things that are correlated quantitatively using statistics: quantities of classes of words in classes of works, for example. In any such analysis, the classes themselves cannot be derived empirically. They have to be specified in advance by appealing to experience, common sense, expertise, or the like. What troubles me about the Musical Mood analysis here is that the rationale for membership in the class of words indicating “well-being” is not spelled out, and perhaps never could be. I would rather ask someone — an expert? a teenager? — to name people who experience well-being and then do one of Matt Jockers’ most-frequent-word analyses on their lyrics or blogs in order to get at the underlying pattern. It’s fine to begin with a set of words whose occurrence indicates (to you) a feeling 0f well-being, but without knowing quantitatively how indicative they are, the numbers are just another kind of adjective. You might as well read a bunch or web pages and decide for yourself. 

    My guess is that you would conclude that teenagers write like teenagers rather quickly.

  • King or no [King]

    I wanted to say a little about a problem we encountered early on when we began counting things in the plays, a problem that gets us into the question of what might be a trivial versus a non-trivial indicator of genre on the microlinguistic level. Several years ago Hope and I began a series of experiments with the plays contained in Shakespeare’s First Folio, feeding them into Docuscope — a text-tagger created at Carnegie Mellon — to see if we could find any ordered groupings in them. The results of that early work were published in the Journal for Early Modern Literary Studies in an article called “The Very Large Textual Object: A Prosthetic Reading of Shakespeare.”  I will say more about Docuscope in subsequent posts, but suffice it to say here that it differs from other text-taggers in that it embodies a phenomenological approach to texts.  (For the creator’s explanation of how it works, see an early online precis here.)  Docuscope, that is, codes words and “strings” of words based on the ways in which they render a world experientially for a reader or listener.  The theory behind how texts do this, and thus the rational for Docuscope’s coding strategy, is derived from Michael Halliday’s systemic-function grammar.  But what is particularly interesting about Docuscope is the human element involved in its creation.  The main architect of the system, a rhetorician named David Kaufer, spent 8 years hand-tagging several million pieces of English according to their rhetorical function, and then expanded out this initial tagging spread with wild-card operators so that Docuscope now classes over 200 million strings of English (1 to 10 words in length) into over 100 distinct categories of use or function.

    Obviously there is a lot to say about the program itself, which represents a “built rhetoric” of sorts, one that has emerged through the interplay of one architect, his reading, and the texts he was interested in classifying.  In any event, when Hope and I fed the plays into Docuscope, we had to make some initial decisions, and the first was whether to strip anything out of the plays we had obtained from the Moby online version.  (We were already thinking about the shortcomings of this conflated, edited corpus as opposed to the text of the plays as it exists in various states in the First Folio, but we had to make do since we were not yet ready to modernize the spelling of F and decide among its internal variants.)  So with the Moby text, we had things like Titles, Act and Scene Numbers, and Speech Prefixes (Othello, King Henry, Miranda, etc.).  The speech prefixes created the greatest difficulty, because in the history plays the word “King” is, as you can imagine, used an awful lot — it appears in the speech prefixes of characters over and over.  And because Docuscope tagged “King” as one of its visible tokens (assigning it to the “bucket” named “Common Authority”), this particular category was off the charts in terms of frequency when it came time to do unsupervised factor analysis on the frequency counts obtained from the plays.  (I’ll post more on factor analysis in the future as well.)

    Here’s the issue.  In the end, we decided that it was “cheating” to let Docuscope count “King” in the speech prefixes, since this was a dead giveaway for History plays, and we wanted something more structural — something more buried in the coordination of word choices and exclusions — to serve as the basis of our linguistic “recipes” for Shakespeare’s genres.  As the article shows, we were able to find such a recipe without relying on “King” in the speech prefixes.  Indeed, subsequent research has shown that plural first person pronouns combined with a the profusion of concrete, sense objects are really the giveaway for Shakespeare’s histories.  (They are also “missing” certain things that other genres have: this combination makes histories the most “visible” genre, statistically speaking” that he wrote.)   But is it really fair to decide that certain types of tokens — King in the speech prefix, for example — are superficial marks of history as a genre, and so not worth using in an analysis?  Isn’t there a certain interpretive bias here, one that I have and in a sense want to argue for, against the apparatus of the play in favor of something like a deeper set of patterns or stances?  To argue for such an exclusion, I would begin by pointing out that they are an artifact of print and are not “said” (even if they are used) in performance, but there is still something to think about here.  

    A Google search algorithm looks for the “shortest vector” or easiest “tell” that identifies a text as this kind or that — even if it is one of a kind.  But those of us who are interested in genre must by definition not be interested in the shortest vector or the easiest tell.  We are looking for the longer path.  The book historian in me, however, says that apparatus is important, and that “accidental” features never really are.  So this is something I want to think more about.