What is influence?

Offline, we’ve been discussing The New York Times‘ article on Matt Jockers’ work, and the notion that iterative/digital analysis might be able to track literary influence.

My first reaction to these articles was that it would be hard to track something as high-level as influence with word counts.

But then I remembered that I would have said the same thing about genre several years ago.

I suspect the problem is not really the ability to track influence – let’s assume, for the sake of argument, that it leaves textual evidence just like genre does.

The problem is that I don’t think we have a stable definition of what influence is – so it is hard to make an educated guess about which countable features might track influence. And that is a problem for literary scholars to solve, not computer scientists. (Though I might be proved wrong on this when Matt Jockers’ book comes out.)

With genre, we have lots of theories of what it is, and, more importantly, many lists of plays put into generic categories for us to test. So we had real things to test when we started to look at the linguistic fingerprints of generic groups, even though we did not then know what features might encode genre at the level of the sentence.

The problems with defining ‘influence’ became clear to me when I read this piece:


on the ‘influence’ of Jane Austen on George Eliot, which even cites empirical evidence in terms of demonstrating Eliot’s interest in, and re-reading of Austen. I think it goes very well with the NYT piece on Jockers, and gives a sense of the slippery nature of ‘influence’.

The piece argues for Austen’s influence on Eliot (no Austen, no Eliot), but this influence isn’t really described – an unkind parody of it might say that the argument boils down to ‘Austen influenced Eliot to write long prose narratives which are called novels’.

Indeed, the article spends more time talking about the differences between the two, rather than the similarities.

So what exactly is ‘influence’ in literary terms? Jockers’ work identifies similarity with influence (or at least the newspaper reports do) – and this is a sensible first step. But is literary influence the use of similar frequencies of function words? Or does similarity at a higher level (plot, character-choreography, the use of certain types of point of view) produce similarities in linguistic frequencies?

Des Higham, professor of statistics at Strathclyde is doing some interesting work producing algorithms to track the ‘influence’ of certain twitter-users over others, following responses to TV programmes. But with Twitter, you can measure ‘influence’ in terms of retweets. In what sense was George Eliot re-tweeting Jane Austen?


UPDATE: There is a *very* interesting discussion of influence and Matt Jockers’ work here (Bill Benzon’s blog New Savanna).

This entry was posted in Counting Other Things and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted January 29, 2013 at 12:06 pm | Permalink

    Like you, I’m waiting for the book to come to any firm conclusion on this.

    But, also like you, I tend to think “influence” is pretty hard to demonstrate even in cases where we have clear biographical testimony that so-and-so read so-and-so. It’s hard to imagine that it will become any easier (to demonstrate, or define) with digital evidence.

    On the other hand, I’m not sure that it’s impossible to demonstrate. And if I wanted to make a guess about influential figures in the history of 19c fiction, I might well say “Austen and Scott.”

    To make a long story short, I really want to read Macroanalysis, and also hear what reviewers have to say about this aspect of its argument.

  2. Posted January 29, 2013 at 2:14 pm | Permalink

    Jonathan and Ted,

    Thanks for keeping the ball rolling (and plugging the book!). Let me take a stab at unpacking what I mean by “influence.”

    I begin chapter 9, titled “Influence,” with a quote from Osip Brik: “In every period there is a certain number of artistic methods and devices available for creative use. Changing these methods and devices is not a matter of the individual author’s volition, but is the result of the evolution of artistic creativity.”

    It’s this business of “the evolution of artistic creativity” that I’m really interested in. After some set up chit chat I get around to saying why I don’t like the word “evolution”: I write, “. . . books are not organisms; they do not breed. The metaphor . . . breaks down quickly, and so I do better to insert myself into the safer, though perhaps more complex, tradition of literary ‘influence’. . . ”

    I then suggest that real influence can’t in fact be assessed or measured; we can hypothesize that x was influenced by y, but we can never know (i.e. impossible to demonstrate). Absent of a method to demonstrate real/true/etc. influence, I argue that “similarity” is a good proxy. I then lay out how why I think thematic/stylistic similarity is a good proxy and how I measure it.

    That bit about what I choose to measure, then, is all debatable, but the really cool thing was discovering that the system, “the literary ecosystem” (if you will), seems to contain forces that impact the extent of authorial creativity–just like Brik predicted in his comments about “the evolution of artistic creativity.” In this context I write a bit about the theory of information cascades and then spend some time trying to backwards engineer the system in order to get at the forces that are influencing change in the system.

    And this is how Austen and Scott end up being, perhaps, key figures in the system: a lot of the “signals” in the system after Scott and Austen seem to have elements that originated with Scott and Austen. And, there don’t seem to be many signals similar to those of Austen and Scott in the system prior to Austen and Scott.

    Now if there is a weakness in my assessment it is just that: I don’t have a huge number of texts in my corpus from the 18th century. Some, but not a lot, and so it may be found that Austen and Scott are really forth generation children of the real literary Adam and Eve. . . So, this is why I’ve been working to build a much larger corpus that goes back to 1700 and, more importantly, forward to 2012!

    Look for a big announcement about that from UNL in the next week. . . .(and I’ll tweet it, too).

  3. Posted January 29, 2013 at 2:54 pm | Permalink

    In a related vein, some work I’ve done on computational assessment of allegory has suggested (to me, anyway) that the main problem is that we don’t really know what allegory is. It’s hard to craft an algorithm that can discriminate between allegorical and nonallegorical texts when you can’t even get scholars to agree that Pilgrim’s Progress and Midnight’s Children are allegories.

    On the plus side — and ignoring for a moment the computational tractability of influence — there might be a lot of value in getting literary folks to think again about what they mean by that term. That said, I suspect that some version of literary influence is indeed suitable for computationally assisted analysis.

  4. Posted January 30, 2013 at 10:31 am | Permalink

    Good point about genre, but as I’ve learned from working with you over the past few years, the important thing:
    it’s a problem for literary scholars to solve, not computer scientists

    The word counts didn’t tell you genre.

    They were a combination of:
    1. a correlate with genre
    2. a ways of finding clues for the details to look at

    With influence, this is what you should expect (at best).

    With influence, there are kinds of influence at the “micro” level that
    word analysis (even simpler than topic models):
    – i see a word/phrase that I never saw before. i like it, and i use it myself.

    We could look for this micro-level influence. I could use the N-Grammer, see that some word phrase was in a small set of books 1 year, then in a larger set the next. I could see that there’s a set of n-grams really uncommon in generally, but really common in two books (where one comes after the other).

    Can i claim that the later author was “influenced” by the first?
    probably not: but it could be a clue for one of your to say “hmmm, maybe
    author B would have read A”
    Maybe if there were enough of these, and they were so uncommon (here are
    10,20,30,… 5-word phrases that occur in exactly 2 books), you’d buy a
    pure stats argument.

    But, i’m guessing these micro-level things aren’t what’s really interesting.
    (well, nowadays people are studying meme propagation in social networks,
    but i can’t imagine that the micro-level things are that interesting at
    the scale of books over decades – we hear/read too many things where we catch a word or two)

    The macro level correlates may have similar properties. But they are harder for me to point to in the books, so its harder for this to serve as a clue to you to do the deeper thinking.
    But, we’re working on that…

  5. Posted January 30, 2013 at 9:26 pm | Permalink

    If I said, “Patterns A, B, and C exist after writer X and not before (for a given corpus),” I might be saying that those patterns originate with X, and their repetition is a kind of influence. If these patterns only occur midway through X’s career and persist to the end of that career, would I be saying that X, in repeating the pattern, is by the same logic being influenced by him or herself? On what grounds would we dismiss this intra-authorial extension of Matt’s inter-authorial influence standard?

    Bravo, Matt. I want to read your book.

  6. Posted January 31, 2013 at 11:58 am | Permalink

    Very interesting stuff. I have to say I’m especially looking forward to “big announcement from UNL about enlarged corpus — forward to 2012.” That sounds coolish!

    Also just to underline what Matt Wilkens was saying about genre. I’m finding the same thing when I work on genre classification. The limit you come up against isn’t so much that algorithms are imperfect or reductive. It’s that we don’t understand “genre” very well in the first place.

    Is a periodized genre like “the silver-fork novel” the same *kind* of category as “satire” or “blank-verse tragedy”? Where does “the gothic” fall on that spectrum? Text-mining might eventually clarify some of these issues, but even before it does that, I think it’s going to reveal how imperfectly we understand some basic literary categories.

  7. Jonathan Hope
    Posted May 13, 2013 at 9:34 am | Permalink

    A link to Ted’s post discussing a paper that looks at ways to get at ‘character’ computationally:


    highly recommended

  8. Jonathan Hope
    Posted May 16, 2014 at 6:30 am | Permalink

    here is Lydia Davis being very interesting on influence, with a cat:


Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>