My first reaction to these articles was that it would be hard to track something as high-level as influence with word counts.
But then I remembered that I would have said the same thing about genre several years ago.
I suspect the problem is not really the ability to track influence – let’s assume, for the sake of argument, that it leaves textual evidence just like genre does.
The problem is that I don’t think we have a stable definition of what influence is – so it is hard to make an educated guess about which countable features might track influence. And that is a problem for literary scholars to solve, not computer scientists. (Though I might be proved wrong on this when Matt Jockers’ book comes out.)
With genre, we have lots of theories of what it is, and, more importantly, many lists of plays put into generic categories for us to test. So we had real things to test when we started to look at the linguistic fingerprints of generic groups, even though we did not then know what features might encode genre at the level of the sentence.
The problems with defining ‘influence’ became clear to me when I read this piece:
on the ‘influence’ of Jane Austen on George Eliot, which even cites empirical evidence in terms of demonstrating Eliot’s interest in, and re-reading of Austen. I think it goes very well with the NYT piece on Jockers, and gives a sense of the slippery nature of ‘influence’.
The piece argues for Austen’s influence on Eliot (no Austen, no Eliot), but this influence isn’t really described – an unkind parody of it might say that the argument boils down to ‘Austen influenced Eliot to write long prose narratives which are called novels’.
Indeed, the article spends more time talking about the differences between the two, rather than the similarities.
So what exactly is ‘influence’ in literary terms? Jockers’ work identifies similarity with influence (or at least the newspaper reports do) – and this is a sensible first step. But is literary influence the use of similar frequencies of function words? Or does similarity at a higher level (plot, character-choreography, the use of certain types of point of view) produce similarities in linguistic frequencies?
Des Higham, professor of statistics at Strathclyde is doing some interesting work producing algorithms to track the ‘influence’ of certain twitter-users over others, following responses to TV programmes. But with Twitter, you can measure ‘influence’ in terms of retweets. In what sense was George Eliot re-tweeting Jane Austen?
UPDATE: There is a *very* interesting discussion of influence and Matt Jockers’ work here (Bill Benzon’s blog New Savanna).