I have been thinking for a while now that Docuscope preserves, in its tagging structure, what a translator preserves — that this is a good definition of what it is looking to classify. One way to test this hypothesis would be to try Docuscope on a set of translations, which is what I’ve tried to do here.
The visualization above (press to rotate) shows the Platonic Corpus as translated by the nineteenth-century classicist Benjamin Jowett, rated by principal components on correlations and color coded by the divisions proposed by the great Plato scholar Gregory Vlastos (1991), whose division of the dialogues into early (red), middle (blue), and late (green) are highlighted here. (The semitransparent elipsoids are drawn to capture 50 percent of the items in the group.) Vlastos argued, on the basis of the types of arguments used in these texts, that the early dialogues represent a distinct group from those produced in the middle or later periods. The mode of argument in these earlier dialogues, he observes, is elenctic or adversative, which means that in these dialogues Socrates does not “defend a thesis of his own” but rather examines one held by an interlocutor (113). Socrates thus avoids making knowledge claims in these dialogues, instead forcing his interlocutors to enunciate them as the weakness of their own positions becomes apparent. Believing that there are two “Socrates” presented in these dialogues, Vlastos argues that the early Socrates — who likely represents the philosophical position of the historical Socrates rather than Plato — must rely on the “‘say what you believe’ rule” (113), this rule supplying the rough materials of his proofs. As epistemologist (which he is not in these dialogues), Socrates does not advance certain knowledge claims: the elenctic method will not support them.
The middle and later Socrates, by contrast, is fully willing to advance certain knowledge claims, which he seeks to present demonstratively (48). Rather than being simply a moral philosopher, he is now a “moral philosopher and metaphysician and epistemologist and philosopher of science and philosopher of language and philosopher of religion and philosopher of eduction and philosopher of art.” In these dialogues, Socrates advances a theory of knowledge as the recollection of separately existing Forms – a significant epistemological leap. This Socrates is now a spokesman for Plato, making the most important division of the corpus that between the early dialogues and all the rest.
Taking this division as a starting point, let’s look at how Docuscope divides the dialogues, which it does here simply on the basis of mean scores on all 101 of the Language Action Types. These scores are plotted in a hyperspace and then the least dissimilar items are paired using Ward’s method on unscaled data. The technique is the same as the one that produced the most effective genre clustering of Shakespeare’s plays. I am thus using what I know of a particular mathematical technique as it applies to historically accepted clusterings of Shakespeare’s plays and applying it to a body of works that is less familiar to me – not quite what Franco Moretti calls “the great unread,” but definitely a case of trying to understand the lesser known through the better known.
As you can see from the clustering of red or early period dialogues above, we can arrive at an arrangement of the dialogues using Docuscope data that is remarkably similar to the basic division in the dialogues that Vlastos argued for in 1991. But what is perhaps most interesting is that roughly the same division was arrived at stylometrically in the late nineteenth century, and that there has been at least some convergence within Plato studies of what we might call “intensive” techniques for sorting the dialogues (based on reactions of readers to the doctrines or manner of presentation) and “extensive” ones (built on groups that themselves represent the capture of stylometrically significant counted items). As Brandwood shows in The Chronology of Plato’s Dialogues (1990), it was already apparent to computationally unassisted readers of Plato such as L. Campbell that the later dialogues exhibited more technical and rare words, as well as a “peculiar, stately rhythm.” These claims were advanced with quantitative evidence (Campbell, 1867) but were grounded in an impression gathered through close and repeated reading. This line of inquiry was also taken up by the German classicist W. Dittenberger, who in 1896 argued that early and later dialogues could be discriminated by looking at the particles καἰ μήν, ἀλλὰ μήν, which co-occur in the early dialogues, and τί μήν, ἀλλὰ…μήν, and γε μήν, which co-occur in the later ones. This essentially multivariate pattern yielded the early grouping: Crito, Euthyphro, Progagoras, Charmides, Laches, Euthydemus, Meno, Gorgias, Cratylus, Phaedo. As you can see from the above, Vlastos’ groupings and those of Dittenberger overlap significantly. To this we might add the groupings derived from the Docuscope codings.
This convergence is interesting for a number of reasons. First, it shows us extensive and intensive techniques working in tandem, which raises the basic question of how these two things are related. Second, it shows us how a certain conversational style or dialogical setting connects with a philosophical position, and how may themselves become available for analysis through the counting of seemingly inconsequential particles such as μήν. The Platonic corpus is an excellent one to work with because it has been well studied, and we have the advantage of pre-computational techniques to examine alongside actual readers’ responses. In my next post, I will examine those features in the translated dialogues that – once tagged by Docuscope – seem to be doing a good job of reproducing the scholarly divisions described above.