Category: Shakespeare

  • Keeping the Game in Your Head: David Ortiz

    I’m not a huge baseball fan, but I did grow up in the suburbs of Boston and so like the Red Sox. Over the weekend I saw a story in the Times about David Ortiz, who went from being a fabulous home run hitter to someone who couldn’t really connect with the ball and so lost his place at the top of the Red Sox batting order. Baseball is now loaded with information, as anyone who has followed the career of Nate Silver will be aware of. (Silver established his reputation as a baseball statistician but then went on to predict congressional and presidential elections at fivethirtyeight.com.) Apparently Ortiz was drawn into the game of studying his own performance “by the numbers,” and eventually it got to his game. Only when he decided to play for the “fun” of it did his hitting power return. As a story about a player’s encounter with statistics, this one has four parts: talented hitter does well; talented hitter attempts to improve performance with statistics (reported in the Times here); talented hitter suffers from overthinking his game; talented hitter learns to play the game again by forgetting about the numbers.

    Perhaps this story is useful for thinking about the nature of statistically assisted reading. I’m not saying that using statistics to explore textual patterns drains the joy out of reading: it doesn’t, because the statistical re-description of texts is not reading in the sense that you or I would practice it. But I have had interesting experiences reading texts after I have learned something about the underlying linguistic patterns that they express. For example, when I learned that Shakespeare’s late plays contain a linguistic structure in the form of “, which” [comma, which] that distinguished them from all other Shakespeare plays, I really started to pay attention to these in my reading. I wouldn’t say that this detracted from my ability to read the text; rather it drew my attention to something else that was going on. But I also noticed that it was nearly impossible to pay attention to the linguistic patterns and to experience the meaning of that pattern at the same time. That is, I could either notice linguistic features of a play (presence of pronouns, concrete nouns, verbs in past tense, etc.) and ask why they were being used in a particular scene, or I could float along with the spoken line, feeling different ideas or emotions eddy and build as the speaker developed an image or theme. But I couldn’t do both.

    Why should there be this “Ortiz effect” in reading? Is there some kind of fundamental scarcity of attention that forbids one’s reading as a (statistically assisted) linguist and as “any reader whatever” at the same time? I’m interested in this division, but skeptical of the idea — advanced in the article about Ortiz’ return to greatness — that you can forget what you know and “just do it.” The Times article says that Ortiz became a better hitter when he learned simply to “play…as if he were a boy.” But reading is never this simple: you can’t completely forget what you know, even if you learned it through the apparently foreign procedures of statistical analysis. Perhaps you can read “as if” you didn’t know it, and then re-engage that knowledge to examine how the linguistic patterns produce the effects you’ve just experienced? My point here is that readers who are assisted by statistics must simultaneously be both versions of Ortiz described in the different articles: both the hitter and the thinker. It would be a mistake to think that “natural” reading is accomplished in a state of child-like absorption in the game, since even children are brimming with strategies and inferences. I am glad to know certain things about Shakespeare that I couldn’t have known without the assistance of statistics — like the fact that the Histories are full of concrete description and a lack of first and second person pronouns. This doesn’t interfere with my game (I hope), but shows me that the game can be played on another, as yet unknown, verbal plane.

  • More Shakespeare Outliers

    PCA Scatterplot in R of the First Folio Plays

    I’ve expanded the labels here on our PCA scatterplot in order to see a few more items. Several things worth thinking about here:

    • Late Plays are clustering in neither the Comedy nor the History quadrants explored in the other posts. The three that we see here — Winter’s Tale, Cymbeline, and Henry VIII — thus lack the dialogic interactivity we saw in comedy and the profusion of concrete nouns and description in history. This is an interesting way of thinking of the Late Plays: as lacking something that is a defining presence in the two most linguistically “obvious” genres of Shakespeare’s writing (comedy and history). We might think of genres that show up as diagonally opposed in PCA as “linguistic primes” in that they seem to be composed of nothing simpler than themselves. Those that are caught in the remaining corners (themselves lacking any opposite partner) would then be called “secondary,” since they cohere indirectly on a set of differences that are more comprehensively ordering a different part of the field. Note too that Romeo and Juliet is virtually identical with The Tempest, our last Late Play, in this plot. Both plays break the most obvious “rule” that Shakespeare seems to honor in his writing of plays — that of choosing between either First Person + Interaction strings or Description strings, but not both– and they break this “rule” in exactly the same way. Instead of choosing one of these two linguistic “forks in the road,” Romeo and Juliet and The Tempest take both at the same time, combining lots of the dialogical element we saw in Twelfth Night with the profusion of concrete descriptions (nouns, adjectives) that characterized Richard II.

    • In almost every visualization I have used of these data — Factor Analysis with various rotations, PCA — I find that A Midsummer Night’s Dream is unusual in terms of comedies. Sometimes it is grouped with the histories because it contains so much description in the passages dealing with the fairy landscape. Linguistically, this feature sets A Midsummer Night’s Dream apart from other comedies. For an illustration of what is unusual about MSND, which scores unusually high on the history component (Description) but also scores reasonably high on the comedy one (First Person/Interaction), click here.  I also find that Henry VIII is often placed away from the pack, which in this case due to its relative lack of all three types of string types tracked in this exercise — Description somewhat, but very obviously First Person and Interaction.  (For a sample passage where few of these are present, click here.) There are many reasons why this play might be distinctive — it is co-written with Fletcher, it is written at the very end of his career — but the only way to really know is to look at individual passages like the one I’ve posted and see what’s going on. Seeing what an absence of something is making possible, of course, is often more difficult than seeing what the presence of something makes possible.

    • Two very unusual Comedies are showing up in the lower left-hand quadrant, where three of the four Late Plays are located. This makes a certain kind of sense, as Measure for Measure and All’s Well That Ends Well are regularly described by critics as “problem comedies.” From a critical standpoint, this means that they lack the bouyant tone of plays like Much Ado or As You Like It or that they veer into emotions or problems that cannot really be solved by a few marriages at the end of the play (e.g., Angelo’s redemption or Bertram’s romantic rehabilitation). Of course, from a statistical-linguistic standpoint, the description of what makes these plays “unusual” would be different: they lack the First Person and Interaction strings of the high comedies while simultaneously lacking the Description strings that characterize histories. This description could be more nuanced — there are more subtle ways of characterizing these patterns if we break the plays down into smaller parts (and so can use more refined categories) — but we will do this later.

    • Tragedies are evenly spread out over the plot. This is in and of itself a significant finding; it does not mean that tragedies don’t have distinguishing traits, but that those traits aren’t tracked by the most obvious forms of coordinated variation that we can track in this corpus using Docuscope. I suspect that Matt Jockers’ most-frequent-word analysis would produce a similar result, as he and I have been finding very similar patterns in primary and secondary genre divisions using our different means. In fact, a combination of two other components (PC3 and PC5) does corner the tragedies in their own quadrant, and this will be the subject of a future post.

    So what are the rest of these dots? Below is an R biplot which shows the items plotted in the PCA scatterplot above, but instead of distinguishing them by color, it lists them by item number. (The numbers correspond to play titles, which I have also posted on the left hand side of the image; please click on the image below to open in another screen, then click again to resize to your window.) The biplot is helpful because, in addition to plotting the plays in PCA space, it shows the component loadings, which means that it illustrates the relationship between the variables counted as they vary across this corpus. The magnitude of trackable variation in individual variables (First Person, Interaction, etc.) is represented by a line in space — a vector — and its variation with respect to other vectors (other variables) is registered geometrically by the variable names (X. [Variable Name]) when they are suitably arranged around the origin. I have numbered the plays in order of composition, using the dating scheme provided by the Oxford editors. It makes for an interesting connect the dots, which represents Shakespeare’s stylistic progress throughout his career. (Note: he leaps.)

    Variables that extend opposite one another at an angle of 180 degrees are inversely correlated, while those that line up on top of one another vary with one another. Vectors that sit at right angles to one another have an interesting feature: because they are orthogonal, their variance is unrelated. So from the biplot below, we can see quite quickly that First Person and Interactivity strings tend to be found together in individual items (plays), whereas Description strings (which vary inversely with the amount of Topical Flow strings) tend to be present or absent in ways that have nothing to do with the presence or absence of the First Person and Interactivity. Another way of expressing this orthogonal relationship: behaviors among First Person and Interaction strings are (for whatever reason) indifferent to those of Description and Topical Flow strings, and vice versa. This doesn’t mean they aren’t connected on some other component (we are only looking at the first two here), but when we are thinking about the most statistically powerful description of variance in the corpus (which is captured in early principal components), this is how all of the quantities of counted things relate.

    Click on chart to enlarge; click again in new screen to resize.
    Click on chart to enlarge; click again in new screen to resize.

    A parting thought: what two plays are the most opposite in terms of style, based on what Docuscope sees and PCA can find in terms of variation patterns? Two obvious candidates would be Henry V and A Comedy of Errors, number 19 at the bottom number 8 at the top; and A Midsummer Night’s Dream and Measure for Measure, numbers 12 and 25 on the left and right.  If you’ve been following the discussion and this diagram makes sense to you — or if you’ve just read both pairs of plays — you know why they are so different.

  • Comic Twelfth Night, Tragic Othello (Part III)

    One of the aims of this kind of work is to find new things to think about or appreciate in texts that have been analyzed with traditional methods of literary criticism. But one does not always need an outside prompt like statistics to begin exploring counterintuitive ideas about how literary or dramatic texts work. Among traditional literary critics, some very distinguished readers (or auditors) of Shakespeare’s plays have argued that he sometimes builds one type of play  on the foundations of another. Susan Snyder, for example, argued in the late 1970s that there is a comic “matrix” underlying Shakespeare’s tragedies. Shakespeare, that is, built some of his tragedies — Othello in particular — on structures that would ordinarily be employed in comedy, and in doing so heightened the emotional effect of downturn in the plays when things deteriorate. There is thus a certain, almost structural irony to Othello. Some of what you see happening on stage seems to evoke the expectations of comedy (and its happy conclusions), but what eventually transpires is the opposite. While this may sound emotionally perverse, I think it is exactly what Shakespeare was up to in Othello, and I’m not surprised that a reader as careful and informed as Snyder was able to figure this out. One of the most interesting consequences of this reading is that we begin to think of genre as something dynamic: a transaction between a spectator and a company that is full of false starts, head fakes, and allusive gestures. Perhaps rather  than a recipe or essence, theatrical genre  is really an oscillation between certain generic possibilities at a given moment in time
    However we choose to think about genre, I think it is safe assume that we never encounter specimens that are “pure to type.” As with the case of illustrators of botanical species, the artist may have one or many individual specimens at hand, but the question is always whether or not to “idealize” or “mix” the specimens in order to depict the ideal type. Such types do not really occur in nature. Or if one settles on a particular example as the ideal, then it will be — strictly speaking — a class of one, since all other specimens will deviate slightly from the illustrated example.
    When we turn to the population that is mapped by Docuscope, we see immediately that Othello is not “true to type.”  Othello is placed, as perhaps Snyder would have predicted, in the same sector where many comedies gather, a sector that we have labelled comic in keeping with the classifications of Shakespeare’s editors. I repeat the diagram from the earlier post here:
    PCquadrantsCH
    Shakespeare Plays in Scatter Plot rated in Principal Components in R

    So, is Docuscope “right” in calling Othello a comedy? Was Snyder “right” in saying that the play was built on a comic “matrix”? Is there anything to be learned from the fact that Docuscope and a particularly distinguished critic agree on where Othello belongs? We should begin thinking about these questions by looking at specific passages. Below is an exchange between Othello and Iago, a dialogue between two individuals that looks a lot like the comic exchanges we examined from Twelfth Night, particularly the exchange between Cesario and Olivia. This is the beginning of what some critics have called the seduction of Othello by Iago, a seduction that culminates in Othello’s kneeling before his former servant in a new misogynistic alliance:

    Open Source Shakespeare, Othello 3.1
    Open Source Shakespeare, Othello 3.1
    Docuscope Tagged Othello 3.1
    Docuscope Tagged Othello 3.1

    The first thing to notice here is that this is yet another passage in which I/you interaction (blue and red strings) is occurring quickly, at the expense of concrete description. This is what, statistically speaking, is pushing the passage up and to the left in the scatter plot above. If there is a comic matrix here — and not just in the happy set-up of the early acts — it is, from a linguistic point of view, the continued stance that allows a “withholding speaker” (Iago) and an eager listener (Othello) to push back and forth on one another. Othello here is playing the role of Olivia in Twelfth Night, trying to delve further into the thoughts of his interlocutor (which is keeping the I/you, I/thee pronouns coming) while Iago is playing a sort of Cesario, refusing to give the speaker something he wants (and in doing so, goading the speaker on). The parallel is perverse, but it shows that a very different emotional trajectory can take shape on a similar linguistic footing, much as a dancer can perform different body movements on a similar footing or stance.

    The next passage deepens the analogy in disturbing ways. In this scene from the fourth act, we have close exchanges between Othello and Desdemona that are structurally similar to to those of the recognition scene in Twelfth Night. Notice how Othello’s complaints echo the type of complaints one hears from a Petrarchan lover, although they emerge from a type of alienation and tragic emotional development that Docuscope can’t count in its perpetual “now.”

    Open Source Shakespeare Othello 4.2
    Open Source Shakespeare Othello 4.2
    Docuscope Tagged Othello 4.2
    Docuscope Tagged Othello 4.2

    “What art thou,” Othello asks. And Desdemona answers, “Your wife, my lord; your true / And loyal wife.” Like Viola declaring who she is to Sebastian in Twelfth Night, Desdemona here is reasserting who (not what) she is in the face of something like a disguise that has been forced upon her by the accusations of Iago. She is trying to puncture the veil of Othello’s illusion. Yet, instead of the gladness of recognition, we get a strange catalogue of personal suffering, a lover’s complaint over a loss he has never really suffered. This could, in other words, be a catalogue of suffering that has ended, but instead Shakespeare writes it as a kind of torment that has just begun. Linguistically, it contains all of the strings that Docuscope sees as key in clustering this play together with others we would call comedies. But comic it is not.

    What fascinates me about passages that are anti-generic in type is that they show the deep flexibility of anything we might call a structure or matrix on the linguistic, statistical level. There is no “essential structure” of comedy here, since tragedies can exploit the same postures or stances that comedies use to comic effect. This is something a counting machine can “see,” but it is also something that a sensitive critic can see as well. But a critic might not describe that matrix in the way that I have here — as a collection of present and absent linguistic tokens classed by type — and this is where Docuscope begins to throw up new questions about the play, about genre and about reading. When Snyder said that Othello has deep affinities with comedies, was she reacting to the linguistic cues described above? Are these features “co-occurrent” with the more intensive features that she as a critic did read for? What is the nature of this co-occurrence or shared footing of particular linguistic patterns and generic types? And how much anti-typical language can there be in a play of a given type — for example, how much “comic” language can a tragedy like Othello tolerate? Finally, what does this type of linguistic borrowing say about the ways in which genre is staged, cued, and self-consciously manipulated by authors? Would it be self-defeating to say that Othello is a good tragedy because it uses comic linguistic features? This latter claim would, of course, be a matter of interpretation. But it is possible, by splitting up the plays into smaller bits or “chunks” to see how often they stray into other generic territories, and to quantify just how convergent they are with a given anti-type. Here, Othello shares quite a bit with the other comedies in its vicinity, and this high degree of linguistic similarity could be demonstrated quantitatively using something called a dendrogram.

    In future posts, we will look more at “outliers,” since this is perhaps an area where we can text what Docuscope sees against what critics would accept or have already asserted. As far as I know, no literary critic has suggested the similarity between Love’s Labour’s Lost and the histories (see below), so this might count as a “discovery” for Docuscope. In the meantime, I will begin posting on the status of these imaginary objects — the texts as coded by Docuscope and arrayed in the two dimensional space of a diagram or map.

  • Comic Twelfth Night, Tragic Othello (Part 2)

    Here is a second comic exchange from Twelfth Night. Maria’s plan has worked wonderfully. Malvolio has arrived cross-gartered and is quoting to Olivia little bits of the love letter he believes she has written to him. The blue and red strings, First Person and Interaction, are again appearing fast and thick as the incomprehension builds. As in the previous passage, which dealt with Cesario’s resistance of Olivia, we have a resistant “you” here who keeps the game going. (Had she succumbed, dismissing Maria to go practice her penmanship, the dialogue would look very different: first and second person singular pronouns would most likely disappear.)

    OSSComedy2TN

    DSComedy2TN

    A few things worth noting about the coding in this passage. Docuscope is ignoring the single quotation marks from the Moby Shakespeare. It does not matter that these words are being “mentioned” rather than “used” in the Austinian sense: all “sightings” by Docuscope occur in a kind of weird citational indicative: there is no way for the machine to catch the fact that the speaker, Malvolio, is note really telling Olivia “Go to, thou art made.” This is a flat earth in the rhetorical sense: no ironic depth can be perceived when every item is tagged because it occurs, not because its use in a certain context means a certain thing. One should not be mislead about Docuscope’s powers of interpretation here.

    Switching analogies, we might say that – like a Spinozan deity – Docuscope contemplates words from the perspective of eternity: it does not itself follow events from the standpoint of a moving present against which it measures temporally marked events as they arrive and withdraw through time. (Docuscope does not engage in phenomenological protention or retention in the Husserlian sense.) Nor does it situate events in space in any perspectivally located way. The history of what happens in the world of the play, if we were to think of it that way, is a history of “mentioned happenings.” No one does anything; rather, words are mentioned, and Docuscope keeps track of which kinds of words are used (but never how).

    Another interesting feature of the passage. Malvolio really doesn’t say anything directly to Olivia in this passage: he is talking past Maria, and is reciting to Olivia what he believes she actually wants to say to him. This sort of indirection, when it is not a group effort, also seems to be contributing to the proliferation of Interaction and First Person strings: the “how,” “what,” “what” paired with the “you” “thou” “thou.” We would expect to find a lot of passages like this in other plays that have disguise and supposition, most of all in Comedy of Errors. I suspect that in the future I will be able to put my finger on a number of passages which parallel this one in terms of their performance on the comedy factor that Docuscope found for the full plays.

    A final observation. Here and elsewhere in the play, Malvolio is often the one who supplies the Description strings, which as I have mentioned below, this play lacks in comparison with other plays (just as it has more, on average, Interaction and First Person). Is there anything about this passage that shows us why one cannot put one’s weight on both sides of this equation – Description on the one hand, First Person/Interaction on the other – in a single play or passage? Is there something about the comic posture, linguistically, that prevents such combinations? Malvolio and Feste are the two characters in the play who use the most Description strings, and during the fabulous speech in which Malvolio fantasizes about being married to Olivia while Toby and Maria look on, the linguistic texture of the scene is that of a History play. But as principal component analysis tells us, such moments of “historical” writing – oversimplified as the definition is – may occur occasionally in Comedy, but they will not occur repeatedly. Malvolio can only give so many such monologues, and Feste can only produce his rich, descriptive banter for so long.

    But isn’t it important that there is a “dash” of Description in the play, indeed, in this passage? One issue that we need to explore as we think about what it means to find “a lot” of something in a particular type of play is what it also means to find “a little” of something. Is there a sense in which things that occur in small amounts are important as well, and if so, how should we think about those “dashes” of a certain type of word?

  • Comic Twelfth Night, Tragic Othello? (Part I)

    Twelfth Night is one of the classic Shakespearean comedies and so it is unsurprising that it appears in the Comedy quadrant that we obtained in our initial analysis. What is it about the language in this play that pushes it toward this quadrant, and would we recognize this comic “itness” if we saw it in the form of an exemplary passage? That is the first question I’ll be looking in the next series of posts, entitled “Comic Twelfth Night, Tragic Othello?” But there is another, more interesting question to ask, given the results we have obtained: why does Othello look to Docuscope like a comedy? Literary critics such as Susan Snyder and Stephen Orgel have noted genealogical links to comedy in this “high tragedy,” so it is particularly intriguing to find unsupervised statistical analysis of the language coming to a similar conclusion. I will try to provide more than one exemplary passage in this series of posts, since these tend to be where the analysis gets interesting (or not).

    So, Twelfth Night. In terms of plot, it has three interesting devices — a set of identical twins,  a shipwreck, and a disguise, all of which introduce a high degree of unintentional confusion into the action, driving it forward. In a plot that is driven on by accident and what you might call “congruent misunderstanding” (when two people don’t realize that they are speaking at cross-purposes), you expect to find a lot of back and forth between characters as they synch-up their erroneous suppositions (which is funny in and of itself), then more back and forth as they backtrack in order to rehearse why they didn’t understand what was going on when they were so deeply engaged with one another. I haven’t yet looked at the color coded play as I write this, but I expect to find the comic strings at the end, where the confusion is being unravelled, and in scenes of comic abuse (which I know from experience involves a lot of “I”/”thou” exchange characteristic of comedy). The exemplars are below, one from Open Source Shakespeare, the other a screen shot of the same passage as tagged by Docuscope:

    OSSComedy1TN3-1

    DSComedy1TN

     

    The first thing I notice about this exchange is that it involves an extended miscommunication, culminating in the wonderful line “I am not what I am.”  The doubled first person is emblematic of the doubling of Viola’s person in Cesario (or in Olivia’s apprehension of Viola as Cesario). The underlined red passages refer to the Docuscope category First Person, which as we remember from the component loadings is high in all of the items on the upper half of the scatterplot.  The other type of strings that push plays upward are those underlined in blue, which are coded in Docuscope under the category of Interactions. First person is fairly self-explanatory here — look at the red items — but Interaction is worth pausing at. Notice first that question marks are being tagged here: a piece of punctuation and so not definitively Shakespearean. Maybe it matters that something that could have been added by a compositor is at work in this category, maybe it doesn’t. I don’t think question marks are as open to interpretation, grammatically, as say a comma or semicolon, but this is something for my colleague Jonathan to weigh in on. We see lots of “thee” and “thou” under Interaction, and these words seem to be the mainstay of comedy as a whole from what I’ve seen. “Thee,” “thou,” “thine,” “you,” and “your” are some of the most common words in the Shakespearean corpus that Docuscope tags, so we can be fairly sure that when we find First Person coming up as a relevant loading in a component, it is words such as “these” that are driving the underlying pattern.

    Red and blue strings are pushing mostly comic plays up toward the top of the scatterplot. Yellow strings will push plays to the right, which means that the comedies clustering in the upper left exhibit a lack of yellow or Descriptive strings. The entire component that characterizes Comedy, then, is one in which First Person and Interaction strings are mutually elevated from the mean score of all plays, while Descriptive strings are (simultaneously) below the mean. Perhaps there is a reason that a linguist could provide that would explain this pattern as a general feature of the language. That is, someone might be able to show that our language is something that can only “bend” in certain ways, making it quite difficult to use a lot of concrete descriptive nouns and words describing motion or changes in states of objects while simultaneously juggling lots of I/you, my/your strings. But this would not be enough of an explanation for me. We need to say why this type of language pattern –whether or not it is constrained by limits in our grammar, cognition, or underlying semantic maps — coincides with genre classifications made by discriminating humans (Heminges and Condell, Shakespeare’s editors).

    Returning the the passage above, I would point out two things. First, the quick trading of I/you, my/your strings in comic dialogue suggests a world in which predicates are being attached to subjects from two and only two points of view. This is not a universe of one, nor is it a crowd. It is not surprising that comic plotting — built as it is on sexual pairings — would favor this type of bivalent, perspectival tagging of action by speakers. But there is something else going on here. Olivia is trying to make something happen here. She says, “do not extort thy reasons from this clause,” and earlier, “I would you were as I would have you be.” The “thy” and “you” here are important because the speaker is trying to create or assert a particular interpretation of how these two individuals relate to one another (and the words traded between them). The essential drama in this situation is the asymmetry of desire that obtains between the two characters, an asymmetry that keeps Viola from assenting to Olivia’s advances. That resistance is actually what forces Olivia to make these statements that are rich with I/you, me/my, since she is using these words as anchors for a broader interpretation that does not yet obtain. She really wants to say we. And Cesario doesn’t, so they remain in I/you dialogue.

    So we could offer a preliminary hypothesis here. Shakespeare writes comedies in which characters, sometimes quite perversely, find the wrong way to the ones they love. Often it is chance or an onstage helper who sorts this out. Shakespeare is actually quite reserved when it comes to showing love as naturally progressing through its obstacles unassisted. But given that, in the initial stages of courtship, Shakespearean lovers almost never meet and join in a perfectly symmetrical way — they don’t begin out as stones set in an arch, leaning perfectly on a keystone — we should expect this asymmetry to show itself in the language. Where does it show up? When a resistant individual, a “you,” prevents another “I” from arriving at an interpretation of their relationship that can be referred to as a “we” before others. Let’s call this the “resistant you” hypothesis. We can perhaps test it in the next passage, and in the passages we encounter from Othello.