Category: Quant Theory

Texts as Probability Clouds

Electron probability density cloud of hydrogen atom

We have thought a lot about what a “text” is in literary studies over the last few decades, spurred on by editorial theory, deconstruction, new media studies and book history. A nominalist by inclination, I tend to think of a text (real or digitized) as a provisional state of something, this other something being a hypothetical ideal or a fiction of analysis. So when I encounter a print version of a Shakespeare play, I am encountering an entity (for example, 1 Henry VI) in a state that is more or less suitable to the medium of print. But the printed play is not the performance. Nor is it whatever idea Shakespeare had when he began working with his company on the play.

An additional complication: versions of any given Shakespeare play in print — those found in the 1623 First Folio — may contain variation at the level of the individual word or character, variation that (in the case of the Folio) is corrected during the print run. Whatever is “behind” the First Folio, then, that original is a reconstruction of something that can only be said to exist in an ideal sense. We can think of that meta-object as having a probabalistic character: different letters in particular positions have a likelihood of being x or y, for example. But in the end, the actual identity of even an individual character must be understood as a likelihood.

None of these ideas except the last is particularly novel in Shakespeare studies. Peter Stallybrass and Margareta de Grazia, among many others, have already made the point that the sources behind Shakespeare plays are an editorial ideal — approximated in practice but unreachable in an ideal sense. Less has said, however, about the probabalistic nature of the text itself: its existence as a set of likelihoods that realized provisionally in different cases. A text as a cloud of probabilities. That’s interesting.

December 26, 2010
Penalty Kicks and Distributed Movement

Gabriel Dias, graduate student at RPI, has recently modeled the way in which penalty kickers move their bodies as they prepare for a shoot. His findings suggest that there are several “tells” – for example, the angle of the hips, or the position of the planted foot – which predict the ultimate direction of the shot. In the PBS interview that I’ve linked to above, he alludes to the existence of “distributed” movements which show the physical commitment of the kicker to one outcome or another. I hear the word distributed and I immediately think, “integrated physical system,” like a body that is constrained to do certain things because of the way its different parts interact. We see this integration in the competitive world of athletics and the expressive realm of dance. (Perhaps the adjectives here should be reversed?)

In our analysis of texts, we have also find distributed movements of a sort. We find, that is, that certain types of words tend to move with each other in some genres, and others move away from one another. Does this mean that genre is a physical system like penalty kicking, and that our explanation of these distributed movements – of words rather than points on a body – are themselves grounded in a physical reality? I have myself offered analogies to describe this “commitment of weight” in the process of using words to do certain things: if you want to write a Shakespearean comedy, there are certain things you are likely to do: you will tend to use more first and second person singular pronouns and less description than you would in, say, a history play. If Docuscope is the goalie/keeper, it may need only 30 or 40 lines to decide that the ball is going to go toward comedy rather than history. Other things will be ignored as incidental. If say that tagging a play and watching how its “points” move in a mathematical space is like biotagging a kicker and studying his or her movements, I am proposing an analogy. Like kicking, writing is a behavior. In certain situations (penalty kicking, writing for the stage), some aspects of this behavior are signal or cardinal — position of hips, use of pronouns –while others are inessential, like the curve of the kicker’s index finger. (Actually, given the dynamism of the human body, I would be surprised to find out that there is not, on some level, a connection between finger position and kick.)

So, does this mean I am advocating an essentially structuralist account of genre? Am I saying that, because language use is a behavior, then writing in a particular genre is also a behavior with certain “tells” that are, in a sense, built into the physical system of writing? I think people who are doing iterative criticism need to have an intelligent answer to this question, complete with an analysis of its underlying analogy. My answer would be that writing fiction in a historically bound literary field does, like penalty kicking, count as a behavior and that such behaviors will exhibit coordination. There is as much connective tissue in language, grammar, plot and audience expectation as there is in the fabric of the human body. But this is not the same thing as saying that there is an essential structure to particular types of writing – that the existence of a tell implies an underlying recipe, essence or structure that is genetically dictating the behavior of the writer.

Why doesn’t structuralism follow from linguistic integration? First, writing is not like penalty kicking. Dias chose penalty kicking because it is a binary physical outcome. With respect to the standing keeper, the ball goes left or right. Language, on the other hand, is like a flock of birds: it can break any way, 360 degrees, and is doing so dynamically at all times. “Yet the flock shows direction,” you say. “Individual birds may be wobbling left and right, up or down, but there is a recognizable trajectory within the group.” Perhaps there are deterministic ways of saying where this group is going to go next, but I doubt it. The total behavior is distributed, immanent: it has massive integrity as an aggregate, but the existence of that integrity does not imply some non-negotiable locus of control. Another way of saying this, and now I am channeling Whitehead, is to say that the direction of the flock is a continuously unfolding event or “society” of actual occasions. Thus, the penalty kicking example is good for showing entailment and distributed connection in the elements of literary linguistic analysis, but bad as a model for the errant and multiple trajectories of writing.

The existence of the tell essentially pushes back the timeline of intelligibility of the direction of the ball. A good keeper or student of physiology – like a good literary critic – will know earlier than most what kind of behavior is being exhibited. But unlike a keeper in a football game, the critic is not looking for a binary outcome. Rather, the critic or spectator is comparing the unfolding action onstage to any number of possible theatrical “types” of entertainment and generic conventions. Shakespeare takes five penalty shots at a time, all the time. If you are interested in this aspect of the play – its participation in comic conventions – yes, there will be “signal” or orienting linguistic events at the level of the line which you could consult to predict what he is about to do. But you don’t have to consult the tells and this is not a penalty kick: you already know what is going on and, indeed, are a better judge of the texture and generic tonalities of the play as it unfolds than a keeper who has to wait for the ball to be kicked. (Docuscope really is a keeper; it knows nothing until the event happens.) As we have seen in our research, human beings are massively sensitive to variations and distributed cues in linguistic behavior. We make an astonishing number of connections between the kinds of variation we see among the plays and texts we have encountered. Finding out that there is a linguistic “tell” for comedy doesn’t then mean that comedy essentially or structurally “is” the series of tells we reliably find for it. The “tells” here are a parallel description — and this, after the fact — of a perceptual reality that we render qualitatively and immediately, in our feel for certain types of writing or stories.

I have used the words “signal,” “cardinal” and “orienting” to describe the types of tokens that serve as good landmarks for genre in this alternative descriptive universe. I do not use “essential.” As we work further through this analogy between physical and linguistic behaviors, I think we should adopt Spinoza’s metaphysical position from the Ethics, that there is a parallelism between the twin domains of thought and extended physical beings. Neither has priority. When understood as a species of behavior, theatrical writing or literary production must obviously exhibit certain empirical regularities: it takes place on the fleshy platform of human consciousness and is constrained by the physical limits of our bodies, environment and history. As critic, I would want to insist that no material factor – the practices and limitations of stagecraft, the documented or remembered history of past performances, the politically charged distribution of resources and cultural actors – can be a priori excluded as unexpressable in the behavior that is writing. All constraints are summed and expressed, but in different amounts. But I would also want to insist that– whatever the behavior is that we are tracking – there has to be in place a certain set of agreements to make sense of the “movements” in this system as such. I have to want to count “these types” of words and not those. I have to search for significant coordination of these counted things with respect to “this type of outcome” and not another. Someone has to have the desire to study penalty kicks, for example, or authorship, or genre: behaviors don’t simply want to study themselves.

The tell is a “sign” that speaks for the kicker, and speaks early. It is a signal event worth attending to if you are a keeper. It is simultaneously an element in a causal sequence, constrained by events prior to it, and a negotiable sign or expression of an intention to do something. It is a physical way of saying, “I mean to kick the ball this way.” The point of the parallelism is that you never get to dump one half of the phenomenon. Leaning to the left, we acknowledge: all physical tells may be redescribed as expressions of an intention, and so tokens of meaning. But inclining to the right, we say: all tokens of meaning are, on some level, also indexes of empirical constraints. The keeper has to dive both ways.

July 29, 2010
Texts as Objects II: Object Oriented Philosophy. And Criticism?

In the previous post I laid out several questions about the nature of texts, objects and interpretation that arise when we subject texts — for example, the Folio plays of Shakespeare — to statistical analysis. Above is a sketch of two texts, T1 and T2 (forgive the hand-drawn visuals), that exist as documents we might read. This is our point of contact as scholars, and we know where to take it from here. But for machine analysis, these texts are transformed into objects — relational, formalized mathematical entities — which means that they are containers of containers of things. So let’s think this way about texts for a moment.

T1 and T2 are both texts of 1000 words in length. We can think of these texts as a set of tokens drawn from a larger set of tokens that represents the totality of English words at a given moment. (Such a totality is an abstraction, just as Saussure’s parole was an abstraction; let’s leave that aside for now.) Now an mathematically-minded critic might say the following: Table 1 is a topologically flat representation of all possible words in English, arrayed in a two-dimensional matrix. The text T1 is a vector through that table, a needle that carries the “thread” through various squares on the surface, like someone embroidering a quilt. One possible way of describing the text, then, would be to chart its movement through this space, like a series of stitches.

Generalizations about the syntax and meaning of that continuously threading line would be generalizations about two things: the sequence of stitches and the significance of different regions in the underlying quilt matrix. I have arranged the words alphabetically in this table, which means that a “stitch history” of movements around the table would not be very revealing. But the table could be rendered in many other ways (it could be rendered three- or multi-dimensionally, for example). What if I put all of the verbs in the lower left-hand corner (southwest) of the table and all of the pronouns in the upper right (northeast). Based on this act of spatial classification, you could then come up with statements like: “I see many threads passing between the northeast and southwest,” a meaningless descriptive statement unless you add: “this is because verbs are here and pronouns are there, and they tend to follow one another in written and spoken English.” So this spatializing approach to textual analysis would require three things: (1) arrangement of the matrix in a meaningful way; (2) description of the movement through the matrix; and (3) analysis of patterns in that movement. Based on (1) you might have something interesting to say about (3), and as the note says, a text is a “vector through a hypothetical Table” and “a theory of rhetoric, grammar, semantics is an attempt to rationalize this vector — as sequence — by regrouping the words in the table by region.” In effect, any mathematical or container-based analysis of a text must ultimately be some kind of mapping of a vector-space (semantic, ideological, grammatical, generic, etc).

Now, Docuscope is itself a built form of this type of container-based analysis, one that eliminates the temporal dimension of “stitching” described above by transforming the hypothetical table into buckets or classes of words and then decanting the text into those buckets. Instead of regional movement, we get inclusion or exclusion of words (strings) from classes of words. The architecture of the classes matters, of course, since only if that architecture is good will we find patterns that we recognize and understand, understanding being the ultimate goal here. (It is also possible to simply look for correlated patterns among documents that might allow someone to find an entire class of objects based on a few tokens they already know (a very small “class”), as Google does; but finding is not criticism.) So what is a text in the eyes of Docuscope, or for than matter, any device that tags documents? One answer is that the text “is” the items circled above M1 and M2: words or sequences of words that have been classed into buckets. At the level of M1 and M2, the text becomes a set of local subsets, each of which contains a number of tokens. Statistical analysis of this partitioned object yields quantitative relations — R1, R2 and R3 — which differentiate one text from another.

Now for the philosophical question, the one where object oriented philosophy might be useful: when asked to describe the nature of the statistical entity undergoing analysis here (the data object rendered by Docuscope and then explored within R), do we say that it is simply the local contents (M1, M2) of the containers (T1 and T2)? If I begin by saying that the being of this object is, rather, the structure of these elements in their containers — a better answer, I think — then I probably mean that T1 and T2 are really the sum of all relations that can be posited (R1, R2, R3) among rendered elements (M1, M2). This rather Leibnizian sounding answer suggests that a text’s existence is ultimately differential: it is the sum of that object’s relations with all other objects. The statistical analysis of texts would be the quantitative description of this totality of relations given a set of classes — classes that we, as humanists, want to debate because they may be the source of any meaning in the result (because a certain kind of meaning or “purpose in pattern” is distributed into the classes).

But here is where I think Harman adds something crucial. If the argument he has been developing in Tool Being, Prince of Networks and elsewhere is correct, then an object of this or any other kind would not be the sum of its relations with other objects, as is the case in Latour’s analysis. To this relational model, Harman opposes the metaphysical integrity of the object over and beyond its relations, an integrity which holds that object together in its “domestic” being over and above its relational “alliances.” In Prince of Networks, he writes:

I hold that there is an absolute distinction between the domestic relations a thing needs to some extent in order to exist [see above, M1, M2] and the external alliances that it does not need [above, R1, R2, R3]. But the actor itself [i.e., object of analysis] cannot be identified with either. An object cannot be exhausted by a set of alliances. But neither is it exhausted by a summary of its pieces, since any genuine object will be an emergent reality over and above its components, oversimplifying those components and able to withstand a certain degree of turbulent change in them. (135)

What I find fascinating and important about Harman’s idea here is that he is providing a rationale for (1) accommodating the kind of container analysis I have outlined above while (2) arguing that this type of analysis is not the end of the story. Now, Harman and the Speculative Realists have been reluctant to discuss what constitutes a text and how language might itself be an object, a reluctance that stems — understandably, I think — from fatigue with the post-Heidegerrian “language is everything” trend in Continental philosophy and cultural studies. But language is definitely something, and it is as real as anything else I can think of. So too are our encounters (in the theater, the library, the cinema) with things like genre, style, ideology and pleasure.

Object oriented philosophy should have something to say about texts, since they too provide a particularly good example of why the purely relational criterion for an object’s identity (whether it is a text, a word, a thought, feeling, or piece of wood) is insufficient. As literary critics and theorists, we may have something to add to Harman’s account of the inexhaustibility of an object’s relations and its emergent reality over and above its components. In fact, this is what many of us have been arguing is wrong about the kinds of reductive claims that can be made about texts on the grounds that they yield statistical regularities.

What does it mean for the reality of an object to “simplify” its “components”? Perhaps the process that Harman refers to as simplification is what we as literary critics refer to as interpretation: the contingent coming into being of a portion of an object’s reality — here, a text — through that object’s interrelation with other objects and the subtractive unveiling of its inexhaustible contents. (Whitehead describes this as the process of “objectification.”) Harman would argue that such emergent realities don’t just take hold between texts and readers, but between sunlight and plant leaves or fire and cotton. All objects can be oversimplified, all of them can survive (and resist) some degree of turbulent change.

If objects are really this universal, then the process of “pattern recognition” that I describe as object oriented criticism is really something more involved than the collating of sets and relations among sets. Clearly, if a text is understood as a container of relations, then statistics can model the complexity of that object and its relations — even the immense complexity of a textual object. But that model, like the map of relations above, will always be just an approximation. As Harman insists, the inner reality of the object — itself alluring with the promise of something more — is never fully available, whether that object is a piece of wood or a piece of writing. As literary critics, I think we can find plenty to work with when objects are defined in this way.

September 17, 2009
Texts as Objects I: Object Oriented Philosophy. And Criticism?

In the work I have been doing on Shakespeare with my colleague Jonathan Hope (see previous posts under Shakespeare category), we have approached the plays as two kinds of objects simultaneously: as historical documents of theater history and as objects of statistical analysis. We have emphasized their theatrical foundations because we believe this is the reality of what is being studied: real people on stage saying these words (or something like them) in a real situation. The forces at work in this situation shaped the final result, and the meaning of what we find there — when we find it — is most significant as a reflection of that time and place. This makes us historicists, and in my case there is also a certain sympathy for materialist rather than idealist approaches to literature (although these terms are not very nuanced).

But what does it mean to say that a text is an object of statistical analysis, and how might this “object status” be related to our broader account of what texts are in general? Is there anything to be learned from thinking in this way about texts and interpretation that might alter the basic conceptual distinctions we use to think about texts, culture, experience, and language? This post represents a first attempt at answering some of these questions.

We need to start with a frame of analysis, and for this, I’ll use recent debates in philosophy and sociology about networks, actors and objects. Some of you may be familiar with the Actor Network Theory of Bruno Latour, which provides what you might call a flat ontology of actors in the world, one that makes no distinction in kind between natural, human made (technological), animate and inanimate “actors” in any given domain of analysis. Graham Harman, who is one of the leaders of a group of philosophers now known as the Speculative Realist school, has provided a fascinating summary and critique of Latour’s work, one that I was present for in a recent symposium on Latour held last year at the London School of Economics. During this event, I asked Harman and Latour if this kind of flat ontology limited the kinds of things one can claim in any causal explanation of a given scene of change or transformation (a revolution in a government, a reconfiguration of a bureaucracy, a change of state in a gas, a change in emotions). The problem — which Harman expertly delineates in his recent book, Bruno Latour: Prince of Networks — is that if no metaphysical priority is given to any particular type of actor; and if, further, all actors exhaust all of their potential at every moment because they possess no metaphysically privileged “special stuff” that will carry their powers through to the exclusion of other powers; then it becomes impossible to account for change. If you accept these consequences, then what we call “explanation” in any kind of critical work becomes interchangeable with description, and the activity of analysis becomes — as I argued at the LSE symposium — the “serial redescription” of each new state of the world. Harman agreed that this was unsatisfactory. Latour, to my surprise, said that this was exactly what he is trying to do in his sociological work. (A book about the symposium will be published next year.)

Now, in literary criticism, we do not think of our work as being that of “description.” And yet, we are not really analyzing causal patterns either, at least not in the way that an epidemiologist would be when she links the presence of a given microbe to the development of a particular illness in a population. Somewhere in the middle of this continuum, between description on the one hand and causal explanation on the either, lies meaning — which is what my colleagues and I in the humanities are probably most interested in. There are lots of ways to think about meaning, but perhaps one way we can do so is to think of it as “purpose in pattern,” something more akin to Aristotle’s final cause than the efficient cause that brings things about causally. (I realize that there are problems with Aristotle, but I believe the distinction is useful for the present discussion.) One of the hallmarks of European modernity, arguably, is the tendency to believe that discussions of final causes, purposes (and later, meaning) ought to be kept separate from discussions of how things work (efficient causation). For the most part, I think that has been a good idea, although it has aided and abetted the creation of the “two cultures” of science and the humanities. Stephen Jay Gould’s notion of two non-overlapping magisteria with different protocols of explanation seems like a fine truce to me. But where do humanists (i.e., members of the humanities disciplines) fit in? In literary studies, we are very much interested in patterns, and the history of literary criticism is — among other things — the history of pattern recognition among readers and users of language.

Literary genre is a pattern that human readers since Aristotle have discerned in drama, poetry and prose. This pattern is also picked out by unsupervised statistical analysis, both on the basis of the frequency of individual words (see Jockers et al.) and on the basis of groupings of words that have been tagged by a device like Docuscope. So where does that pattern exist? In the text or performance itself? In the mind that recognizes it? What is it made of? A set of relationships? A series of comparisons undertaken by the creators of texts and their interpreters? Do we learn anything new about genre when we say that it can be given multiple descriptions — either a plot formula (an amusing story ending in marriage) or a multivariate, statistical recipe (a story containing lots of I, me, my, you but very little concretely descriptive language)? Let’s take seriously the idea that genre is a formal or mathematical object, and see where it leads us.

September 11, 2009
Spectralism, Maya Lin Show at Corcoran

Two items worth mentioning: today I had a chance to hear the new record from the Steve Lehman Quartet called Travail, Transformation and Flow, which shows off some of what is new in spectralism, an aesthetic that involves analyzing a tone from a single instrument with a computer and developing improvisations out of its overtone series. The album — check out “Echoes,” which can be downloaded free here — reminds me of recordings I once heard of glass harmonicas, which are really sets of rotating glasses filled with water that “sing” when touched, like a champagne glass rubbed on its rim. The music shimmers like a school of fish: you hear a set of tones developing in one of his arpeggiated, darting solos and it fans out in a number of semi-dissonant directions. It reminds me also of the sound of those prayer bowls that are struck during meditation. Forerunners of spectralism include Debussy, Bartok, Messiaen and Stockhousen, but the music seems to go beyond any of these influences, especially when it takes the form — in the case of Lehman’s quartet — of an octet with tuba, bass, a few horns and an incredible vibraphone player who is constantly charting the harmonic offramps. There is something vaguely medieval about this music in the way it interlaces and suspends dissonant moments in a progression. I would not want to listen to Lehman’s music in a cathedral, however.

The Maya Lin Systematic Landscapes show at the Corcoran was also interesting, particularly the large massing of cut 2x4s into a kind of wooden berm in the middle of on the galleries. The piece, 2 x 4 Landscape looks an awful lot like a digital scan of a small geographical feature, one that has been recreated in physical form with all of its discrete jumps and bumpy texture: the model has become the object. I liked seeing the Lin exhibit after hearing Lehman’s piece because it reminded me of the ways in which some pieces of music or art acquire the status of diagrams or maps of their own construction. Apparently a mathematical algorithm called a Fast Fourier Transform is done on the initial tones in a spectralist analysis, since this pulls out aspects of the overtone series that you or I wouldn’t “hear” immediately. The composition then calls attention to these facts, which you somehow recognize. I thought the Lin piece does something of this as well: it shows you the way in which a hill, landscape or model of the same is composed of many possible paths through the terrain — across, diagonal, up and down — and that each of these vectors or “traverses” will provide you with a different sequence of ups and downs. A landscape or composition, that is, becomes a vector through a table of values. We could think of a text in a similar manner as well. (I’ll be posting more in this in the future.)

Several months ago I had the idea of taking a concordance of a text and then create a shape using the weightings of these words as heights, radiating from the most common in the center to the least common at the perimeter. Such a shape or sculpture could look like Lin’s 2 x 4 — it is a physical model of one set of “magnitudes” that defines the text — but would also be something other than a text. Take ten texts by two writers. Place the most frequent word of the first work in the center at height n1 (representing the number of times that word occurs in the work), then begin a clockwise coil starting at one position “north,” here at height n2 representing the number of occurrences of that word. Move next one position east (diagonal from the original origin) and place n3, then south for n4, south again for n5, then west for n6, etc. Now, if you created shapes for all ten works and then gave the surfaces to a topologist, what are the odds that she or he could do an author attribution? The point here is that Lin is using landscape, at least in part, as a very large dataset, which is something you can do with texts as well. (They contain ravines, valleys, hidden depths…) Or a single note with its overtone series: this too can be a starting point for meditation, analysis and improvisation, since there is “more to it” than just the note. Lin and Lehman are both artists who are interested in elemental phenomena that are really bundled sets of relationships. The bundles can be teased apart, made explicit, and expressed in a more vivid form: a systematic landscape or an improvisational spectrum.

July 3, 2009