{"id":20,"date":"2009-07-02T08:32:30","date_gmt":"2009-07-02T13:32:30","guid":{"rendered":"http:\/\/winedarksea.com\/?p=20"},"modified":"2025-02-10T18:00:23","modified_gmt":"2025-02-10T23:00:23","slug":"king-or-no-king","status":"publish","type":"post","link":"https:\/\/winedarksea.org\/?p=20","title":{"rendered":"King or no [King]"},"content":{"rendered":"<p>I wanted to say a little about a problem we encountered early on when we began counting things in the plays, a problem that gets us into the question of what might be a trivial versus a non-trivial indicator of genre on the microlinguistic level.  Several years ago Hope and I began a series of experiments with the plays contained in Shakespeare&#8217;s First Folio, feeding them into Docuscope &#8212; a text-tagger created at Carnegie Mellon &#8212; to see if we could find any ordered groupings in them.  The results of that early work were published in the Journal for Early Modern Literary Studies in an article called <a title=\"Very Large Textual Object\" href=\"http:\/\/extra.shu.ac.uk\/emls\/09-3\/hopewhit.htm\">&#8220;The Very Large Textual Object: A Prosthetic Reading of Shakespeare.&#8221;<\/a>\u00a0\u00a0I will say more about Docuscope in subsequent posts, but suffice it to say here that it differs from other text-taggers in that it embodies a phenomenological approach to texts. \u00a0(For the creator&#8217;s explanation of how it works, see an early online precis <a title=\"Docuscope Tag Rationale\" href=\"http:\/\/betterwriting.net\/projects\/fed01\/dsc_fed01.html\">here<\/a>.) \u00a0Docuscope, that is, codes words and &#8220;strings&#8221; of words based on the ways in which they render a world experientially for a reader or listener. \u00a0The theory behind how texts do this, and thus the rational for Docuscope&#8217;s coding strategy, is derived from Michael Halliday&#8217;s systemic-function grammar. \u00a0But what is particularly interesting about Docuscope is the human element involved in its creation. \u00a0The main architect of the system, a rhetorician named David Kaufer, spent 8 years hand-tagging several <em>million<\/em> pieces of English according to their rhetorical function, and then expanded out this initial tagging spread with wild-card operators so that Docuscope now classes over 200 million strings of English (1 to 10 words in length) into over 100 distinct categories of use or function.<\/p>\n<p>Obviously there is a lot to say about the program itself, which represents a &#8220;built rhetoric&#8221; of sorts, one that has emerged through the interplay of one architect, his reading, and the texts he was interested in classifying. \u00a0In any event, when Hope and I fed the plays into Docuscope, we had to make some initial decisions, and the first was whether to strip anything out of the plays we had obtained from the Moby online version. \u00a0(We were already thinking about the shortcomings of this conflated, edited corpus as opposed to the text of the plays as it exists in various states in the First Folio, but we had to make do since we were not yet ready to modernize the spelling of F and decide among its internal variants.) \u00a0So with the Moby text, we had things like Titles, Act and Scene Numbers, and Speech Prefixes (Othello, King Henry, Miranda, etc.). \u00a0The speech prefixes created the greatest difficulty, because in the history plays the word &#8220;King&#8221; is, as you can imagine, used an awful lot &#8212; it appears in the speech prefixes of characters over and over. \u00a0And because Docuscope tagged &#8220;King&#8221; as one of its visible tokens (assigning it to the &#8220;bucket&#8221; named &#8220;Common Authority&#8221;), this particular category was off the charts in terms of frequency when it came time to do unsupervised factor analysis on the frequency counts obtained from the plays. \u00a0(I&#8217;ll post more on factor analysis in the future as well.)<\/p>\n<p>Here&#8217;s the issue. \u00a0In the end, we decided that it was &#8220;cheating&#8221; to let Docuscope count &#8220;King&#8221; in the speech prefixes, since this was a dead giveaway for History plays, and we wanted something more structural &#8212; something more buried in the coordination of word choices and exclusions &#8212; to serve as the basis of our linguistic &#8220;recipes&#8221; for Shakespeare&#8217;s genres. \u00a0As the article shows, we were able to find such a recipe without relying on &#8220;King&#8221; in the speech prefixes. \u00a0Indeed, subsequent research has shown that plural first person pronouns combined with a the profusion of concrete, sense objects are really the giveaway for Shakespeare&#8217;s histories. \u00a0(They are also &#8220;missing&#8221; certain things that other genres have: this combination makes histories the most &#8220;visible&#8221; genre, statistically speaking&#8221; that he wrote.) \u00a0 But is it really fair to decide that certain types of tokens &#8212; King in the speech prefix, for example &#8212; are superficial marks of history as a genre, and so not worth using in an analysis? \u00a0Isn&#8217;t there a certain interpretive bias here, one that I have and in a sense want to argue for, against the apparatus of the play in favor of something like a deeper set of patterns or stances? \u00a0To argue for such an exclusion, I would begin by pointing out that they are an artifact of print and are not &#8220;said&#8221; (even if they are used) in performance, but there is still something to think about here. \u00a0<\/p>\n<p>A Google search algorithm looks for the &#8220;shortest vector&#8221; or easiest &#8220;tell&#8221; that identifies a text as this kind or that &#8212; even if it is one of a kind. \u00a0But those of us who are interested in genre must by definition <em>not<\/em> be interested in the shortest vector or the easiest tell. \u00a0We are looking for the longer path. \u00a0The book historian in me, however, says that apparatus is important, and that &#8220;accidental&#8221; features never really are. \u00a0So this is something I want to think more about.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A discussion of what a &#8220;superficial&#8221; indicator of literary genre might be &#8212; for example, the word &#8220;King&#8221; in the speech prefixes of Shakespeare&#8217;s histories &#8212; and why might or might not want to exclude such indicators in the statistical study of genres.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[10,13,5,9,12,11,201,6,7],"class_list":["post-20","post","type-post","status-publish","format-standard","hentry","category-shakespeare","tag-docuscope","tag-genre","tag-google","tag-histories","tag-jockers","tag-kaufer","tag-shakespeare","tag-speech-prefixes","tag-tells"],"_links":{"self":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts\/20","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=20"}],"version-history":[{"count":2,"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts\/20\/revisions"}],"predecessor-version":[{"id":221,"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts\/20\/revisions\/221"}],"wp:attachment":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=20"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=20"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=20"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}