{"id":567,"date":"2009-12-15T11:48:18","date_gmt":"2009-12-15T16:48:18","guid":{"rendered":"http:\/\/winedarksea.com\/?p=567"},"modified":"2025-02-10T17:53:56","modified_gmt":"2025-02-10T22:53:56","slug":"clustering-the-plays-without-principal-components","status":"publish","type":"post","link":"https:\/\/winedarksea.org\/?p=567","title":{"rendered":"Clustering the Plays Without Principal Components"},"content":{"rendered":"<figure id=\"attachment_568\" aria-describedby=\"caption-attachment-568\" style=\"width: 521px\" class=\"wp-caption alignnone\"><a rel=\"attachment wp-att-568\" href=\"http:\/\/winedarksea.com\/?attachment_id=568\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-568\" title=\"wardnonstandardLats\" src=\"http:\/\/winedarksea.com\/wp-content\/uploads\/2009\/12\/wardnonstandardLats.jpg\" alt=\"Folio plays clustered using all Language Action Types, Non-Standardized Data\" width=\"521\" height=\"681\" srcset=\"https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/wardnonstandardLats.jpg 521w, https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/wardnonstandardLats-229x300.jpg 229w\" sizes=\"auto, (max-width: 521px) 100vw, 521px\" \/><\/a><figcaption id=\"caption-attachment-568\" class=\"wp-caption-text\">Folio plays clustered using all Language Action Types, Non-Standardized Data<\/figcaption><\/figure>\n<p>In comparison to the previous post, where we were using the plays&#8217; scores on Principal Components to create clusters, here we are just using the percentage counts of the plays on all of the Language Action Types, the lowest level of aggregation in Docuscope&#8217;s taxonomy of words or strings of language. There are 101 Language Action Types or LATs, which is to say, buckets of words or strings of words that David Kaufer has classified as doing a certain kind of linguistic or rhetorical work in a text. I have made a table of examples of these types, taken from the George Eliot novel <em>Middlemarch<\/em>, which can be downloaded <a title=\"downloadable table of Docuscope strings by category\" href=\"http:\/\/winedarksea.com\/?attachment_id=581\">here<\/a>.<\/p>\n<p>I find this diagram more than a little unnerving. It is quite accurate in terms of received genre judgments &#8212; notice that almost all of the Folio history plays (in green) are correct &#8212; and there are nice clusterings of both tragedies (tan) and comedies (red). <em>Henry VIII<\/em>, which is here identified as a late play (blue), is placed in the cluster full of other late plays (including <em>Coriolanus<\/em>, which could just as easily have been coded blue). And plays with a similar tone &#8212; <em>Titus<\/em>, <em>Lear<\/em>, and <em>Timon<\/em> &#8212; are all grouped together as tragedies, separate from the other tragedies that are placed together further above. The strange pairing that repeats here from the Principal Component clusterings is <em>Tempest<\/em> plus Ro<em>meo and Juliet<\/em>, something which merits further inquiry.<\/p>\n<p>Why should a mechanical algorithm looking at distances between counts of things produce a diagram this accurate? I&#8217;m not really sure. The procedure involves arraying each of the 36 plays in a multidimensional space depending on its percentage score on each of the things being rated here &#8212; the LAT categories. So, if &#8220;Motion&#8221; strings are one category, you can imagine an X axis with the scores of all the plays on &#8220;Motion,&#8221; with a Y axis rating all the plays on &#8220;Direct Address&#8221; as below:<\/p>\n<figure id=\"attachment_570\" aria-describedby=\"caption-attachment-570\" style=\"width: 518px\" class=\"wp-caption alignnone\"><a rel=\"attachment wp-att-570\" href=\"http:\/\/winedarksea.com\/?attachment_id=570\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-570\" title=\"DirectversusMotion\" src=\"http:\/\/winedarksea.com\/wp-content\/uploads\/2009\/12\/DirectversusMotion.jpg\" alt=\"Direct Address and Motion Scores in two Dimensions\" width=\"518\" height=\"531\" srcset=\"https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/DirectversusMotion.jpg 518w, https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/DirectversusMotion-292x300.jpg 292w\" sizes=\"auto, (max-width: 518px) 100vw, 518px\" \/><\/a><figcaption id=\"caption-attachment-570\" class=\"wp-caption-text\">Direct Address and Motion Scores in two Dimensions<\/figcaption><\/figure>\n<p>Now think about adding another score &#8212; First Person &#8212; to the third dimension, which will give us a spatial distribution of the plays and their scores on each of these three LATs:<\/p>\n<figure id=\"attachment_571\" aria-describedby=\"caption-attachment-571\" style=\"width: 515px\" class=\"wp-caption alignnone\"><a rel=\"attachment wp-att-571\" href=\"http:\/\/winedarksea.com\/?attachment_id=571\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-571\" title=\"DirectAddressMotionFirstPerson\" src=\"http:\/\/winedarksea.com\/wp-content\/uploads\/2009\/12\/DirectAddressMotionFirstPerson.jpg\" alt=\"Direct Address, First Person and Motion Scores of Folio Plays\" width=\"515\" height=\"527\" srcset=\"https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/DirectAddressMotionFirstPerson.jpg 515w, https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/DirectAddressMotionFirstPerson-293x300.jpg 293w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/a><figcaption id=\"caption-attachment-571\" class=\"wp-caption-text\">Direct Address, First Person and Motion Scores of Folio Plays<\/figcaption><\/figure>\n<p>Now, there are distances between all of the points here and various methods (single linkage, complete linkage, Ward&#8217;s) for expressing the degree to which items arranged in such a space can be grouped together in a hierarchy of filiation or likeness. If you multiply out all of the things being scored in this analysis &#8212; that is, all 101 Language Action Types &#8212; you end up with a multidimensional space that is unvisualizable. But there are still distances among items in this multidimensional space, distances that can be placed into the algorithms for producing the hierarchy of likeness. That is what is going on &#8212; using Ward&#8217;s procedure with non-standardized data &#8212; in the visualization at the beginning of this post.<\/p>\n<p>As I&#8217;ve said before, a picture is nice, but just because you can reproduce a human classification with an algorithm doesn&#8217;t mean you&#8217;ve made any progress. You have to be able to show what&#8217;s going on in a text &#8212; which words are doing what things some or most of the time &#8212; before you can call your work an analysis. Perhaps that&#8217;s another reason why I find a diagram like this unnerving: I cannot work back from it to a passage in a text.<\/p>\n<p>By standardizing the data, we get the following re-arrangement. I am unsure how to categorize the benefits of data standardization in this case, but think this is a comparatively less compelling diagram:<\/p>\n<figure id=\"attachment_569\" aria-describedby=\"caption-attachment-569\" style=\"width: 521px\" class=\"wp-caption alignnone\"><a rel=\"attachment wp-att-569\" href=\"http:\/\/winedarksea.com\/?attachment_id=569\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-569\" title=\"wardstandardLats\" src=\"http:\/\/winedarksea.com\/wp-content\/uploads\/2009\/12\/wardstandardLats.jpg\" alt=\"Clustering of Folio Plays using Standardized Data\" width=\"521\" height=\"680\" srcset=\"https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/wardstandardLats.jpg 521w, https:\/\/winedarksea.org\/wp-content\/uploads\/2009\/12\/wardstandardLats-229x300.jpg 229w\" sizes=\"auto, (max-width: 521px) 100vw, 521px\" \/><\/a><figcaption id=\"caption-attachment-569\" class=\"wp-caption-text\">Clustering of Folio Plays using Standardized Data<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>In comparison to the previous post, where we were using the plays&#8217; scores on Principal Components to create clusters, here we are just using the percentage counts of the plays on all of the Language Action Types, the lowest level of aggregation in Docuscope&#8217;s taxonomy of words or strings of language. There are 101 Language [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[37,38],"class_list":["post-567","post","type-post","status-publish","format-standard","hentry","category-shakespeare","tag-hierarchical-clustering","tag-standardization"],"_links":{"self":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts\/567","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=567"}],"version-history":[{"count":20,"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts\/567\/revisions"}],"predecessor-version":[{"id":593,"href":"https:\/\/winedarksea.org\/index.php?rest_route=\/wp\/v2\/posts\/567\/revisions\/593"}],"wp:attachment":[{"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=567"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=567"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/winedarksea.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}