{"id":79,"date":"2005-05-21T22:57:17","date_gmt":"2005-05-22T04:57:17","guid":{"rendered":"\/?p=79"},"modified":"2005-05-21T22:59:11","modified_gmt":"2005-05-22T04:59:11","slug":"learning-reductions-discourage-modularity-in-structured-prediction","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=79","title":{"rendered":"What is the right form of modularity in structured prediction?"},"content":{"rendered":"<p>Suppose you are given a sequence of observations <em>x<sub>1<\/sub>,&#8230;,x<sub>T<\/sub><\/em> from some space and wish to predict a sequence of labels <em>y<sub>1<\/sub>,&#8230;,y<sub>T<\/sub><\/em> so as to minimize the Hamming loss: <em>sum<sub>i=1 to T<\/sub> I(y<sub>i<\/sub> != c(x<sub>1<\/sub>,&#8230;,x<sub>T<\/sub>)<sub>i<\/sub>)<\/em> where <em>c(x<sub>1<\/sub>,&#8230;,x<sub>T<\/sub>)<sub>i<\/sub><\/em> is the <em>i<\/em>th predicted component.  For simplicity, suppose each label <em>y<sub>i<\/sub><\/em> is in <em>{0,1}<\/em>.<\/p>\n<p>We can optimize the Hamming loss by simply optimizing the error rate in predicting each individual component <em>y<sub>i<\/sub><\/em> independently since the loss is a linear combination of losses on each individual component <em>i<\/em>.  From a learning reductions viewpoint, we can learn a different classifier for each individual component.  An average error rate of <em>e<\/em> over these classifiers implies an expected Hamming loss of <em>Te<\/em>.  This breakup into <em>T<\/em> different prediction problems is <em>not<\/em> the standard form of modularity in structured prediction.<\/p>\n<p>A more typical form of modularity is to predict <em>y<sub>i<\/sub><\/em> given <em>x<sub>i<\/sub>, y<sub>i-1<\/sub>, y<sub>i+1<\/sub><\/em> where the circularity (predicting given other predictions) is handled in various ways.  This is often represented with a graphical model like so:<br \/>\n<img decoding=\"async\" src=\"https:\/\/hunch.net\/images\/hmm.jpg\"><br \/>\nThis form of modularity seems to be preferred for several reasons:<\/p>\n<ol>\n<li>Graphical models of this sort are a natural language for expressing what we know (or believe we know) about a problem in advance.<\/li>\n<li>There may be computational advantages to learning to predict from fewer features. (But note that handling the circularity is sometimes computationally difficult.)<\/li>\n<li>There may be sample complexity advantages to learning to predict from fewer features.  This is particularly true for many common learning algorithms.<\/li>\n<\/ol>\n<p>The difficulty with this approach is that &#8220;errors accumulate&#8221;.  In particular, an average error rate of <em>e<\/em> for each of the predictors can easily imply a hamming loss of <em>O(eT<sup>2<\/sup>)<\/em>. <a href=\"http:\/\/www.cs.helsinki.fi\/u\/mtkaaria\/\">Matti Kaariainen<\/a> convinced me this is not improvable for predictors of this form.<\/p>\n<p>So, we have two forms of modularity.  One is driven by the loss function while the other driven by simplicity of prediction descriptions.  Each has advantages and disadvantages from a practical viewpoint.  Can these different approaches be reconciled?  Is there a compelling algorithm for solving structured prediction which incorporated both intuitions?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Suppose you are given a sequence of observations x1,&#8230;,xT from some space and wish to predict a sequence of labels y1,&#8230;,yT so as to minimize the Hamming loss: sumi=1 to T I(yi != c(x1,&#8230;,xT)i) where c(x1,&#8230;,xT)i is the ith predicted component. For simplicity, suppose each label yi is in {0,1}. We can optimize the Hamming &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=79\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;What is the right form of modularity in structured prediction?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,12,26],"tags":[],"class_list":["post-79","post","type-post","status-publish","format-standard","hentry","category-problems","category-reductions","category-structured"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/79","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=79"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/79\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=79"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=79"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=79"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}