{"id":153,"date":"2005-12-27T18:28:00","date_gmt":"2005-12-28T00:28:00","guid":{"rendered":"http:\/\/hunch.net\/?p=153"},"modified":"2005-12-27T18:30:49","modified_gmt":"2005-12-28T00:30:49","slug":"automated-labeling","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=153","title":{"rendered":"Automated Labeling"},"content":{"rendered":"<p>One of the common trends in machine learning has been an emphasis on the use of unlabeled data.  The argument goes something like &#8220;there aren&#8217;t many labeled web pages out there, but there are a <em>huge<\/em> number of web pages, so we must find a way to take advantage of them.&#8221;  There are several standard approaches for doing this:<\/p>\n<ol>\n<li><a href=\"https:\/\/hunch.net\/index.php?cat=10\">Unsupervised Learning<\/a>.  You use only unlabeled data.  In a typical application, you cluster the data and hope that the clusters somehow correspond to what you care about.<\/li>\n<li>Semisupervised Learning.  You use both unlabeled and labeled data to build a predictor.  The unlabeled data influences the learned predictor in some way.<\/li>\n<li><a href=\"https:\/\/hunch.net\/index.php?cat=22\">Active Learning<\/a>. You have unlabeled data and access to a labeling oracle.  You interactively choose which examples to label so as to optimize prediction accuracy.<\/li>\n<\/ol>\n<p>It seems there is a fourth approach worth serious investigation&#8212;automated labeling.  The approach goes as follows:<\/p>\n<ol>\n<li>Identify some subset of observed values to predict from the others.<\/li>\n<li>Build a predictor.<\/li>\n<li>Use the output of the predictor to define a new prediction problem.<\/li>\n<li>Repeat&#8230;<\/li>\n<\/ol>\n<p>Examples of this sort seem to come up in robotics very naturally.  An extreme version of this is:<\/p>\n<ol>\n<li>Predict nearby things given touch sensor output.<\/li>\n<li>Predict medium distance things given the nearby predictor.<\/li>\n<li>Predict far distance things given the medium distance predictor.<\/li>\n<\/ol>\n<p>Some of the participants in the <a href=\"https:\/\/hunch.net\/?p=68\">LAGR project<\/a> are using this approach.<\/p>\n<p>A less extreme version was the <a href=\"http:\/\/en.wikipedia.org\/wiki\/2005_DARPA_Grand_Challenge\">DARPA grand challenge winner<\/a> where the output of a laser range finder was used to form a road-or-not predictor for a camera image.<\/p>\n<p>These automated labeling techniques transform an unsupervised learning problem into a supervised learning problem, which has huge implications:  we understand supervised learning much better and can bring to bear a host of techniques.<\/p>\n<p>The set of work on automated labeling is sketchy&#8212;right now it is mostly just an observed-as-useful technique for which we have no general understanding.   Some relevant bits of algorithm and theory are:<\/p>\n<ol>\n<li>Reinforcement learning to classification reductions which convert rewards into labels.<\/li>\n<li><a href=\"http:\/\/www.cs.cmu.edu\/~avrim\/Papers\/cotrain.ps\">Cotraining<\/a> which considers a setting containing multiple data sources.  When predictors using different data sources agree on unlabeled data, an inferred label is automatically created.<\/li>\n<\/ol>\n<p>It&#8217;s easy to imagine that undiscovered algorithms and theory exist to guide and use this empirically useful technique.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the common trends in machine learning has been an emphasis on the use of unlabeled data. The argument goes something like &#8220;there aren&#8217;t many labeled web pages out there, but there are a huge number of web pages, so we must find a way to take advantage of them.&#8221; There are several standard &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=153\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Automated Labeling&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22,27,29,12,9,25,10],"tags":[],"class_list":["post-153","post","type-post","status-publish","format-standard","hentry","category-active","category-empirical","category-machine-learning","category-reductions","category-semisupervised","category-supervised","category-unsupervised"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=153"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/153\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}