{"id":160,"date":"2006-01-18T13:19:03","date_gmt":"2006-01-18T19:19:03","guid":{"rendered":"http:\/\/hunch.net\/?p=160"},"modified":"2006-01-18T13:21:36","modified_gmt":"2006-01-18T19:21:36","slug":"is-multitask-learning-black-boxable","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=160","title":{"rendered":"Is Multitask Learning Black-Boxable?"},"content":{"rendered":"<p>Multitask learning is the learning to predict multiple outputs given the same input.  Mathematically, we might think of this as trying to learn a function <em>f:X -> {0,1}<sup>n<\/sup><\/em>.  Structured learning is similar at this level of abstraction.  Many people have worked on solving multitask learning (for example <a href=\"http:\/\/www.cs.cornell.edu\/~caruana\/mlj97.ps\">Rich Caruana<\/a>) using methods which share an internal representation.  On other words, the the computation and learning  of the <em>i<\/em>th prediction is shared with the computation and learning of the <em>j<\/em>th prediction.  Another way to ask this question is: can we avoid sharing the internal representation?  <\/p>\n<p>For example, it <em>might<\/em> be feasible to solve multitask learning by some process feeding the <em>i<\/em>th prediction <em>f(x)<sub>i<\/sub><\/em> into the <em>j<\/em>th predictor <em>f(x,f(x)<sub>i<\/sub>)<sub>j<\/sub><\/em>, <\/p>\n<p>If the answer is &#8220;no&#8221;, then it implies we can not take binary classification as a basic primitive in the process of solving prediction problems.  If the answer is &#8220;yes&#8221;, then we can reuse binary classification algorithms to solve multitask learning problems.<\/p>\n<p>Finding a satisfying answer to this question at a theoretical level appears tricky.  If you consider the infinite data limit with IID samples for any finite <em>X<\/em>, the answer is &#8220;yes&#8221; because any function can be learned.  However, this does not take into account two important effects:<\/p>\n<ol>\n<li>Using a shared representation alters the bias of the learning process.  What this implies is that fewer examples may be required to learn all of the predictors.  Of course, changing the input features also alters the bias of the learning process.  Comparing these altered biases well enough to distinguish their power seems very tricky.  For reference, Jonathon Baxter has done <a href=\"http:\/\/www.cs.cmu.edu\/afs\/cs\/project\/jair\/pub\/volume12\/baxter00a.pdf\">some related analysis<\/a> (which still doesn&#8217;t answer the question).<\/li>\n<li>Using a shared representation may be computationally cheaper.<\/li>\n<\/ol>\n<p>One thing which <em>can<\/em> be said about multitask learning (in either black-box or shared representation form), is that it can make learning radically easier.  For example, predicting the first bit output by a cryptographic circuit is (by design) extraordinarily hard.  However, predicting the bits of every gate in the circuit (including the first bit output) is easily done based upon a few examples.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Multitask learning is the learning to predict multiple outputs given the same input. Mathematically, we might think of this as trying to learn a function f:X -> {0,1}n. Structured learning is similar at this level of abstraction. Many people have worked on solving multitask learning (for example Rich Caruana) using methods which share an internal &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=160\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Is Multitask Learning Black-Boxable?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29,16,25],"tags":[],"class_list":["post-160","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-problems","category-supervised"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=160"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/160\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}