{"id":30,"date":"2005-02-23T12:49:01","date_gmt":"2005-02-23T18:49:01","guid":{"rendered":"\/?p=30"},"modified":"2005-02-23T12:49:03","modified_gmt":"2005-02-23T18:49:03","slug":"problem-reinforcement-learning-with-classification","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=30","title":{"rendered":"Problem: Reinforcement Learning with Classification"},"content":{"rendered":"<p>At an intuitive level, the question here is &#8220;Can reinforcement learning be solved with classification?&#8221;  <\/p>\n<p><strong>Problem<\/strong> Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the <a href=\"https:\/\/hunch.net\/~jl\/projects\/comparison\/direct-experience.html\">direct experience model<\/a> given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems.  The definition of &#8220;posed&#8221; here is slightly murky.  I consider a problem &#8220;posed&#8221; if there is an algorithm for constructing labeled classification examples.<\/p>\n<p><strong>Past Work<\/strong><\/p>\n<ol>\n<li>There exists a <a href=\"https:\/\/hunch.net\/~jl\/projects\/reductions\/RL_to_class\/colt_submission.ps\">reduction of reinforcement learning to classification given a generative model.<\/a>  A generative model is an inherently stronger assumption than the direct experience model.<\/li>\n<li>Other <a href=\"https:\/\/hunch.net\/index.php?p=17\">work on learning reductions<\/a> may be important.<\/li>\n<li>Several algorithms for solving reinforcement learning in the direct experience model exist.  Most, such as <a href=\"http:\/\/www.cis.upenn.edu\/~mkearns\/papers\/reinforcement.ps\">E<sup>3<\/sup><\/a>, <a href=\"http:\/\/www.cis.upenn.edu\/~mkearns\/papers\/dbne3.ps\">Factored-E<sup>3<\/sup><\/a>, and <a href=\"https:\/\/hunch.net\/~jl\/projects\/metric_e3\/icml_final.ps\">metric-E<sup>3<\/sup><\/a> and <a href=\"http:\/\/jmlr.csail.mit.edu\/papers\/volume3\/brafman02a\/brafman02a.ps\">Rmax<\/a> require that the observation be the state.  Recent work <a href=\"http:\/\/www.cis.upenn.edu\/~skakade\/papers\/rl\/learn_pomdp.ps\">extends this approach to POMDPs<\/a>.<\/li>\n<li>This problem is related to <a href=\"http:\/\/www.eecs.umich.edu\/~baveja\/Papers\/psr.ps.gz\">predictive state representations<\/a>, because we are trying to solve reinforcement learning with prediction ability.<\/li>\n<\/ol>\n<p><strong>Difficulty<\/strong> It is not clear whether this problem is solvable or not.  A proof that it is not solvable would be extremely interesting, and even partial success one way or another could be important.<\/p>\n<p><strong>Impact<\/strong> At the theoretical level, it would be very nice to know if the ability to generalize implies the ability to solve reinforcement learning because (in a general sense) all problems can be cast as reinforcement learning.   Even if the solution is exponential in the horizon time it can only motivate relaxations of the core algorithm like <a href=\"https:\/\/hunch.net\/~jl\/projects\/reductions\/RL_to_class\/colt_submission.ps\">RLgen<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At an intuitive level, the question here is &#8220;Can reinforcement learning be solved with classification?&#8221; Problem Construct a reinforcement learning algorithm with near-optimal expected sum of rewards in the direct experience model given access to a classifier learning algorithm which has a small error rate or regret on all posed classification problems. The definition of &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=30\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Problem: Reinforcement Learning with Classification&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,12,11],"tags":[],"class_list":["post-30","post","type-post","status-publish","format-standard","hentry","category-problems","category-reductions","category-reinforcement"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/30","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=30"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/30\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=30"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=30"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=30"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}