{"id":1358,"date":"2010-05-10T14:04:22","date_gmt":"2010-05-10T20:04:22","guid":{"rendered":"http:\/\/hunch.net\/?p=1358"},"modified":"2010-05-10T14:04:22","modified_gmt":"2010-05-10T20:04:22","slug":"aggregation-of-estimators-sparsity-in-high-dimension-and-computational-feasibility","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=1358","title":{"rendered":"Aggregation of estimators, sparsity in high dimension and computational feasibility"},"content":{"rendered":"<p>(I&#8217;m channeling for <a href=\"http:\/\/certis.enpc.fr\/~audibert\/\">Jean-Yves Audibert<\/a> here, with some minor tweaking for clarity.)<\/p>\n<p>Since <a href=\"http:\/\/www2.isye.gatech.edu\/~nemirovs\/\">Nemirovski&#8217;s<\/a> <a href=\"http:\/\/www2.isye.gatech.edu\/~nemirovs\/Lect_SaintFlour.pdf\">Saint Flour lecture notes<\/a>, numerous researchers have studied the following problem in least squares regression: predict as well as<br \/>\n  (MS) the best of d given functions (like in prediction with expert advice; model = finite set of d functions)<br \/>\n  (C) the best convex combination of these functions (i.e., model = convex hull of the d functions)<br \/>\n  (L) the best linear combination of these functions (i.e., model = linear span of the d functions)<br \/>\nIt is now well known (see, e.g., Sacha Tsybakov&#8217;s COLT&#8217;03 paper) that these tasks can be achieved since there exist estimators having an excess risk of order (log d)\/n for (MS), min( sqrt((log d)\/n), d\/n ) for (C) and d\/n for (L), where n is the training set size.  Here, &#8220;risk&#8221; is amount of extra loss per example which may be suffered due to the choice of random sample.<\/p>\n<p>The practical use of these results seems rather limited to trivial statements like: do not use the OLS estimator when the dimension d of the input vector is larger than n (here the d functions are the projections on each of the d components). Nevertheless, it provides a rather easy way to prove that there exists a learning algorithm having an excess risk of order s (log d)\/n, with respect to the best linear combination of s of the d functions (s-sparse linear model). Indeed, it suffices to consider the algorithm which<\/p>\n<ol>\n<li>cuts the training set into two parts, say of equal size for simplicity,<\/li>\n<li> uses the first part to train linear estimators corresponding to every possible subset of s features.  Here you can use your favorite linear estimator (the empirical risk minimizer on a compact set or robust but <a href=\"http:\/\/arxiv.org\/abs\/0902.1733\">more involved ones<\/a> are possible rather than the OLS), as long as it solves (L) with minimal excess risk.\n<\/li>\n<li>uses the second part to predict as well as the &#8220;d choose s&#8221; linear estimators built on the first part. Here you choose your favorite aggregate solving (MS).  The one I prefer is described in p.5 of my <a href=\"http:\/\/imagine.enpc.fr\/~audibert\/Mes%20articles\/NIPS07a.pdf\">NIPS&#8217;07 paper<\/a>, but you might prefer the progressive mixture rule or <a href=\"http:\/\/www.cmi.univ-mrs.fr\/~lecue\/LM1.pdf\">the algorithm<\/a> of Guillaume Lecu\u00e9 and <a href=\"http:\/\/wwwmaths.anu.edu.au\/~mendelso\/\">Shahar Mendelson<\/a>.  Note that empirical risk minimization and cross-validation completely fail for this task with excess risk of order sqrt((log d)\/n) instead of (log d)\/n.<\/li>\n<\/ol>\n<p>It is an easy exercise to combine the different excess risk bounds and obtain that the above procedure achieves an excess risk of s (log d)\/n. The nice thing compared to works on <a href=\"http:\/\/en.wikipedia.org\/wiki\/Lasso_%28statistics%29#LASSO_method\">Lasso<\/a>, Dantzig selectors and their variants is that you do not need all these assumptions saying that your features should be &#8220;not too much&#8221; correlated. Naturally, the important limitation of the above procedure, which is often encountered when using classical model selection approach, is its computational intractability. So this leaves  open the following fundamental problem:<br \/>\nis it possible to design a computationally efficient algorithm with the s (log d)\/n guarantee without assuming low correlation between the explanatory variables?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(I&#8217;m channeling for Jean-Yves Audibert here, with some minor tweaking for clarity.) Since Nemirovski&#8217;s Saint Flour lecture notes, numerous researchers have studied the following problem in least squares regression: predict as well as (MS) the best of d given functions (like in prediction with expert advice; model = finite set of d functions) (C) the &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=1358\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Aggregation of estimators, sparsity in high dimension and computational feasibility&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29,34],"tags":[],"class_list":["post-1358","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-statistics"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/1358","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1358"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/1358\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1358"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1358"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1358"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}