{"id":36,"date":"2005-02-28T14:26:00","date_gmt":"2005-02-28T20:26:00","guid":{"rendered":"\/?p=36"},"modified":"2005-02-28T14:27:13","modified_gmt":"2005-02-28T20:27:13","slug":"regularization","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=36","title":{"rendered":"Regularization"},"content":{"rendered":"<p><a href=\"http:\/\/web.engr.oregonstate.edu\/~bulatov\/cg-research\/work-page.html\">Yaroslav Bulatov<\/a> says that we should think about regularization a bit.  It&#8217;s a complex topic which I only partially understand, so I&#8217;ll try to explain from a couple viewpoints.<\/p>\n<ol>\n<li><strong>Functionally<\/strong>. Regularization is optimizing some representation to fit the data <em>and<\/em> minimize some notion of predictor complexity.  This notion of complexity is often the l<sub>1<\/sub> or l<sub>2<\/sub> norm on a set of parameters, but the term can be used much more generally.  Empirically, this often works much better than simply fitting the data.<\/li>\n<li><strong>Statistical Learning Viewpoint<\/strong> Regularization is about the failiure of statistical learning to adequately predict generalization error.  Let <em>e(c,D)<\/em> be the expected error rate with respect to <em>D<\/em> of classifier <em>c<\/em> and <em>e(c,S)<\/em> the observed error rate on a sample <em>S<\/em>.  There are numerous bounds of the form: assuming i.i.d. samples, with high probability over the drawn samples <em>S<\/em>,  <em>e(c,D) less than e(c,S) + f(complexity)<\/em> where <em>complexity<\/em> is some measure of the size of a set of functions.  Unfortunately, we have never convincingly nailed the exact value of <em>f()<\/em>.   We can note that <em>f()<\/em> is always monotonically increasing with the complexity measure and so there exists a unique constant <em>C<\/em> such that <em>f(complexity)=C*complexity<\/em> at the value of complexity which minimizes the bound.  Empirical parameter tuning such as for the <em>C<\/em> constant in a support vector machine can be regarded as searching for this &#8220;right&#8221; tradeoff.<\/li>\n<li><strong>Computationally<\/strong>  Regularization can be thought of as a computational shortcut to computing the <em>f()<\/em> above.  Hence, smoothness, convexity, and other computational constraints are important issues.<\/li>\n<\/ol>\n<p>One thing which should be clear is that there is no one best method of regularization for all problems.  &#8220;What is a good regularizer for my problem?&#8221; is another &#8220;<a href=\"https:\/\/hunch.net\/index.php?p=9\">learning complete<\/a>&#8221; question since solving it <em>perfectly<\/em> implies solving the learning problem  (For example consider the &#8220;regularizer&#8221; which assigns complexity 0 to the best prediction function and infinity to all others).  Similarly, &#8220;What is an empirically useful regularizer?&#8221; is like &#8220;What is a good learning algorithm?&#8221;  The choice of regularizer used when solving empirical problems is a degree of freedom with which prior information and biases can be incorporated in order to improve performance.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yaroslav Bulatov says that we should think about regularization a bit. It&#8217;s a complex topic which I only partially understand, so I&#8217;ll try to explain from a couple viewpoints. Functionally. Regularization is optimizing some representation to fit the data and minimize some notion of predictor complexity. This notion of complexity is often the l1 or &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=36\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Regularization&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-36","post","type-post","status-publish","format-standard","hentry","category-definitions"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/36","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=36"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/36\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=36"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=36"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=36"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}