{"id":269,"date":"2007-05-12T09:00:17","date_gmt":"2007-05-12T15:00:17","guid":{"rendered":"http:\/\/hunch.net\/?p=269"},"modified":"2007-05-12T09:00:17","modified_gmt":"2007-05-12T15:00:17","slug":"loss-function-semantics","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=269","title":{"rendered":"Loss Function Semantics"},"content":{"rendered":"<p>Some loss functions have a meaning, which can be understood in a manner independent of the loss function itself.  <\/p>\n<ol>\n<li>Optimizing squared loss <em>l<sub>sq<\/sub>(y,y&#8217;)=(y-y&#8217;)<sup>2<\/sup><\/em> means predicting the (conditional) mean of <em>y<\/em>.<\/li>\n<li>Optimizing absolute value loss <em>l<sub>av<\/sub>(y,y&#8217;)=|y-y&#8217;|<\/em> means predicting the (conditional) median of <em>y<\/em>.  Variants can <a href=\"http:\/\/www.econ.uiuc.edu\/~roger\/research\/rq\/rq.html\">handle other quantiles<\/a>.  0\/1 loss for classification is a special case.<\/li>\n<li>Optimizing log loss <em>l<sub>log<\/sub>(y,y&#8217;)=log (1\/Pr<sub>z~y&#8217;<\/sub>(z=y))<\/em> means minimizing the description length of <em>y<\/em>.<\/li>\n<\/ol>\n<p>The semantics (= meaning) of the loss are made explicit by a theorem in each case.  For squared loss, we can prove a theorem of the form:<br \/>\nFor all distributions <em>D<\/em> over <em>Y<\/em>, if <center> <em>y&#8217; = arg min<sub>y&#8217;<\/sub> E<sub>y ~ D<\/sub> l<sub>sq<\/sub> (y,y&#8217;)<\/em><\/center> then <center><em>y&#8217; = E<sub>y~D<\/sub> y<\/em><\/center><\/p>\n<p>Similar theorems hold for the other examples above, and they can all be extended to predictors of <em>y&#8217;<\/em> for distributions <em>D<\/em> over a context <em>X<\/em> and a value <em>Y<\/em>.<\/p>\n<p>There are 3 points to this post.<\/p>\n<ol>\n<li>Everyone doing general machine learning should be aware of the laundry list above.  They form a handy toolkit which can match many of the problems naturally encountered.<\/li>\n<li>People also try to optimize a variety of other loss functions.  Some of these are (effectively) a special case of the above.  For example, &#8220;hinge loss&#8221; is absolute value loss when the hinge point is at the upper range.  Some of the other losses do not have any known semantics.  In this case, discovering a semantics could be quite valuable.<\/li>\n<li>The natural direction when thinking about how to solve a problem is to start with the semantics you want and then derive a loss.  I don&#8217;t know of any general way to do this other than simply applying the laundry list above.  As one example, what is a loss function for estimating the mean of a random variable <em>y<\/em> over the 5th to 95th quantile?  (How do we do squared error regression which is insensitive to outliers?)  Which semantics are satisfiable with a loss?<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Some loss functions have a meaning, which can be understood in a manner independent of the loss function itself. Optimizing squared loss lsq(y,y&#8217;)=(y-y&#8217;)2 means predicting the (conditional) mean of y. Optimizing absolute value loss lav(y,y&#8217;)=|y-y&#8217;| means predicting the (conditional) median of y. Variants can handle other quantiles. 0\/1 loss for classification is a special case. &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=269\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Loss Function Semantics&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29,16],"tags":[],"class_list":["post-269","post","type-post","status-publish","format-standard","hentry","category-machine-learning","category-problems"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/269","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=269"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/269\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}