{"id":1309,"date":"2010-04-14T20:37:17","date_gmt":"2010-04-15T02:37:17","guid":{"rendered":"http:\/\/hunch.net\/?p=1309"},"modified":"2010-04-14T20:37:17","modified_gmt":"2010-04-15T02:37:17","slug":"mlcomp-a-website-for-objectively-comparing-ml-algorithms","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=1309","title":{"rendered":"MLcomp: a website for objectively comparing ML algorithms"},"content":{"rendered":"<div>Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We (<a href=\"http:\/\/www.cs.berkeley.edu\/%7Epliang\" id=\"p60t\" title=\"Percy\">Percy<\/a> and <a href=\"http:\/\/www.cs.berkeley.edu\/%7Ejake\" id=\"nr35\" title=\"Jake\">Jake<\/a>) believe there are currently a number of shortcomings:<\/div>\n<p><\/p>\n<ol>\n<li><b>Incomplete Disclosure:<\/b> You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets.&nbsp; Great.&nbsp; But what about on other datasets?&nbsp; How sensitive is this result?&nbsp;&nbsp; What about compute time &#8211; does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster?<\/li>\n<li><b>Lack of Standardization:<\/b> Algorithm A beats Algorithm B on one version of a dataset.&nbsp; Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing.&nbsp; Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options.&nbsp; And what if we wanted to compare on more than just one dataset and two algorithms?\n<\/li>\n<li><b>Incomplete View of State-of-the-Art: <\/b>Basic question: What&#8217;s the best algorithm for your favorite dataset?&nbsp; To find out, you could simply plow through fifty papers, get code from any author willing to reply, and reimplement the rest. Easy right? Well maybe not&#8230;\n<\/li>\n<\/ol>\n<p><\/p>\n<div>We&#8217;ve thought a lot about how to solve these problems. Today, we&#8217;re launching a new website, <a href=\"http:\/\/MLcomp.org\" id=\"re.g\" title=\"MLcomp.org\">MLcomp.org<\/a>, which we think is a good first step.<\/div>\n<p><\/p>\n<div>What is&nbsp;<a href=\"http:\/\/MLcomp.org\" id=\"re.g\" title=\"MLcomp.org\">MLcomp<\/a>? In short, it&#8217;s a collaborative website for objectively comparing machine learning programs across various datasets.&nbsp; On the website, a user can do any combination of the following:<\/div>\n<p><\/p>\n<ol>\n<li>Upload a program to our online repository.<\/li>\n<li>Upload a dataset.\n<\/li>\n<li>Run any user&#8217;s program on any user&#8217;s dataset.&nbsp; (MLcomp provides the computation for free using Amazon&#8217;s EC2.)<\/li>\n<li>For any executed run, view the results (various error metrics and time\/memory usage statistics).<\/li>\n<li>Download any dataset, program, or run for further use.<\/li>\n<\/ol>\n<p>\nAn important aspect of the site is that it&#8217;s <b>collaborative<\/b>: by uploading just one program or dataset, a user taps into the entire network of existing programs and datasets for comparison.&nbsp; While data and code repositories do exist (e.g., UCI, mloss.org), MLcomp is unique in that data and code interact to produce analyzable results.<\/p>\n<p>MLcomp is under active development.&nbsp; Currently, seven machine learn task types (classification, regression, collaborative filtering, sequence tagging, etc.) are supported, with hundreds of standard programs and datasets already online.&nbsp; We encourage you to browse the site and hopefully contribute more!&nbsp; Please send comments and feedback to mlcomp.support (AT) gmail.com.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We (Percy and Jake) believe there are currently a number of shortcomings: &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=1309\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;MLcomp: a website for objectively comparing ML algorithms&#8221;<\/span><\/a><\/p>\n","protected":false},"author":100,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29],"tags":[],"class_list":["post-1309","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/1309","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/100"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1309"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/1309\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}