MLcomp: a website for objectively comparing ML algorithms

Much of the success and popularity of machine learning has been driven by its practical impact. Of course, the evaluation of empirical work is an integral part of the field. But are the existing mechanisms for evaluating algorithms and comparing results good enough? We (Percy and Jake) believe there are currently a number of shortcomings:

  1. Incomplete Disclosure: You read a paper that proposes Algorithm A which is shown to outperform SVMs on two datasets.  Great.  But what about on other datasets?  How sensitive is this result?   What about compute time – does the algorithm take two seconds on a laptop or two weeks on a 100-node cluster?
  2. Lack of Standardization: Algorithm A beats Algorithm B on one version of a dataset.  Algorithm B beats Algorithm A on another version yet uses slightly different preprocessing.  Though doing a head-on comparison would be ideal, it would be tedious since the programs probably use different dataset formats and have a large array of options.  And what if we wanted to compare on more than just one dataset and two algorithms?
  3. Incomplete View of State-of-the-Art: Basic question: What’s the best algorithm for your favorite dataset?  To find out, you could simply plow through fifty papers, get code from any author willing to reply, and reimplement the rest. Easy right? Well maybe not…

We’ve thought a lot about how to solve these problems. Today, we’re launching a new website, MLcomp.org, which we think is a good first step.

What is MLcomp? In short, it’s a collaborative website for objectively comparing machine learning programs across various datasets.  On the website, a user can do any combination of the following:

  1. Upload a program to our online repository.
  2. Upload a dataset.
  3. Run any user’s program on any user’s dataset.  (MLcomp provides the computation for free using Amazon’s EC2.)
  4. For any executed run, view the results (various error metrics and time/memory usage statistics).
  5. Download any dataset, program, or run for further use.

An important aspect of the site is that it’s collaborative: by uploading just one program or dataset, a user taps into the entire network of existing programs and datasets for comparison.  While data and code repositories do exist (e.g., UCI, mloss.org), MLcomp is unique in that data and code interact to produce analyzable results.

MLcomp is under active development.  Currently, seven machine learn task types (classification, regression, collaborative filtering, sequence tagging, etc.) are supported, with hundreds of standard programs and datasets already online.  We encourage you to browse the site and hopefully contribute more!  Please send comments and feedback to mlcomp.support (AT) gmail.com.

What is missing for online collaborative research?

The internet has recently made the research process much smoother: papers are easy to obtain, citations are easy to follow, and unpublished “tutorials” are often available. Yet, new research fields can look very complicated to outsiders or newcomers. Every paper is like a small piece of an unfinished jigsaw puzzle: to understand just one publication, a researcher without experience in the field will typically have to follow several layers of citations, and many of the papers he encounters have a great deal of repeated information. Furthermore, from one publication to the next, notation and terminology may not be consistent which can further confuse the reader.

But the internet is now proving to be an extremely useful medium for collaboration and knowledge aggregation. Online forums allow users to ask and answer questions and to share ideas. The recent phenomenon of Wikipedia provides a proof-of-concept for the “anyone can edit” system. Can such models be used to facilitate research and collaboration? This could potentially be extremely useful for newcomers and experts alike. On the other hand, entities of this sort already exist to some extent: Wikipedia::Machine Learning, MLpedia, the discussion boards on kernel-machines.org, Rexa, and the gradual online-ification of paper proceedings to name a few.

None of these have yet achieved takeoff velocity. You’ll know that takeoff velocity has been achieved when these become a necessary part of daily life rather than a frill.

Each of these efforts seems to be missing critical pieces, such as:

  1. A framework for organizing and summarizing information. Wikipedia and MLpedia are good examples, yet this is not as well solved as you might hope as mathematics on the web is still more awkward than it should be.
  2. A framework for discussion. Kernel-machines.org handles this, but is too area-specific. There does exist a discussion framework on Wikipedia/MLpedia, but the presentation format marginalizes discussion, placed on a separate page and generally not viewed by most observers. The discussion, in fact, should be an integral part of the presentation.
  3. Researchers have incentives to contribute. Wikipedia intentionally anonymizes contributors in the presentation, because recognizing them might invite the wrong sort of contributor. Incentives done well, however, are one of the things creating (6). One of the existing constraints within academia is that the basic unit of credit is coauthorship on a peer-reviewed paper. Given this constraint, it would be very handy if a system could automatically translate a subset of an online site into a paper, with authorship automatically summarized. The site itself might also track and display who has contributed how much and who has contributed recently.
  4. Explicit mechanisms for handling disagreements. If you get 3 good researchers on a topic in a room, you might have about 5 distinct opinions. Much of research has to do with thinking carefully about what is important and why, the sorts of topics likely to provoke disagreement. Given that disagreement is a part of the process of research, there needs to be a way to facilitate, and even spotlight, disagreements for a healthy online research mechanism. One crude system for handling disagreements is illustrated by the linux kernel “anyone can download and start their own kernel tree”. A more fine-grained version of this may be effective “anyone can clone a webpage and start their own version of it”. Perhaps this can be coupled with a version voting system, although that is tricky. A fundamental point is: a majority vote does not determine the correctness of a theorem. Integrating a peer review system may work well. None of the existing systems handle this problem effectively.
  5. Low entry costs. Many systems handle this well, but it must be emphasized because small changes in the barrier to entry can have a large effect on (6).
  6. Community buy-in. Wikipedia is the big success story here, but Wikipedia::MachineLearning has more limited success. There are many techniques which might aid community buy in, but they may not be enough.

Can a site be created that simultaneously handles all of the necessary pieces for online research?