Machine Learning (Theory)

3/22/2013

I’m a bandit

Sebastien Bubeck has a new ML blog focused on optimization and partial feedback which may interest people.

1/31/2013

Remote large scale learning class participation

Tags: Machine Learning,Online,Teaching jl@ 9:40 pm

Yann and I have arranged so that people who are interested in our large scale machine learning class and not able to attend in person can follow along via two methods.

  1. Videos will be posted with about a 1 day delay on techtalks. This is a side-by-side capture of video+slides from Weyond.
  2. We are experimenting with Piazza as a discussion forum. Anyone is welcome to subscribe to Piazza and ask questions there, where I will be monitoring things. update2: Sign up here.

The first lecture is up now, including the revised version of the slides which fixes a few typos and rounds out references.

1/7/2013

NYU Large Scale Machine Learning Class

Yann LeCun and I are coteaching a class on Large Scale Machine Learning starting late January at NYU. This class will cover many tricks to get machine learning working well on datasets with many features, examples, and classes, along with several elements of deep learning and support systems enabling the previous.

This is not a beginning class—you really need to have taken a basic machine learning class previously to follow along. Students will be able to run and experiment with large scale learning algorithms since Yahoo! has donated servers which are being configured into a small scale Hadoop cluster. We are planning to cover the frontier of research in scalable learning algorithms, so good class projects could easily lead to papers.

For me, this is a chance to teach on many topics of past research. In general, it seems like researchers should engage in at least occasional teaching of research, both as a proof of teachability and to see their own research through that lens. More generally, I expect there is quite a bit of interest: figuring out how to use data to make predictions well is a topic of growing interest to many fields. In 2007, this was true, and demand is much stronger now. Yann and I also come from quite different viewpoints, so I’m looking forward to learning from him as well.

We plan to videotape lectures and put them (as well as slides) online, but this is not a MOOC in the sense of online grading and class certificates. I’d prefer that it was, but there are two obstacles: NYU is still figuring out what to do as a University here, and this is not a class that has ever been taught before. Turning previous tutorials and class fragments into coherent subject matter for the 50 students we can support at NYU will be pretty challenging as is. My preference, however, is to enable external participation where it’s easily possible.

Suggestions or thoughts on the class are welcome :-)

1/1/2013

Deep Learning 2012

2012 was a tumultuous year for me, but it was undeniably a great year for deep learning efforts. Signs of this include:

  1. Winning a Kaggle competition.
  2. Wide adoption of deep learning for speech recognition.
  3. Significant industry support.
  4. Gains in image recognition.

This is a rare event in research: a significant capability breakout. Congratulations are definitely in order for those who managed to achieve it. At this point, deep learning algorithms seem like a choice undeniably worth investigating for real applications with significant data.

12/29/2012

Simons Institute Big Data Program

Tags: Announcements,Funding,Workshop jl@ 8:17 am

Michael Jordan sends the below:

The new Simons Institute for the Theory of Computing
will begin organizing semester-long programs starting in 2013.

One of our first programs, set for Fall 2013, will be on the “Theoretical Foundations
of Big Data Analysis”. The organizers of this program are Michael Jordan (chair),
Stephen Boyd, Peter Buehlmann, Ravi Kannan, Michael Mahoney, and Muthu Muthukrishnan.

See http://simons.berkeley.edu/program_bigdata2013.html for more information on
the program.

The Simons Institute has created a number of “Research Fellowships” for young
researchers (within at most six years of the award of their PhD) who wish to
participate in Institute programs, including the Big Data program. Individuals
who already hold postdoctoral positions or who are junior faculty are welcome
to apply, as are finishing PhDs.

Please note that the application deadline is January 15, 2013. Further details
are available at http://simons.berkeley.edu/fellows.html .

Mike Jordan

10/26/2012

ML Symposium and Strata/Hadoop World

Tags: Conferences,Workshop jl@ 11:40 am

The New York ML symposium was last Friday. There were 303 registrations, up a bit from last year. I particularly enjoyed talks by Bill Freeman on vision and ML, Jon Lenchner on strategy in Jeopardy, and Tara N. Sainath and Brian Kingsbury on deep learning for speech recognition. If anyone has suggestions or thoughts for next year, please speak up.

I also attended Strata + Hadoop World for the first time. This is primarily a trade conference rather than an academic conference, but I found it pretty interesting as a first time attendee. This is ground zero for the Big data buzzword, and I see now why. It’s about data, and the word “big” is so ambiguous that everyone can lay claim to it. There were essentially zero academic talks. Instead, the focus was on war stories, product announcements, and education. The general level of education is much lower—explaining Machine Learning to the SQL educated is the primary operating point. Nevertheless that’s happening, and the fact that machine learning is considered a necessary technology for industry is a giant step for the field. Over time, I expect the industrial side of Machine Learning to grow, and perhaps surpass the academic side, in the same sense as has already occurred for chip design. Amongst the talks I could catch, I particularly liked the Github, Zillow, and Pandas talks. Ted Dunning also gave a particularly masterful talk, although I have doubts about the core Bayesian Bandit approach(*). The streaming k-means algorithm they implemented does look quite handy.

(*) The doubt is the following: prior elicitation is generally hard, and Bayesian techniques are not robust to misspecification. This matters in standard supervised settings, but it may matter more in exploration settings where misspecification can imply data starvation.

10/18/2012

7th Annual Machine Learning Symposium

A reminder that the New York Academy of Sciences will be hosting the 7th Annual Machine Learning Symposium tomorrow from 9:30am.

The main program will feature invited talks from Peter BartlettWilliam Freeman, and Vladimir Vapnik, along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly, Buzzfeed, Chartbeat, and Sense Networks, Visual Revenue). This should be of great interest to everyone considering working in machine learning.

9/29/2012

Vowpal Wabbit, version 7.0

A new version of VW is out. The primary changes are:

  1. Learning Reductions: I’ve wanted to get learning reductions working and we’ve finally done it. Not everything is implemented yet, but VW now supports direct:
    1. Multiclass Classification –oaa or –ect.
    2. Cost Sensitive Multiclass Classification –csoaa or –wap.
    3. Contextual Bandit Classification –cb.
    4. Sequential Structured Prediction –searn or –dagger

    In addition, it is now easy to build your own custom learning reductions for various plausible uses: feature diddling, custom structured prediction problems, or alternate learning reductions. This effort is far from done, but it is now in a generally useful state. Note that all learning reductions inherit the ability to do cluster parallel learning.

  2. Library interface: VW now has a basic library interface. The library provides most of the functionality of VW, with the limitation that it is monolithic and nonreentrant. These will be improved over time.
  3. Windows port: The priority of a windows port jumped way up once we moved to Microsoft. The only feature which we know doesn’t work at present is automatic backgrounding when in daemon mode.
  4. New update rule: Stephane visited us this summer, and we fixed the default online update rule so that it is unit invariant.

There are also many other small updates including some contributed utilities that aid the process of applying and using VW.

Plans for the near future involve improving the quality of various items above, and of course better documentation: several of the reductions are not yet well documented.

8/27/2012

NYAS ML 2012 and ICML 2013

The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to physicalscience@nyas.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year.

The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing.

Edit: Fixed second ICML deadline.

8/24/2012

Patterns for research in machine learning

There are a handful of basic code patterns that I wish I was more aware of when I started research in machine learning. Each on its own may seem pointless, but collectively they go a long way towards making the typical research workflow more efficient. Here they are:

  1. Separate code from data.
  2. Separate input data, working data and output data.
  3. Save everything to disk frequently.
  4. Separate options from parameters.
  5. Do not use global variables.
  6. Record the options used to generate each run of the algorithm.
  7. Make it easy to sweep options.
  8. Make it easy to execute only portions of the code.
  9. Use checkpointing.
  10. Write demos and tests.

Click here for discussion and examples for each item. Also see Charles Sutton’s and HackerNews’ thoughts on the same topic.

My guess is that these patterns will not only be useful for machine learning, but also any other computational work that involves either a) processing large amounts of data, or b) algorithms that take a significant amount of time to execute. Share this list with your students and colleagues. Trust me, they’ll appreciate it.

7/17/2012

MUCMD and BayLearn

The workshop on the Meaningful Use of Complex Medical Data is happening again, August 9-12 in LA, near UAI on Catalina Island August 15-17. I enjoyed my visit last year, and expect this year to be interesting also.

The first Bay Area Machine Learning Symposium is August 30 at Google. Abstracts are due July 30.

7/9/2012

Videolectures

Yaser points out some nicely videotaped machine learning lectures at Caltech. Yaser taught me machine learning, and I always found the lectures clear and interesting, so I expect many people can benefit from watching. Relative to Andrew Ng‘s ML class there are somewhat different areas of emphasis but the topic is the same, so picking and choosing the union may be helpful.

6/29/2012

ICML survey and comments

Just about nothing could keep me from attending ICML, except for Dora who arrived on Monday. Consequently, I have only secondhand reports that the conference is going well.

For those who are remote (like me) or after the conference (like everyone), Mark Reid has setup the ICML discussion site where you can comment on any paper or subscribe to papers. Authors are automatically subscribed to their own papers, so it should be possible to have a discussion significantly after the fact, as people desire.

We also conducted a survey before the conference and have the survey results now. This can be compared with the ICML 2010 survey results. Looking at the comparable questions, we can sometimes order the answers to have scores ranging from 0 to 3 or 0 to 4 with 3 or 4 being best and 0 worst, then compute the average difference between 2012 and 2010.

Glancing through them, I see:

  1. Most people found the papers they reviewed a good fit for their expertise (-.037 w.r.t 2010). Achieving this was one of our subgoals in the pursuit of high quality decisions.
  2. Most people had sufficient time for doing reviews. This was something that we worried about significantly in shifting the paper deadline and otherwise massaging the schedule. Most people also thought the review period was sufficiently long and most reviews were high quality (+.023 w.r.t. 2010)
  3. About 1/4 of reviewers say that author response changed their mind on a paper and 2/3 of reviewers say discussion changed their mind on a paper. The expectation of decision impact from author response is reduced from 2010 (-.135). The existence of author response is overwhelmingly preferred.
  4. People generally found ICML reviewing the same or better than previous ICMLs (+.35 w.r.t. 2010) and other similar conferences (+.198 w.r.t. 2010) at the cost of being somewhat more work. A substantial bump in reviewing quality was a primary goal.
  5. The ACs spent substantially more time (43 hours on average) than PC members (28 hours on average). This agrees with our expectation—the set of ACs didn’t change even after we had a 50% increase in submissions. The AC load we had this year was probably too high and will need to be reduced somewhat for next year.
  6. 2/3 of authors prefer the option to revise a paper during author response.
  7. The choice of how to deal with increased submissions is deeply undecided, with a slight preference for short talk+poster as we did.
  8. Most people like having two workshop days or don’t care.
  9. There is a strong preference for COLT and UAI colocation with the next tier of preference for IJCAI, KDD, AAAI, and CVPR.

6/15/2012

Normal Deviate and the UCSC Machine Learning Summer School

Larry Wasserman has started the Normal Deviate blog which I added to the blogroll on the right.

Manfred Warmuth points out the UCSC machine learning summer school running July 9-20 which may be of particular interest to those in silicon valley.

6/5/2012

ICML acceptance statistics

Tags: Conferences,Reviewing jl@ 8:24 pm

People are naturally interested in slicing the ICML acceptance statistics in various ways. Here’s a rundown for the top categories.

18/66 = 0.27 in (0.18,0.36) Reinforcement Learning
10/52 = 0.19 in (0.17,0.37) Supervised Learning
9/51 = 0.18 not in (0.18, 0.37) Clustering
12/46 = 0.26 in (0.17, 0.37) Kernel Methods
11/40 = 0.28 in (0.15, 0.4) Optimization Algorithms
8/33 = 0.24 in (0.15, 0.39) Learning Theory
14/33 = 0.42 not in (0.15, 0.39) Graphical Models
10/32 = 0.31 in (0.15, 0.41) Applications (+5 invited)
8/29 = 0.28 in (0.14, 0.41]) Probabilistic Models
13/29 = 0.45 not in (0.14, 0.41) NN & Deep Learning
8/26 = 0.31 in (0.12, 0.42) Transfer and Multi-Task Learning
13/25 = 0.52 not in (0.12, 0.44) Online Learning
5/25 = 0.20 in (0.12, 0.44) Active Learning
6/22 = 0.27 in (0.14, 0.41) Semi-Supervised Learning
7/20 = 0.35 in (0.1, 0.45) Statistical Methods
4/20 = 0.20 in (0.1, 0.45) Sparsity and Compressed Sensing
1/19 = 0.05 not in (0.11, 0.42) Ensemble Methods
5/18 = 0.28 in (0.11, 0.44) Structured Output Prediction
4/18 = 0.22 in (0.11, 0.44) Recommendation and Matrix Factorization
7/18 = 0.39 in (0.11, 0.44) Latent-Variable Models and Topic Models
1/17 = 0.06 not in (0.12, 0.47) Graph-Based Learning Methods
5/16 = 0.31 in (0.13, 0.44) Nonparametric Bayesian Inference
3/15 = 0.20 in (0.7, 0.47) Unsupervised Learning and Outlier Detection
7/12 = 0.58 not in (0.08, 0.50) Gaussian Processes
5/11 = 0.45 not in (0.09, 0.45) Ranking and Preference Learning
2/11 = 0.18 in (0.09, 0.45) Large-Scale Learning
0/9 = 0.00 in [0, 0.56) Vision
3/9 = 0.33 in [0, 0.56) Social Network Analysis
0/9 = 0.00 in [0, 0.56) Multi-agent & Cooperative Learning
2/9 = 0.22 in [0, 0.56) Manifold Learning
4/8 = 0.50 not in [0, 0.5) Time-Series Analysis
2/8 = 0.25 in [0, 0.5] Large-Margin Methods
2/8 = 0.25 in [0, 0.5] Cost Sensitive Learning
2/7 = 0.29 in [0, 0.57) Recommender Systems
3/7 = 0.43 in [0, 0.57) Privacy, Anonymity, and Security
0/7 = 0.00 in [0, 0.57) Neural Networks
0/7 = 0.00 in [0, 0.57) Empirical Insights
0/7 = 0.00 in [0, 0.57) Bioinformatics
1/6 = 0.17 in [0, 0.5) Information Retrieval
2/6 = 0.33 in [0, 0.5) Evaluation Methodology

Update: See Brendan’s graph for a visualization.

I usually find these numbers hard to interpret. At the grossest level, all areas have significant selection. At a finer level, one way to add further interpretation is to pretend that the acceptance rate of all papers is 0.27, then compute a 5% lower tail and a 5% upper tail. With 40 categories, we expect to have about 4 violations of tail inequalities. Instead, we have 9, so there is some evidence that individual areas are particularly hot or cold. In particular, the hot topics are Graphical models, Neural Networks and Deep Learning, Online Learning, Gaussian Processes, Ranking and Preference Learning, and Time Series Analysis. The cold topics are Clustering, Ensemble Methods, and Graph-Based Learning Methods.

We also experimented with AIStats resubmits (3/4 accepted) and NFP papers (4/7 accepted) but the numbers were to small to read anything significant.

One thing that surprised me was how uniform decisions were as a function of average score in reviews. All reviews included a decision from {Strong Reject, Weak Reject, Weak Accept, Strong Accept}. These were mapped to numbers in the range {1,2,3,4}. In essence, average review score < 2.2 meant 0% chance of acceptance, and average review score > 3.1 meant acceptance. Due to discretization in the number of reviewers and review scores there were only 3 typical uncertain outcomes:

  1. 2.33. This was either 2 Weak Rejects+Weak Accept or Strong Reject+2 Weak Accepts or (rarely) Strong Reject+Weak Reject+Strong Accept. About 8% of these paper were accepted.
  2. 2.67. This was either Weak Reject+Weak Accept*2 or Strong Accept+2 Weak Rejects or (rarely) Strong Reject+Weak Accept+Strong Accept. About 48% of these paper were accepted.
  3. 3.0. This was commonly 3 Weak Accepts or Strong Accept+Weak Accept+Weak Reject or (rarely) 2 Strong Accepts + Strong Reject. About 90% of these papers were accepted.

One question I’ve always wondered is: How much variance is there in the accept/reject decision? In general, correlated assignment of reviewers can greatly increase the amount of variance, so one of our goals this year was doing as independent an assignment as possible. If you accept that as independence, we essentially get 3 samples for each paper where the average standard deviation of reviewer scores before author feedback and discussion is 0.64. After author feedback and discussion the standard deviation drops to 0.51. If we pretend that papers have an intrinsic value between 1 and 4 then think of reviews as discretized gaussian measurements fed through the above decision criteria, we get the following:

There are great caveats to this picture. For example, treating the AC’s decision as random conditioned on the reviewer average is a worst-case analysis. The reality is that ACs are removing noise from the few events that I monitored carefully, although it is difficult to quantify this. Similarly, treating the reviews observed after discussion as independent is clearly flawed. A reasonable way to look at it is: author feedback and discussion get us about 1/3 or 1/4 of the way to the final decision from the initial reviews.

Conditioned on the papers, discussion, author feedback and reviews, AC’s are pretty uniform in their decisions with ~30 papers where ACs disagreed on the accept/reject decision. For half of those, the ACs discussed further and agreed, leaving Joelle and I a feasible quantity of cases to look at (plus several other exceptions).

At the outset, we promised a zero-spof reviewing process. We actually aimed higher: at least 3 people needed to make a wrong decision for the ICML 2012 reviewing process to kick out a wrong decision. I expect this happened a few times given the overall level of quality disagreement and quantities involved, but hopefully we managed to reduce the noise appreciably.

5/12/2012

ICML accepted papers and early registration

The accepted papers are up in full detail. We are still struggling with the precise program itself, but that’s coming along. Also note the May 13 deadline for early registration and room booking.

5/3/2012

Microsoft Research, New York City

Tags: Machine Learning,Research jl@ 5:34 am

Yahoo! laid off people. Unlike every previous time there have been layoffs, this is serious for Yahoo! Research.

We had advanced warning from Prabhakar through the simple act of leaving. Yahoo! Research was a world class organization that Prabhakar recruited much of personally, so it is deeply implausible that he would spontaneously decide to leave. My first thought when I saw the news was “Uhoh, Rob said that he knew it was serious when the head of ATnT Research left.” In this case it was even more significant, because Prabhakar recruited me on the premise that Y!R was an experiment in how research should be done: via a combination of high quality people and high engagement with the company. Prabhakar’s departure is a clear end to that experiment.

The result is ambiguous from a business perspective. Y!R clearly was not capable of saving the company from its illnesses. I’m not privy to the internal accounting of impact and this is the kind of subject where there can easily be great disagreement. Even so, there were several strong direct impacts coming from the machine learning, economics, and algorithms groups.

Y!R clearly was excellent from an academic research perspective. On a per person basis in relevant subjects, it was outstanding. One way to measure this is by noticing that both ICML and KDD had (co)program chairs from Y!R. It turns out that talking to the rest of the organization doing consulting, architecting, and prototyping on a minority basis helps research by sharpening the questions you ask more than it hinders by taking up time. The decision to participate in this experiment was a good one for me personally.

It has been clear in silicon valley, academia, and pretty much everywhere else that people at Yahoo! including Yahoo! Research have been looking around for new positions. Maintaining the excellence of Y!R in a company that has been under prolonged stress was challenging leadership-wise. Consequently, the abrupt departure of Prabhakar and an apparent lack of appreciation by the new CEO created a crisis of confidence. Many people who were sitting on strong offers quickly left, and everyone else started looking around.

In this situation, my first concern was for colleagues, both in Machine Learning across the company and the Yahoo! Research New York office.

Machine Learning turns out to be a very hot technology. Every company and government in the world is drowning in data, and Machine Learning is the prime tool for actually using it to do interesting things. More generally, the demand for high quality seasoned machine learning researchers across startups, mature companies, government labs, and academia has been astonishing, and I expect the outcome to reflect that. This is remarkably different from the cuts that hit ATnT research in late 2001 and early 2002 where the famous machine learning group there took many months to disperse to new positions.

In the New York office, we investigated many possibilities hard enough that it became a news story. While that article is wrong in specifics (we ended up not fired for example, although it is difficult to discern cause and effect), we certainly shook the job tree very hard to see what would fall out. To my surprise, amongst all the companies we investigated, Microsoft had a uniquely sufficient agility, breadth of interest, and technical culture, enabling them to make offers that I and a significant fraction of the Y!R-NY lab could not resist. My belief is that the new Microsoft Research New York City lab will become an even greater techhouse than Y!R-NY. At a personal level, it is deeply flattering that they have chosen to create a lab for us on short notice. I will certainly do my part chasing the greatest learning algorithms not yet invented.

In light of this, I would encourage people in academia to consider Yahoo! in as fair a light as possible in the current circumstances. There are and will be some serious hard feelings about the outcome as various top researchers elsewhere in the organization feel compelled to look for jobs and leave. However, Yahoo! took a real gamble supporting a research organization about 7 years ago, and many positive things have come of this gamble from all perspectives. I expect almost all of the people leaving to eventually do quite well, and often even better.

What about ICML? My second thought on hearing about Prabhakar’s departure was “I really need to finish up initial paper/reviewer assignments today before dealing with this”. During the reviewing period where the program chair load is relatively light, Joelle handled nearly everything. My great distraction ended neatly in time to help with decisions at ICML. I considered all possibilities in accepting the job and was prepared to simply put aside a job search for some time if necessary, but the timing was surreally perfect. All signs so far point towards this ICML being an exceptional ICML, and I plan to do everything that I can to make that happen. The early registration deadline is May 13.

What about KDD? Deepak was sitting on an offer at Linkedin and simply took it, so the disruption there was even more minimal. Linkedin is a significant surprise winner in this affair.

What about Vowpal Wabbit? Amongst other things, VW is the ultrascale learning algorithm, not the kind of thing that you would want to put aside lightly. I negotiated to continue the project and succeeded. This surprised me greatly—Microsoft has made serious commitments to supporting open source in various ways and that commitment is what sealed the deal for me. In return, I would like to see Microsoft always at or beyond the cutting edge in machine learning technology.

added: crosspost on CACM.
added: Lance, Jennifer, NYTimes, Vader

5/2/2012

ICML: Behind the Scenes

This is a rather long post, detailing the ICML 2012 review process. The goal is to make the process more transparent, help authors understand how we came to a decision, and discuss the strengths and weaknesses of this process for future conference organizers.

Microsoft’s Conference Management Toolkit (CMT)
We chose to use CMT over other conference management software mainly because of its rich toolkit. The interface is sub-optimal (to say the least!) but it has extensive capabilities (to handle bids, author response, resubmissions, etc.), good import/export mechanisms (to process the data elsewhere), excellent technical support (to answer late night emails, add new functionalities). Overall, it was the right choice, although we hope a designer will look at that interface sometime soon!

Toronto Matching System (TMS)
TMS is now being used by many major conferences in our field (including NIPS and UAI). It is an automated system (developed by Laurent Charlin and Rich Zemel at U. Toronto) to match reviewers to papers, based on an analysis of each reviewer’s publications. TMS collects publications from reviewers, parses them into features and applies unsupervised or supervised learning techniques to predict the relevance of any target paper for any reviewer. We convinced TMS to integrate with CMT and funded Laurent’s work for that. Reviewers were asked to put in a publication list for TMS to parse. For those who failed to do so (after many reminders!), we manually added that information from public sources.

The Program Committee
Recruiting a program committee that is both large and highly qualified is difficult these days. We sent out 69 area chair invitations; 50 (highly qualified!) people accepted. Each of these area chairs was asked to nominate a list of potential reviewers. We sent out approximately 700 invitations for program committee members; 389 accepted. A number of additional PC members were recruited during the review process (most of them for 1-2 papers), for a total of 470 active PC members. In terms of seniority, the final PC contains about ~15% students, 80% researchers, 5% other.

The Surge (ICML + 50%)
The first big challenge came on the submission deadline. In the past few years, ICML had consistently received ~550-600 submissions. This year, we had a 50% increase, to 890 submissions. We had recruited a PC that could comfortably handle 700 papers. Dealing with an extra 200 papers was not an easy task.

About 10 submissions were rejected without review for various reasons (severe formatting issues, extra pages, non-anonymization).

Bidding
An unsupervised version of TMS was used to generate a list of candidate papers for each reviewer and area chair. This was done working closely with the Laurent Charlin of TMS using validation on previous NIPS data. CMT did not have the functionality to show a good list of candidate papers to reviewers, so we crafted an interface to show this list and let reviewers use that in conjunction with CMT. Ideally, this will be better incorporated in CMT in the future.

When you ask a group of scientists to run a conference, you must expect a few experiments will take place…. And so we decided to assess the usefulness of TMS scoring for generating lists of papers to bid on. To do this, we (randomly) assigned PC members to 1 of 3 groups. One group saw a list purely based on TMS scores. Another group received a list based on the matching between their subject area and that of the paper (referred to as the “relevance” score in CMT). The third group received a list based on a mix of both TMS and relevance. Reviewers were allowed to bid on any paper (excluding those with which they had a conflict); the lists were provided to help them efficiently sort through the large number of papers. We then compared the set of bids for a reviewer, with the list of suggestions, and measured the correspondence.

The following is the Discounted Cumulative Gain (DCG) of each list with respect to the bidding scores, averaged separately for each group. Note that each group was only presented with their corresponding list and not the others.

Group: CMT Group: TMS Group: CMT+TMS
Sorting by CMT scores 6.11 out of 12.64 (48%) 4.98 out of 13.63 (36%) 4.87 out of 13.55 (35%)
Sorting by TMS score 4.06 out of 12.64 (32%) 6.43 out of 13.63 (47%) 5.72 out of 13.55 (42%)
Sorting by TMS+CMT 4.77 out of 12.64 (37%) 6.11 out of 13.63 (44%) 6.71 out of 13.55 (49%)

A micro-survey was also run to collect further information on how users liked their short list. 85% of the participants indicated that they have used the list interface provided to them. The following is the preference indicated by each group (~75 reviewers in each group, ~2% error):

CMT TMS CMT+TMS
Preferred CMT over list 15% 12% 8%
Preferred list+CMT 81% 83% 83%
Preferred list over CMT 4% 5% 9%

It is obvious from the above that most participants found the list useful in conjunction with CMT (suggesting that the list should be integrated inside CMT). We can also see that those who were presented with a list based on TMS scores were more likely to find the list useful.

Note that all of the above was done in a long hectic but fun weekend.

Imputing Missing Bids
CMT assumes that the reviewers are not willing to review a paper unless stated otherwise. It does not differentiate between an unseen (but potentially relevant) paper and a paper that has been seen and ignored. This is a real shortcoming when it comes to matching papers to reviewers, especially for those reviewers that did not bid often. To mitigate this problem, we used the click information on the shortlist presented to the reviewers to find out which papers have been observed and ignored. We then impute these cases as real non-willing bids.

Around 30 reviewers did not provide any bids (and many had only a few). This is problematic because the tools used to do the actual reviewer-paper matching tend to assign the papers without any bids to the reviewers who did not bid, regardless of the match in expertise.

Once the bidding information was in and imputation was done, we now had to fill in the rest of the paper-reviewer bidding matrix to mitigate the problem with sparse bidders. This was done, once again, through TMS, but this time using a supervised learning approach.

Using supervised learning was more delicate than expected. To deal with the wildly varying number of bids per person, we imputed zero bids, first from papers that were plausibly skipped over, and if necessary at random from papers not bid on such that each person had the same expected bid in the dataset. From this dataset, we held out a random bid per person, and then trained to predict well the heldout bid. Most optimization approaches performed poorly due to the number of features greatly exceeding the number of labels. The best approach we found used the online algorithms in Vowpal Wabbit with a mass personalized training method similar to the one discussed here. This trained predictor was used to predict bid values for the full paper-reviewer bid matrix.

Automated Area Chair and First Reviewer Assignment
Once we had the imputed paper-reviewer bidding matrix, CMT was used to generate the actual match between papers and area chairs, and (separately) between papers and reviewers. Each paper had two area chairs (sometimes called “meta-reviewers” in CMT) assigned to it, one primary, one secondary, by running two rounds of assignments (so that the primary was usually the “better” match). One reviewer per paper was also assigned automatically by CMT in a similar fashion. CMT provides proper load balancing, so that all area chairs and reviewers had similar loads.

Manual Checks of the Automated Assignments
Before finalizing the automated assignment, we manually looked through the list of papers to fix any potential problems that were not handled by the automated process. The two major cases were papers that did not go through the TMS system (authors did not agree to do so), and cases of poor primary-secondary meta-reviewer pairs (when the two area chairs are judged to be too close to offer independent assessment, e.g. working at the same institution, previous supervisor-student relationship).

Second and Third Reviewer Assignment
Once the initial assignments were announced, we asked the two area chairs for a given paper to each manually assign another reviewer from the PC. To help area chairs with this, we generated a shortlist of 10 recommended reviewers for each paper (using the estimated bid matrix and TMS score, with the CMT matching algorithm for load balancing of reviewer suggestions.) Area chairs were free to either use this list, or select from the complete program committee, or alternately, they could seek an outside reviewer which was then added to the PC, an option used 80 times. The load for each reviewer was restricted to at most 7 papers with exceptions when they agreed explicitly to more.

The second and third uses of TMS, including the new supervised learning system, lead to another long hectic weekend with Laurent, Mahdi, Joelle, and John all deeply involved.

Reviews
Most papers received at least 3 full reviews in the first round. Reviewers could not see each others’ reviews until they submitted their own. ML-Journaled submissions (see double submission guide) were reviewed only by two area chairs. In a small number of regular submissions (less than 10), we received 2 very negative reviews and notified the third reviewer (who was usually late by this point!) that we would not need their review.

Authors’ Response
Authors were given a chance to respond to the reviews during a short feedback period. This is becoming a standard practice in machine learning conferences. Authors were also allowed to upload a new version of the paper. The motivation here is that in some cases, it is easier to show the changes directly in the paper, rather than discuss them separately.

Our analysis shows that authors’ responses and subsequent discussions by reviewers made significant changes to the scoring of papers. A total of ~35% of the papers had some change in their scores after the author feedback. The average score for ~50% of the papers went down, stayed the same for ~10%, and went up for the other ~40%. The variance on the scores decreased by ~20%, indicating some convergence in the decisions.

Final Decisions
To help us better decide on the quality of the papers, we asked the primary area chairs to provide a meta-review for each of their papers. For papers without unanimous review decisions (i.e. some reviews wanted to accept and some wanted to reject), we asked the secondary area chair to (independently) fill-in a meta-review, recommending whether to accept or reject the paper. A total of 1214 meta-reviews were provided. There were also 20 papers for which a 4th review was added in this period.

In all cases where the primary and secondary area chairs disagreed on the decision, the program chairs were directly involved, reviewing all the evidence (reviews, rebuttal, discussion, often the paper itself), and entering in a discussion (usually via email) with the area chairs, until a unanimous decision was achieved.
A total of 243 papers (27% of submissions) were accepted. Author notifications were sent out on April 30.

4/20/2012

Both new: STOC workshops and NEML

May 16 in Cambridge, is the New England Machine Learning Day, a first regional workshop/symposium on machine learning. To present a poster, submit an abstract by May 5.

May 19 in New York, STOC is coming to town and rather surprisingly having workshops which should be quite a bit of fun. I’ll be speaking at Algorithms for Distributed and Streaming Data.

4/9/2012

ICML author feedback is open

Tags: Conferences,Machine Learning jl@ 8:24 pm

as of last night, late.

When the reviewing deadline passed Wednesday night 15% of reviews were still missing, much higher than I expected. Between late reviews coming in, ACs working overtime through the weekend, and people willing to help in the pinch another ~390 reviews came in, reducing the missing mass to 0.2%. Nailing that last bit and a similar quantity of papers with uniformly low confidence reviews is what remains to be done in terms of basic reviews. We are trying to make all of those happen this week so authors have some chance to respond.

I was surprised by the quantity of late reviews, and I think that’s an area where ICML needs to improve in future years. Good reviews are not done in a rush—they are done by setting aside time (like an afternoon), and carefully reading the paper while thinking about implications. Many reviewers do this well but a significant minority aren’t good at scheduling their personal time. In this situation there are several ways to fail:

  1. Give early warning and bail.
  2. Give no warning and finish not-too-late.
  3. Give no warning and don’t finish.

The worst failure mode by far is the last one for Program Chairs and Area Chairs, because they must catch and fix all the failures at the last minute. I expect the second failure mode also impacts the quality of reviews because high speed reviewing of a deep paper often doesn’t work. This issue is one of community norms which can only be adjusted slowly. To do this, we’re going to pass a flake list for failure mode 3 to future program chairs who will hopefully further encourage people to schedule time well and review carefully.

If my experience is any guide, plenty of authors will feel disappointed by the reviews. Part of this is simply because it’s the first time the authors have had contact with people not biased towards agreeing with them, as almost all friends are. Part of this is the significant hurdle of communicating technical new things well. Part may be too-hasty reviews, as discussed above. And part of it may be that the authors simply are far more expert in their subject than reviewers.

In author responses, my personal tendency is to be blunter than most people when reviewers make errors. Perhaps “kind but clear” is a good viewpoint. You should be sympathetic to reviewers who have voluntarily put significant time into reviewing your paper, but you should also use the channel to communicate real information. Remotivating your paper almost never works, so concentrate on getting across errors in understanding by reviewers or answer their direct questions.

We did not include reviewer scores in author feedback, although we do plan to include them when the decision is made. Scores should not be regarded as final by any party, since author feedback and discussion can significantly alter a reviewer’s understanding of the paper. Encouraging reviewers to incorporate this additional information well before settling on a final score is one of my goals.

We did allow resubmission of the paper with the author response, similar to what Geoff Gordon did as program chair for AIStat. This solves two problems: It helps authors create a more polished draft, and it avoids forcing an overly constrained channel in the communication. If an equation has a bug, you can write it out bug free in mathematical notation rather than trying to describe by reference how to alter the equation in author response.

Please comment if you have further thoughts.

« Newer PostsOlder Posts »

Powered by WordPress