# Machine Learning (Theory)

## 5/2/2012

### ICML: Behind the Scenes

This is a rather long post, detailing the ICML 2012 review process. The goal is to make the process more transparent, help authors understand how we came to a decision, and discuss the strengths and weaknesses of this process for future conference organizers.

Microsoft’s Conference Management Toolkit (CMT)
We chose to use CMT over other conference management software mainly because of its rich toolkit. The interface is sub-optimal (to say the least!) but it has extensive capabilities (to handle bids, author response, resubmissions, etc.), good import/export mechanisms (to process the data elsewhere), excellent technical support (to answer late night emails, add new functionalities). Overall, it was the right choice, although we hope a designer will look at that interface sometime soon!

Toronto Matching System (TMS)
TMS is now being used by many major conferences in our field (including NIPS and UAI). It is an automated system (developed by Laurent Charlin and Rich Zemel at U. Toronto) to match reviewers to papers, based on an analysis of each reviewer’s publications. TMS collects publications from reviewers, parses them into features and applies unsupervised or supervised learning techniques to predict the relevance of any target paper for any reviewer. We convinced TMS to integrate with CMT and funded Laurent’s work for that. Reviewers were asked to put in a publication list for TMS to parse. For those who failed to do so (after many reminders!), we manually added that information from public sources.

The Program Committee
Recruiting a program committee that is both large and highly qualified is difficult these days. We sent out 69 area chair invitations; 50 (highly qualified!) people accepted. Each of these area chairs was asked to nominate a list of potential reviewers. We sent out approximately 700 invitations for program committee members; 389 accepted. A number of additional PC members were recruited during the review process (most of them for 1-2 papers), for a total of 470 active PC members. In terms of seniority, the final PC contains about ~15% students, 80% researchers, 5% other.

The Surge (ICML + 50%)
The first big challenge came on the submission deadline. In the past few years, ICML had consistently received ~550-600 submissions. This year, we had a 50% increase, to 890 submissions. We had recruited a PC that could comfortably handle 700 papers. Dealing with an extra 200 papers was not an easy task.

About 10 submissions were rejected without review for various reasons (severe formatting issues, extra pages, non-anonymization).

Bidding
An unsupervised version of TMS was used to generate a list of candidate papers for each reviewer and area chair. This was done working closely with the Laurent Charlin of TMS using validation on previous NIPS data. CMT did not have the functionality to show a good list of candidate papers to reviewers, so we crafted an interface to show this list and let reviewers use that in conjunction with CMT. Ideally, this will be better incorporated in CMT in the future.

When you ask a group of scientists to run a conference, you must expect a few experiments will take place…. And so we decided to assess the usefulness of TMS scoring for generating lists of papers to bid on. To do this, we (randomly) assigned PC members to 1 of 3 groups. One group saw a list purely based on TMS scores. Another group received a list based on the matching between their subject area and that of the paper (referred to as the “relevance” score in CMT). The third group received a list based on a mix of both TMS and relevance. Reviewers were allowed to bid on any paper (excluding those with which they had a conflict); the lists were provided to help them efficiently sort through the large number of papers. We then compared the set of bids for a reviewer, with the list of suggestions, and measured the correspondence.

The following is the Discounted Cumulative Gain (DCG) of each list with respect to the bidding scores, averaged separately for each group. Note that each group was only presented with their corresponding list and not the others.

 Group: CMT Group: TMS Group: CMT+TMS Sorting by CMT scores 6.11 out of 12.64 (48%) 4.98 out of 13.63 (36%) 4.87 out of 13.55 (35%) Sorting by TMS score 4.06 out of 12.64 (32%) 6.43 out of 13.63 (47%) 5.72 out of 13.55 (42%) Sorting by TMS+CMT 4.77 out of 12.64 (37%) 6.11 out of 13.63 (44%) 6.71 out of 13.55 (49%)

A micro-survey was also run to collect further information on how users liked their short list. 85% of the participants indicated that they have used the list interface provided to them. The following is the preference indicated by each group (~75 reviewers in each group, ~2% error):

 CMT TMS CMT+TMS Preferred CMT over list 15% 12% 8% Preferred list+CMT 81% 83% 83% Preferred list over CMT 4% 5% 9%

It is obvious from the above that most participants found the list useful in conjunction with CMT (suggesting that the list should be integrated inside CMT). We can also see that those who were presented with a list based on TMS scores were more likely to find the list useful.

Note that all of the above was done in a long hectic but fun weekend.

Imputing Missing Bids
CMT assumes that the reviewers are not willing to review a paper unless stated otherwise. It does not differentiate between an unseen (but potentially relevant) paper and a paper that has been seen and ignored. This is a real shortcoming when it comes to matching papers to reviewers, especially for those reviewers that did not bid often. To mitigate this problem, we used the click information on the shortlist presented to the reviewers to find out which papers have been observed and ignored. We then impute these cases as real non-willing bids.

Around 30 reviewers did not provide any bids (and many had only a few). This is problematic because the tools used to do the actual reviewer-paper matching tend to assign the papers without any bids to the reviewers who did not bid, regardless of the match in expertise.

Once the bidding information was in and imputation was done, we now had to fill in the rest of the paper-reviewer bidding matrix to mitigate the problem with sparse bidders. This was done, once again, through TMS, but this time using a supervised learning approach.

Using supervised learning was more delicate than expected. To deal with the wildly varying number of bids per person, we imputed zero bids, first from papers that were plausibly skipped over, and if necessary at random from papers not bid on such that each person had the same expected bid in the dataset. From this dataset, we held out a random bid per person, and then trained to predict well the heldout bid. Most optimization approaches performed poorly due to the number of features greatly exceeding the number of labels. The best approach we found used the online algorithms in Vowpal Wabbit with a mass personalized training method similar to the one discussed here. This trained predictor was used to predict bid values for the full paper-reviewer bid matrix.

Automated Area Chair and First Reviewer Assignment
Once we had the imputed paper-reviewer bidding matrix, CMT was used to generate the actual match between papers and area chairs, and (separately) between papers and reviewers. Each paper had two area chairs (sometimes called “meta-reviewers” in CMT) assigned to it, one primary, one secondary, by running two rounds of assignments (so that the primary was usually the “better” match). One reviewer per paper was also assigned automatically by CMT in a similar fashion. CMT provides proper load balancing, so that all area chairs and reviewers had similar loads.

Manual Checks of the Automated Assignments
Before finalizing the automated assignment, we manually looked through the list of papers to fix any potential problems that were not handled by the automated process. The two major cases were papers that did not go through the TMS system (authors did not agree to do so), and cases of poor primary-secondary meta-reviewer pairs (when the two area chairs are judged to be too close to offer independent assessment, e.g. working at the same institution, previous supervisor-student relationship).

Second and Third Reviewer Assignment
Once the initial assignments were announced, we asked the two area chairs for a given paper to each manually assign another reviewer from the PC. To help area chairs with this, we generated a shortlist of 10 recommended reviewers for each paper (using the estimated bid matrix and TMS score, with the CMT matching algorithm for load balancing of reviewer suggestions.) Area chairs were free to either use this list, or select from the complete program committee, or alternately, they could seek an outside reviewer which was then added to the PC, an option used 80 times. The load for each reviewer was restricted to at most 7 papers with exceptions when they agreed explicitly to more.

The second and third uses of TMS, including the new supervised learning system, lead to another long hectic weekend with Laurent, Mahdi, Joelle, and John all deeply involved.

Reviews
Most papers received at least 3 full reviews in the first round. Reviewers could not see each others’ reviews until they submitted their own. ML-Journaled submissions (see double submission guide) were reviewed only by two area chairs. In a small number of regular submissions (less than 10), we received 2 very negative reviews and notified the third reviewer (who was usually late by this point!) that we would not need their review.

Authors’ Response
Authors were given a chance to respond to the reviews during a short feedback period. This is becoming a standard practice in machine learning conferences. Authors were also allowed to upload a new version of the paper. The motivation here is that in some cases, it is easier to show the changes directly in the paper, rather than discuss them separately.

Our analysis shows that authors’ responses and subsequent discussions by reviewers made significant changes to the scoring of papers. A total of ~35% of the papers had some change in their scores after the author feedback. The average score for ~50% of the papers went down, stayed the same for ~10%, and went up for the other ~40%. The variance on the scores decreased by ~20%, indicating some convergence in the decisions.

Final Decisions
To help us better decide on the quality of the papers, we asked the primary area chairs to provide a meta-review for each of their papers. For papers without unanimous review decisions (i.e. some reviews wanted to accept and some wanted to reject), we asked the secondary area chair to (independently) fill-in a meta-review, recommending whether to accept or reject the paper. A total of 1214 meta-reviews were provided. There were also 20 papers for which a 4th review was added in this period.

In all cases where the primary and secondary area chairs disagreed on the decision, the program chairs were directly involved, reviewing all the evidence (reviews, rebuttal, discussion, often the paper itself), and entering in a discussion (usually via email) with the area chairs, until a unanimous decision was achieved.
A total of 243 papers (27% of submissions) were accepted. Author notifications were sent out on April 30.

###### 10 Comments to “ICML: Behind the Scenes”
1. Anonymous says:

Why is it that, as a reviewer, I can’t see the final decisions on the papers I reviewed in CMT?

• Mahdi says:

You can now see them using the “View Paper Statuses and Reviewing Data for Papers Assigned to Me” link in the reviewer console. Thanks for pointing it out.

2. Anonymous says:

Must say, this is an excellent job!

3. Fei Sha says:

An excellent job! I especially like the updated manuscript approach in the author feedback period. I benefited from it both as a reviewer and as an author. I am looking forward to a fantastic ICML.

One thought: it is certainly great to learn that “The variance on the scores decreased by ~20%, indicating some convergence in the decisions.” Additionally, is it also meaningful and more direct to compute the ratio of decisions that have been reversed due to responses? This might seem requiring ACs to make decisions prior to responses. However, it might be just as easy to use an automatic procedure: set a target acceptance ratio and thus derive an acceptance threshold (of the combined score) on scores pre- and post- responses… (the thresholds might be different).

• Mahdi says:

Thanks for the feedback. Regarding the comment on the relative change in decisions, I don’t think a meaningful comparison can be made without setting the threshold apriori and spending a considerable amount of time before the feedback period. This is not easy to do, given that the threshold is often a function of the acceptance rate and the current scoring. Specifically for this ICML, we were not even sure about the acceptance rate up until the very end. My general understanding is that the discussions helped increase the number of clear cut decisions. We had fewer papers with conflicting reviews and had less uncertainty in our decisions. An educated guess would suggest only a slight negative change in the noiseless assessment.

• I was extremely impressed by the ICML review process this year. John and Joelle did an *AMAZING* job.

That said, the section on “Authors’ Response” here seems a little misleading to me. Yes, there was a lot of discussion after the author feedback period, and yes, scores did tend to converge based on this discussion, but not all of this discussion was a result of the author feedback. In almost all cases I saw, the reviewers held off on starting any meaningful discussion with each other until after the feedback was collected, and not all of the discussion was related to the feedback. I saw very few cases where reviewers actually looked at the updated PDFs. So I think it is hard to tease apart how much decisions changed due to author feedback vs. how much decisions changed due to reviewers discussing the papers with each other (which would have happened with or without author feedback).

4. Anonymous says:

Great job!.. One thing I would like to know is, when will the list of accepted papers be out? I am working on a NIPS submission on a rather hot topic, and would like to know if more papers came out on a similar topic at ICML.

• jl says:

We’re working on it. The set of all titles is here:

Robust PCA in High-dimension: A Deterministic Approach
Complexity Analysis of the Lasso Regularization Path
Projection-free Online Learning
Machine Learning that Matters
Scene parsing with Multiscale Feature Learning
Linear Regression with Limited Observation
An Online Boosting Algorithm with Theoretical Justifications
Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
Approximate Modified Policy Iteration
Nonparametric Link Prediction in Dynamic Networks
Agnostic System Identification for Model-Based Reinforcement Learning
Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
A Convex Feature Learning Formulation for Latent Task Structure Discovery
Efficient Decomposed Learning for Structured Prediction
Path Integral Policy Improvement with Covariance Matrix Adaptation
Optimizing F-measure: A Tale of Two Approaches
Efficient Euclidean Projections onto the Intersection of Norm Balls
To Average or Not to Average? Making Stochastic Gradient Descent Optimal for Strongly Convex Problems
Clustering using Max-norm Constrained Optimization
Submodular Inference of Diffusion Networks from Multiple Trees
Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds
Modelling transition dynamics in MDPs with RKHS embeddings
Approximate Principal Direction Trees
Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
Randomized Smoothing for (Parallel) Stochastic Optimization
Marginalized Denoising Autoencoders for Domain Adaptation
Copula-based Kernel Dependency Measures
Policy Gradients with Variance Related Risk Criteria
Efficient Structured Prediction with Latent Variables for General Graphical Models
On the Partition Function and Random Maximum A-Posteriori Perturbations
Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis
Inferring Latent Structure From Mixed Real and Categorical Relational Data
Bayesian Cointegration
Online Alternating Direction Method
On the Sample Complexity of Reinforcement Learning with a Generative Model
Which Statistical Estimators Have Differentially Private Approximations?
High Dimensional Semiparametric Gaussian Copula Graphical Models
Discovering Support and Affiliated Features from Very High Dimensions
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
Statistical linear estimation with penalized estimators: an application to reinforcement learning
Large Scale Variational Bayesian Inference for Structured Scale Mixture Models
Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
An adaptive algorithm for finite stochastic partial monitoring
Bootstrapping Big Data
Predicting Consumer Behavior in Commerce Search
Conditional mean embeddings as regressors
A Generative Process for Contractive Auto-Encoders
Local Loss Optimization in Operator Models: A New Insight into Spectral Learning
Conditional Likelihood Maximization: A Unifying Framework for Information Theoretic Feature Selection
Communications Inspired Linear Discriminant Analysis
Sparse stochastic inference for latent Dirichlet allocation
Quasi-Newton Methods: A New Direction
No-Regret Learning in Extensive-Form Games with Imperfect Recall
Semi-supervised Metric Learning Paradigm with Hyper-Sparsity
Fast approximation of matrix coherence and statistical leverage
Predicting Manhole Events in New York City
Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting
On multi-view feature learning
Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning
A Hybrid Algorithm for Convex Semidefinite Optimization
A Complete Analysis of the l_1
Convergence Rates of Biased Stochastic Optimization for Learning Sparse Ising Models
Decoupling Exploration and Exploitation in Multi-Armed Bandits
Learning to Identify Regular Expressions that Describe Email Campaigns
Deep Mixtures of Factor Analysers
Revisiting k-means: New Algorithms via Bayesian Nonparametrics
Gaussian Process Regression Networks
Analysis of Kernel Mean Matching under Covariate Shift
Tighter Variational Representations of f-Divergences via Restriction to Probability Measures
Training Restricted Boltzmann Machines on Word Observations
Bayesian Watermark Attacks
Max-Margin Nonparametric Latent Feature Models for Link Prediction
Discriminative Probabilistic Prototype Learning
Sparse-GEV: Sparse Latent Space Model for Multivariate Extreme Value Time Serie Modeling
Factorized Asymptotic Bayesian Hidden Markov Models
PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching
A Proximal-Gradient Homotopy Method for the L1-Regularized Least-Squares Problem
Comparison-Based  Learning with Rank Nets
Semi-Supervised Collective Classification via Hybrid Label Regularization
Shortest path distance in k-nearest neighbor graphs
A Split-Merge Framework for Comparing Clusterings
Compositional Planning Using Optimal Option Models
Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation
A General Framework for Inferring Latent Task Structures
An Efficient Approach to Sparse Linear Discriminant Analysis
The Greedy Miser: Learning under Test-time Budgets
Canonical Trends: Detecting Trend Setters in Web Data
A convex relaxation for weakly supervised classifiers
Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events
A Binary Classification Framework for Two-Stage Multiple Kernel Learning
Learning Local Transformation Invariance with Restricted Boltzmann Machines
Consistent Multilabel Ranking through Univariate Losses
On the Equivalence between Herding and Conditional Gradient Algorithms
Variational Bayesian Inference with Stochastic Search
Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering
Joint Optimization and Variable Selection of High-dimensional Gaussian Processes
Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
Anytime Marginal MAP Inference
Distributed Parameter Estimation via Pseudo-likelihood
The Nonparametric Metadata Dependent Relational Model
Deep Lambertian Networks
Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients
A Joint Model of Language and Perception for Grounded Attribute Learning
Learning Parameterized Skills
Safe Exploration in Markov Decision Processes
Improved Estimation in Time Varying Models
Poisoning Attacks against Support Vector Machines
Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations
Utilizing Static Analysis and Code Generation to Accelerate Neural Networks
Similarity Learning for Provably Accurate Sparse Linear Classification
Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation
Fast Training of Nonlinear Embedding Algorithms
Fast Prediction of New Feature Utility
Robust Classification with Adiabatic Quantum Optimization
Agglomerative Bregman Clustering
Isoelastic Agents and Wealth Updates in Machine Learning Markets
Compact Hyperplane Hashing with Bilinear Functions
Continuous Inverse Optimal Control with Locally Optimal Examples
A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling
Levy Measure Decompositions for the Beta and Gamma Processes
Building high-level features using large scale unsupervised learning
Near-Optimal BRL using Optimistic Local Transitions
A Unified Robust Classification Model
Manifold Relevance Determination
Residual Components Analysis
Clustering to Maximize the Ratio of Split to Diameter
A Graphical Model Formulation of Collaborative Filtering  Neighbourhood Methods with Fast Maximum Entropy Training
On-Line Portfolio Selection with Moving Average Reversion
Improved Information Gain Estimates for Decision Tree Induction
Influence Maximization in Continuous Time Diffusion Networks
On the Size of the Online Kernel Sparsification Dictionary
Multi-level Lasso for Sparse Multi-task Regression
Fast Computation of Subpath Kernel for Trees
Total Variation and Euler's Elastica for Supervised Learning
Learning the Dependence Graph of Time Series with Latent Factors
A Generalized Loop Correction Method  for Approximate Inference in Graphical Models
Infinite-Word Topic Models
Consistent Covariance Selection From Data With Missing Values
Is margin preserved after random projection?
A Bayesian Approach to Approximate Joint Diagonalization of Square Matrices
Predicting accurate probabilities with a ranking loss
Learning with Augmented Features for Heterogeneous Domain Adaptation
Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data
Evaluating Bayesian and L1 Approaches for  Sparse Unsupervised Learning
Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems
LPQP for MAP: Putting LP Solvers to Better Use
Clustering by Low-Rank Doubly Stochastic Matrix Decomposition
Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling
State-Space Inference for Non-Linear Latent Force Models with Application to Satellite Orbit Prediction
Sparse Additive Functional and Kernel CCA
The Kernelized Stochastic Batch Perceptron
Fast classification using  sparse decision DAGs
A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion
Rethinking Collapsed Variational Bayes Inference for LDA
Exact Maximum Margin Structure Learning of Bayesian Networks
AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem
Hypothesis testing using pairwise distances and associated kernels
Monte Carlo Bayesian Reinforcement Learning
A Topic Model for Melodic Sequences
Output Space Search for Structured Prediction
How To Grade a Test Without Knowing the Answers ---  A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing
Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound
Bayesian Optimal Active Search and Surveying
Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis
Lognormal and Gamma Mixed Negative Binomial Regression
Greedy Algorithms for Sparse Reinforcement Learning
Variance Function Estimation in High-dimensions
Conditional Sparse Coding and Grouped Multivariate Regression
Apprenticeship Learning for Model Parameters of Partially Observable Environments
Modeling Images using Transformed Indian Buffet Processes
Learning Object Arrangements in 3D Scenes using Human Context
Cross Language Text Classification via  Subspace Co-regularized Multi-view Learning
Plug-in martingales for testing exchangeability on-line
An Iterative Locally Linear Embedding Algorithm
Gaussian Process Quantile Regression using Expectation Propagation
Latent Multi-group Membership Graph Model
Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations
Estimating the Hessian by Back-propagating Curvature
Feature Selection via Probabilistic Outputs
Hierarchical Exploration for Accelerating Contextual Bandits
Latent Collaborative Retrieval
Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
On causal and anticausal learning
Bayesian Efficient Multiple Kernel Learning
Bayesian Nonexhaustive Learning for Online Discovery and Modeling of Emerging Classes
Exact Soft Confidence-Weighted Learning
Distributed Tree Kernels
Multiple Kernel Learning from Noisy Labels by Stochastic Programming
Improved Nystrom Low-rank Decomposition with Priors
Active Learning for Matching Problems
Ensemble Methods for Convex Regression with Applications to Geometric Programming Based Circuit Design
Groupwise Constrained Reconstruction for Subspace Clustering
Stability of matrix factorization for collaborative filtering
Linear Off-Policy Actor-Critic
Modeling Latent Variable Uncertainty for Loss-based Learning
Dimensionality Reduction by Local Discriminative Gaussians
Learning to Label Aerial Images from Noisy Data
The Most Persistent Soft-Clique in a Set of Sampled Graphs
Learning Efficient Structured Sparse Models
PAC Subset Selection in Stochastic Multi-armed Bandits
Nonparametric variational inference
The Convexity and Design of Composite Multiclass Losses
Finding Botnets Using Minimal Graph Clusterings
Learning the Experts for Online Sequence Prediction
Efficient Active Algorithms for Hierarchical Clustering
Copula Mixture Model for Dependency-seeking Clustering
The Landmark Selection Method for Multiple Output Prediction
Subgraph Matching Kernels for Attributed Graphs
Adaptive Canonical Correlation Analysis Based On Matrix Manifolds
Batch Active Learning via Coordinated Matching
Hybrid Batch Bayesian Optimization
Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization
Gap Filling in the Plant Kingdom---Trait Prediction Using Hierarchical Probabilistic Matrix Factorization
Sparse Support Vector Infinite Push
A Dantzig Selector Approach to Temporal Difference Learning
Scaling Up Coordinate Descent Algorithms for Large $\ell_1$ Regularization Problems
Cross-Domain Multitask Learning with Latent Probit Models
Structured Learning from Partial Annotations
Maximum Margin Output Coding
Sequential Nonparametric Regression
An Infinite Latent Attribute Model for Network Data
On Local Regret
Smoothness and Structure Learning by Proxy
A fast and simple algorithm for training neural probabilistic language models
Incorporating Causal Prior Knowledge as Path-Constraints in Bayesian Networks and Maximal Ancestral Graphs
High-Dimensional  Covariance  Decomposition into Sparse Markov and Independence Domains
Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss
Bounded Planning in Passive POMDPs
Capturing topical content with frequency and exclusivity
TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings
Robust Multiple Manifold Structure Learning
Two Manifold Problems with Applications to Nonlinear System Identification
On the Difficulty of Nearest Neighbor Search
Learning Force Control Policies for Compliant Robotic Manipulation
Estimation of Simultaneously Sparse and Low Rank Matrices
Online Structured Prediction via Coactive Learning
Using CCA to improve CCA: A new spectral method for estimating vector models of words
Integer Optimization Methods for Supervised Ranking
Conversational Speech Transcription  Using Context-Dependent Deep Neural Networks
Data-driven Web Design
Learning the Central Events and Participants in Unlabeled Text
Exemplar-SVMs for Visual Ob ject Detection

5. Gavin Brown says:

I’d like to reiterate the thanks for the incredible openness of this year’s review process. Thankyou.
I wonder if it’s possible to release more detailed acceptance stats – specifically I wonder about the not-for-proceedings papers, those that were previously ML-journaled. I assume these were assessed in a separate pool to the for-proceedings papers, if so what was the acceptance rate in this pool?

• Joelle Pineau says:

For the not-for-proceedings papers, we received 7 submissions and 4 were accepted. Only 2 of the submissions were already accepted at ML journals; they received brief reviews mainly to assess interest. The other 5 underwent the usual ICML reviewing process (either because they had not been accepted yet, or they had been accepted at journals other than MLJ and JMLR).

For the AIStats resubmissions, we received 4 submissions and 3 were accepted. All underwent the usual ICML reviewing process, with the slight difference that 1-2 reviewers had previously reviewed the paper for AIStats.

John should release stats about the acceptance rate per subject area shortly.

Sorry, the comment form is closed at this time.