ICML: Behind the Scenes

This is a rather long post, detailing the ICML 2012 review process. The goal is to make the process more transparent, help authors understand how we came to a decision, and discuss the strengths and weaknesses of this process for future conference organizers.

Microsoft’s Conference Management Toolkit (CMT)
We chose to use CMT over other conference management software mainly because of its rich toolkit. The interface is sub-optimal (to say the least!) but it has extensive capabilities (to handle bids, author response, resubmissions, etc.), good import/export mechanisms (to process the data elsewhere), excellent technical support (to answer late night emails, add new functionalities). Overall, it was the right choice, although we hope a designer will look at that interface sometime soon!

Toronto Matching System (TMS)
TMS is now being used by many major conferences in our field (including NIPS and UAI). It is an automated system (developed by Laurent Charlin and Rich Zemel at U. Toronto) to match reviewers to papers, based on an analysis of each reviewer’s publications. TMS collects publications from reviewers, parses them into features and applies unsupervised or supervised learning techniques to predict the relevance of any target paper for any reviewer. We convinced TMS to integrate with CMT and funded Laurent’s work for that. Reviewers were asked to put in a publication list for TMS to parse. For those who failed to do so (after many reminders!), we manually added that information from public sources.

The Program Committee
Recruiting a program committee that is both large and highly qualified is difficult these days. We sent out 69 area chair invitations; 50 (highly qualified!) people accepted. Each of these area chairs was asked to nominate a list of potential reviewers. We sent out approximately 700 invitations for program committee members; 389 accepted. A number of additional PC members were recruited during the review process (most of them for 1-2 papers), for a total of 470 active PC members. In terms of seniority, the final PC contains about ~15% students, 80% researchers, 5% other.

The Surge (ICML + 50%)
The first big challenge came on the submission deadline. In the past few years, ICML had consistently received ~550-600 submissions. This year, we had a 50% increase, to 890 submissions. We had recruited a PC that could comfortably handle 700 papers. Dealing with an extra 200 papers was not an easy task.

About 10 submissions were rejected without review for various reasons (severe formatting issues, extra pages, non-anonymization).

An unsupervised version of TMS was used to generate a list of candidate papers for each reviewer and area chair. This was done working closely with the Laurent Charlin of TMS using validation on previous NIPS data. CMT did not have the functionality to show a good list of candidate papers to reviewers, so we crafted an interface to show this list and let reviewers use that in conjunction with CMT. Ideally, this will be better incorporated in CMT in the future.

When you ask a group of scientists to run a conference, you must expect a few experiments will take place…. And so we decided to assess the usefulness of TMS scoring for generating lists of papers to bid on. To do this, we (randomly) assigned PC members to 1 of 3 groups. One group saw a list purely based on TMS scores. Another group received a list based on the matching between their subject area and that of the paper (referred to as the “relevance” score in CMT). The third group received a list based on a mix of both TMS and relevance. Reviewers were allowed to bid on any paper (excluding those with which they had a conflict); the lists were provided to help them efficiently sort through the large number of papers. We then compared the set of bids for a reviewer, with the list of suggestions, and measured the correspondence.

The following is the Discounted Cumulative Gain (DCG) of each list with respect to the bidding scores, averaged separately for each group. Note that each group was only presented with their corresponding list and not the others.

Group: CMT Group: TMS Group: CMT+TMS
Sorting by CMT scores 6.11 out of 12.64 (48%) 4.98 out of 13.63 (36%) 4.87 out of 13.55 (35%)
Sorting by TMS score 4.06 out of 12.64 (32%) 6.43 out of 13.63 (47%) 5.72 out of 13.55 (42%)
Sorting by TMS+CMT 4.77 out of 12.64 (37%) 6.11 out of 13.63 (44%) 6.71 out of 13.55 (49%)

A micro-survey was also run to collect further information on how users liked their short list. 85% of the participants indicated that they have used the list interface provided to them. The following is the preference indicated by each group (~75 reviewers in each group, ~2% error):

Preferred CMT over list 15% 12% 8%
Preferred list+CMT 81% 83% 83%
Preferred list over CMT 4% 5% 9%

It is obvious from the above that most participants found the list useful in conjunction with CMT (suggesting that the list should be integrated inside CMT). We can also see that those who were presented with a list based on TMS scores were more likely to find the list useful.

Note that all of the above was done in a long hectic but fun weekend.

Imputing Missing Bids
CMT assumes that the reviewers are not willing to review a paper unless stated otherwise. It does not differentiate between an unseen (but potentially relevant) paper and a paper that has been seen and ignored. This is a real shortcoming when it comes to matching papers to reviewers, especially for those reviewers that did not bid often. To mitigate this problem, we used the click information on the shortlist presented to the reviewers to find out which papers have been observed and ignored. We then impute these cases as real non-willing bids.

Around 30 reviewers did not provide any bids (and many had only a few). This is problematic because the tools used to do the actual reviewer-paper matching tend to assign the papers without any bids to the reviewers who did not bid, regardless of the match in expertise.

Once the bidding information was in and imputation was done, we now had to fill in the rest of the paper-reviewer bidding matrix to mitigate the problem with sparse bidders. This was done, once again, through TMS, but this time using a supervised learning approach.

Using supervised learning was more delicate than expected. To deal with the wildly varying number of bids per person, we imputed zero bids, first from papers that were plausibly skipped over, and if necessary at random from papers not bid on such that each person had the same expected bid in the dataset. From this dataset, we held out a random bid per person, and then trained to predict well the heldout bid. Most optimization approaches performed poorly due to the number of features greatly exceeding the number of labels. The best approach we found used the online algorithms in Vowpal Wabbit with a mass personalized training method similar to the one discussed here. This trained predictor was used to predict bid values for the full paper-reviewer bid matrix.

Automated Area Chair and First Reviewer Assignment
Once we had the imputed paper-reviewer bidding matrix, CMT was used to generate the actual match between papers and area chairs, and (separately) between papers and reviewers. Each paper had two area chairs (sometimes called “meta-reviewers” in CMT) assigned to it, one primary, one secondary, by running two rounds of assignments (so that the primary was usually the “better” match). One reviewer per paper was also assigned automatically by CMT in a similar fashion. CMT provides proper load balancing, so that all area chairs and reviewers had similar loads.

Manual Checks of the Automated Assignments
Before finalizing the automated assignment, we manually looked through the list of papers to fix any potential problems that were not handled by the automated process. The two major cases were papers that did not go through the TMS system (authors did not agree to do so), and cases of poor primary-secondary meta-reviewer pairs (when the two area chairs are judged to be too close to offer independent assessment, e.g. working at the same institution, previous supervisor-student relationship).

Second and Third Reviewer Assignment
Once the initial assignments were announced, we asked the two area chairs for a given paper to each manually assign another reviewer from the PC. To help area chairs with this, we generated a shortlist of 10 recommended reviewers for each paper (using the estimated bid matrix and TMS score, with the CMT matching algorithm for load balancing of reviewer suggestions.) Area chairs were free to either use this list, or select from the complete program committee, or alternately, they could seek an outside reviewer which was then added to the PC, an option used 80 times. The load for each reviewer was restricted to at most 7 papers with exceptions when they agreed explicitly to more.

The second and third uses of TMS, including the new supervised learning system, lead to another long hectic weekend with Laurent, Mahdi, Joelle, and John all deeply involved.

Most papers received at least 3 full reviews in the first round. Reviewers could not see each others’ reviews until they submitted their own. ML-Journaled submissions (see double submission guide) were reviewed only by two area chairs. In a small number of regular submissions (less than 10), we received 2 very negative reviews and notified the third reviewer (who was usually late by this point!) that we would not need their review.

Authors’ Response
Authors were given a chance to respond to the reviews during a short feedback period. This is becoming a standard practice in machine learning conferences. Authors were also allowed to upload a new version of the paper. The motivation here is that in some cases, it is easier to show the changes directly in the paper, rather than discuss them separately.

Our analysis shows that authors’ responses and subsequent discussions by reviewers made significant changes to the scoring of papers. A total of ~35% of the papers had some change in their scores after the author feedback. The average score for ~50% of the papers went down, stayed the same for ~10%, and went up for the other ~40%. The variance on the scores decreased by ~20%, indicating some convergence in the decisions.

Final Decisions
To help us better decide on the quality of the papers, we asked the primary area chairs to provide a meta-review for each of their papers. For papers without unanimous review decisions (i.e. some reviews wanted to accept and some wanted to reject), we asked the secondary area chair to (independently) fill-in a meta-review, recommending whether to accept or reject the paper. A total of 1214 meta-reviews were provided. There were also 20 papers for which a 4th review was added in this period.

In all cases where the primary and secondary area chairs disagreed on the decision, the program chairs were directly involved, reviewing all the evidence (reviews, rebuttal, discussion, often the paper itself), and entering in a discussion (usually via email) with the area chairs, until a unanimous decision was achieved.
A total of 243 papers (27% of submissions) were accepted. Author notifications were sent out on April 30.

10 Replies to “ICML: Behind the Scenes”

  1. Why is it that, as a reviewer, I can’t see the final decisions on the papers I reviewed in CMT?

    1. You can now see them using the “View Paper Statuses and Reviewing Data for Papers Assigned to Me” link in the reviewer console. Thanks for pointing it out.

  2. An excellent job! I especially like the updated manuscript approach in the author feedback period. I benefited from it both as a reviewer and as an author. I am looking forward to a fantastic ICML.

    One thought: it is certainly great to learn that “The variance on the scores decreased by ~20%, indicating some convergence in the decisions.” Additionally, is it also meaningful and more direct to compute the ratio of decisions that have been reversed due to responses? This might seem requiring ACs to make decisions prior to responses. However, it might be just as easy to use an automatic procedure: set a target acceptance ratio and thus derive an acceptance threshold (of the combined score) on scores pre- and post- responses… (the thresholds might be different).

    1. Thanks for the feedback. Regarding the comment on the relative change in decisions, I don’t think a meaningful comparison can be made without setting the threshold apriori and spending a considerable amount of time before the feedback period. This is not easy to do, given that the threshold is often a function of the acceptance rate and the current scoring. Specifically for this ICML, we were not even sure about the acceptance rate up until the very end. My general understanding is that the discussions helped increase the number of clear cut decisions. We had fewer papers with conflicting reviews and had less uncertainty in our decisions. An educated guess would suggest only a slight negative change in the noiseless assessment.

      1. I was extremely impressed by the ICML review process this year. John and Joelle did an *AMAZING* job.

        That said, the section on “Authors’ Response” here seems a little misleading to me. Yes, there was a lot of discussion after the author feedback period, and yes, scores did tend to converge based on this discussion, but not all of this discussion was a result of the author feedback. In almost all cases I saw, the reviewers held off on starting any meaningful discussion with each other until after the feedback was collected, and not all of the discussion was related to the feedback. I saw very few cases where reviewers actually looked at the updated PDFs. So I think it is hard to tease apart how much decisions changed due to author feedback vs. how much decisions changed due to reviewers discussing the papers with each other (which would have happened with or without author feedback).

  3. Great job!.. One thing I would like to know is, when will the list of accepted papers be out? I am working on a NIPS submission on a rather hot topic, and would like to know if more papers came out on a similar topic at ICML.

    1. We’re working on it. The set of all titles is here:

      Robust PCA in High-dimension: A Deterministic Approach
      Complexity Analysis of the Lasso Regularization Path
      Projection-free Online Learning
      Machine Learning that Matters
      Scene parsing with Multiscale Feature Learning
      Linear Regression with Limited Observation
      An Online Boosting Algorithm with Theoretical Justifications
      Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
      Approximate Modified Policy Iteration
      Nonparametric Link Prediction in Dynamic Networks
      Agnostic System Identification for Model-Based Reinforcement Learning
      Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure
      A Convex Feature Learning Formulation for Latent Task Structure Discovery
      Efficient Decomposed Learning for Structured Prediction
      Path Integral Policy Improvement with Covariance Matrix Adaptation
      Optimizing F-measure: A Tale of Two Approaches
      Efficient Euclidean Projections onto the Intersection of Norm Balls
      To Average or Not to Average? Making Stochastic Gradient Descent Optimal for Strongly Convex Problems
      Clustering using Max-norm Constrained Optimization
      Submodular Inference of Diffusion Networks from Multiple Trees
      Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds
      Modelling transition dynamics in MDPs with RKHS embeddings
      Approximate Principal Direction Trees
      Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
      Randomized Smoothing for (Parallel) Stochastic Optimization
      Marginalized Denoising Autoencoders for Domain Adaptation
      Copula-based Kernel Dependency Measures
      Group Sparse Additive Models
      Policy Gradients with Variance Related Risk Criteria
      Efficient Structured Prediction with Latent Variables for General Graphical Models
      On the Partition Function and Random Maximum A-Posteriori Perturbations
      Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis
      Inferring Latent Structure From Mixed Real and Categorical Relational Data
      Bayesian Cointegration
      Online Alternating Direction Method
      On the Sample Complexity of Reinforcement Learning with a Generative Model 
      Which Statistical Estimators Have Differentially Private Approximations?
      Learning Task Grouping and Overlap in Multi-task Learning
      High Dimensional Semiparametric Gaussian Copula Graphical Models
      Discovering Support and Affiliated Features from Very High Dimensions
      Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
      Statistical linear estimation with penalized estimators: an application to reinforcement learning
      Large Scale Variational Bayesian Inference for Structured Scale Mixture Models
      Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring
      An adaptive algorithm for finite stochastic partial monitoring
      Bootstrapping Big Data
      Predicting Consumer Behavior in Commerce Search
      Conditional mean embeddings as regressors
      A Generative Process for Contractive Auto-Encoders
      Local Loss Optimization in Operator Models: A New Insight into Spectral Learning
      Conditional Likelihood Maximization: A Unifying Framework for Information Theoretic Feature Selection
      Communications Inspired Linear Discriminant Analysis
      Sparse stochastic inference for latent Dirichlet allocation
      Quasi-Newton Methods: A New Direction
      No-Regret Learning in Extensive-Form Games with Imperfect Recall
      Semi-supervised Metric Learning Paradigm with Hyper-Sparsity
      Fast approximation of matrix coherence and statistical leverage
      Predicting Manhole Events in New York City
      Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting
      On multi-view feature learning
      Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning
      A Hybrid Algorithm for Convex Semidefinite Optimization
      A Complete Analysis of the l_1
      Convergence Rates of Biased Stochastic Optimization for Learning Sparse Ising Models
      Decoupling Exploration and Exploitation in Multi-Armed Bandits
      Learning to Identify Regular Expressions that Describe Email Campaigns
      Deep Mixtures of Factor Analysers
      Revisiting k-means: New Algorithms via Bayesian Nonparametrics
      Gaussian Process Regression Networks
      Analysis of Kernel Mean Matching under Covariate Shift
      Tighter Variational Representations of f-Divergences via Restriction to Probability Measures
      Training Restricted Boltzmann Machines on Word Observations
      Bayesian Watermark Attacks
      Max-Margin Nonparametric Latent Feature Models for Link Prediction
      Discriminative Probabilistic Prototype Learning
      Sparse-GEV: Sparse Latent Space Model for Multivariate Extreme Value Time Serie Modeling
      Factorized Asymptotic Bayesian Hidden Markov Models
      PAC-Bayesian Generalization Bound on Confusion Matrix for Multi-Class Classification
      Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching
      A Proximal-Gradient Homotopy Method for the L1-Regularized Least-Squares Problem
      Comparison-Based  Learning with Rank Nets
      Semi-Supervised Collective Classification via Hybrid Label Regularization
      Shortest path distance in k-nearest neighbor graphs
      A Split-Merge Framework for Comparing Clusterings
      Compositional Planning Using Optimal Option Models
      Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation
      A General Framework for Inferring Latent Task Structures
      An Efficient Approach to Sparse Linear Discriminant Analysis
      The Greedy Miser: Learning under Test-time Budgets
      Canonical Trends: Detecting Trend Setters in Web Data
      A convex relaxation for weakly supervised classifiers
      Demand-Driven Clustering in Relational Domains for Predicting Adverse Drug Events
      A Binary Classification Framework for Two-Stage Multiple Kernel Learning
      Learning Local Transformation Invariance with Restricted Boltzmann Machines
      Consistent Multilabel Ranking through Univariate Losses
      On the Equivalence between Herding and Conditional Gradient Algorithms
      Variational Bayesian Inference with Stochastic Search
      Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering
      Joint Optimization and Variable Selection of High-dimensional Gaussian Processes
      Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
      Anytime Marginal MAP Inference
      Distributed Parameter Estimation via Pseudo-likelihood 
      The Nonparametric Metadata Dependent Relational Model
      Deep Lambertian Networks
      Convergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients
      A Joint Model of Language and Perception for Grounded Attribute Learning
      Learning Parameterized Skills
      Safe Exploration in Markov Decision Processes 
      Improved Estimation in Time Varying Models
      Poisoning Attacks against Support Vector Machines
      Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations
      Utilizing Static Analysis and Code Generation to Accelerate Neural Networks
      Similarity Learning for Provably Accurate Sparse Linear Classification
      Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation
      Fast Training of Nonlinear Embedding Algorithms
      Fast Prediction of New Feature Utility
      Robust Classification with Adiabatic Quantum Optimization
      Agglomerative Bregman Clustering
      Isoelastic Agents and Wealth Updates in Machine Learning Markets
      Compact Hyperplane Hashing with Bilinear Functions
      Continuous Inverse Optimal Control with Locally Optimal Examples
      Convex Multitask Learning with Flexible Task Clusters
      A Hierarchical Dirichlet Process Model with Multiple Levels of Clustering for Human EEG Seizure Modeling
      Levy Measure Decompositions for the Beta and Gamma Processes
      Building high-level features using large scale unsupervised learning
      Near-Optimal BRL using Optimistic Local Transitions
      A Unified Robust Classification Model
      Manifold Relevance Determination
      Residual Components Analysis
      Clustering to Maximize the Ratio of Split to Diameter
      A Graphical Model Formulation of Collaborative Filtering  Neighbourhood Methods with Fast Maximum Entropy Training
      On-Line Portfolio Selection with Moving Average Reversion
      Improved Information Gain Estimates for Decision Tree Induction
      Influence Maximization in Continuous Time Diffusion Networks
      On the Size of the Online Kernel Sparsification Dictionary
      Multi-level Lasso for Sparse Multi-task Regression
      Fast Computation of Subpath Kernel for Trees
      Total Variation and Euler's Elastica for Supervised Learning
      Learning the Dependence Graph of Time Series with Latent Factors
      A Generalized Loop Correction Method  for Approximate Inference in Graphical Models
      Infinite-Word Topic Models
      Consistent Covariance Selection From Data With Missing Values
      Is margin preserved after random projection?
      A Bayesian Approach to Approximate Joint Diagonalization of Square Matrices
      Predicting accurate probabilities with a ranking loss
      Learning with Augmented Features for Heterogeneous Domain Adaptation
      Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data
      Evaluating Bayesian and L1 Approaches for  Sparse Unsupervised Learning 
      Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems
      LPQP for MAP: Putting LP Solvers to Better Use
      Clustering by Low-Rank Doubly Stochastic Matrix Decomposition
      Dependent Hierarchical Normalized Random Measures for Dynamic Topic Modeling
      State-Space Inference for Non-Linear Latent Force Models with Application to Satellite Orbit Prediction
      Sparse Additive Functional and Kernel CCA
      The Kernelized Stochastic Batch Perceptron
      Fast classification using  sparse decision DAGs
      A Combinatorial Algebraic Approach for the Identifiability of Low-Rank Matrix Completion
      Rethinking Collapsed Variational Bayes Inference for LDA
      Exact Maximum Margin Structure Learning of Bayesian Networks
      AOSO-LogitBoost: Adaptive One-Vs-One LogitBoost for Multi-Class Problem
      Hypothesis testing using pairwise distances and associated kernels
      Monte Carlo Bayesian Reinforcement Learning
      A Topic Model for Melodic Sequences
      Output Space Search for Structured Prediction
      How To Grade a Test Without Knowing the Answers ---  A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing
      Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
      A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound
      Bayesian Optimal Active Search and Surveying
      Incorporating Domain Knowledge in Matching Problems via Harmonic Analysis
      Lognormal and Gamma Mixed Negative Binomial Regression
      Greedy Algorithms for Sparse Reinforcement Learning
      Variance Function Estimation in High-dimensions
      Conditional Sparse Coding and Grouped Multivariate Regression
      Apprenticeship Learning for Model Parameters of Partially Observable Environments
      Modeling Images using Transformed Indian Buffet Processes
      Learning Object Arrangements in 3D Scenes using Human Context
      Cross Language Text Classification via  Subspace Co-regularized Multi-view Learning 
      Plug-in martingales for testing exchangeability on-line
      An Iterative Locally Linear Embedding Algorithm
      Gaussian Process Quantile Regression using Expectation Propagation
      Latent Multi-group Membership Graph Model
      Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations
      Estimating the Hessian by Back-propagating Curvature
      Feature Selection via Probabilistic Outputs
      Hierarchical Exploration for Accelerating Contextual Bandits
      Latent Collaborative Retrieval
      Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
      On causal and anticausal learning
      Bayesian Efficient Multiple Kernel Learning
      Bayesian Nonexhaustive Learning for Online Discovery and Modeling of Emerging Classes
      Exact Soft Confidence-Weighted Learning
      Distributed Tree Kernels
      Multiple Kernel Learning from Noisy Labels by Stochastic Programming
      Improved Nystrom Low-rank Decomposition with Priors
      Active Learning for Matching Problems
      Ensemble Methods for Convex Regression with Applications to Geometric Programming Based Circuit Design
      Groupwise Constrained Reconstruction for Subspace Clustering
      Stability of matrix factorization for collaborative filtering
      Adaptive Regularization for Similarity Measures
      Linear Off-Policy Actor-Critic
      Modeling Latent Variable Uncertainty for Loss-based Learning
      Dimensionality Reduction by Local Discriminative Gaussians
      Learning to Label Aerial Images from Noisy Data
      The Most Persistent Soft-Clique in a Set of Sampled Graphs
      Learning Efficient Structured Sparse Models
      PAC Subset Selection in Stochastic Multi-armed Bandits
      Nonparametric variational inference
      The Convexity and Design of Composite Multiclass Losses
      Finding Botnets Using Minimal Graph Clusterings
      Learning the Experts for Online Sequence Prediction
      Efficient Active Algorithms for Hierarchical Clustering
      Copula Mixture Model for Dependency-seeking Clustering
      The Landmark Selection Method for Multiple Output Prediction
      Subgraph Matching Kernels for Attributed Graphs
      Adaptive Canonical Correlation Analysis Based On Matrix Manifolds
      Batch Active Learning via Coordinated Matching
      Hybrid Batch Bayesian Optimization
      Efficient and Practical Stochastic Subgradient Descent for Nuclear Norm Regularization
      Gap Filling in the Plant Kingdom---Trait Prediction Using Hierarchical Probabilistic Matrix Factorization
      Sparse Support Vector Infinite Push
      A Dantzig Selector Approach to Temporal Difference Learning
      Scaling Up Coordinate Descent Algorithms for Large $\ell_1$ Regularization Problems
      Cross-Domain Multitask Learning with Latent Probit Models
      Structured Learning from Partial Annotations
      Maximum Margin Output Coding
      Sequential Nonparametric Regression
      An Infinite Latent Attribute Model for Network Data
      On Local Regret
      Smoothness and Structure Learning by Proxy 
      A fast and simple algorithm for training neural probabilistic language models
      Incorporating Causal Prior Knowledge as Path-Constraints in Bayesian Networks and Maximal Ancestral Graphs
      High-Dimensional  Covariance  Decomposition into Sparse Markov and Independence Domains
      Minimizing The Misclassification Error Rate Using a Surrogate Convex Loss
      Bounded Planning in Passive POMDPs
      Capturing topical content with frequency and exclusivity
      TrueLabel + Confusions: A Spectrum of Probabilistic Models in Analyzing Multiple Ratings
      Robust Multiple Manifold Structure Learning
      Two Manifold Problems with Applications to Nonlinear System Identification
      On the Difficulty of Nearest Neighbor Search
      Learning Force Control Policies for Compliant Robotic Manipulation
      Estimation of Simultaneously Sparse and Low Rank Matrices
      Online Structured Prediction via Coactive Learning
      Using CCA to improve CCA: A new spectral method for estimating vector models of words
      Integer Optimization Methods for Supervised Ranking
      Conversational Speech Transcription  Using Context-Dependent Deep Neural Networks
      Data-driven Web Design
      Learning the Central Events and Participants in Unlabeled Text
      Exemplar-SVMs for Visual Ob ject Detection
  4. I’d like to reiterate the thanks for the incredible openness of this year’s review process. Thankyou.
    I wonder if it’s possible to release more detailed acceptance stats – specifically I wonder about the not-for-proceedings papers, those that were previously ML-journaled. I assume these were assessed in a separate pool to the for-proceedings papers, if so what was the acceptance rate in this pool?

    1. For the not-for-proceedings papers, we received 7 submissions and 4 were accepted. Only 2 of the submissions were already accepted at ML journals; they received brief reviews mainly to assess interest. The other 5 underwent the usual ICML reviewing process (either because they had not been accepted yet, or they had been accepted at journals other than MLJ and JMLR).

      For the AIStats resubmissions, we received 4 submissions and 3 were accepted. All underwent the usual ICML reviewing process, with the slight difference that 1-2 reviewers had previously reviewed the paper for AIStats.

      John should release stats about the acceptance rate per subject area shortly.

Comments are closed.