I had a fantastic time at ICML 2016— I learned a great deal. There was far more good stuff than I could see, and it was exciting to catch up on recent advances.

David Silver gave one of the best tutorials I’ve seen on his group’s recent work in “deep” reinforcement learning. I learned about a few new techniques, including the benefits of asychrononous updates in distributed Q-learning https://arxiv.org/abs/1602.01783, which was presented in more detail at the main conference. The new domains being explored were exciting, as were the improvements made on the computational side. I would love to seen more pointers to some of the related work from the tutorial, particularly given there was such an exciting mix of new techniques and old staples (e.g. experience replay http://www.dtic.mil/dtic/tr/fulltext/u2/a261434.pdf ), but the talk was so information packed it would have been difficult.

Pieter Abbeel gave an outstanding talk in the Abstraction in RL workshop http://rlabstraction2016.wix.com/icml#!schedule/bx34m, and (I heard) another excellent one during the deep learning workshop.

It was rumored that Aviv Tamar gave an exciting talk (I believe on this http://arxiv.org/abs/1602.02867) , but I was forced to miss it to see Rong Ge’s https://users.cs.duke.edu/~rongge/ outstanding talk on a new-ish geometric tool for understanding non-convex optimization, the

*strict saddle.*I first read about the approach here http://arxiv.org/abs/1503.02101, but at ICML he and other authors have demonstrated a remarkable number of problems that have this property that enables efficient optimization via an stochastic gradient descent (and other) procedures.This was a theme of ICML— an incredible amount of good material, so much that I barely saw the posters at all because there was nearly always a talk I wanted to see!

Rocky Duan surveyed some benchmark RL continuous control problems http://jmlr.org/proceedings/papers/v48/duan16.pdf An interesting theme of the conference— and came up in conversation with John Schulman and Yann LeCun– was really old methods working well. In fact, this group demonstrated that variants of the natural/covariant policy gradient proposed originally by Sham Kakade (with a derivation here: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1080&context=robotics) are largely at the state-of-the-art on many benchmark problems. There are some clever tricks necessary for large policy classes like neural networks (like using a partial-least squares-style truncated conjugate gradient to solve for the change in policy in the usual F \delta = \nabla one solves in the natural gradient procedure) that dramatically improve performance (https://arxiv.org/abs/1502.05477). I had begun to view these methods as doing little better (or worse) then black-box search, so it’s exciting to see them make a comeback.

Chelsea Finn http://people.eecs.berkeley.edu/~cbfinn/ gave an outstanding talk on this work https://arxiv.org/abs/1603.00448. She and co-authors (Sergey Levine and Pieter) effectively came up with a technique that lets one apply Maximum Entropy Inverse Optimal Control without the double-loop procedure and using policy gradient techniques. Jonathan Ho described a related algorithm http://jmlr.org/proceedings/papers/v48/ho16.pdf that also appeared to mix policy gradient and an optimization over cost functions. Both are definitely on my reading list, and I want to understand the trade-offs of the techniques.

Both presentations were informative, and both made the interesting connection to Generative Adversarial Nets (GANS) http://arxiv.org/abs/1406.2661 . These were also a theme of the conference in both talks and during discussions. A very cool idea getting more traction, and being embraced by the neural net pioneers.

David Belanger https://people.cs.umass.edu/~belanger/belanger_spen_icml.pdf gave a interesting talk on using backprop to optimize a structured output relative to a a learned cost function. I left thinking the technique was closely related to inverse optimal control methods and the GANs, and wanting understand how implicit differentiation wasn’t being used to optimize the energy function parameters.

Speaking of neural net pioneers— there was lots of good talks during both the main conference and workshops on what’s new — and what’s old https://sites.google.com/site/nnb2tf/— in neural network architectures and algorithms.

I was intrigued by http://jmlr.org/proceedings/papers/v48/balduzzi16.pdf and particularly by the well written blog post it mentions http://colah.github.io/posts/2015-09-NN-Types-FP/ by Christopher Olah. The notion that we need language tools to structure the design of learning programs (e.g. http://www.umiacs.umd.edu/~hal/docs/daume14lts.pdf) and have tools to reason about them seems to be gaining currency. After reading these, I began to view some of the recent work of Wen, Arun, Byron, and myself (including at http://jmlr.org/proceedings/papers/v48/sun16.pdf ICML) in this light— generative RNNs “should” have a well defined hidden state whose “type” is effectively (moments of) future observations. I wonder now if there is a larger lesson here in the design of learning programs.

Nando de Freitas and colleagues approach of separating value and advantage function predictions in one network http://jmlr.org/proceedings/papers/v48/wangf16.pdf was quite interesting and had a lot of buzz.

Ian Osband gave an amazing talk on another topic that previously made me despair: exploration in RL http://jmlr.org/proceedings/papers/v48/osband16.pdf. This is one of few approaches that combines the ability to function approximation with rigorous exploration guarantees/sample complexity in the tabular case (and amazingly *better* sample complexity then previous papers that work only in the tabular case). Super cool and also very high on my reading list.

Boaz Barak http://www.boazbarak.org/ gave a truly inspired talk that mixed a kind of coherent computationally-bounded Bayesian-ism (Slogan: ”Compute like a frequentist, think like a Bayesian.”) with demonstrating a lower bound for SoS procedures. Well outside of my expertise, but delivered in a way that made you feel like you understood all of it.

Honglak Lee gave an exciting talk on the benefits of semi-supervision in CNNs http://web.eecs.umich.edu/~honglak/icml2016-CNNdec.pdf. The authors demonstrated that a remarkable amount of information needed to reproduce an input image was preserved quite deep in CNNs, and further that encouraging the ability to reconstruct could significantly enhance discriminative performance on real benchmarks.

The problem with this ICML is that I think it would take literally weeks of reading/watching talks to really absorb the high quality work that was presented. I’m *very* grateful to the organizing committee http://icml.cc/2016/?page_id=39 for making it so valuable.

Thanks for the shout-out to my SPEN work. Since the ICML deadline, I switched to a new learning method. Namely, I use the ‘Back-Optimization’ method described in Justin Domke’s “Generic Methods for Optimization-Based Modeling.” Like implicit differentiation, this can be performed using repeated calls to a black box for a Hessian-vector product, which in turn can be approximated using finite differences. Unlike implicit differentiation, this method does not assume that the optimization problem used to form predictions was solved to near optimality. Instead, it directly back-propagates through the process of doing gradient-based prediction for a fixed number of iterations. An implementation of this technique for learning SPENs is provided here: github.com/davidBelanger/SPEN.

Hi, new to the blog! I was there at ICML, my first time and the whole exprience was amazing. Ian Osband’s talk was enlightening.