The Decision Service is a first-in-the-world project making tractable reinforcement learning easily used by developers everywhere. We are hiring for devel opers, data scientist, and a product manager. Please consider joining us to do something interesting this life 🙂
I went to the European Workshop on Reinforcement Learning and NIPS last month and saw several interesting things.
At EWRL, I particularly liked the talks from:
- Remi Munos on off-policy evaluation
- Mohammad Ghavamzadeh on learning safe policies
- Emma Brunskill on optimizing biased-but safe estimators (sense a theme?)
- Sergey Levine on low sample complexity applications of RL in robotics.
My talk is here. Overall, this was a well organized workshop with diverse and interesting subjects, with the only caveat being that they had to limit registration 🙂
At NIPS itself, I found the poster sessions fairly interesting.
- Allen-Zhu and Hazan had a new notion of a reduction (video).
- Zhao, Poupart, and Gordon had a new way to learn Sum-Product Networks
- Ho, Littman, MacGlashan, Cushman, and Austerwell, had a paper on how “Showing” is different from “Doing”.
- Toulis and Parkes had a paper on estimation of long term causal effects.
- Rae, Hunt, Danihelka, Harley, Senior, Wayne, Graves, and Lillicrap had a paper on large memories with neural networks.
- Hardt, Price, and Srebro, had a paper on Equal Opportunity in ML.
Format-wise, I thought the 2 sessions was better than 1, but I really would have preferred more. The recorded spotlights are also pretty cool.
The NIPS workshops were great, although I was somewhat reminded of kindergarten soccer in terms of lopsided attendance. This may be inevitable given how hot the field is, but I think it’s important for individual researchers to remember that:
- There are many important directions of research.
- You personally have a much higher chance of doing something interesting if everyone else is not doing it also.
During the workshops, I learned about ADAM (a momentum form of Adagrad), testing ML systems, and that even TenserFlow is finally looking into synchronous updates for parallel learning (allreduce is the way).
(edit: added one)
I just released Vowpal Wabbit 8.3 and we are planning a tutorial at NIPS Saturday over the lunch break in the ML systems workshop. Please join us if interested.
8.3 should be backwards compatible with all 8.x series. There have been big changes since the last version related to
- Contextual bandits, particularly w.r.t. the decision service.
- Learning to search for which we have a paper at NIPS.
- Logarithmic time multiclass classification.
The ICML 2016 videos are out.
I also wanted to share some statistics from registration that might be of general interest.
The total number of people attending: 3103.
Industry: 47% University: 46%
Male: 83% Female: 14%
Local (NY, NJ, or CT): 27%
North America: 70% Europe: 18% Asia: 9% Middle East: 2% Remainder: <1% including 2 from Antarctica 🙂
I had a fantastic time at ICML 2016— I learned a great deal. There was far more good stuff than I could see, and it was exciting to catch up on recent advances.
David Silver gave one of the best tutorials I’ve seen on his group’s recent work in “deep” reinforcement learning. I learned about a few new techniques, including the benefits of asychrononous updates in distributed Q-learning https://arxiv.org/abs/1602.01783
, which was presented in more detail at the main conference. The new domains being explored were exciting, as were the improvements made on the computational side. I would love to seen more pointers to some of the related work from the tutorial, particularly given there was such an exciting mix of new techniques and old staples (e.g. experience replay http://www.dtic.mil/dtic/tr/fulltext/u2/a261434.pdf
), but the talk was so information packed it would have been difficult.
It was rumored that Aviv Tamar gave an exciting talk (I believe on this http://arxiv.org/abs/1602.02867
) , but I was forced to miss it to see Rong Ge’s https://users.cs.duke.edu/~rongge/
outstanding talk on a new-ish geometric tool for understanding non-convex optimization, the strict saddle.
I first read about the approach here http://arxiv.org/abs/1503.02101
, but at ICML he and other authors have demonstrated a remarkable number of problems that have this property that enables efficient optimization via an stochastic gradient descent (and other) procedures.
This was a theme of ICML— an incredible amount of good material, so much that I barely saw the posters at all because there was nearly always a talk I wanted to see!
Rocky Duan surveyed some benchmark RL continuous control problems http://jmlr.org/proceedings/papers/v48/duan16.pdf
An interesting theme of the conference— and came up in conversation with John Schulman and Yann LeCun– was really old methods working well. In fact, this group demonstrated that variants of the natural/covariant policy gradient proposed originally by Sham Kakade (with a derivation here: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1080&context=robotics
) are largely at the state-of-the-art on many benchmark problems. There are some clever tricks necessary for large policy classes like neural networks (like using a partial-least squares-style truncated conjugate gradient to solve for the change in policy in the usual F \delta = \nabla one solves in the natural gradient procedure) that dramatically improve performance (https://arxiv.org/abs/1502.05477
). I had begun to view these methods as doing little better (or worse) then black-box search, so it’s exciting to see them make a comeback.
Chelsea Finn http://people.eecs.berkeley.edu/~cbfinn/
gave an outstanding talk on this work https://arxiv.org/abs/1603.00448
. She and co-authors (Sergey Levine and Pieter) effectively came up with a technique that lets one apply Maximum Entropy Inverse Optimal Control without the double-loop procedure and using policy gradient techniques. Jonathan Ho described a related algorithm http://jmlr.org/proceedings/papers/v48/ho16.pdf
that also appeared to mix policy gradient and an optimization over cost functions. Both are definitely on my reading list, and I want to understand the trade-offs of the techniques.
Both presentations were informative, and both made the interesting connection to Generative Adversarial Nets (GANS) http://arxiv.org/abs/1406.2661
. These were also a theme of the conference in both talks and during discussions. A very cool idea getting more traction, and being embraced by the neural net pioneers.
David Belanger https://people.cs.umass.edu/~belanger/belanger_spen_icml.pdf
gave a interesting talk on using backprop to optimize a structured output relative to a a learned cost function. I left thinking the technique was closely related to inverse optimal control methods and the GANs, and wanting understand how implicit differentiation wasn’t being used to optimize the energy function parameters.
Speaking of neural net pioneers— there was lots of good talks during both the main conference and workshops on what’s new — and what’s old https://sites.google.com/site/nnb2tf/
— in neural network architectures and algorithms.
Ian Osband gave an amazing talk on another topic that previously made me despair: exploration in RL http://jmlr.org/proceedings/papers/v48/osband16.pdf
. This is one of few approaches that combines the ability to function approximation with rigorous exploration guarantees/sample complexity in the tabular case (and amazingly *better* sample complexity then previous papers that work only in the tabular case). Super cool and also very high on my reading list.
Boaz Barak http://www.boazbarak.org/
gave a truly inspired talk that mixed a kind of coherent computationally-bounded Bayesian-ism (Slogan: ”Compute like a frequentist, think like a Bayesian.”) with demonstrating a lower bound for SoS procedures. Well outside of my expertise, but delivered in a way that made you feel like you understood all of it.
Honglak Lee gave an exciting talk on the benefits of semi-supervision in CNNs http://web.eecs.umich.edu/~honglak/icml2016-CNNdec.pdf
. The authors demonstrated that a remarkable amount of information needed to reproduce an input image was preserved quite deep in CNNs, and further that encouraging the ability to reconstruct could significantly enhance discriminative performance on real benchmarks.
The problem with this ICML is that I think it would take literally weeks of reading/watching talks to really absorb the high quality work that was presented. I’m *very* grateful to the organizing committee http://icml.cc/2016/?page_id=39
for making it so valuable.