ICML 2021 Invited Speakers — ML for Science

By: Stefanie Jegelka and Ameet Talwalkar (ICML21 Communication Chairs)

With ICML 2021 underway, we wanted to briefly highlight the upcoming invited talks. A general theme of the invited talks this year is “machine learning for science.” The Program Chairs (Marina Meila and Tong Zhang) have invited world-renowned scientists from various disciplines to discuss their problems and the corresponding machine learning challenges. By exposing the machine learning community to these fascinating problems, we hope that we can help to further expand the applicability of machine learning to a wide range of scientific domains. 

  • Daphne Koller (Tuesday, July 20th at 8am PDT): Dr. Koller is a pioneer in the field of machine learning, and is currently the Founder and CEO of Insitro, which leverages machine learning for drug discovery. She was the Rajeev Motwani Professor of Computer Science at Stanford University, where she served on the faculty for 18 years. She was the co-founder, co-CEO and President of Coursera, and the Chief Computing Officer of Calico, an Alphabet company in the healthcare space. She received the MacArthur Foundation Fellowship in 2004, was awarded the ACM Prize in Computing in 2008, and was recognized as one of TIME Magazine’s 100 most influential people in 2012.
  • Xiao Cunde and Dahe Qin (Tuesday, July 20th at 8pm PDT): Dr. Cunde is a glaciologist and Deputy Director of the Institute of the Climate System, Chinese Academy of Meteorological Sciences. He has worked in the fields of polar glaciology and meteorology since 1997. His major research focus has been ice core studies relating to paleo-climate and paleo-environment, and present day cold region meteorological and glaciological processes that impact environmental and climatic changes. Dr. Qin is the Former Director of the China Meteorological Administration. He is a glaciologist and the first Chinese ever to cross the South Pole. He was a member of the 1989 International Cross South Pole Expedition and has published numerous ground-breaking articles, using evidence gathered from his Antarctic expeditions.
  • Esther Duflo (Wednesday, July 21st at 8am PDT): Dr. Duflo is the Abdul Latif Jameel Professor of Poverty Alleviation and Development Economics in the Department of Economics at MIT and a co-founder and co-director of the Abdul Latif Jameel Poverty Action Lab (J-PAL). In her research, she seeks to understand the economic lives of the poor, with the aim to help design and evaluate social policies. She has worked on health, education, financial inclusion, environment and governance. In 2019, she received a Nobel Prize in Economic Sciences “for their experimental approach to alleviating global poverty”. In particular, she and co-authors have introduced a new approach to obtaining reliable answers about the best ways to fight global poverty.
  • Edward Chang (Wednesday, July 21st at 8pm PDT): Dr. Chang is a Professor in the Department of Neurological Surgery at the UCSF Weill Institute for Neurosciences. He is a neurosurgeon and uses machine learning to understand brain functions. His research focuses on the brain mechanisms for speech, movement and human emotion. He co-directs the Center for Neural Engineering and Prostheses, a collaborative enterprise of UCSF and UC Berkeley. The center brings together experts in engineering, neurology and neurosurgery to develop state-of-the-art biomedical technology to restore function for patients with neurological disabilities such as paralysis and speech disorders.
  • Cecilia Clementi (Thursday, July 22nd at 8am PDT):  Dr. Clementi is a Professor of Chemistry, and Chemical and Biomolecular Engineering, and Senior Scientist in the Center for Theoretical Biological Physics at Rice University, and an Einstein Fellow at FU Berlin. She researches strategies to study complex biophysical processes on long timescales, and she is an expert in the simulation of biomolecules using large-scale ML. Her group designs multiscale models, adaptive sampling approaches, and data analysis tools, and uses both data-driven methods and theoretical formulations.

To register for the conference and check out these talks, please visit: https://icml.cc/.

ALT Highlights – An Interview with Joelle Pineau

Welcome to ALT Highlights, a series of blog posts spotlighting various happenings at the recent conference ALT 2021, including plenary talks, tutorials, trends in learning theory, and more! To reach a broad audience, the series will be disseminated as guest posts on different blogs in machine learning and theoretical computer science. John has been kind enough to host the first post in the series. This initiative is organized by the Learning Theory Alliance, and overseen by Gautam Kamath. All posts in ALT Highlights are indexed on the official Learning Theory Alliance blog.

The first post is an interview with Joelle Pineau, by Michal Moshkovitz and Keziah Naggita.

We would like you to meet Dr. Joelle Pineau, an astounding leader in AI, based in Montreal, Canada.

Name: Joelle Pineau

Institutions: Joelle Pineau is a faculty member at Mila and an Associate Professor and William Dawson Scholar at the School of Computer Science at McGill University, where she co-directs the Reasoning and Learning Lab. She is a senior fellow of the Canadian Institute for Advanced Research (CIFAR), a co-managing director of Facebook AI Research, and the Montreal, Canada lab director. Learn more information about  Joelle here and her talk here.

Reinforcement Learning (RL)

How and why did you choose to work in reinforcement learning?   What are the things that inspired you to choose health as a domain of application for your RL work?

I started working in reinforcement learning at the beginning of my PhD  in robotics at CMU.  Quite honestly, I was delighted by the elegance of  the mathematical formulation.  It also had some link to topics I studied previously (in supervised learning & in operations search).   It was also useful for decision-making, which was complementary to state tracking & prediction, which was the topic studied by many other members of my lab at the time.

I started working on applications to health-care early in my career as a faculty at McGill.  I was curious to explore practical applications, and found some colleagues in health-care who had some interesting decision-making problems with the right characteristics.

How would you recommend a newcomer enter the RL field?  For RL researchers interested in safety, is there some literature you can recommend as a starting point?

Get familiar with the basic mathematical formalism & algorithm, try your hand at easy simulation cases.  For RL and safety, the literature is very small and quite recent, so it’s easy enough to get started.  Work on Constrained MDPs (Altman, 1999) is a good starting point.  See also the work on Seldonian RL, by Phil Tomas and colleagues.

In your talk you mentioned applications of RL to different domains. What do you think is the main achievement of RL? 

The AlphaGo result was very impressive!  Recently, the work on using RL to control the flight of the Loon balloons is also quite impressive.

What are the big open problems in RL? 

Efficient exploration continues to be a major challenge.  Stability of learning, even when the data is non-stationary (e.g. due to policy change), is also very important to address.  In my talk I also highlighted the importance of development methods for RL with responsible properties (safety, security, transparency, etc.) as a major open problem.


Based on your work in neurostimulation, it appears that people from different fields of expertise were involved. 

Yes, this was a close collaboration between researchers in CS (my own lab) and researchers in neuroscience, with expertise in electrophysiology.

What advice would you give researchers in finding interdisciplinary collaborators?

This collaboration was literally started by me picking up the phone and calling a colleague in neuroscience to propose the project.  I then wrote a grant proposal and obtained funding to start the project.  More generally, these days it’s actually very easy for researchers in machine learning to find interdisciplinary collaborators.  Giving talks, offering office hours, speaking to colleagues you meet in random events – I’ve had literally dozens of projects proposed to me in the last few years, from all sorts of disciplines.

What are some of the best ways to foster successful collaborations tackling work cutting across multiple disciplines?

Spend time understanding the problems from the point of view of your collaborator, and commit to solving *that* problem.  Don’t walk in with  your own hammer (or pre-selected set of techniques), and expect to find a problem to show-off your techniques. Genuine curiosity about the other field is very valuable!  Don’t hesitate to read the literature – don’t expect your collaborator to share all the needed knowledge.  Co-supervising a student together is also often an effective way of working closely together.

Academia, industry and everything in between 

During the talk, you mentioned variance in freedom of research for theoreticians in industry versus academia. Could you elaborate more about this? Are there certain personality traits or characteristics more likely to make someone more successful in academia versus industry?

For certain more theoretical work, it can be a long time until the impact and value of the work is realized.  This is perhaps harder to support in industry, which is better suited to appreciated shorter-term impact.  Another big difference is that in Academia, professors work closely with students and junior researchers, and should expect to dedicate a good amount of time and energy to training & developing them (even if it means the work might move along a bit slower).  In industry, a researcher will most often work with more senior researchers, and the project is likely to move along faster (also because no one is taking or teaching courses).

How do you balance leadership, for example, at FAIR, with students advising like at McGill, research [CIFAIR, FAIR, McGill, Mila], and personal life? 

It’s useful to have clarity about your priorities.  Don’t let other people dictate what these are – you should decide for yourself.  And then spend your time according to this.  I enjoy my work at FAIR a lot, I also really enjoy spending time with my grad students at McGill/Mila, and of course I really enjoy time with my family & friends.  So I try to keep a good balance between all of this. I also try to be clear & transparent with other people about my availability & priorities, so they can plan accordingly.

What do you think distinguishes the mindset of an extraordinary researcher?

To be a strong researcher, it helps to be very curious, genuinely want to understand and find out new knowledge. The ability to find new connections between ideas, concepts, is also useful.  For scientific research, you also need discipline and good methodology, and a commitment to deep understanding (rather than “proving” whatever hypothesis you hold).   Frankly, I also don’t think we need to further cultivate the myth of the “extraordinary researcher”.  Research is primarily a collective institution, where many people contribute, in ways small and big, and it is through this collective work that we achieve big discoveries and breakthroughs!

HOMER: Provable Exploration in Reinforcement Learning

Last week at ICML 2020, Mikael HenaffAkshay KrishnamurthyJohn Langford and I had a paper on a new reinforcement learning (RL) algorithm that solves three key problems in RL: (i) global exploration, (ii) decoding latent dynamics, and (iii) optimizing a given reward function. Our ICML poster is here.

The paper is a bit mathematically heavy in nature so this post is an attempt to distill the key findings. We will also be following up soon with a new codebase release (more on it later).

Rich-observation RL landscape

Consider the combination lock problem shown below. The agent starts in the state s1a or s1b with equal probability. After taking h-1 actions, the agent will be in either state sha, shb, or shc. The agent can take 10 different actions. The agent observes a high-dimensional observation (focus circle) instead of the underlying state which is latent. There is a big treasure chest that one can get after taking 100 actions. We view the states with subscript “a” or “b” as “good states” and one with subscript “c” as “bad states”. You can reach the treasure chest at the end only if you remain in good states. If you reach any bad state, then you can never make it to the treasure chest.

The environment makes it difficult to reach the big treasure chest in three ways. First, the environmental dynamics are such that if you are in good states, then only 1 out of 10 possible actions will let you reach the two good states at the next time step with equal probability (the good action changes from state to state). Every other action in good states and all actions in bad states put you into bad states at the next time step, from which it is impossible to recover. Second, it misleads myopic agents by giving a small bonus for transitioning from a good state to a bad state (small treasure chest). This means that a locally optimal policy is transitions to one of the bad states as quickly as possible. Third, the agent never directly observes which state it is in. Instead, it receives a high-dimensional, noisy observation from which it must decode the true underlying state.

It is easy to see that if we take actions uniformly at random, then the probability of reaching the big treasure chest at the end is 1/10100. The number 10100 is called Googol and is larger than the current estimate of number of elementary particles in the universe. Furthermore, since transitions are stochastic one can show that no fixed sequence of actions performs well either.

A key aspect of the rich-observation setting is that the agent receives observations instead of latent state. The observations are stochastically sampled from an infinitely large space conditioned on the state. However, observations are rich-enough to enable decoding the latent state which generates them.

What does provable RL mean?

A provable RL algorithm means that for any given numbers ed in (0, 1); we can learn an e-optimal policy with probability at least 1-d using a number of episodes which are polynomial in relevant quantities (state size, horizon, action space, 1/e, 1/d, etc.). By e-optimal policy we mean a policy whose value (expected total return) is at most e less than the optimal return.

Thus, a provable RL algorithm is capable of learning a close to optimal policy with high probability (where the word high and close can be made arbitrarily more refined), provided the assumptions it makes are satisfied.

Why should I care if my algorithm is provable?

There are two main advantages of being able to show your algorithm is provable:

  1. We can only test an algorithm on a finite number of environments (in practice somewhere between 1 and 20). Without guarantees, we don’t know how they will behave in a new environment. This matters especially if failure in a new environment can result in high real-world costs (e.g., in health or financial domains).
  2. If a provable algorithm fails to consistently give the desired result, this can be attributed to failure of at least one of its assumptions. A developer can then look at the assumptions and try to determine which ones are violated, and either intervene to fix them or determine that the algorithm is not appropriate for the problem.


Our algorithm addresses what is known as the Block MDP setting. In this setting, a small number of discrete states generates a potentially infinite number of high dimensional observations.

For each time step, HOMER learns a state decoder function, and a set of exploration policies. The state decoder maps high-dimensional observations to a small set of possible latent states, while the exploration policies map observations to actions which will lead the agent to each of the latent states. We describe HOMER below.

  • For a given time step, we first learn a decoder for mapping observations to a small set of values using contrastive learning. This procedure works as follows: collect a transition by following a randomly sampled exploration policy from the previous time step until that time step, and then taking a single random action. We use this procedure to sample two transitions shown below.
  • We then flip a coin; if we get heads then we store the transition (x1, a1, x’1), and otherwise we store the imposter transition (x1, a1, x’2). We train a supervised classifier to predict if a given transition (x, a, x’) is real or not.
    This classifier has a special structure which allows us to recover a decoder for time step h.
  • Once we have learned the state decoder, we will learn an exploration policy for every possible value of the decoder (which we call abstract state as they are related to the latent state space). This step is standard can be done using many different approaches such as model-based planning, model-free methods, etc. In the paper we use an existing model-free algorithm called policy search by dynamic programming (PSDP) by Bagnell et al. 2004.
  • We recovered a decoder and a set of exploration policy for this time step. We then keep doing it for every time step and learn a decoder and exploration policy for the whole latent state space. Finally, we can easily optimize any given reward function using any provable planner like PSDP or a model-based algorithm. (The algorithm actually recovers the latent state space up to an inherent ambiguity by combining two different decoders; but I’ll leave that to avoid overloading this post).

Key findings

HOMER achieves the following three properties:

  1. The contrastive learning procedure gives us the right state decoding (we recover up to some inherent ambiguity but I won’t cover it here).
  2. HOMER can learn a set of exploration policies to reach every latent state
  3. HOMER can learn a nearly-optimal policy for any given reward function with high probability. Further, this can be done after exploration part has been performed.

Failure cases of prior RL algorithms

There are many RL algorithms in the literature and many new are proposed every month. It is difficult to do justice to this vast literature in a blog post. It is equally difficult to situate HOMER in this vast literature. However, we show that several very commonly used RL algorithms fail to solve the above problem while HOMER succeeds. One of these is the PPO algorithm, a widely used policy gradient algorithm. In spite of its popular use, PPO is not designed for challenging exploration problems and easily fails. Researchers have made efforts to alleviate this with ad-hoc proposals such as using prediction errors, counts based on auto-encoders, etc. The best alternative approach we found is called Random Network Distillation(RND) which measures novelty of a state based on prediction errors for a fixed randomly initialized network.

Below we show how PPO+RND fails to solve the above problem while HOMER succeeds. We simplify the problem by using a grid pattern where rows represent the state (the top two represents “good” states and bottom row represents “bad” states), and column represents timestep.

We present counter-examples for other algorithms in the paper (see Section 6 here). These counterexamples allow us to find limits of prior work without expensive empirical computation on many domains.

How can I use with HOMER?

We will be providing the code soon as part of a new package release called cereb-rl. You can find it here: https://github.com/cereb-rl and join the discussion here: https://gitter.im/cereb-rl

Coronavirus and Machine Learning Conferences

I’ve been following the renamed COVID-19 epidemic closely since potential exponentials deserve that kind of attention.

The last few days have convinced me it’s a good idea to start making contingency plans for machine learning conferences like ICML. The plausible options happen to be structurally aligned with calls to enable reduced travel to machine learning conferences, but of course the need is much more immediate.

I’ll discuss relevant observations about COVID-19 and then the impact on machine learning conferences.

COVID-19 observations

  1. COVID-19 is capable of exponentiating with a base estimated at 2.13-3.11 and a doubling time around a week when unchecked.
  2. COVID-19 is far more deadly than the seasonal flu with estimates of a 2-3% fatality rate but also much milder than SARS or MERS. Indeed, part of what makes COVID-19 so significant is the fact that it is mild for many people leading to a lack of diagnosis, more spread, and ultimately more illness and death.
  3. COVID-19 can be controlled at a large scale via draconian travel restrictions. The number of new observed cases per day peaked about 2 weeks after China’s lockdown and has been declining for the last week.
  4. COVID-19 can be controlled at a small scale by careful contact tracing and isolation. There have been hundreds of cases spread across the world over the last month which have not created new uncontrolled outbreaks.
  5. New significant uncontrolled outbreaks in Italy, Iran, and South Korea have been revealed over the last few days. Some details:
    1. The 8 COVID-19 deaths in Iran suggests that the few reported cases (as of 2/23) are only the tip of the iceberg.
    2. The fact that South Korea and Italy can suddenly discover a large outbreak despite heavy news coverage suggests that it can really happen anywhere.
    3. These new outbreaks suggest that in a few days COVID-19 is likely to become a world-problem with a declining China aspect rather than a China-problem with ramifications for the rest of the world.

There remains quite a bit of uncertainty about COVID-19, of course. The plausible bet is that the known control measures remain effective when and where they can be exercised with new ones (like a vaccine) eventually reducing it to a non-problem.

The plausible scenario leaves conferences still in a delicate position because they require many things go right to function. We can easily envision 3 quite different futures here consistent with the plausible case.

  1. Good case New COVID-19 outbreaks are systematically controlled via proven measures with the overall number of daily cases declining steadily as they are right now. The impact on conferences is marginal with lingering travel restrictions affecting some (<10%) potential attendees.
  2. Poor case Multiple COVID-19 outbreaks turn into a pandemic (=multi-continent epidemic) in regions unable to effectively exercise either control measure. Outbreaks in other regions occur, but they are effectively controlled. The impact on conferences is significant with many (50%?) avoiding travel due to either restrictions or uncertainty about restrictions.
  3. Bad case The same as (2), except that an outbreak occurs in the area of the conference. This makes the conference nonviable due to travel restrictions alone. It’s notable here that Italy’s new outbreak involves travel lockdowns a few hundred miles/kilometers from Vienna where ICML 2020 is planned.

Even the first outcome could benefit from some planning while gracefully handling the last outcome requires it.

The obvious response to these plausible scenarios is to reduce the dependence of a successful conference on travel. To do this we need to think about what a conference is in terms of the roles that it fulfills. The quick breakdown I see is:

  1. Distilling knowledge. Luckily, our review process is already distributed.
  2. Passing on knowledge.
  3. Meeting people, both old friends and discovering new ones.
  4. Finding a job / employee.

How (and which) of these can be effectively supported remotely?

I’m planning to have discussions over the next few weeks about this to distill out some plans. If you have good ideas, let’s discuss. Unlike most contingency planning, it seems likely that efforts are not wasted no matter what the outcome 🙂

Code submission should be encouraged but not compulsory

ICML, ICLR, and NeurIPS are all considering or experimenting with code and data submission as a part of the reviewer or publication process with the hypothesis that it aids reproducibility of results. Reproducibility has been a rising concern with discussions in paper, workshop, and invited talk.

The fundamental driver is of course lack of reproducibility. Lack of reproducibility is an inherently serious and valid concern for any kind of publishing process where people rely on prior work to compare with and do new things. Lack of reproducibility (due to random initialization for example) was one of the things leading to a period of unpopularity for neural networks when I was a graduate student. That has proved nonviable (Surprise! Learning circuits is important!), but the reproducibility issue remains. Furthermore, there is always an opportunity and latent suspicion that authors ‘cheat’ in reporting results which could be allayed using a reproducible approach.

With the above said, I think the reproducibility proponents should understand that reproducibility is a value but not an absolute value. As an example here, I believe it’s quite worthwhile for the community to see AlphaGoZero published even if the results are not necessarily easily reproduced. There is real value for the community in showing what is possible irrespective of whether or not another game with same master of Go is possible, and there is real value in having an algorithm like this be public even if the code is not. Treating reproducibility as an absolute value could exclude results like this.

An essential understanding here is that machine learning is (at least) 3 different kinds of research.

  • Algorithms: The goal is coming up with a better algorithm for solving some category of learning problems. This is the most typical viewpoint at these conferences.
  • Theory: The goal is generally understanding what is possible or not possible for learning algorithms. Although these papers may have algorithms, they are often not the point and demanding an implementation of them is a waste of time for author, reviewer, and reader.
  • Applications: The goal is solving some particular task. AlphaGoZero is a reasonable example of this—it was about beating the world champion in Go with algorithmic development in service of that. For this kind of research perfect programmatic reproducibility may be infeasible because the computation is to extreme, the data is proprietary, etc…

Using a one-size-fits-all approach where you demand that every paper “is” a programmatically reproducible implementation is a mistake that would create a division that reduces our community. Keeping this three-fold focus fundamentally enriches the community both literally and ontologically.

Another view here is provided by considering the argument at a wider scope. Would you prefer that health regulations/treatments be based on all scientific studies including those where data is not fully released to the public (i.e almost all of them for privacy reasons)? Or would you prefer that health regulations/treatments be based only on data fully released to the public? Preferring the latter is equivalent to ignoring most scientific studies in making decisions.

The alternative to a compulsory approach is to take an additive view. The additive approach has a good track record amongst reviewing process changes.

  • When I was a graduate student, papers were not double blind. The community switched to double blind because it adds an opportunity for reviewers to review fairly and it gives authors a chance to have their work reviewed fairly whether they are junior or senior. As a community we also do not restrict posting on arxiv or talks about a paper before publication, because that would subtract from what authors can do. Double blind reviewing could be divisive, but it is not when used in this fashion.
  • When I was a graduate student, there was also a hard limit on the number of pages in submissions. For theory papers this meant that proofs were not included. We changed the review process to allow (but not require) submission of an appendix which could optionally be used by reviewers. This again adds to the options available to authors/reviewers and is generally viewed as positive by everyone involved.

What can we add to the community in terms reproducibility?

  1. Can reviewers do a better job of reviewing if they have access to the underlying code or data?
  2. Can authors benefit from releasing code?
  3. Can readers of a paper benefit from an accompanying code release?

The answer to each of these question is a clear ‘yes’ if done right.

For reviewers, it’s important to not overburden them. They may lack the computational resources, platform, or personal time to do a full reproduction of results even if that is possible. Hence, we should view code (and data) submission in the same way as an appendix which reviewers may delve into and use if they so desire.

For authors, code release has two benefits—it provides an additional avenue for convincing reviewers who default to skeptical and it makes followup work significantly more likely. My most cited paper was Isomap which did indeed come with a code release. Of course, this is not possible or beneficial for authors in many cases. Maybe it’s a theory paper where the algorithm isn’t the point? Maybe either data or code can’t be fully released since it’s proprietary? There are a variety of reasons. From this viewpoint we see that releasing code should be supported and encouraged but optional.

For readers, having code (and data) available obviously adds to the depth of value that a paper has. Not every reader will take advantage of that but some will and it enormously reduces the barrier to using a paper in many cases.

Let’s assume we do all of these additive and enabling things, which is about where Kamalika and Russ aimed the ICML policy this year.

Is there a need for go further towards compulsory code submission? I don’t yet see evidence that default skeptical reviewers aren’t capable of weighing the value of reproducibility against other values in considering whether a paper should be published.

Should we do less than the additive and enabling things? I don’t see why—the additive approach provides pure improvements to the author/review/publish process. Not everyone is able to take advantage of this, but that seems like a poor reason to restrict others from taking advantage when they can.

One last thing to note is that this year’s code submission process is an experiment. We should all want program chairs to be able to experiment, because that is how improvements happen. We should do our best to work with such experiments, try to make a real assessment of success/failure, and expect adjustments for next year.