John Langford – Page 35 – Machine Learning (Theory)

There are at least 3 summer schools related to machine learning this summer.

The first is at University of Chicago June 1-11 organized by Misha Belkin, Partha Niyogi, and Steve Smale. Registration is closed for this one, meaning they met their capacity limit. The format is essentially an extended Tutorial/Workshop. I was particularly interested to see Valiant amongst the speakers. I’m also presenting Saturday June 6, on logarithmic time prediction.
Praveen Srinivasan points out the second at Peking University in Beijing, China, July 20-27. This one differs substantially, as it is about vision, machine learning, and their intersection. The deadline for applications is June 10 or 15. This is also another example of the growth of research in China, with active support from NSF.
The third one is at Cambridge, England, August 29-September 10. It’s in the MLSS series. Compared to the Chicago one, this one is more about the Bayesian side of ML, although effort has been made to create a good cross section of topics. It’s also more focused on tutorials over workshop-style talks.

I recently had fun discussions with both Vikash Mansinghka and Thomas Breuel about approaching AI with machine learning. The general interest in taking a crack at AI with machine learning seems to be rising on many fronts including DARPA.

As a matter of history, there was a great deal of interest in AI which died down before I began research. There remain many projects and conferences spawned in this earlier AI wave, as well as a good bit of experience about what did not work, or at least did not work yet. Here are a few examples of failure modes that people seem to run into:

Supply/Product confusion. Sometimes we think “Intelligences use X, so I’ll create X and have an Intelligence.” An example of this is the Cyc Project which inspires some people as “intelligences use ontologies, so I’ll create an ontology and a system using it to have an Intelligence.” The flaw here is that Intelligences create ontologies, which they use, and without the ability to create ontologies you don’t have an Intelligence. If we are lucky, the substantial effort invested in Cyc won’t be wasted, as it has a large quantity of information stored in a plausibly useful format. If we are unlucky, it fails to even be partially useful, because the format is unnatural for the internal representations of an Intelligence.
Uncertainty second. Many of the older AI programs had no role for uncertainty. If you asked the people working on them, they might agree that uncertainty was an important but secondary concern to be solved after the main problem. Unfortunately, it seems that uncertainty is a primary concern in practice. One example of this is blocks world where a system for planning how to rearrange blocks on a table might easily fail in practice because the robot fails to grab a block properly. Many people think of uncertainty as a second order concern, because they don’t experience uncertainty in their daily lives. I believe this is incorrect—a mental illusion due to the effect that focusing attention on a specific subject implies reducing uncertainty on that subject. More generally, because any Intelligence is a small part of the world, the ability of any intelligence to perceive, understand, and manipulate the world is inherently limited, requiring the ability to deal with uncertainty. For statistics & ML people, it’s important to not breath a sigh of relief too easily, as the problem is pernicious. For example many ML techniques based around conditional independence routinely suffer from excess certainty.
Computation second. Some people try to create an intelligence without reference to efficient computation. AIXI is an extreme example of this sort. The algorithm is very difficult to deploy in practice because there were no computational constraints other than computability designed into it’s creation. It’s important to understand that computational constraints and uncertainty go together: because there are computational constraints, an intelligence is forced to deal with uncertainty since not everything which might follow at a mathematical level can be inferred in the available computational budget.
AI-Hard problems. There was a time when some people thought, “If we could just get a program that mastered chess so well it could beat the best humans, we will learn enough about AI to create an AI.” Deep Blue put that theory to rest. Current efforts on Poker and Go seem more promising, but no one believes they are “AI-Hard” for good reason. It’s not even clear that the Turing Test is a reliable indicator, because (for example) we might imagine that there is Intelligence which can not imitate a human, or that there are programs that can imitate humans well enough to fool humans without being able to achieve everything that an Intelligence could. Perhaps the best evidence is something singularity-style: AI exists when it can substantially improve it’s own abilities.
Asymptopia. In machine learning there are many theorems of the form “learning algorithm A can solve any learning problem in the limit of infinite data”. Here A might be nearest neighbors, decision trees, two-layer neural networks, support vector machines, nonparametric statistics, nonparametric Bayes, or something else. These theorem are ok, but insufficient. Often the algorithms are not computationally acceptable, and even if so, they are not sufficiently efficient with respect to the amount of experience required to learn.

Solving AI is undeniably hard, as evidenced by the amount of time spent on it, and the set of approaches which haven’t succeeded. There are a couple reasons for hope this time. The first is that there is, or soon will be sufficient computation available, unlike the last time. The second is that the machine learning approach fails well, because there are industrial uses for machine learning. Consequently, we can expect a lack of success to still see substantial use in practice. This might sound like “a good downside”, but it’s actually an upside, because it implies that incremental progress has the potential for ultimate success.

Restated at an abstract level: a hard problem can generally be decomposed in many ways into subproblems. Amongst all such decompositions, a good decomposition is one with the property that solutions to the subproblems are immediately useful. The machine learning approach to AI has this goodness property, unlike many other approaches, which partially explains why the ML approach is successful despite “failing” so far to achieve AI.

One reason why AI is hard, is that it turns out tackling general problems in the world undeniably requires a substantial number of different strategies, including learning, searching, and chunking (= constructing macros), all while respecting constraints of computation and robustness to uncertainty. Given this, a fair strategy seems to be first mastering one strategy, and then incorporating others, always checking that that incorporation properly addresses real world problems. In doing this, considering the constraint ignoring approaches as limiting cases of the real system may be helpful.

Author: John Langford

Many ways to Learn this summer

2009 ICML discussion site

CI Fellows

Server Update

Machine Learning to AI