What Learning Theory might do

I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides.

  1. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that:
    1. Explains the basic intuitions.
    2. Makes new testable predictions about how to learn.
    3. Succeeds in so learning.

    This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems.

  2. Where next for learning theory? I like the analogy with physics. Back before we-the-humans knew much, people would experiment occasionally and learn to design new things by slow evolution. At some point the physics model arose: you try to build mathematical models of what is happening and then make predictions based on the models. This was wildly succesful for physics. For machine learning, it has only been moderately succesful. We have some formalisms which are of some use in addressing novel learning problems, but the overall process of doing machine learning is not very close to “automatic”. The good news is that over the last 20 years a much richer set of positive examples of succesful applied machine learning has developed. Thus, there are many good intuitions from which we can hope to generalize. In the physics analogy, the year is (perhaps) 1900. Here are a few specific issues:
    1. What is the “right” mathematical model of learning? (in analogy, What is the “right” mathematical model of physical phenomena?”) The models we currently use have their compelling points but typically fail to capture all of the relevant details. This is a very hard question to address, but it should be actively considered and any progress may be very helpful. Examples of this include:
      1. What is the “right” model of active learning? We know almost nothing except there is great potential.
      2. What is the “right” model of Reinforcement learning? Again, we know very little in comparison to what we want to know—a fully automatic general RL solver.

      The notion of “right” here is partially theoretical (can we get derive efficient algorithms?) and partially empirical (do they actually work?).

    2. How do we refine the empirical observations and intuitions of applied learning?
      1. How should we think about “prior”? The Bayesian answer seems unconvincing. At a minimum, information used to create a Bayesian prior often does not come in the form of a Bayesian prior, and so some translation system must be developed.
      2. How can we develop big learning systems that solve big problems? Some form of structure seems necessary, but the right form is still unclear. What theory governs the design of such systems?
    3. How do we take existing theoretical insights and translate them into practical algorithms?
      1. The method of linear projection into spaces has been studied theoretically. Is it useful empirically?
      2. The online learning setting seems theoretically compelling and, at least sometimes, empirically validated. What concerns remain to be addressed to make this a useful technology?

We should keep in mind that there is a real chance the limits of machine learning are lower bounded by human learning. Getting from here to there of course will require a bit of work, some of which might be greatly aided by mathematical consideration.

“Sister Conference” presentations

Some of the “sister conference” presentations at AAAI have been great. Roughly speaking, the conference organizers asked other conference organizers to come give a summary of their conference. Many different AI-related conferences accepted. The presenters typically discuss some of the background and goals of the conference then mention the results from a few papers they liked. This is great because it provides a mechanism to get a digested overview of the work of several thousand researchers—something which is simply available nowhere else.

Based on these presentations, it looks like there is a significant component of (and opportunity for) applied machine learning in AIIDE, IUI, and ACL.

There was also some discussion of having a super-colocation event similar to FCRC, but centered on AI & Learning. This seems like a fine idea. The field is fractured across so many different conferences that the mixing of a supercolocation seems likely helpful for research.

Thinking the Unthought

One thing common to much research is that the researcher must be the first person ever to have some thought. How do you think of something that has never been thought of? There seems to be no methodical manner of doing this, but there are some tricks.

  1. The easiest method is to just have some connection come to you. There is a trick here however: you should write it down and fill out the idea immediately because it can just as easily go away.
  2. A harder method is to set aside a block of time and simply think about an idea. Distraction elimination is essential here because thinking about the unthought is hard work which your mind will avoid.
  3. Another common method is in conversation. Sometimes the process of verbalizing implies new ideas come up and sometimes whoever you are talking to replies just the right way. This method is dangerous though—you must speak to someone who helps you think rather than someone who occupies your thoughts.
  4. Try to rephrase the problem so the answer is simple. This is one aspect of giving up. Failing fast is better than failing slow.

There are also general ‘context development’ techniques which are not specifically helpful for your problem, but which are generally helpful for related problems.

  1. Understand the multiple motivations for working on some topic, when they exist.
  2. Question the “rightness” of every related thing. This is fundamental to finding good judgement in what you work on.
  3. Let a little bit of chaos into your life. Once in awhile, attend a random conference, talk to people who you would not otherwise talk to, etc…

The Limits of Learning Theory

Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning?

The answer is “no”. This answer is both obvious and sometimes underappreciated.

There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1. Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must:

  1. Have a bias. Always predicting 0 is as likely as 1 is useless.
  2. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful.

The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined with respect to the solutions to problems encountered in the real world. The same effect occurs in various sciences such as physics—a mathematician can not solve physics because the “right” answer is defined by the world.

A similar question is “Can an entirely empirical approach solve machine learning?”. The answer to this is “yes”, as long as we accept the evolution of humans and that a “solution” to machine learning is human-level learning ability.

A related question is then “Is mathematics useful in solving machine learning?” I believe the answer is “yes”. Although mathematics can not tell us what the “right” bias is, it can:

  1. Give us computational shortcuts relevant to machine learning.
  2. Abstract empirical observations of what an empirically good bias is allowing transference to new domains.

There is a reasonable hope that solving mathematics related to learning implies we can reach a good machine learning system in time shorter than the evolution of a human.

All of these observations imply that the process of solving machine learning must be partially empirical. (What works on real problems?) Anyone hoping to do so must either engage in real-world experiments or listen carefully to people who engage in real-world experiments. A reasonable model here is physics which has benefited from a combined mathematical and empirical study.