Machine Learning (Theory)

9/20/2005

Workshop Proposal: Atomic Learning

Tags: General jl@ 5:18 pm

This is a proposal for a workshop. It may or may not happen depending on the level of interest. If you are interested, feel free to indicate so (by email or comments).

Description:
Assume(*) that any system for solving large difficult learning problems must decompose into repeated use of basic elements (i.e. atoms). There are many basic questions which remain:

  1. What are the viable basic elements?
  2. What makes a basic element viable?
  3. What are the viable principles for the composition of these basic elements?
  4. What are the viable principles for learning in such systems?
  5. What problems can this approach handle?

Hal Daume adds:

  1. Can composition of atoms be (semi-) automatically constructed[?]
  2. When atoms are constructed through reductions, is there some notion of the “naturalness” of the created leaning problems?
  3. Other than Markov fields/graphical models/Bayes nets, is there a good language for representing atoms and their compositions?

The answer to these and related questions remain unclear to me. A workshop gives us a chance to pool what we have learned from some very different approaches to tackling this same basic goal.

(*) As a general principle, it’s very difficult to conceive of any system for solving any large problem which does not decompose.

Plan Sketch:

  1. A two day workshop with unhurried presentations and discussion seems appropriate, especially given the diversity of approaches.
  2. TTI-Chicago may be able to help with costs.

The above two points suggest having a workshop on a {Friday, Saturday} or {Saturday, Sunday} at TTI-Chicago.

9/19/2005

NIPS Workshops

Tags: General jl@ 3:46 pm

Attendance at the NIPS workshops is highly recommended for both research and learning. Unfortunately, there does not yet appear to be a public list of workshops. However, I found the following workshop webpages of interest:

  1. Machine Learning in Finance
  2. Learning to Rank
  3. Foundations of Active Learning
  4. Machine Learning Based Robotics in Unstructured Environments

There are many more workshops. In fact, there are so many that it is not plausible anyone can attend every workshop they are interested in. Maybe in future years the organizers can spread them out over more days to reduce overlap.

Many of these workshops are accepting presentation proposals (due mid-October).

9/14/2005

The Predictionist Viewpoint

Tags: General jl@ 12:54 pm

Virtually every discipline of significant human endeavor has a way explaining itself as fundamental and important. In all the cases I know of, they are both right (they are vital) and wrong (they are not solely vital).

  1. Politics. This is the one that everyone is familiar with at the moment. “What could be more important than the process of making decisions?”
  2. Science and Technology. This is the one that we-the-academics are familiar with. “The loss of modern science and technology would be catastrophic.”
  3. Military. “Without the military, a nation will be invaded and destroyed.”
  4. (insert your favorite here)

Within science and technology, the same thing happens again.

  1. Mathematics. “What could be more important than a precise language for establishing truths?”
  2. Physics. “Nothing is more fundamental than the laws which govern the universe. Understanding them is the key to understanding everything else.”
  3. Biology. “Without life, we wouldn’t be here, so clearly the study of life is fundamental.”
  4. Computer Science. “Everything is a computer. Controlling computation is fundamental to controlling the world.”

This post is a “me too” for machine learning. The basic claim is that all problems can be rephrased as prediction problems. In particular, for any agent (human or machine), there are things which are sensed and the goal is make good predictions about which actions to take. Here are some examples:

  1. Soccer. Playing soccer with Peter Stone is interesting because he sometimes reacts to a pass before it is made. The ability to predict what will happen in the future is a huge edge in games.
  2. Defensive Driving is misnamed. It’s really predictive driving. You, as a driver, attempt to predict how the other cars around you can mess up, and take that into account in your own driving style.
  3. Predicting well can make you very wealthy by playing the stock market. Some companies have been formed around the idea of automated stock picking, with partial success. More generally, the idea of prediction as the essential ingredient is very common when gambling with stocks.
  4. Information markets generalize the notion of stock picking to make predictions about arbitrary facts.

Prediction problems are prevalent throughout our lives so studying the problems and their solution, which is a core goal of machine learning, is essential. From the predictionist viewpoint, it is not about what you know, what you can prove or infer, who your friends are, or how much wealth you have. Instead, it’s about how well you can predict (and act on predictions of) the future.

9/10/2005

“Failure” is an option

Tags: General jl@ 1:14 pm

This is about the hard choices that graduate students must make.

The cultural definition of success in academic research is to:

  1. Produce good research which many other people appreciate.
  2. Produce many students who go on to do the same.

There are fundamental reasons why this is success in the local culture. Good research appreciated by others means access to jobs. Many students succesful in the same way implies that there are a number of people who think in a similar way and appreciate your work.

In order to graduate, a phd student must live in an academic culture for a period of several years. It is common to adopt the culture’s definition of success during this time. It’s also common for many phd students discover they are not suited to an academic research lifestyle. This collision of values and abilities naturally results in depression.

The most fundamental advice when this happens is: change something. Pick a new advisor. Pick a new research topic. Or leave the program (and do something else with your life).

The first two are relatively easy, but “Do something else with your life” is a hard choice for a phd student to make because they are immersed in and adopt a value system that does not value that choice. Remember here that the academic value system is not a universal value system. For example, many people want to do something that is immediately constructive and find this at odds with academic research (which is almost defined by “not immediate”). The world is big enough and diverse enough to support multiple value systems. Realizing this may be the key to making very good decisions in your life. A number of my friends made this decision and went to google or investment banking places where they are deliriously happier (and more productive) than in their former lives.

9/6/2005

A link

Tags: General jl@ 2:48 pm

I read through some of the essays of Michael Nielsen today, and recommend them. Principles of Effective Research and Extreme Thinking are both relevant to several discussions here.

9/5/2005

Site Update

Tags: General jl@ 9:48 pm

I tweaked the site in a number of ways today, including:

  1. Updating to WordPress 1.5.
  2. Installing and heavily tweaking the Geekniche theme. Update: I switched back to a tweaked version of the old theme.
  3. Adding the Customizable Post Listings plugin.
  4. Installing the StatTraq plugin.
  5. Updating some of the links. I particularly recommend looking at the computer research policy blog.
  6. Adding threaded comments. This doesn’t thread old comments obviously, but the extra structure may be helpful for new ones.

Overall, I think this is an improvement, and it addresses a few of my earlier problems. If you have any difficulties or anything seems “not quite right”, please speak up. A few other tweaks to the site may happen in the near future.

9/4/2005

Science in the Government

Tags: General jl@ 9:18 am

I found the article on “Political Science” at the New York Times interesting. Essentially the article is about allegations that the US government has been systematically distorting scientific views. With a petition by some 7000+ scientists alleging such behavior this is clearly a significant concern.

One thing not mentioned explicitly in this discussion is that there are fundamental cultural differences between academic research and the rest of the world. In academic research, careful, clear thought is valued. This value is achieved by both formal and informal mechanisms. One example of a formal mechanism is peer review.

In contrast, in the land of politics, the basic value is agreement. It is only with some amount of agreement that a new law can be passed or other actions can be taken. Since Science (with a capitol ‘S’) has accomplished many things, it can be a significant tool in persuading people. This makes it compelling for a politician to use science as a mechanism for pushing agreement on their viewpoint.

Most scientists would not mind if their research is used in a public debate. The difficulty arises when the use of science is not representative of the beliefs of scientists. This can happen in many ways. For example, agreement is uncommon in research which implies that it is almost always possible, by carefully picking and choosing, to find one scientist who supports almost any viewpoint.

Such misrepresentations of scientific beliefs about the world violate the fundamental value of “careful, clear thought”, so they are regarded as fundamentally dangerous to the process of research. Naturally, fundamentally dangerous things are sensitive issues which can easily lead to large petitions.

This combination of mismatched values is what appears to be happening. It is less clear what should be done about it.

One response has been (as the article title suggests) politicization of science and scientists. For example the Union of Concerned Scientists (which organized the petition) has a viewpoint and is pushing it. As another example, anecdotal evidence suggests a strong majority of scientists in the US voted against Bush in the last presidential election.

I would prefer a different approach, which is essentially a separation of responsibilities. Given a sufficient separation of powers, scientists should be the most reliable source for describing and predicting the outcomes of some courses of action and the impact of new technologies. What is done with such information is up to the rest of the world. This style of “sharply defined well-separated powers” has worked fairly well elsewhere. Supreme court judges (who specialize in interpretation of law) are, by design, relatively unaffectable by the rest of politics. A newer example is the federal reserve board who have been relatively unaffected by changes in politics, even though it is easy to imagine their powers could dramatically effect election outcomes. This last example is a matter of custom rather than constitutional law.

Neither of the above examples are perfect—the separation of powers has failed on multiple occasions. Nevertheless, it seems to be a useful ideal.

8/23/2005

(Dis)similarities between academia and open source programmers

Tags: General jl@ 2:14 am

Martin Pool and I recently discussed the similarities and differences between academia and open source programming.

Similarities:

  1. Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.)
  2. Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected.
  3. Problems Both academia and open source programming suffer from similar problems.
    1. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research.
    2. Funding is often a problem for both groups. Academics often invest many hours in writing grants while open source programmers simply often are not paid.
  4. Both groups of people work in a mixed competitive/collaborative environment.
  5. Both groups use conferences as a significant mechanism of communication.

Given the similarities, it is not too surprising that there is significant cooperation between academia and open source programming, and it is relatively common to crossover from one to the other.

The differences are perhaps more interesting to examine because they may point out where one group can learn from the other.

  1. A few open source projects have achieved significantly larger scales than academia as far as coordination amongst many people over a long time. Big project examples include linux, apache, and mozilla. Groups of people of this scale in academia are typically things like “the ICML community”, or “people working on Bayesian learning”, which are significantly less tightly coupled than any of the above projects. This suggests it may be possible to achieve significantly larger close collaborations in academia.
  2. Academia has managed to secure significantly more funding than open source programmers. Funding typically comes from a mixture of student tuition and government grants. Part of the reason for better funding in academia is that it has been around longer and so been able to accomplish more. Perhaps governments will start funding open source programming more seriously if they produce an equivalent (with respect to societal impact) of the atom bomb.
  3. Academia has a relatively standard career path: grade school education, undergraduate education, graduate education, then apply for a job as a professor at a university. In contrast the closest thing to a career path for open source programmers is something like “do a bunch of open source projects and become so wildly succesful that some company hires you to do the same thing”. This is a difficult path but perhaps it is slowly becoming easier and there is still much room for improvement.
  4. Open source programmers take significantly more advantage of modern tools for communication. As an example of this, Martin mentioned that perhaps half the people working on Ubuntu have blogs. In academia, they are still a rarity.
  5. Open source programmers have considerably more freedom of location. Academic research is almost always tied to a particular university or lab, while many people who work on open source projects can choose to live esssentially anywhere with reasonable internet access.

8/22/2005

Do you believe in induction?

Tags: General jl@ 1:55 am

Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing.

As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings.

The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to think about it. In the real world, we do not care about the expectation over all possible sequences, but perhaps instead about some (weighted) expectation over the set of problems we actually encounter. It is enitrely possible that we can form a prediction algorithm with good performance over this set of problems.

This is one of the fundamental reasons why experiments are done in machine learning. If we want to access the set of problems we actually encounter, we must do this empirically. Although we must work with the world to understand what a good general-purpose learning algorithm is, quantifying how good the algorithm is may be difficult. In particular, performing well on the last 100 encountered learning problems may say nothing about performing well on the next encountered learning problem.

This is where induction comes in. It has been noted by Hume that there is no mathematical proof that the sun will rise tomorrow which does not rely on unverifiable assumptions about the world. Nevertheless, the belief in sunrise tomorrow is essentially universal. A good general purpose learning algorithm is similar to ‘sunrise’: we can’t prove that we will succeed on the next learning problem encountered, but nevertheless we might believe it for inductive reasons. And we might be right.

8/8/2005

Apprenticeship Reinforcement Learning for Control

Tags: General jl@ 9:34 am

Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning. The basic idea of this algorithm is:

  1. Collect data from a human controlling a machine.
  2. Build a transition model based upon the experience.
  3. Build a policy which optimizes the transition model.
  4. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2).

The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics).

This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate reinforcement learning problems where a particular sequence of actions must be taken to achieve a goal. A novice might be able to memorize this sequence given just one demonstration even though it would require experience exponential in the length of the sequence to discover the key sequence accidentally.

Andrew Ng’s group has exploited this to make this very fun picture.
(Yeah, that’s a helicopter flying upside down, under computer control.)

As far as this particular paper, one question occurs to me. There is a general principle of learning which says we should avoid “double approximation”, such as occurs in step (3) where we build an approximate policy on an approximate model. Is there a way to fuse steps (2) and (3) to achieve faster or better learning?

7/27/2005

Not goal metrics

Tags: General jl@ 9:10 am

One of the confusing things about research is that progress is very hard to measure. One of the consequences of being in a hard-to-measure environment is that the wrong things are often measured.

  1. Lines of Code The classical example of this phenomenon is the old lines-of-code-produced metric for programming. It is easy to imagine systems for producing many lines of code with very little work that accomplish very little.
  2. Paper count In academia, a “paper count” is an analog of “lines of code”, and it suffers from the same failure modes. The obvious failure mode here is that we end up with a large number of uninteresting papers since people end up spending a lot of time optimizing this metric.
  3. Complexity Another metric, is “complexity” (in the eye of a reviewer) of a paper. There is a common temptation to make a method appear more complex than it is in order for reviewers to judge it worthy of publication. The failure mode here is unclean thinking. Simple effective methods are often overlooked in favor of complex relatively ineffective methods. This is simply wrong for any field. (Discussion at Lance‘s blog.)
  4. Acceptance Rate “Acceptance rate” is the number of papers accepted/number of papers submitted. A low acceptance rate is often considered desirable for a conference. But:
    1. It’s easy to skew an acceptance rate by adding (or inviting) many weak or bogus papers.
    2. It’s very difficult to judge what, exactly, is good work in the long term. Consequently, a low acceptance rate can retard progress by simply raising the bar too high for what turns out to be a good idea when it is more fully developed. (Consider the limit where only one paper is accepted per year…)
    3. Accept/reject decisions can become more “political” and less about judging the merits of a paper/idea. With a low acceptance ratio, a strong objection by any one of several reviewers might torpedo a paper. The consequence of this is that papers become noncontroversial with a tendency towards incremental improvements.
    4. A low acceptance rate tends to spawn a multiplicity of conferences in one area. There is a strong multiplicity of learning-related conferences.

    (see also How to increase the acceptance ratios at top conferences?)

  5. Citation count Counting citations is somewhat better than counting papers because it is some evidence that an idea is actually useful. This has been particularly aided by automated citation counting systems like scholar.google.com and http://citeseer.ist.psu.edu/. However, there are difficulties—citation counts can be optimized using self-citation and “societies of mutual admiration” (groups of people who agree implicitly or explicitly to cite each other). Citations are also sometimes negative of the form “here we fix bad idea X”.
  6. See also the Academic Mechanism Design post for other ideas.

These metrics do have some meaning. A programmer who writes no lines of code isn’t very good. An academic who produces no papers isn’t very good. A conference that doesn’t aid information filtration isn’t helpful. Hard problems often require complex solutions. Important papers are often cited.

Nevertheless, optimizing these metrics is not beneficial for a field of research. In thinking about this, we must clearly differentiate 1) what is good for a field of research (solving important problems) and 2) what is good for individual researchers (getting jobs). The essential point here is that there is a disparity.

Any individual in academia cannot avoid being judged by these metrics. Attempts by an individual or a small group of individuals to ignore these metrics is unlikely to change the system (and likely to result in the individual or small group being judged badly).

I don’t believe there is an easy fix to this problem. The best we can hope for is incremental progress which takes the form of the leadership in the academic community introducing new, saner metrics. This is a difficult thing, particularly because any academic leader must have succeeded in the old system. Nevertheless, it must happen if academic-style research is to flourish.

In the spirit of being constructive, I’ll make one proposal which may address the “complexity” problem: judge the importance of a piece of work independent of the method. For a conference paper this might be done by changing the review process to have one “technical reviewer” and several “importance reviewers” rather than 3 or 4 reviewers. The “importance reviewer” is easier than the current standard: they must simply understand the problem being solved and rate how important this problem is. The technical reviewers job is harder than the current standard: they must verify that all claims of solution to the problem are met. Overall, the amount of work by reviewers would stay constant, and perhaps we would avoid the preference for complex solutions.

7/21/2005

Six Months

Tags: General jl@ 10:03 pm

This is the 6 month point in the “run a research blog” experiment, so it seems like a good point to take stock and assess.

One fundamental question is: “Is it worth it?” The idea of running a research blog will never become widely popular and useful unless it actually aids research. On the negative side, composing ideas for a post and maintaining a blog takes a significant amount of time. On the positive side, the process might yield better research because there is an opportunity for better, faster feedback implying better, faster thinking.

My answer at the moment is a provisional “yes”. Running the blog has been incidentally helpful in several ways:

  1. It is sometimes educational. example
  2. More often, the process of composing thoughts well enough to post simply aids thinking. This has resulted in a couple solutions to problems of interest (and perhaps more over time). If you really want to solve a problem, letting the world know is helpful. This isn’t necessarily because the world will help you solve it, but it’s helpful nevertheless.
  3. In addition, posts by others have helped frame thinking about “What are important problems people care about?”, and why. In the end, working on the right problem is invaluable.

7/14/2005

What Learning Theory might do

Tags: General jl@ 1:48 pm

I wanted to expand on this post and some of the previous problems/research directions about where learning theory might make large strides.

  1. Why theory? The essential reason for theory is “intuition extension”. A very good applied learning person can master some particular application domain yielding the best computer algorithms for solving that problem. A very good theory can take the intuitions discovered by this and other applied learning people and extend them to new domains in a relatively automatic fashion. To do this, we take these basic intuitions and try to find a mathematical model that:
    1. Explains the basic intuitions.
    2. Makes new testable predictions about how to learn.
    3. Succeeds in so learning.

    This is “intuition extension”: taking what we have learned somewhere else and applying it in new domains. It is fundamentally useful to everyone because it increases the level of automation in solving problems.

  2. Where next for learning theory? I like the analogy with physics. Back before we-the-humans knew much, people would experiment occasionally and learn to design new things by slow evolution. At some point the physics model arose: you try to build mathematical models of what is happening and then make predictions based on the models. This was wildly succesful for physics. For machine learning, it has only been moderately succesful. We have some formalisms which are of some use in addressing novel learning problems, but the overall process of doing machine learning is not very close to “automatic”. The good news is that over the last 20 years a much richer set of positive examples of succesful applied machine learning has developed. Thus, there are many good intuitions from which we can hope to generalize. In the physics analogy, the year is (perhaps) 1900. Here are a few specific issues:
    1. What is the “right” mathematical model of learning? (in analogy, What is the “right” mathematical model of physical phenomena?”) The models we currently use have their compelling points but typically fail to capture all of the relevant details. This is a very hard question to address, but it should be actively considered and any progress may be very helpful. Examples of this include:
      1. What is the “right” model of active learning? We know almost nothing except there is great potential.
      2. What is the “right” model of Reinforcement learning? Again, we know very little in comparison to what we want to know—a fully automatic general RL solver.

      The notion of “right” here is partially theoretical (can we get derive efficient algorithms?) and partially empirical (do they actually work?).

    2. How do we refine the empirical observations and intuitions of applied learning?
      1. How should we think about “prior”? The Bayesian answer seems unconvincing. At a minimum, information used to create a Bayesian prior often does not come in the form of a Bayesian prior, and so some translation system must be developed.
      2. How can we develop big learning systems that solve big problems? Some form of structure seems necessary, but the right form is still unclear. What theory governs the design of such systems?
    3. How do we take existing theoretical insights and translate them into practical algorithms?
      1. The method of linear projection into spaces has been studied theoretically. Is it useful empirically?
      2. The online learning setting seems theoretically compelling and, at least sometimes, empirically validated. What concerns remain to be addressed to make this a useful technology?

We should keep in mind that there is a real chance the limits of machine learning are lower bounded by human learning. Getting from here to there of course will require a bit of work, some of which might be greatly aided by mathematical consideration.

7/13/2005

“Sister Conference” presentations

Tags: General jl@ 8:23 am

Some of the “sister conference” presentations at AAAI have been great. Roughly speaking, the conference organizers asked other conference organizers to come give a summary of their conference. Many different AI-related conferences accepted. The presenters typically discuss some of the background and goals of the conference then mention the results from a few papers they liked. This is great because it provides a mechanism to get a digested overview of the work of several thousand researchers—something which is simply available nowhere else.

Based on these presentations, it looks like there is a significant component of (and opportunity for) applied machine learning in AIIDE, IUI, and ACL.

There was also some discussion of having a super-colocation event similar to FCRC, but centered on AI & Learning. This seems like a fine idea. The field is fractured across so many different conferences that the mixing of a supercolocation seems likely helpful for research.

7/11/2005

AAAI blog

Tags: General jl@ 10:11 am

The AAAI conference is running a student blog which looks like a fun experiment.

7/10/2005

Thinking the Unthought

Tags: General jl@ 9:10 am

One thing common to much research is that the researcher must be the first person ever to have some thought. How do you think of something that has never been thought of? There seems to be no methodical manner of doing this, but there are some tricks.

  1. The easiest method is to just have some connection come to you. There is a trick here however: you should write it down and fill out the idea immediately because it can just as easily go away.
  2. A harder method is to set aside a block of time and simply think about an idea. Distraction elimination is essential here because thinking about the unthought is hard work which your mind will avoid.
  3. Another common method is in conversation. Sometimes the process of verbalizing implies new ideas come up and sometimes whoever you are talking to replies just the right way. This method is dangerous though—you must speak to someone who helps you think rather than someone who occupies your thoughts.
  4. Try to rephrase the problem so the answer is simple. This is one aspect of giving up. Failing fast is better than failing slow.

There are also general ‘context development’ techniques which are not specifically helpful for your problem, but which are generally helpful for related problems.

  1. Understand the multiple motivations for working on some topic, when they exist.
  2. Question the “rightness” of every related thing. This is fundamental to finding good judgement in what you work on.
  3. Let a little bit of chaos into your life. Once in awhile, attend a random conference, talk to people who you would not otherwise talk to, etc…

7/7/2005

The Limits of Learning Theory

Tags: General jl@ 8:33 am

Suppose we had an infinitely powerful mathematician sitting in a room and proving theorems about learning. Could he solve machine learning?

The answer is “no”. This answer is both obvious and sometimes underappreciated.

There are several ways to conclude that some bias is necessary in order to succesfully learn. For example, suppose we are trying to solve classification. At prediction time, we observe some features X and want to make a prediction of either 0 or 1. Bias is what makes us prefer one answer over the other based on past experience. In order to learn we must:

  1. Have a bias. Always predicting 0 is as likely as 1 is useless.
  2. Have the “right” bias. Predicting 1 when the answer is 0 is also not helpful.

The implication of “have a bias” is that we can not design effective learning algorithms with “a uniform prior over all possibilities”. The implication of “have the ‘right’ bias” is that our mathematician fails since “right” is defined with respect to the solutions to problems encountered in the real world. The same effect occurs in various sciences such as physics—a mathematician can not solve physics because the “right” answer is defined by the world.

A similar question is “Can an entirely empirical approach solve machine learning?”. The answer to this is “yes”, as long as we accept the evolution of humans and that a “solution” to machine learning is human-level learning ability.

A related question is then “Is mathematics useful in solving machine learning?” I believe the answer is “yes”. Although mathematics can not tell us what the “right” bias is, it can:

  1. Give us computational shortcuts relevant to machine learning.
  2. Abstract empirical observations of what an empirically good bias is allowing transference to new domains.

There is a reasonable hope that solving mathematics related to learning implies we can reach a good machine learning system in time shorter than the evolution of a human.

All of these observations imply that the process of solving machine learning must be partially empirical. (What works on real problems?) Anyone hoping to do so must either engage in real-world experiments or listen carefully to people who engage in real-world experiments. A reasonable model here is physics which has benefited from a combined mathematical and empirical study.

7/4/2005

The Health of COLT

Tags: General jl@ 7:57 am

The health of COLT (Conference on Learning Theory or Computational Learning Theory depending on who you ask) has been questioned over the last few years. Low points for the conference occurred when EuroCOLT merged with COLT in 2001, and the attendance at the 2002 Sydney COLT fell to a new low. This occurred in the general context of machine learning conferences rising in both number and size over the last decade.

Any discussion of why COLT has had difficulties is inherently controversial as is any story about well-intentioned people making the wrong decisions. Nevertheless, this may be worth discussing in the hope of avoiding problems in the future and general understanding. In any such discussion there is a strong tendency to identify with a conference/community in a patriotic manner that is detrimental to thinking. Keep in mind that conferences exist to further research.

My understanding (I wasn’t around) is that COLT started as a subcommunity of the computer science theory community. This implies several things:

  1. There was a basic tension facing authors: Do you submit to COLT or to FOCS or STOC which are the “big” theory conferences?
  2. The research programs in COLT were motivated by theoretical concerns (rather than, say, practical experience). This includes motivations like understanding the combinatorics of some models of learning and the relationship with crypto.

This worked well in the beginning when new research programs were being defined and new learning models were under investigation. What went wrong from there is less clear.

  1. Perhaps the community shifted focus from thinking about new learning models to simply trying to find solutions in older models, and this went stale.
  2. Perhaps some critical motivations were left out. Many of the learning models under investigation at COLT strike empirically motivated people as implausibly useful.
  3. Perhaps the conference/community was not inviting enough to new forms of learning theory. Many pieces of learning theory have not appeared at COLT over the last 20 years.

These concerns have been addressed since the low point of COLT, but the long term health is still questionable: ICML has been accepting learning theory with plausible empirical motivations and a mathematical learning theory conference has appeared so there are several choices of venue available to authors.

The good news is that this year’s COLT appeared healthy. The topics covered by the program were diverse and often interesting. Several of the papers seem quite relevant to the practice of machine learning. Perhaps an even better measure is that there were many younger people in attendance.

7/1/2005

The Role of Impromptu Talks

Tags: General jl@ 9:29 pm

COLT had an impromptu session which seemed as interesting or more interesting than any other single technical session (despite being only an hour long). There are several roles that an impromptu session can play including:

  1. Announcing new work since the paper deadline. Letting this happen now rather than later helps aid the process of research.
  2. Discussing a paper that was rejected. Reviewers err sometimes and an impromptu session provides a means to remedy that.
  3. Entertainment. We all like to have a bit of fun.

For design, the following seem important:

  1. Impromptu speakers should not have much time. At COLT, it was 8 minutes, but I have seen even 5 work well.
  2. The entire impromptu session should not last too long because the format is dense and promotes restlessness. A half hour or hour can work well.

Impromptu talks are a mechanism to let a little bit of chaos into the schedule. They will be chaotic in content, presentation, and usefulness. The fundamental advantage of this chaos is that it provides a means for covering material that the planned program did not (or could not). This seems like a “bargain use of time” considering the short duration. One caveat is that it is unclear how well this mechanism can scale to large conferences.

6/10/2005

Workshops are not Conferences

Tags: General jl@ 9:09 am

… and you should use that fact.

A workshop differs from a conference in that it is about a focused group of people worrying about a focused topic. It also differs in that a workshop is typically a “one-time affair” rather than a series. (The Snowbird learning workshop counts as a conference in this respect.)

A common failure mode of both organizers and speakers at a workshop is to treat it as a conference. This is “ok”, but it is not really taking advantage of the situation. Here are some things I’ve learned:

  1. For speakers: A smaller audience means it can be more interactive. Interactive means a better chance to avoid losing your audience and a more interesting presentation (because you can adapt to your audience). Greater focus amongst the participants means you can get to the heart of the matter more easily, and discuss tradeoffs more carefully. Unlike conferences, relevance is more valued than newness.
  2. For organizers: Not everything needs to be in a conference style presentation format (i.e. regularly spaced talks of 20-30 minute duration). Significant (and variable) question time, different talk durations, flexible rescheduling, and panel discussions can all work well.
« Newer PostsOlder Posts »

Powered by WordPress