Machine Learning (Theory)

2/24/2006

A Fundamentalist Organization of Machine Learning

Tags: Machine Learning jl@ 12:54 pm

There are several different flavors of Machine Learning classes. Many classes are of the ‘zoo’ sort: many different learning algorithms are presented. Others avoid the zoo by not covering the full scope of machine learning.

This is my view of what makes a good machine learning class, along with why. I’d like to specifically invite comment on whether things are missing, misemphasized, or misplaced.

Phase Subject Why?
Introduction What is a machine learning problem? A good understanding of the characteristics of machine learning problems seems essential. Characteristics include: a data source, some hope the data is predictive, and a need for generalization. This is probably best taught in a case study manner: lay out the specifics of some problem and then ask “Is this a machine learning problem?”
Introduction Machine Learning Problem Identification Identification and recognition of the type of learning problems is (obviously) a very important step in solving such problems. People need to be familiar witth the concept of ‘regression’, ‘classification’, ‘cost sensitive classification’, ‘reinforcement learning’, etc… A good organization of these things is possible, but not yet well done.
Introduction Example algorithm 1 To really understand machine learning, a couple learning algorithms must be understood in detail.
Introduction Example algorithm 2 Ditto. The reason why the number is “2” and not “1” or “3” is that 2 is the minimum number required to make people naturally aware of the degrees of freedom available in learning algorithm design.
Analysis Bias for Learning The need for a good bias is one of the defining characteristics of learning. This includes discussing the means to specify bias (via Bayesian priors, choice of features, graphical models, etc…). This statement is generic so it will always apply to one degree or another.
Analysis Learning can be boosted. This is the boosting observation: that it is possible to bootstrap predictive ability to create a better overall system. This statement is similarly generic.
Analysis Learning can be transformed This is the reductions observation: that the ability to solve one kind of learning problems implies the ability to solve other kinds of leanring problems. This statement is similarly generic.
Analysis Learning can be preserved This is the online learning with experts observation: that we can have a master algorithm which preserves the best learning performance of subalgorithms. This statement is again generic.
Analysis Overfitting Learning algorithms can easily overfit to existing training data. How to analyze this (with an IID assumption), and how to avoid it are very important for success.
Analysis Hardness of Learning It turns out that there are several different ways in which machine learning can be hard including computational and information theoretic hardness. Some of PAC learning is relevant here. An understanding of how and why learning algorithms can fail seems important to understand the process.
Applications Vision One example of how learning is applied to solve vision problems.
Applications Language Ditto for language problems.
Applications Robotics Ditto for robotics
Applications Speech Ditto for speech
Applications Businesses Ditto for businesses
Where is machine learning going? Insert predictions of the future here. It should be understood that the field of machine learning is changing rapidly.

The emphasis here is on fundamentals: generally applicable mathematical statements and understandings of the learning problem. Given that emphasis, the ‘applications’ section could be cut without harming the integrity of the purpose.

12 Comments to “A Fundamentalist Organization of Machine Learning”
  1. Hector says:

    I have two questions. Can you elaborate on the statement that ML Problem Identification cannot be well done yet? And, what do you think are the natural prerequisites for a class like this?

  2. Mentifex says:

    The artificial mind that I am working on makes the following two approaches to Machine Learning. First, he underlying algorithms of the artificial mind learn new concepts as a matter of course, and then the intellect of the AI seeks out knowledge by means of the Ask module and its related sub-modules, such as wtAuxSDo.

  3. Bianca Zadrozny says:

    John, I think you (with co-authors :) should write a textbook following this outline, since current textbooks follow the zoo approach or focus on one kind of learning algorithm. I am sure the book would be a success.
    It would be nice to have something on sequential learning, since this is also a very common type of learning problem.

  4. jl says:

    I think explaining problem identification well requires quite a bit of work.

    Natural prerequisites are:

    1. familiarity with programming.
    2. familiarity with statistics.
  5. Arindam says:

    I agree with Bianca – a good *textbook* is Machine Learning is sorely missing. While there are great books on special topics (I am told – there are 4 books on graphical models that are coming out this year, Vapnik is writing another book), the books on the general area need to be majorly updated, or someone needs to write a new book.

    As for the class (or when you guys write the book) – its best to get input from someone who has taught machine learning for several years. What looks best offline, may not work in the class room setting. Remember, the i.i.d. assumption does not hold in an ‘adversarial’ seeting :)

  6. hal says:

    I think that overall that looks good: I would enjoy taking this course now and would really have enjoyed it several years ago.

    I’m a bit surprised that overfitting comes so late. This seems like one of the most important basic issues, and it might be difficult to discuss, say, example algorithms without talking about overfitting. I’m also curious what you think two good example algorithms would be. There are so many to choose from, and different ones have different prereqs (i.e., understanding SVMs without understanding duality and optimization is hard). And there are so many “basis vectors” along which algorithms differ, it seems difficult to get a reasonable sample.

    I’m also curious how you feel about “historical interest” issue. I personally enjoy such information, even if relegated to a section at the end of each “chapter.” In a classroom setting, I think that a bit of history can go a long way to explaining why things are the way they are.

  7. I think overfitting would be more interesting prior to the contextual discussion of boosting, transformation, and preservation. Especially because you could discuss overfitting in these contexts.

    If you flesh out this outline, you should email it to me because I will be a willing consumer.

  8. DrewBagnell says:

    I second Hal’s comment: overfitting is perhaps the fundemental issue in machine learning.

  9. jl says:

    Perhaps this is pointing out a gap in our understanding: the only way we really know how to analyze overfitting is with respect to an IID sample assumption. And yet, overfitting is more fundamental than the IID assumption.

    How do we analyze overfitting without making an IID assumption?

  10. pingva says:

    I was wondering where would NFL and ugly-duckling theorems fit in here, if at all.

  11. Myriam Abramson says:

    A machine learning book that does not have a “zoo” approach I think is Pat Langley’s Elements of Machine Learning.

  12. Zachary says:

    As a PhD student (music composition, minor in computer science) who just took his first Machine Learning class, this outline looks outstanding. It’s always hard to think back to when one didn’t know what one knows now, but I believe this organization would have clarified the material and made it less overwhelming. And I add my voice to the clamor for a good textbook, a repeated topic of discussion and frustration last semester.

    My class took the zoo approach. While most students who stuck with it were able (with difficulty) to wrap their heads around both a new fundamental concept and an algorithm in which it became apparent each week or so, I’d prefer (at least for an “Intro to ML” class) to focus more on the concepts, problems, and applications than on algorithms.

    On the other hand (such is academia), I would not have time to take a second ML class, and if I had a grasp on the conceptual framework without having coded many algorithms to make my knowledge concrete, I would probably feel a little cheated, ending the semester thirsty to write some code and see it in action.

Sorry, the comment form is closed at this time.

Powered by WordPress