A Fundamentalist Organization of Machine Learning

There are several different flavors of Machine Learning classes. Many classes are of the ‘zoo’ sort: many different learning algorithms are presented. Others avoid the zoo by not covering the full scope of machine learning.

This is my view of what makes a good machine learning class, along with why. I’d like to specifically invite comment on whether things are missing, misemphasized, or misplaced.

Phase	Subject	Why?
Introduction	What is a machine learning problem?	A good understanding of the characteristics of machine learning problems seems essential. Characteristics include: a data source, some hope the data is predictive, and a need for generalization. This is probably best taught in a case study manner: lay out the specifics of some problem and then ask “Is this a machine learning problem?”
Introduction	Machine Learning Problem Identification	Identification and recognition of the type of learning problems is (obviously) a very important step in solving such problems. People need to be familiar witth the concept of ‘regression’, ‘classification’, ‘cost sensitive classification’, ‘reinforcement learning’, etc… A good organization of these things is possible, but not yet well done.
Introduction	Example algorithm 1	To really understand machine learning, a couple learning algorithms must be understood in detail.
Introduction	Example algorithm 2	Ditto. The reason why the number is “2” and not “1” or “3” is that 2 is the minimum number required to make people naturally aware of the degrees of freedom available in learning algorithm design.
Analysis	Bias for Learning	The need for a good bias is one of the defining characteristics of learning. This includes discussing the means to specify bias (via Bayesian priors, choice of features, graphical models, etc…). This statement is generic so it will always apply to one degree or another.
Analysis	Learning can be boosted.	This is the boosting observation: that it is possible to bootstrap predictive ability to create a better overall system. This statement is similarly generic.
Analysis	Learning can be transformed	This is the reductions observation: that the ability to solve one kind of learning problems implies the ability to solve other kinds of leanring problems. This statement is similarly generic.
Analysis	Learning can be preserved	This is the online learning with experts observation: that we can have a master algorithm which preserves the best learning performance of subalgorithms. This statement is again generic.
Analysis	Overfitting	Learning algorithms can easily overfit to existing training data. How to analyze this (with an IID assumption), and how to avoid it are very important for success.
Analysis	Hardness of Learning	It turns out that there are several different ways in which machine learning can be hard including computational and information theoretic hardness. Some of PAC learning is relevant here. An understanding of how and why learning algorithms can fail seems important to understand the process.
Applications	Vision	One example of how learning is applied to solve vision problems.
Applications	Language	Ditto for language problems.
Applications	Robotics	Ditto for robotics
Applications	Speech	Ditto for speech
Applications	Businesses	Ditto for businesses
	Where is machine learning going?	Insert predictions of the future here. It should be understood that the field of machine learning is changing rapidly.

The emphasis here is on fundamentals: generally applicable mathematical statements and understandings of the learning problem. Given that emphasis, the ‘applications’ section could be cut without harming the integrity of the purpose.

12 Replies to “A Fundamentalist Organization of Machine Learning”

Hector says:

2/24/2006 at 3:43 pm

I have two questions. Can you elaborate on the statement that ML Problem Identification cannot be well done yet? And, what do you think are the natural prerequisites for a class like this?
Mentifex says:

2/24/2006 at 7:18 pm

The artificial mind that I am working on makes the following two approaches to Machine Learning. First, he underlying algorithms of the artificial mind learn new concepts as a matter of course, and then the intellect of the AI seeks out knowledge by means of the Ask module and its related sub-modules, such as wtAuxSDo.
Bianca Zadrozny says:

2/25/2006 at 12:17 am

John, I think you (with co-authors 🙂 should write a textbook following this outline, since current textbooks follow the zoo approach or focus on one kind of learning algorithm. I am sure the book would be a success.
It would be nice to have something on sequential learning, since this is also a very common type of learning problem.
jl says:

2/25/2006 at 9:15 am
I think explaining problem identification well requires quite a bit of work.

Natural prerequisites are:
1. familiarity with programming.
2. familiarity with statistics.
Arindam says:

2/26/2006 at 12:23 pm

I agree with Bianca – a good *textbook* is Machine Learning is sorely missing. While there are great books on special topics (I am told – there are 4 books on graphical models that are coming out this year, Vapnik is writing another book), the books on the general area need to be majorly updated, or someone needs to write a new book.

As for the class (or when you guys write the book) – its best to get input from someone who has taught machine learning for several years. What looks best offline, may not work in the class room setting. Remember, the i.i.d. assumption does not hold in an ‘adversarial’ seeting 🙂
hal says:

2/26/2006 at 2:31 pm

I think that overall that looks good: I would enjoy taking this course now and would really have enjoyed it several years ago.

I’m a bit surprised that overfitting comes so late. This seems like one of the most important basic issues, and it might be difficult to discuss, say, example algorithms without talking about overfitting. I’m also curious what you think two good example algorithms would be. There are so many to choose from, and different ones have different prereqs (i.e., understanding SVMs without understanding duality and optimization is hard). And there are so many “basis vectors” along which algorithms differ, it seems difficult to get a reasonable sample.

I’m also curious how you feel about “historical interest” issue. I personally enjoy such information, even if relegated to a section at the end of each “chapter.” In a classroom setting, I think that a bit of history can go a long way to explaining why things are the way they are.
Steve Purpura says:

2/26/2006 at 3:17 pm

I think overfitting would be more interesting prior to the contextual discussion of boosting, transformation, and preservation. Especially because you could discuss overfitting in these contexts.

If you flesh out this outline, you should email it to me because I will be a willing consumer.
DrewBagnell says:

2/26/2006 at 7:44 pm

I second Hal’s comment: overfitting is perhaps the fundemental issue in machine learning.
jl says:

2/27/2006 at 10:22 am

Perhaps this is pointing out a gap in our understanding: the only way we really know how to analyze overfitting is with respect to an IID sample assumption. And yet, overfitting is more fundamental than the IID assumption.

How do we analyze overfitting without making an IID assumption?
pingva says:

3/1/2006 at 10:19 pm

I was wondering where would NFL and ugly-duckling theorems fit in here, if at all.
Myriam Abramson says:

3/2/2006 at 1:08 pm

A machine learning book that does not have a “zoo” approach I think is Pat Langley’s Elements of Machine Learning.
Zachary says:

2/15/2007 at 6:11 pm

As a PhD student (music composition, minor in computer science) who just took his first Machine Learning class, this outline looks outstanding. It’s always hard to think back to when one didn’t know what one knows now, but I believe this organization would have clarified the material and made it less overwhelming. And I add my voice to the clamor for a good textbook, a repeated topic of discussion and frustration last semester.

My class took the zoo approach. While most students who stuck with it were able (with difficulty) to wrap their heads around both a new fundamental concept and an algorithm in which it became apparent each week or so, I’d prefer (at least for an “Intro to ML” class) to focus more on the concepts, problems, and applications than on algorithms.

On the other hand (such is academia), I would not have time to take a second ML class, and if I had a grasp on the conceptual framework without having coded many algorithms to make my knowledge concrete, I would probably feel a little cheated, ending the semester thirsty to write some code and see it in action.

Comments are closed.