All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum.
|Human Necessary||Human partially necessary||Human unnecessary|
|Clustering, data visualization||Bayesian Learning, Probabilistic Models, Graphical Models||Kernel Learning (SVM’s, etc..)||Decision Trees?||Reinforcement Learning|
The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what is a good prior/structure. Kernel learning approaches have a few standard kernels which often work on simple problems, although sometimes significant kernel engineering is required. I’ve been impressed of late how ‘black box’ decision trees or boosted decision trees are. The goal of reinforcement learning (rather than perhaps the reality) is designing completely automated agents.
The position in this spectrum provides some idea of what the state of progress is. Things at the ‘human necessary’ end have been succesfully used by many people to solve many learning problems. At the ‘human unnecessary’ end, the systems are finicky and often just won’t work well.
I am most interested in the ‘human unnecessary’ end.