The Humanloop Spectrum of Machine Learning

All branches of machine learning seem to be united in the idea of using data to make predictions. However, people disagree to some extent about what this means. One way to categorize these different goals is on an axis, where one extreme is “tools to aid a human in using data to do prediction” and the other extreme is “tools to do prediction with no human intervention”. Here is my estimate of where various elements of machine learning fall on this spectrum.

Human Necessary Human partially necessary Human unnecessary
Clustering, data visualization Bayesian Learning, Probabilistic Models, Graphical Models Kernel Learning (SVM’s, etc..) Decision Trees? Reinforcement Learning

The exact position of each element is of course debatable. My reasoning is that clustering and data visualization are nearly useless for prediction without a human in the loop. Bayesian/probabilistic models/graphical models generally require a human to sit and think about what is a good prior/structure. Kernel learning approaches have a few standard kernels which often work on simple problems, although sometimes significant kernel engineering is required. I’ve been impressed of late how ‘black box’ decision trees or boosted decision trees are. The goal of reinforcement learning (rather than perhaps the reality) is designing completely automated agents.

The position in this spectrum provides some idea of what the state of progress is. Things at the ‘human necessary’ end have been succesfully used by many people to solve many learning problems. At the ‘human unnecessary’ end, the systems are finicky and often just won’t work well.

I am most interested in the ‘human unnecessary’ end.

5 Replies to “The Humanloop Spectrum of Machine Learning”

  1. I was wondering about the need for a human. It seems
    that the human faculty called for is qualitatively different as you move along the spectrum. A few:
    – visual perception of patterns
    – knowledge of causal connections and which variables to include
    – knowledge of relevant feature interactions in order to pick a kernel that reflects them

    What are your thoughts on the degree to which any of these things can be automated (e.g. structure searches for graphical models)? And would they need to? I’m wondering if the apparent need for a human
    is really in part because one expects a human to explain why the prediction is successful, and the methods at the left make that easier, if they work,
    while for other methods no such demand may be made.

  2. Right now, using a human-as-part-of-a-learning-algorithm for many problems seems unavoidable, if you want to make good predictions. For example, it’s difficult for me to imagine making dasher using some fully automated algorithm.

    There are (roughly) two approaches to achieving automation. The first approach is to take a less automated system and try to automate the parts that a human might do. The other approach is to design a fully automated system. The jury is out as far as which of these approaches will succeed first. However, there are several pieces of evidence suggesting that retrofitting automation may be harder than designing it. For example structure search seems to often be intractably difficult, both theoretically, and in practice.

    Whether or not any system can succeed is debated by some people. I personally see no reason why computers can’t eventually do anything humans can do.

  3. This may sound silly but a part often taken by humans is feature selection. When using a machine learning scheme of any sort on text, for instance, you usually do’nt just feed text files and a classification as input – not unless you coded a mechanism for feature extraction first. Many stock-trading algorithms don’t use quotes and ToS’s to learn what to do…
    This is inherently domain-specific action to a certain extent, but the same reasons human intervention comes in handy in feature selection may be part of why we don’t get fully automated stuff to learn well.

  4. Gilad is not silly. What he’s pointing out is that any ‘human unnecessary’ learning method can incorporate a human by the choice of initial features or initial representation. Thus, everything can be mapped into one end of the spectrum.

    Incidentally, it’s not totally crazy to apply learning without this step. For example clustering by compression.

  5. I believe that it makes sense to divide the human intervention in current machine learning systems in three steps:
    1) Before learning: choice of representation, feature selection, choice of learning paradigm (unsupervised, supervised, reinforcement) and learning algorithm.
    2) During learning: manual adjustments of learning algorithm parameters are often necessary.
    3) After learning: understanding the models learned and using the models to make decisions.

    In step 3 lies the main difference between the left end of the spectrum and the right end of the spectrum proposed by John. In reinforcement learning, this step does not exist because the agent learns to act directly in the environment it was built for. In clustering and bayesian networks, this may be the most important step because the purpose of these learning algorithms is to actually aid humans to understand the domain from where the data was extracted. In supervised learning, depending on the application, humans may be interested in understanding the model itself, but more often they are interested in using the predictions of the model to make decisions (which can often be automated).

Comments are closed.