This is a proposal for a workshop. It may or may not happen depending on the level of interest. If you are interested, feel free to indicate so (by email or comments).
Description:
Assume(*) that any system for solving large difficult learning problems must decompose into repeated use of basic elements (i.e. atoms). There are many basic questions which remain:
- What are the viable basic elements?
- What makes a basic element viable?
- What are the viable principles for the composition of these basic elements?
- What are the viable principles for learning in such systems?
- What problems can this approach handle?
Hal Daume adds:
- Can composition of atoms be (semi-) automatically constructed[?]
- When atoms are constructed through reductions, is there some notion of the “naturalness” of the created leaning problems?
- Other than Markov fields/graphical models/Bayes nets, is there a good language for representing atoms and their compositions?
The answer to these and related questions remain unclear to me. A workshop gives us a chance to pool what we have learned from some very different approaches to tackling this same basic goal.
(*) As a general principle, it’s very difficult to conceive of any system for solving any large problem which does not decompose.
Plan Sketch:
- A two day workshop with unhurried presentations and discussion seems appropriate, especially given the diversity of approaches.
- TTI-Chicago may be able to help with costs.
The above two points suggest having a workshop on a {Friday, Saturday} or {Saturday, Sunday} at TTI-Chicago.
Do “decissions stumps”-like mechanism or committee machines working on trivial but different spaces qualify as “Atomic Learning”?
My apologies if my first comment here qualifies as childrish, lame, awkward, too simple for the regular readers or any combination of the above.
Cool blog John, cheers Matti.
I would say they ‘qualify’ if they can be (re)used in bigger structures. I don’t know how to do that at the moment, but I can’t rule out the possibility either.
To a first approximation, I’d lay it out like this:
1. “Core” methods, which make up the internals of even the simplest models:
1.1. Optimization methods, such as Conjugate Gradent Ascent, Simulated Annealing, SMO, etc.
1.2. Parameterized functions, like Gaussians, Sigmoids, etc.
1.3. Lookup Tables
1.4. Data Manipulators, like sorters, filters, linear algebra routines, etc.
2. Basic Models, which are either complete or partial solutions to the learning problem, however trivial, that cannot be decomposed into smaller sets of models:
2.1. Single-layer Perceptrons
2.2. Bayesian Networks (including HMM’s) (but not with outputs! this is an example of a partial model)
2.3. Gaussian Mixtures
2.4. SVM’s
2.5. Decision Trees
2.6. Log-linear or Maximum Entropy models
2.7. Many More…
**Note: Any of these basic models, if it works on only one dimension of the data, is a ‘decsion stump’ that can be combined in a framework like AdaBoost.
3. Compound Models, which are complete solutions to the learning problem, built up from combinations of basic models:
3.1. HMM’s with Gaussian Mixture outputs (Speech recognition)
3.2. Neural Networks (Multi-layer perceptrons)
3.3. AdaBoost using any models from #2 or #3.
3.4. Many More…
-Jared Maguire
That’s a most interesting idea for a workshop! Some thoughts. In brief, I’d go a bit deeper into the fundamentals. Models are already quite high up.
How is the data represented? What is a concept? What is a variable? What is a value? What is an instance? We can learn to extract instances, and we can learn to construct new variables. What are the canonical data representations (networks? tables? matrices? relational tables? ontologies?)
What is the learning problem? What is the loss function? We can learn loss functions, too. What is the measure of similarity? How can it be constructed?
How do you express the bias/prior? How can a human being provide the background knowledge? How do you enable interaction between the learning human and the learning machine? Do we want alien intelligence?
Interested, yes.
Especially if we can relax the word “learning” to incorporate search, optimization and design more generally.
What you’re talking about, by the way, sounds a lot like an Alexandrian pattern language for learning. And I agree with Aleks: be prepared to wrestle with a number of philosophical entities that are usually presumed and often neglected, like representation theory and learning (search) performance measures.
There seem to be two fascinating, but largely different, types of “atomic learning” that are being discussed. I would roughly categorize these as (1) taking atomic learners that solve problem X and combining them in some way to create a better way to solve X; (2) taking atomic learners that solve problem X and combining them in some way to create a way to solve a more “complex” problem than X itself.
Examples of (1) are many of the relationships John described above: neurons:neural nets, logistic regression:deep belief networks; weak learners:boosted learners. Examples of (2) are, for instance, those techniques found in structured prediction (among other tasks): logistic regression:conditional random fields; svms:max-margin markov nets; perceptron:collins’ structured perceptron.
I (and this is a completely personal view) am more interested in (2). Between (1) and (2), (2) seems to be a place where much less is known; conversely, much is known about weak/strong learning theory, multilayer NNs, etc., especially as concerns the typical PAC-style analyses. As far as I am aware, significantly less is known for (2).