Several recent papers have shown that SVM-like optimizations can be used to handle several large family loss functions.
This is a good thing because it is implausible that the loss function imposed by the world can not be taken into account in the process of solving a prediction problem. Even people used to the hard-core Bayesian approach to learning often note that some approximations are almost inevitable in specifying a prior and/or integrating to achieve a posterior. Taking into account how the system will be evaluated can allow both computational effort and design effort to be focused so as to improve performance.
A current laundry list of capabilities includes:
- 2002 multiclass SVM including arbitrary cost matrices
- ICML 2003 Hidden Markov Models
- NIPS 2003 Markov Networks (see some discussion)
- EMNLP 2004 Context free grammars
- ICML 2004 Any loss (with much computation)
- ICML 2005 Any constrained linear prediction model (that’s my own name).
- ICML 2005 Any loss dependent on a contingency table
I am personally interested in how this relates to the learning reductions work which has similar goals, but works at a different abstraction level (the learning problem rather than algorithmic mechanism). The difference in abstraction implies that anything solvable by reduction should be solvable by a direct algorithmic mechanism. However, comparing and constrasting the results I know of it seems that what is solvable via reduction to classification versus what is solvable via direct SVM-like methods is currently incomparable.
- Can SVMs be tuned to directly solve (example dependent) cost sensitive classification? Obviously, they can be tuned indirectly via reduction, but it is easy to imagine more tractable direct optimizations.
- How efficiently can learning reductions be used to solve structured prediction problems? Structured prediction problems are instances of cost sensitive classification, but the regret transform efficiency which occurs when this embedding is done is too weak to be of interest.
- Are there any problems efficiently solvable by SVM-like algorithms which are not efficiently solvable via learning reductions?
The answer to question 1 is “yes”. Alex Smola showed how the ICML 2004 ‘any loss’ can be applied to example-dependent losses (and randomly sampled) losses, giving it the full generality of cost sensitive classification.