Breakfast at the Quad club, Lunch at TTI, Dinner in groups.
9:00am | Breakfast@Quad Club | |||
overview talks | ||||
9:30 am | Nick Hopper | Machine Learning and Reductions in Cryptography | Modern cryptography can be viewed as the compliment of machine learning: given apparently strong primitives (concept classes which appear hard to learn (even weakly)), the goal of a cryptographer is to find a way to construct a cryptographic component from these primitives so that finding a weakness in the new component ((weakly) learning some supposedly hidden information) is provably equivalent to finding weaknesses in the primitives used to construct it. Thus, in a sense, every cryptographic security proof is a reduction between weakly learning two related concept classes. Over the past twenty-five years, the cryptographic community has developed a set of tools for thinking about these reductions, and its own set of answers to the central questions of this workshop. This talk will highlight some of the famous cryptographic reductions, and some cryptographic tools and approaches to thinking about reductions. | |
10:15 am | John Langford | Classification as an Atom | Several fields such as digital circuit design and 3-d graphics
have greatly benefited from a reductionist approach: find the
simplest element capable of constructing desired behaviors and
use it. This approach is typically fruitful because it becomes
unnecessary to design a new element for a new problem. Instead, you
reuse old components to construct new solutions to new problems.
Is classification an atom for machine learning? We look at the set of problems reducible to classification and the advantages of using classification with respect to other primitives. |
|
11:00 am | Imre Risi Kondor | To the RKHS it's all the same | It is now generally recognized that one of the principal advantages of kernel based learning is that the same algorithms can support a wide variety input and output spaces. I will be talking about how a wide variety of learning problems involving learning on manifolds, groups, etc. can be reduced to the beloved SVM or Gaussian Processes. In particular, I am going to talk about solving ranking problems with kernels. | |
11:45am | Lunch@TTI-C | |||
Reinforcement Learning Talks | ||||
1:30pm | Alan Fern and Robert Givan | Reinforcement Learning Viewed as Classification: Empirical Results for Large Relational Markov Decision Processes | We present and empirically evaluate a heuristic reduction from reinforcement learning to cost-sensitive classification. Existing reductions that are based on approximate policy iteration typically represent policies via cost functions and produce a sequence of cost-function regression problems. Instead, our approach represents policies directly as state-action mappings and produces a sequence of cost-sensitive classification problems. We argue that reducing to classification rather than regression can have practical benefits as it is often easier to specify an effective policy space via a direct policy-language bias rather than a cost-function bias. Empirically our technique can select good stationary policies for very large relational Markov decision processes (MDPs) where previous techniques fail. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. | paper |
2:15pm | Michail Lagoudakis(joint with Ron Parr) | Reinforcement Learning as Classification: Leveraging Modern Classifiers | The basic tools of machine learning appear in the inner loop of most reinforcement learning algorithms, typically in the form of Monte Carlo methods or function approximation techniques. To a large extent, however, current reinforcement learning algorithms draw upon machine learning techniques that are at least ten years old and, with a few exceptions, very little has been done to exploit recent advances in classification learning for the purposes of reinforcement learning. We use a variant of approximate policy iteration based on rollouts that allows us to use a pure classification learner, such as a support vector machine (SVM), in the inner loop of the algorithm. We argue that the use of SVMs, particularly in combination with the kernel trick, can make it easier to apply reinforcement learning as an ``out-of-the-box'' technique, without extensive feature engineering. Our approach opens the door to modern classification methods, but does not preclude the use of classical methods. We present experimental results in the pendulum balancing and bicycle riding domains using both SVMs and neural networks for classifiers. | ICML 2003 paper |
3:00pm | Drew Bagnell | Policy Search by Dynamic Programming | Policy search approaches to reinforcement learning represent a
promising method for solving POMDPs and large MDPs. Unfortunately, many
methods in common use suffer from poor scaling in required
computational and sample resources. We show that if a ``baseline
distribution'' is given (indicating roughly how often we expect a good
policy to visit each state), then we can derive an efficient policy
search algorithm for which we can provide non-trivial performance
guarantees. Our work leverages the central insight of dynamic
programming, the principle of optimality, without recourse to finding
fixed points of the Bellman functional equation. In particular, our
algorithm, which we call Policy Search by Dynamic Programming (PSDP),
is in the spirit of the traditional dynamic programming approach to
solving MDPs where values are ``backed up.'' In PSDP, it is the policy
which is backed up. Each backup is in essence a weighted classification problem; in this way we are able to reduce reinforcement learning with
a baseline distribution to a finite sequence of supervised learning
problems. We will conclude with applications of the algorithm to
robotic planning and control in partially observed domains.
This is joint work with Sham Kakade, Andrew Ng and Jeff Schneider. |
paper |
3:45pm | John Langford | Families of Reductions of Reinforcement Learning to Classification | There are several ways to reduce Reinforcement learning to classification in the presence of some form of side information about the world. Some of these methods have advantages in precision, some have advantages in computational complexity, and some have advantages in terms of computational requirements. I will review and compare them all at a high level. | paper paper |
4:30pm | Adam Klivans | In Search of New Algorithms for PAC Learning |
A concept class has polynomial threshold function degree d if for
every element of the concept class there exists a real valued,
multivariate polynomial of degree d such the polynomial is positive on
exactly the set of inputs for which the concept is TRUE (and negative
otherwise).
Recent new results for classical problems such as learning DNF or learning intesections of halfspaces suggests that giving a reduction from a concept to a polynomial threshold function is a powerful approach for designing algorithms. Unfortunately it also seems like it is the only approach for designing PAC learning algorithms. We will show that with the notable exception of the concept class of parity functions, essentially all known PAC algorithms (in the distribution free model) can be rephrased as reductions to polynomial threshold functions. We will describe recent polynomial threshold function constructions for concept classes such as DNF formulae, intersections of halfspaces, and decision lists, and show that even minor variations of this approach can yield interesting new algorithms. (Joint work with Ryan O'Donnell and Rocco Servedio) |
|
5:15pm | Dinner |
8:30am | Breakfast@Quad Club | |||
Theory | ||||
9:30 am | Shai Ben-David | An Informational Complexity Notion of Reduction for Concept Classes | Several parameters defined
for a concept class may be viewed as reflecting the informational
complexity of the class. The VC-dimension is probably the prime
example of such a parameter. Some other examples are:
This talk is based on joint work with Ami Litman. |
|
10:15 am | Phil Long | Weighing evidence in the absence of a gold standard |
Many propositions in Biology can be supported by a variety of
different sorts of evidence. It is often useful to collect together
large numbers of such propositions, together with the evidence
supporting them, into databases to be used in other analyses. In this
case, it becomes necessary to automatically decide which propositions
are supported well enough to be included. This can involve weighing
evidence of varying strength. In some important cases, there is
effectively no definitive source of evidence, no "gold standard," that
can be used for evaluating the others. Examples of problems with this
property are (1) pairing up equivalent genes between species, and (2)
predicting which pairs of proteins interact.
Previously, Mitina and Vereshchagin proposed and analyzed a theoretical model of a problem with some key properties in common with these applications. To an extent, a more detailed theoretical formulation of the above problems can be reduced to a problem similar to theirs, to which their techniques can be applied. This leads naturally to an algorithm, though some further work was needed to make it practical. I will describe simulation results for this algorithm on artificial data, evaluation of the method through its application to mapping orthologs, and some open problems. |
|
11:00 am | Guy Lebanon | The Ranking Poset Framework for Classification and Ranking | We introduce a new framework for combining rankers that generalizes several known models such as discrete Adaboost and logistic regression, error correcting output codes and the Mallows model. Using the new framework we study properties of the above models and derive a previously unknown Bayesian interpretation. In addition, the framework suggests new models for ranking and classification. One such model, cranking, is demonstrated in the context of meta search and classification. Other proposed models are an extension of the above framework to incomplete and categorical ranking and applying the framework to the collaborative filtering problem. | NIPS 2002 paper |
11:45 | Lunch@TTI-C | |||
Instantiations | ||||
1:30pm | Adam Kalai | Boosting with Noise |
One of the most common complaints about boosting is that it seems to
perform poorly on noisy data sets. We show theoretically exactly how far
one can boost in the presence of random classification noise. Under a
slightly different model of weak learning, we show that one can boost to
arbitrary accuracy in the presence of noise.
In newer results, we give a noise-tolerant boosting algorithm that has the same theoretical efficiency of mainstream boosting algorithms such as AdaBoost. Joint work with Rocco Servedio |
paper |
2:15pm | Bianca Zadrozny | Reductions as changes in the distribution of examples |
Recent reductions from cost-sensitive learning and reinforcement learning
to zero/one-loss classifier learning involve as a key component a change
in the distribution of examples. The boosting reduction from strong
classifier learning to weak classifier llearning is also carried out as a
series of distribution changes. Besides being interesting theoretically as
a possible way of unifying different reductions into a common framework,
the observation that many reductions in machine learning involve
distribution changes has implications for the use of reductions in
practice. In order to achieve good results through the use of reductions,
our "core learning algorithm" (the one we reduce to) must be able to
learn effectively with respect to a distribution of examples that is
different than the one from which we can directly draw examples. In this
talk, I will compare different ways of implementing this with existing
classifier learners and present a general ensemble learning scheme based
on rejection sampling that yields very good results in practice for a
variety of base classifier learners.
This talk is based on joint work with John Langford and Naoki Abe. |
|
3:00 pm | David McAllester | Collective Classification | The collective classification problem arises when a set of items are to be simultaneously classified taking into account correlations between them. For example, labeling the words in a sentence into parts of speech or classifying the web pages in a department web site into professors and students. We review some existing approaches to this problem and consider a particular reduction to classical binary classification. | |
3:45pm | Naoki Abe | Reduction of unsupervised learning to classification with application to outlier detection |
We present a formulation of unsupervised learning which can be reduced to
classification , namely, a classification algorithm with performance
guarantee can be transformed into the same for unsupervised learning.
While the reduction is valid for various forms of unsupervised learning,
including clustering and density estimation, it is most natural to apply it
to the problem of outlier detection. We demonstrate its pratical
significance by applying it to a proto-typical application of outlier
detection: intrusion detection. Our experiments show that our method based
on the proposed reduction attains excellent predictive performance, as
compared to the best performing methods for this problem in the literature.
Joint work with John Langford. |
|
4:30pm | Valentina Bayer | Cost-sensitive Learning: Reduction from Supervised Learning to an Acyclic MDP |
We define cost-sensitive learning with attribute costs
and misclassification costs as the problem of learning
diagnostic policies minimizing expected total costs.
In this formulation, cost-sensitive learning is harder to solve than standard supervised learning. Special cases of cost-sensitive learning problems include classifiers minimizing 0/1 loss, classifiers sensitive only to attribute costs, and classifiers sensitive only to misclassification costs. This CSL problem can be reduced to an acyclic MDP with a single start state (the state where no attributes were measured). Model-free Reinforcement Learning algorithms take too long to converge, due to the stochastic nature of classification actions. Dynamic programming methods can be used to solve this MDP, but it is more efficient to perform heuristic search. We are interested in learning cost-sensitive diagnostic policies from datasets of labeled examples. We show on realistic problems that AO*-based systematic search learns good diagnostic policies when additional safeguards against overfitting are provided. |
paper |
5:15pm | Dinner |