Machine Learning (Theory) – Page 82 – Machine learning and learning theory research

12/22/200512/22/2005

Yes , I am applying

Every year about now hundreds of applicants apply for a research/teaching job with the timing governed by the university recruitment schedule. This time, it’s my turn—the hat’s in the ring, I am a contender, etc… What I have heard is that this year is good in both directions—both an increased supply and an increased demand for machine learning expertise.

I consider this post a bit of an abuse as it is neither about general research nor machine learning. Please forgive me this once.

My hope is that I will learn about new places interested in funding basic research—it’s easy to imagine that I have overlooked possibilities.

I am not dogmatic about where I end up in any particular way. Several earlier posts detail what I think of as a good research environment, so I will avoid a repeat. A few more details seem important:

Application. There is often a tension between basic research and immediate application. This tension is not as strong as might be expected in my case. As evidence, many of my coauthors of the last few years are trying to solve particular learning problems and I strongly care about whether and where a learning theory is useful in practice.
Duration. I would like my next move to be of indefinite duration.

Feel free to email me (jl@hunch.net) if there is a possibility you think I should consider.

12/17/200512/17/2005

Workshops as Franchise Conferences

Founding a successful new conference is extraordinarily difficult. As a conference founder, you must manage to attract a significant number of good papers—enough to entice the participants into participating next year and to (generally) to grow the conference. For someone choosing to participate in a new conference, there is a very significant decision to make: do you send a paper to some new conference with no guarantee that the conference will work out? Or do you send it to another (possibly less related) conference that you are sure will work?

The conference founding problem is a joint agreement problem with a very significant barrier. Workshops are a way around this problem, and workshops attached to conferences are a particularly effective means for this. A workshop at a conference is sure to have people available to speak and attend and is sure to have a large audience available. Presenting work at a workshop is not generally exclusive: it can also be presented at a conference. For someone considering participation, the only overhead is the direct time and effort involved in participation.

All of the above says that workshops are much easier than conferences, but it does not address a critical question: “Why run a workshop at a conference rather than just a session at the conference?” A session at the conference would have all the above advantages.

There is one more very signficant and direct advantage of a workshop over a special session: workshops are run by people who have a direct and significant interest in their success. The workshop organizers do the hard work of developing a topic, soliciting speakers, and deciding what the program will be. Reputations for the workshop organizer are then built on the success or flop of the workshop. This “direct and signficant interest” aspect of a workshop is the basic reason why franchise systems (think 7-11 or McDonalds) are common and successful.

What does this observation imply about how things could be? For example, we could imagine a conference that is “all workshops”. Instead of having a program committee and program chair, the conference might just have a program chair that accepts or rejects workshop chairs who then organize their own workshop/session. This mode doesn’t seem to exist which is always cautioning, but on the other hand it ‘s not clear this mode has even been tried. NIPS is probably the conference closest to using this approach. For example, a significant number of people attend only the workshops at NIPS.

12/14/200512/17/2005

More NIPS Papers II

I thought this was a very good NIPS with many excellent papers. The following are a few NIPS papers which I liked and I hope to study more carefully when I get the chance. The list is not exhaustive and in no particular order…

Preconditioner Approximations for Probabilistic Graphical Models.
Pradeeep Ravikumar and John Lafferty.
I thought the use of preconditioner methods from solving linear systems in the context of approximate inference was novel and interesting. The results look good and I’d like to understand the limitations.
Rodeo: Sparse nonparametric regression in high dimensions.
John Lafferty and Larry Wasserman.
A very interesting approach to feature selection in nonparametric regression from a frequentist framework. The use of lengthscale variables in each dimension reminds me a lot of ‘Automatic Relevance Determination’ in Gaussian process regression — it would be interesting to compare Rodeo to ARD in GPs.
Interpolating between types and tokens by estimating power law generators.
Goldwater, S., Griffiths, T. L., & Johnson, M.
I had wondered how Chinese restaurant processes and Pitman-Yor processes related to Zipf’s plots and power laws for word frequencies. This paper seems to have the answers.
A Bayesian spatial scan statistic.
Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper.
When I first learned about spatial scan statistics I wondered what a Bayesian counterpart would be. I liked the fact they their method was simple, more accurate, and much faster than the usual frequentist method.
Q-Clustering.
M. Narasimhan, N. Jojic and J. Bilmes.
A very interesting application of sub-modular function optimization to clustering. This feels like a hot area.
Worst-Case Bounds for Gaussian Process Models.
Sham M. Kakade, Matthias W. Seeger, & Dean P. Foster.

It’s useful for Gaussian process practitioners to know that their approaches don’t do silly things when viewed from a worst-case frequentist setting. This paper provides some relevant theoretical results.

12/11/200512/11/2005

More NIPS Papers

Let me add to John’s post with a few of my own favourites
from this year’s conference. First, let me say that
Sanjoy’s talk, Coarse Sample Complexity Bounds for Active
Learning was also one of my favourites, as was the

Forgettron paper.

I also really enjoyed the last third of
Christos’ talk
on the complexity of finding Nash equilibria.

And, speaking of tagging, I think
the U.Mass Citeseer replacement system
Rexa from the demo track is very cool.

Finally, let me add my recommendations for specific papers:

Z. Ghahramani, K. Heller: Bayesian Sets
[no preprint]
(A very elegant probabilistic information retrieval style model
of which objects are “most like” a given subset of objects.)
T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and
the Indian Buffet Process
[
preprint]
(A Dirichlet style prior over infinite binary matrices with
beautiful exchangeability properties.)
K. Weinberger, J. Blitzer, L. Saul: Distance Metric Learning for
Large Margin Nearest Neighbor Classification
[
preprint]
(A nice idea about how to learn a linear transformation of your
feature space which brings nearby points of the same class closer
together and sends nearby points of differing classes further
apart. Convex. Kilian gave a very nice talk on this.)
D. Blei, J. Lafferty: Correlated Topic Models
[
preprint]
(Nice trick using the lognormal to induce correlations on the simplex
applied to topic models for text.)

I’ll also post in the comments a list of other papers that caught my eye but
which I haven’t looked at closely enough to be able to out-and-out
recommend.

12/9/200512/9/2005

Some NIPS papers

Here is a set of papers that I found interesting (and why).

A PAC-Bayes approach to the Set Covering Machine improves the set covering machine. The set covering machine approach is a new way to do classification characterized by a very close connection between theory and algorithm. At this point, the approach seems to be competing well with SVMs in about all dimensions: similar computational speed, similar accuracy, stronger learning theory guarantees, more general information source (a kernel has strictly more structure than a metric), and more sparsity. Developing a classification algorithm is not very easy, but the results so far are encouraging.
Off-Road Obstacle Avoidance through End-to-End Learning and Learning Depth from Single Monocular Images both effectively showed that depth information can be predicted from camera images (using notably different techniques). This ability is strongly enabling because cameras are cheap, tiny, light, and potentially provider longer range distance information than the laser range finders people traditionally use.
The Forgetron: A Kernel-Based Perceptron on a Fixed Budget proved that a bounded memory kernelized perceptron algorithm (which might be characterizable as “stochastic functional gradient descent with weight decay and truncation”) competes well with respect to an unbounded memory algorithm when the data contains a significant margin. Roughly speaking, this implies that the perceptron approach can learn arbitary (via the kernel) reasonably simple concepts from unbounded quantities of data.

In addition, Sebastian Thrun‘s “How I won the Darpa Grand Challenge” and Sanjoy Dasgupta‘s “Coarse Sample Complexity for Active Learning” talks were both quite interesting.

(Feel free to add any that you found interesting.)