Machine Learning (Theory)

12/11/2005

More NIPS Papers

Tags: Papers roweis@ 1:14 am

Let me add to John’s post with a few of my own favourites
from this year’s conference. First, let me say that
Sanjoy’s talk, Coarse Sample Complexity Bounds for Active
Learning
was also one of my favourites, as was the

Forgettron paper
.

I also really enjoyed the last third of
Christos’ talk
on the complexity of finding Nash equilibria.

And, speaking of tagging, I think
the U.Mass Citeseer replacement system
Rexa from the demo track is very cool.

Finally, let me add my recommendations for specific papers:

  • Z. Ghahramani, K. Heller: Bayesian Sets
    [no preprint]
    (A very elegant probabilistic information retrieval style model
    of which objects are “most like” a given subset of objects.)
  • T. Griffiths, Z. Ghahramani: Infinite Latent Feature Models and
    the Indian Buffet Process

    [
    preprint
    ]
    (A Dirichlet style prior over infinite binary matrices with
    beautiful exchangeability properties.)
  • K. Weinberger, J. Blitzer, L. Saul: Distance Metric Learning for
    Large Margin Nearest Neighbor Classification

    [
    preprint
    ]
    (A nice idea about how to learn a linear transformation of your
    feature space which brings nearby points of the same class closer
    together and sends nearby points of differing classes further
    apart. Convex. Kilian gave a very nice talk on this.)
  • D. Blei, J. Lafferty: Correlated Topic Models
    [
    preprint
    ]
    (Nice trick using the lognormal to induce correlations on the simplex
    applied to topic models for text.)

I’ll also post in the comments a list of other papers that caught my eye but
which I haven’t looked at closely enough to be able to out-and-out
recommend.

6 Comments to “More NIPS Papers”
  1. Kevembuangga says:

    I think the U.Mass Citeseer replacement system Rexa from the demo track is very cool.

    Would be very cool…
    It is not yet really open:
    “Access to Rexa will be unrestricted later in March.

    If a demonstration is needed for NSF-related reasons, you may obtain a private login/password by having your NSF program director contact Andrew McCallum at mccallum@cs.umass.edu.”

    This mostly means that they didn’t “grok” the Internet yet.
    Any restriction which prevents or even just slows down the adoption of a new feature or service is detrimental, competitors will have plenty of time to settle in, no matter how “superior” the restricted service promises to be.

    P.S. You link is wrong “http://hunch.net/rexa.info”, spurious “hunch.net/” to be removed.

  2. The “restriction” is simply that the service is not ready for public use yet. As for “competitors,” lighten up. Rexa is a research project, not a commercial service.

  3. Kevembuangga says:

    Oh, well! Fernando did not you notice that competition happens even for non commercial activities?
    Never had any colleagues publishing on the very same topics as you?
    Even good friends…
    Competition is the name of the game everywhere!
    BTW, since you’re in “machine-learnable models of language” what do you think of Jeffrey Elman’s “An alternative view of the mental lexicon”
    http://crl.ucsd.edu/~elman/Papers/elman_tics_opinion_2004.pdf

  4. hal says:

    This is getting a bit OT, but I think Andrew’s doing the right thing. They’re in the process of moving from about 100k docs indexed to 300k, which is substantial. They’re also moving from a dumb coreference system (that thinks I am the same as Henry Randel III or something, because of first letter, last name matching [it doesn't realize III isn't a last name]) to a good one. If they released something last summer with 50k docs and bad coref, people would go, think it interesting but overall not worthwhile and never go back. Despite current trends in, say, software, to release soon and release often, this is often not the best strategy, especially for something that alreay has competition (citeseer, google scholar), has a non-zero cost of adoption, and whose primary audience is rather picky (academics).

  5. Kevembuangga says:

    This is getting a bit OT,

    Not so sure, this started with Chaitanya Sai comment in a previous post http://hunch.net/?p=148#comments
    with the general idea to “ease of sharing and disseminating information” and “the idea of tagging posts with something specific”.
    This is only remotely related to the kind of services provided by either citeseer or google scholar rather closer to what happens on del.icio.us with a focus on academic work.
    Did you try or even looked at CiteULike?
    Some academics are may be too picky (or too “academic”?> but many are not already!
    The whole point is not who publish or cite what but who reads what and how they categorize.
    The value of this is that it is live and current, full of noise and may be even a bit of crap but lot of gems too.
    http://www.citeulike.org/
    And, I am NOT affiliated with this Richard Cameron guy in any way

  6. I too am very excited about Rexa, based on the demo.

    I enjoyed the Bayesian Sets talk, although, like the questioner in the audience, I thought they were a little unfair to suggest that their results were better than Google. For example, FOZZIE BEAR is an excellent response to the query ANIMAL, since they are both Muppets. Fortunately, since Google’s algorithm isn’t published, I don’t think they have a huge burden to show that their method is better.

Sorry, the comment form is closed at this time.

Powered by WordPress