Kolmogorov Complexity and Googling

Machine learning makes the New Scientist. From the article:

COMPUTERS can learn the meaning of words simply by plugging into Google. The finding could bring forward the day that true artificial intelligence is developed….
But Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam, the Netherlands, realised that a Google search can be used to measure how closely two words relate to each other. For instance, imagine a computer needs to understand what a hat is.

You can read the paper at KC Google.

Hat tip: Kolmogorov Mailing List

Any thoughts on the paper?

Why I decided to run a weblog.

I have decided to run a weblog on machine learning and learning theory research. Here are some reasons:

1) Weblogs enable new functionality:

  • Public comment on papers. No mechanism for this exists at conferences and most journals. I have encountered it once for a science paper. Some communities have mailing lists supporting this, but not machine learning or learning theory. I have often read papers and found myself wishing there was some method to consider other’s questions and read the replies.
  • Conference shortlists. One of the most common conversations at a conference is “what did you find interesting?” There is no explicit mechanism for sharing this information at conferences, and it’s easy to imagine that it would be handy to do so.
  • Evaluation and comment on research directions. Papers are almost exclusively about new research, rather than evaluation (and consideration) of research directions. This last role is satisfied by funding agencies to some extent, but that is a private debate of a subset of the community. It’s easy to imagine that a public debate would be more thorough and thoughtful, producing better decisions.
  • Public Collaboration. It may be feasible to use a weblog as a mechanism for public research on a scale less than a paper. Currently, most research is done in machine learning by one or a few closely working and privately communicating authors. Weblogs provide a natural generalization where anyone who is interested may be able to contribute.
  • The things not thought of. Weblogs provide new capabilities, and it is natural to miss the impact of these capabilities until a number of people have thought about and used them.

I intend to experiment with these capabilities.

2) Weblogs have the potential to be revolutionary. Here is a comparison of the different mechanisms of communication in a table.

mechanism speed scope permanency information filtration
journal papers 6 months to years. Anyone with interest and access. Very permanent reviewed
conference papers 4-6 months Attendees (and often any with interest). Permanent reviewed
workshops 1-6 months Attendees Typically Transient inspected
mailing lists a few days Anyone subscribed (or reading archives). Semipermanent (with archives) inspected
personal discussion thought speed Whoever is there then. Transient not reviewed
weblog thought speed Anyone with interest Semipermaent not reviewed

Weblogs achieve “best we can imagine” in every category except permanency and quality control. Furthermore, the weaknesses are not inherent to the medium, and are being actively addressed.

Permalinks are the equivalent of a citation, providing a semipermanent pointer to a piece of content. This is only ‘semi’ becuase the _author_ of the content can typically revise the content at any moment in the future and the pointer is only permanet up to the permanence of the website.
Trackback is an explicit method for creating the reverse lookup table of citations: who cites this?
In addition, there are several mechanisms for information filtration such as “post is reposted in another weblog” and experimental moderation schemes.

The same forces driving academia into desiring permanent indelible records and very careful information filtration exist for blogs. These forces may produce the ‘missing pieces’, making weblogs very compelling for academic purposes.

3) Lance Fortnow told me so.