Machine learning makes the New Scientist. From the article:
COMPUTERS can learn the meaning of words simply by plugging into Google. The finding could bring forward the day that true artificial intelligence is developed….
But Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam, the Netherlands, realised that a Google search can be used to measure how closely two words relate to each other. For instance, imagine a computer needs to understand what a hat is.
You can read the paper at KC Google.
Hat tip: Kolmogorov Mailing List
Any thoughts on the paper?
The meaning of “meaning” in this paper isn’t what I naively expect. I tend to expect something like “the dictionary entries for a word”. Here, the meaning of “meaning” is a nonmetric distance between words.
There are other common approaches to computing nonmetric distances between words such as using “latent semantic indexing” (= singular value decomposition), and the frequencies of individual words in much smaller sets of documents.
The nice thing here is the source of information, and how you might use it. One common theme of learning is that using a lot of information naively is often better than using a small amount of information with great care.
You’re right about the definition– it’s quite interesting. It’s also worth pointing out that there have been some rather more sophisticated uses of this “google distribution” in the recent past. For instance, Intel Research has shown they build HMM’s from
this data:
http://seattleweb.intel-research.net/people/fishkin/pubs_files/www04_guide.pdf