John Langford – Page 67 – Machine Learning (Theory)

4/27/20064/27/2006

Conferences, Workshops, and Tutorials

This is a reminder that many deadlines for summer conference registration are coming up, and attendance is a very good idea.

It’s entirely reasonable for anyone to visit a conference once, even when they don’t have a paper. For students, visiting a conference is almost a ‘must’—there is no where else that a broad cross-section of research is on display.
Workshops are also a very good idea. ICML has 11, KDD has 9, and AAAI has 19. Workshops provide an opportunity to get a good understanding of some current area of research. They are probably the forum most conducive to starting new lines of research because they are so interactive.
Tutorials are a good way to gain some understanding of a long-standing direction of research. They are generally more coherent than workshops. ICML has 7 and AAAI has 15.

4/17/20064/17/2006

Rexa is live

Rexa is now publicly available. Anyone can create an account and login.

Rexa is similar to Citeseer and Google Scholar in functionality with more emphasis on the use of machine learning for intelligent information extraction. For example, Rexa can automatically display a picture on an author’s homepage when the author is searched for.

4/14/20064/14/2006

JMLR is a success

In 2001, the “Journal of Machine Learning Research” was created in reaction to unadaptive publisher policies at MLJ. Essentially, with the creation of the internet, the bottleneck in publishing research shifted from publishing to research. The declaration of independence accompanying this move expresses the reasons why in greater detail.

MLJ has strongly changed its policy in reaction to this. In particular, there is no longer an assignment of copyright to the publisher (*), and MLJ regularly sponsors many student “best paper awards” across several conferences with cash prizes. This is an advantage of MLJ over JMLR: MLJ can afford to sponsor cash prizes for the machine learning community. The remaining disadvantage is that reading papers in MLJ sometimes requires searching for the author’s website where the free version is available. In contrast, JMLR articles are freely available to everyone off the JMLR website. Whether or not this disadvantage cancels the advantage is debatable, but essentially no one working on machine learning argues with the following: the changes brought by the creation of JMLR have been positive for the general machine learning community.

This model can and should be emulated in other areas of research where publishers are not behaving in a sufficiently constructive manner. Doing so requires two vital ingredients: a consensus of leaders to support a new journal and the willigness to spend the time and effort setting it up. Presumably, some lessons on how to do this have been learned by the editors of JMLR and they are willing to share it.

(*) Back in the day, it was typical to be forced to sign over all rights to your journal paper, then ignore this and place it on your homepage. The natural act of placing your paper on your webpage is no longer illegal.

4/9/20064/9/2006

Progress in Machine Translation

I just visited ISI where Daniel Marcu and others are working on machine translation. Apparently, machine translation is rapidly improving. A particularly dramatic year was 2002->2003 when systems switched from word-based translation to phrase-based translation. From a (now famous) slide by Charles Wayne at DARPA (which funds much of the work on machine translation) here is some anecdotal evidence:

2002

2003

insistent Wednesday may recurred her trips to Libya tomorrow for flying.

Cairo 6-4 ( AFP ) – An official announced today in the Egyptian lines company for flying Tuesday is a company “insistent for flying” may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment.

And said the official “the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air, a situation her recieving replying are so a trip will pull to Libya a morning Wednesday.”

Egyptair has tomorrow to Resume Its flight to Libya.

Cairo 4-6 (AFP) – said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flight to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya.

“The official said that the company had sent a letter to the Ministry of Foreign Affairs, information on the lifting of the air embargo on Libya, where it had received a response, the firt take off a trip to Libya on Wednesday morning”.

The machine translation systems are becoming effective at the “produces mostly understandable although broken output”. Two obvious application arise:

Web browsing. A service might deliver translations of web pages into your native language. babelfish is a first attempt. When properly integrated into
the web browser, it will appear as if every webpage uses your native language (although maybe in a broken-but-understandable way).
Instant messaging. An instant message service might deliver translations into whichever language you specify allowing communication with more people.

At this point, the feasibility of these applications is a matter of engineering and “who pays for it” coordination rather than technology development. There remain significant research challenges in tackling nonstudied language pairs and in improving the existing technology. We could imagine a point in the near future (10 years?) where the machine translation version of a Turing test is passed: humans can not distinguish between a machine translated sentence and a human translated sentence. A key observation here is that machine translation does not require full machine understanding of natural language.

The source of machine translation success seems to be a combination of better models (switching to phrase-based translation made a huge leap), application of machine learning technology, and big increases in the quantity of data available.

4/6/20064/8/2006

Bounds greater than 1

Nati Srebro and Shai Ben-David have a paper at COLT which, in the appendix, proves something very striking: several previous error bounds are always greater than 1.

Background One branch of learning theory focuses on theorems which

Assume samples are drawn IID from an unknown distribution D.
Fix a set of classifiers
Find a high probability bound on the maximum true error rate (with respect to D) as a function of the empirical error rate on the training set.

Many of these bounds become extremely complex and hairy.

Current Everyone working on this subject wants “tighter bounds”, however there are different definitions of “tighter”. Some groups focus on “functional tightness” (getting the right functional dependency between the size of the training set and a parameterization of the hypothesis space) while others focus on “practical tightness” (finding bounds which work well on practical problems). (I am definitely in the second camp.)

One of the dangers of striving for “functional tightness” is that the bound can depend on strangely interrelated parameters. In fact, apparently these strange interrelations can become so complex that they end up always larger than 1 (some bounds here and here).

It seems we should ask the question: “Why are we doing the math?” If it is just done to get a paper accepted under review, perhaps this is unsatisfying. The real value of math comes when it guides us in designing learning algorithms. Math from bounds greater than 1 is a dangerously weak motivation for learning algorithm design.

There is a significant danger in taking this “oops” too strongly.

There exist some reasonable arguments (not made here) for aiming at functional tightness.
The value of the research a person does is more related to the best they have done than the worst.