I just visited ISI where Daniel Marcu and others are working on machine translation. Apparently, machine translation is rapidly improving. A particularly dramatic year was 2002->2003 when systems switched from word-based translation to phrase-based translation. From a (now famous) slide by Charles Wayne at DARPA (which funds much of the work on machine translation) here is some anecdotal evidence:
2002 | 2003 |
insistent Wednesday may recurred her trips to Libya tomorrow for flying.
Cairo 6-4 ( AFP ) – An official announced today in the Egyptian lines company for flying Tuesday is a company “insistent for flying” may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment. And said the official “the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air, a situation her recieving replying are so a trip will pull to Libya a morning Wednesday.” |
Egyptair has tomorrow to Resume Its flight to Libya.
Cairo 4-6 (AFP) – said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flight to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya. “The official said that the company had sent a letter to the Ministry of Foreign Affairs, information on the lifting of the air embargo on Libya, where it had received a response, the firt take off a trip to Libya on Wednesday morning”. |
The machine translation systems are becoming effective at the “produces mostly understandable although broken output”. Two obvious application arise:
- Web browsing. A service might deliver translations of web pages into your native language. babelfish is a first attempt. When properly integrated into
the web browser, it will appear as if every webpage uses your native language (although maybe in a broken-but-understandable way). - Instant messaging. An instant message service might deliver translations into whichever language you specify allowing communication with more people.
At this point, the feasibility of these applications is a matter of engineering and “who pays for it” coordination rather than technology development. There remain significant research challenges in tackling nonstudied language pairs and in improving the existing technology. We could imagine a point in the near future (10 years?) where the machine translation version of a Turing test is passed: humans can not distinguish between a machine translated sentence and a human translated sentence. A key observation here is that machine translation does not require full machine understanding of natural language.
The source of machine translation success seems to be a combination of better models (switching to phrase-based translation made a huge leap), application of machine learning technology, and big increases in the quantity of data available.