Progress in Machine Translation

I just visited ISI where Daniel Marcu and others are working on machine translation. Apparently, machine translation is rapidly improving. A particularly dramatic year was 2002->2003 when systems switched from word-based translation to phrase-based translation. From a (now famous) slide by Charles Wayne at DARPA (which funds much of the work on machine translation) here is some anecdotal evidence:

2002 2003
insistent Wednesday may recurred her trips to Libya tomorrow for flying.

Cairo 6-4 ( AFP ) – An official announced today in the Egyptian lines company for flying Tuesday is a company “insistent for flying” may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment.

And said the official “the institution sent a speech to Ministry of Foreign Affairs of lifting on Libya air, a situation her recieving replying are so a trip will pull to Libya a morning Wednesday.”

Egyptair has tomorrow to Resume Its flight to Libya.

Cairo 4-6 (AFP) – said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flight to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya.

“The official said that the company had sent a letter to the Ministry of Foreign Affairs, information on the lifting of the air embargo on Libya, where it had received a response, the firt take off a trip to Libya on Wednesday morning”.

The machine translation systems are becoming effective at the “produces mostly understandable although broken output”. Two obvious application arise:

  1. Web browsing. A service might deliver translations of web pages into your native language. babelfish is a first attempt. When properly integrated into
    the web browser, it will appear as if every webpage uses your native language (although maybe in a broken-but-understandable way).
  2. Instant messaging. An instant message service might deliver translations into whichever language you specify allowing communication with more people.

At this point, the feasibility of these applications is a matter of engineering and “who pays for it” coordination rather than technology development. There remain significant research challenges in tackling nonstudied language pairs and in improving the existing technology. We could imagine a point in the near future (10 years?) where the machine translation version of a Turing test is passed: humans can not distinguish between a machine translated sentence and a human translated sentence. A key observation here is that machine translation does not require full machine understanding of natural language.

The source of machine translation success seems to be a combination of better models (switching to phrase-based translation made a huge leap), application of machine learning technology, and big increases in the quantity of data available.

8 Replies to “Progress in Machine Translation”

  1. Machine Translation (MT) in Mind.Forth will be predicated on the idea that “there must be only one deep conceptual mindcore but there may be multiple lexicons and multiple syntactic superstructures in the AI Mind.” Under this idea, thought occurs independently of language but finds expression in one language for communication or in multiple languages for machine translation.

  2. A key observation here is that machine translation does not require full machine understanding of natural language.

    I would refine that to: Adequately human understandable machine translation does not require full machine understanding. The research questions then become: What is the distribution of adequacy and how does it vary (e.g. as a function of type of text)? I suspect that the MT of a philosophical argument would be much less adequate then the MT of a concrete, factual text.

    One of the possible causes of “broken output” would be circumstances in which the MT actually needed machine understanding in order to produce correct output. The broken output is adequate because the human reader is able to sufficiently reconstruct the meaning from the context. This make the full understanding of the reader central to the adequacy of the translation. I suspect that the need for machine understanding would be more dramatically demonstrated if you translated the text through more than one language (A->B->C) because of the lack of human semantic cleanup on the intermediate stages.

  3. What is “machine understanding of natural language” ? What degree of “NL understanding” is required for adequate translation? If it will be well defined then we could build the more reliable MT products, isn’t it?

  4. This site is very interesting for me because took about MT. MT is a new for me. I hope will get more knowledge from this forum. If somebody have related resources, please send to me. Tq

  5. I’m one of “human translators”, and I have to confess that more than once time I had to translate a text without full understanding of it (I mean a special text in a thematic domain I was unacquainted with). In fact, understanding the grammar structure of a phrase and having a good dictionary, I could manage it, and what is more not bad 🙂 But with a translation with “full understanding” would be much more reliable. In fact, there are a lot of formal links in a language, association, connotation, context rules for choosing one or another translation, etc., which can be used as a framework for a translation, and a great many of these links are not discovered up to the present, or discovered but not taken into account by MT developpers – that’s why we can see a considerable progress in machine translation, and will see it, I guess, during the nearest ten years.

  6. MT reached new heights recently (this topic is from 2006). Many developers now combine different methods and the results are getting better. Specialists now even discuss advantages and disadvantages of the different methods. We can’t speak of “full understanding” yet.

  7. You can also refer to for more info on the topic -both from developers, linguists and human translators.

Comments are closed.