Research about




Machine Translation












Machine Translation

Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Current machine translation software often allows for customisation by domain or profession (such as weather reports) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text.

Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports).
"What is Machine Translation?
Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains." A definition from the European Association for Machine Translation (EAMT), "an organization that serves the growing community of people interested in MT and translation tools, including users, developers, and researchers of this increasingly viable technology."

History of Machine Translation

The idea of machine translation may be traced back to the 17th century. In 1629, René Descartes proposed a universal language, with equivalent ideas in different tongues sharing one symbol. In the 1950s, The Georgetown experiment (1954) involved fully-automatic translation of over sixty Russian sentences into English. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem.

Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation.

The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A. D. Booth and possibly others. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (University of London) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer.

Approaches

Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation .Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language.
It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first.
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.
Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use.
To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.


Disambiguation

Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel.[2] He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.[3] Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.

Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.[citation needed].

The late Claude Piron, a long-time translator for the United Nations and the World Health Organization, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved:

Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are the ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia. [4]
The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.
Me Translate Pretty One Day - Spanish to English? French to Russian? Computers haven't been up to the task. But a New York firm with an ingenious algorithm and a really big dictionary is finally cracking the code. By Evan Ratliff. Wired (December 2006; Issue 14.12). "Jaime Carbonell, chief science officer of Meaningful Machines, hunches over his laptop in the company's midtown Manhattan offices, waiting for it to decode a message from the perpetrators of a grisly terrorist attack. Running software that took four years and millions of dollars to develop, Carbonell's machine -- or rather, the server farm it's connected to a few miles away -- is attempting a task that has bedeviled computer scientists for half a century. The message isn't encrypted or scrambled or hidden among thousands of documents. It's simply written in Spanish:.... Language translation is a tricky problem, not only for a piece of software but also for the human mind."
Mark my words. The Economist (February 16, 2007). "For those who put their faith in technology, therefore, it was encouraging to hear Shinzo Abe, Japan’s prime minister, demonstrate his linguistic skills a few weeks ago with a palm-sized gizmo that provided instantaneous translations of spoken Japanese into near-flawless English and Chinese. ... [T]he fact that a pocket-sized device could interpret tourist-type phrases accurately and on the fly, from one language to several others, says much about the improvements that have been made lately in machine translation. This device, developed by the Advanced Telecommunications Research Institute International near Kyoto.... Machine translation has been an elusive goal since the earliest days of computer science. ... The main drivers for this more pragmatic approach to machine translation have been the enlargement of the European Union and the spread of the internet. Both have generated a pressing need for cheap and cheerful translations between numerous languages. In turn, this has spawned a wealth of new translation approaches."
Translation Tools - New Approaches to an Old Discipline. Automated translation tools have been around for a long time, and new techniques are boosting their performance. But use them with caution. By Gary Anthes. Computerworld (August 13, 2007). "Language translation software isn’t likely to allow you to lay off your bilingual staffers -- at least not right away. But applied with discrimination and lots of preparation, translation tools can be fantastic productivity aids. And researchers say new approaches to this old discipline are greatly improving the performance of the tools. Ford Motor Co. began using 'machine translation' software in 1998 and has so far translated 5 million automobile assembly instructions into Spanish, German, Portuguese and Mexican Spanish. Assembly manuals are updated in English every day, and their translations -- some 5,000 pages a day -- are beamed overnight to plants around the world. 'It wouldn’t be feasible to do this all manually,' says Nestor Rychtyckyj, a technical specialist in artificial intelligence (AI) at Ford. ... Systran’s tool uses a tried-and-true translation technique called rules-based translation. ... Statistical machine translation is a newer technique that’s not yet in widespread use. It uses collections of documents and their translations to 'train' software. Over time, these data-driven systems 'learn' what makes a good translation and what doesn’t and then use probability and statistics to decide which of several possible translations of a given word or phrase is most likely correct based on context. ... 'The new direction in the research community is to see how you can combine these purely statistical techniques with some linguistic knowledge,' says Steve Richardson, a senior researcher at Microsoft. 'It’s modeling the rules with the statistical methods.' ... Automated translation in the corporate world succeeds to the extent that users are willing to carefully customize systems to their unique needs and vocabularies, he says. And the technology is most appropriate when translations don’t have to be perfect. 'We have serviced thousands and thousands of customers with articles we have machine-translated,' Richardson says. 'It’s not perfect, but it’s good enough. They get an answer without calling in. What’s that worth to the company' ... [H]ybrid systems, which combine translation memories and machine translation based on rules or statistics or both, are the wave of the future, researchers say, and they are becoming more sophisticated and complex. ... In essence, SRI’s approach is to do machine translations with the best available rules-based and statistical-based systems, and then have another system that 'adjudicates' among them in real time to find the best translation."
"The Center for Machine Translation (CMT) is a research branch of the School of Computer Science [at Carnegie Mellon University] devoted to basic and applied research in all aspects of natural language processing, with a primary focus on machine translation, speech processing, and information retrieval. Containing a unique mix of academic and industrial researchers specializing in various aspects of computer science, artificial intelligence, computational linguistics and theoretical linguistics...."
Association for Machine Translation in the Americas. "AMTA is an association dedicated to anyone interested in the translation of languages using computers in some way. This includes people with translation needs, commercial system developers, researchers, sponsors, and people studying, evaluating, and understanding the science of machine translation (MT) and educating the public on important scientific techniques and principles involved. ... AMTA has members in Canada, Latin America, and the United States. It is the regional component of a worldwide network headed by the International Association for Machine Translation (IAMT)."
Automating Knowledge Acquisition for Machine Translation. By Kevin Knight. AI Magazine, 18(4): Winter 1997, 81-96. "Machine translation of human languages (for example, Japanese, English, Spanish) was one of the earliest goals of computer science research, and it remains an elusive one. Like many AI tasks, translation requires an immense amount of knowledge about language and the world. Recent approaches to machine translation frequently make use of text-based learning algorithms to fully or partially automate the acquisition of knowledge. This article illustrates these approaches."