Machine Translation – How it Works, What Users Expect, and What They G…

Machine translation (MT) systems are now ubiquitous. This ubiquity is due to a combination of increased need for translation in today’s global marketplace, and an exponential growth in computing strength that has made such systems viable. And under the right circumstances, MT systems are a powerful tool. They offer low-quality translations in situations where low-quality translation is better than no translation at all, or where a rough translation of a large document delivered in seconds or minutes is more useful than a good translation delivered in three weeks’ time.

Unfortunately, despite the extensive accessibility of MT, it is clear that the purpose and limitations of such systems are frequently misunderstood, and their capability widely overestimated. In this article, I want to give a fleeting overview of how MT systems work and consequently how they can be put to best use. Then, I’ll present some data on how Internet-based MT is being used right now, and show that there is a chasm between the intended and actual use of such systems, and that users nevertheless need educating on how to use MT systems effectively.

How machine translation works

You might have expected that a computer translation program would use grammatical rules of the languages in question, combining them with some kind of in-memory “dictionary” to produce the resulting translation. And indeed, that’s essentially how some earlier systems worked. But most modern MT systems truly take a statistical approach that is quite “linguistically blind”. Essentially, the system is trained on a corpus of example translations. The consequence is a statistical form that incorporates information such as:

– “when the words (a, b, c) occur in series in a sentence, there is an X% chance that the words (d, e, f) will occur in series in the translation” (N.B. there don’t have to be the same number of words in each pair);

– “given two subsequent words (a, b) in the target language, if information (a) ends in -X, there is an X% chance that information (b) will end in -Y”.

Given a huge body of such observations, the system can then translate a sentence by considering various candidate translations– made by stringing words together almost at random (in reality, via some ‘naive selection’ course of action)– and choosing the statistically most likely option.

On hearing this high-level description of how MT works, most people are surprised that such a “linguistically blind” approach works at all. What’s already more surprising is that it typically works better than rule-based systems. This is partly because relying on grammatical examination itself introduces errors into the equation (automated examination is not completely accurate, and humans don’t always agree on how to analyse a sentence). And training a system on “bare text” allows you to base a system on far more data than would otherwise be possible: corpora of grammatically analysed texts are small and few and far between; pages of “bare text” are obtainable in their trillions.

However, what this approach does average is that the quality of translations is very dependent on how well elements of the source text are represented in the data originally used to aim the system. If you accidentally kind he will returned or vous avez demander (instead of he will return or vous avez needé), the system will be hampered by the fact that sequences such as will returned are doubtful to have occurred many times in the training corpus (or worse, may have occurred with a completely different meaning, as in they needed his will returned to the solicitor). And since the system has little concept of grammar (to work out, for example, that returned is a form of return, and “the infinitive is likely after he will”), it in effect has little to go on.

Similarly, you may ask the system to translate a sentence that is perfectly grammatical and shared in everyday use, but which includes features that happen not to have been shared in the training corpus. MT systems are typically trained on the types of text for which human translations are freely obtainable, such as technical or business documents, or transcripts of meetings of multilingual parliaments and conferences. This gives MT systems a natural bias towards certain types of formal or technical text. And already if everyday vocabulary is nevertheless covered by the training corpus, the grammar of everyday speech (such as using tú instead of usted in Spanish, or using the present tense instead of the future tense in various languages) may not.

MT systems in practice

Researches and developers of computer translation systems have always been aware that one of the biggest dangers is public misperception of their purpose and limitations. Somers (2003)[1], observing the use of MT on the web and in chat rooms, comments that: “This increased visibility of MT has had a number of side effets. […] There is certainly a need to educate the general public about the low quality of raw MT, and, importantly, why the quality is so low.” Observing MT in use in 2009, there’s sadly little evidence that users’ awareness of these issues has improved.

As an illustration, I’ll present a small sample of data from a Spanish-English MT service that I make obtainable at the Español-Inglés web site. The service works by taking the user’s input, applying some “cleanup” processes (such as correcting some shared orthographical errors and decoding shared instances of “SMS-speak”), and then looking for translations in (a) a bank of examples from the site’s Spanish-English dictionary, and (b) a MT engine. Currently, Google Translate is used for the MT engine, although a custom engine may be used in the future. The figures I present here are from an examination of 549 Spanish-English queries presented to the system from machines in Mexico[2]– in other words, we assume that most users are translating from their native language.

First, what are people using the MT system for? For each query, I attempted a “best guess” at the user’s purpose for translating the query. In many situations, the purpose is quite obvious; in a few situations, there is clearly ambiguity. With that caveat, I estimate that in about 88% of situations, the intended use is fairly clear-cut, and categorise these uses as follows:

  • Looking up a single information or term: 38%
  • Translating a formal text: 23%
  • Internet chat session: 18%
  • Homework: 9%

A surprising (if not upsetting!) observation is that in such a large proportion of situations, users are using the translator to look up a single information or term. In fact, 30% of queries consisted of a single information. The finding is a little surprising given that the site in question also has a Spanish-English dictionary, and indicates that users confuse the purpose of dictionaries and translators. Although not represented in the raw figures, there were clearly some situations of consecutive searches where it appeared that a user was deliberately splitting up a sentence or phrase that would have probably been better translated if left together. Perhaps as a consequence of student over-drilling on dictionary usage, we see, for example, a query for cuarto para (“quarter to”) followed closest by a query for a number. There is clearly a need to educate students and users in general on the difference between the electronic dictionary and the machine translator[3]: in particular, that a dictionary will guide the user to choosing the appropriate translation given the context, but requires single-information or single-phrase lookups, while a translator generally works best on whole sentences and given a single information or term, will simply report the statistically most shared translation.

I calculate that in less than a quarter of situations, users are using the MT system for its “trained-for” purpose of translating or gisting a formal text (and are entering an complete sentence, or at the minimum uncompletely sentence instead of an secluded noun phrase). Of course, it’s impossible to know whether any of these translations were then intended for publication without further proof, which definitely isn’t the purpose of the system.

The use for translating formal texts is now almost rivalled by the use to translate informal on-line chat sessions– a context for which MT systems are typically not trained. The on-line chat context poses particular problems for MT systems, since features such as non-standard spelling, without of punctuation and presence of colloquialisms not found in other written settings are shared. For chat sessions to be translated effectively would probably require a dedicated system trained on a more appropriate (and possibly custom-built) corpus.

It’s not too surprising that students are using MT systems to do their homework. But it’s interesting to observe to what extent and how. In fact, use for homework incudes a combination of “fair use” (understanding an exercise) with an attempt to “get the computer to do their homework” (with predictably dire results in some situations). Queries categorised as homework include sentences which are clearly instructions to exercises, plus certain sentences explaining unimportant generalities that would be uncommon in a text or conversation, but which are typical in beginners’ homework exercises.

at any rate the use, an issue for system users and designers alike is the frequency of errors in the source text which are liable to make difficulty the translation. In fact, over 40% of queries contained such errors, with some queries containing several. The most shared errors were the following (queries for single words and terms were excluded in calculating these figures):

  • Missing accents: 14% of queries
  • Missing punctuation: 13%
  • Other orthographical error: 8%
  • Grammatically incomplete sentence: 8%

Bearing in mind that in the majority of situations, users where translating from their native language, users appear to underestimate the importance of using standard orthography to give the best chance of a good translation. More subtly, users do not always understand that the translation of one information can depend on another, and that the translator’s job is more difficult if grammatical constituents are incomplete, so that queries such as hoy es día de are not uncommon. Such queries make difficulty translation because the chance of a sentence in the training corpus with, say, a “dangling” preposition like this will be slim.

Lessons to be learnt…?

At present, there’s nevertheless a mismatch between the performance of MT systems and the expectations of users. I see responsibility for closing this gap as lying in the hands both of developers and of users and educators. Users need to think more about making their source sentences “MT-friendly” and learn how to estimate the output of MT systems. Language courses need to address these issues: learning to use computer translation tools effectively needs to be seen as a applicable part of learning to use a language. And developers, including myself, need to think about how we can make the tools we offer better appropriate to language users’ needs.

Notes

[1] Somers (2003), “Machine Translation: the Latest Developments” in The Oxford Handbook of Computational Linguistics, OUP.

[2] This strange number is simply because queries matching the selection criteria were captured with random probability within a fixed time frame. It should be noted that the system for deducing a machine’s country from its IP address is not completely accurate.

[3] If the user enters a single information into the system in question, a message is displayed beneath the translation suggesting that the user would get a better consequence by using the site’s dictionary.

Leave a Reply