The Linguist is a languages magazine for professional linguists, translators, interpreters, language professionals, language teachers, trainers, students and academics with articles on translation, interpreting, business, government, technology
Issue link: https://thelinguist.uberflip.com/i/220935
FOCUS: CAT TOOLS Machine precision Why there's never been a better time to dip your toe into the machine translation water. By Dragoș Ciobanu magine you are going for a dip at your favourite pool. You are looking forward to it: you've been practising (even took some lessons); the calm water beats any other way to relax; and swimming offers a complete exercise, without the need for expensive equipment. You may wonder what this has got to do with translating, and even more so with an article on Machine Translation (MT), but bear with me a moment longer. The trip is full of surprises: the membership fee has increased; two enthusiastic beginners are creating huge waves that almost drown you; you are deafened by tunes blasting through the speakers; and you are required to vacate the pool within 20 minutes. Sound familiar? What should you do now? You could abandon the activity altogether, or you could look into the other options available. Instead of paying membership fees to the nearest facility, do your research and pick the best choice for you. In relation to MT, this would be the tool that gives you the best results for the language pair(s) and text genre(s) you work with. Just as acquiring goggles and earplugs can make swimming easier, CAT tools can aid translation. In that analogy, MT might equate to an underwater MP3 player: it sounds fancy but they are quite easy to get hold of nowadays, reasonably priced (thanks to technological developments), but not yet perfect. It's the same with MT. I Technological evolution In the 1950s, researchers had high hopes for MT – the 1954 Georgetown-IBM experiment,1 working on Russian-English, looked particularly promising. However, as decades passed, it became obvious that the first approach to MT – ie, to hard-code all the linguistic rules applying to a particular language pair and direction – was extremely expensive and timeconsuming. Known as Rule-Based MT (RBMT), it also came with significant challenges, as rules started interfering with each other and introducing noise in the output. As the size of the internet grew, increasingly larger corpora of both bilingual parallel texts and monolingual texts became accessible. At the same time, computing power kept improving. A new MT approach emerged: statistical machine translation (SMT). The question 'Which rules should be used to translate this sentence from language X into language Y?' evolved to: 'Which is the best translation from this multitude of statistically probable target-language versions?' Despite the complexity of the question, SMT gained traction in the 1980s and became popular after 2000, with its introduction by Google and then Microsoft. Both companies base their SMT systems on the whole of the internet, with no distinction (yet) between domains and their specific terminologies. A welcome intermezzo was offered by the EU-funded free, open-source Moses SMT engine.2 Quite a few other companies now offer bespoke MT systems, which can be trained on individual domains for much more accurate results. If you are wondering how much better custom systems can be compared to the Big Two, you may not like the next sentence. Both Google and Microsoft are constantly updating their systems, so any comparisons become outdated even before the MT output reaches the human evaluators. What researchers and the industry have been reporting, though, are increases in productivity rates for translators adopting MT. These range from a 15-20 percent increase for translators switching from a TM-only scenario to MT+TM3, to a 328 percent increase for translators using MT compared to those working without CAT tools.4 (Although these statistics may be influenced by the quality of the source texts and post-editing guidelines.) But SMT alone is not the future. The better SMT relies on huge collections of data. It cleverly builds a language model for each language pair and direction based on tens of millions of translated words, with similar quantities of authentic target language texts. However, there is room for RBMT in some language pairs – especially in the case of highly inflected or, even more challenging, agglutinative target languages, such as Hungarian and Finnish.