The Linguist

The Linguist 55,6

The Linguist is a languages magazine for professional linguists, translators, interpreters, language professionals, language teachers, trainers, students and academics with articles on translation, interpreting, business, government, technology

Issue link: https://thelinguist.uberflip.com/i/757464

Contents of this Issue

Navigation

Page 10 of 35

thelinguist.uberflip.com 11 FEATURES and on a probabilistic model of the target language, learned from a large monolingual corpus of texts. Such 'learning' is done in a so-called 'training' phase. In a second 'tuning' phase, system developers work out the optimal weight that should be assigned to each model to get the best outcome. When the system is called upon to translate new text (in a third phase called 'decoding'), it searches for the most probable target-language sentence given a particular source sentence, the models it has learned and the weights assigned to them. SMT systems thus have a tri-partite architecture and involve a lot of tuning to find the optimal weights for different models. The models used are based on n-grams – i.e. strings of one, two, three or n words that appear contiguously in the training data used. SMTs can have difficulty handling discontinuous dependencies, such as that between 'threw' and 'out' in the sentence 'She threw all her old clothes out'. This is due to the relatively limited amount of context used to build models, and the fact that the n-grams are translated largely independently of each other and don't necessarily correspond to any kind of structural unit. SMT systems are also known to perform poorly for agglutinative and highly inflected languages, as they have no principled way of handling grammatical agreement. Other problems include word drop, where a system fails to translate a given source word, and inconsistency, where the same source- language word is translated two different ways, sometimes in the same sentence. These are precisely the kind of errors that human post-editors are employed to fix. The editing environments used by post-editors are often the same as those used by translators, namely the interfaces provided by TM tools. Although they are distinct technologies, the lines between TM and SMT are blurring somewhat, as it is now common for translators to be fed automatic translations directly from an SMT system when their translation memory does not contain a match for the source sentence. TM and SMT are also intimately connected by the fact that the translation memories that translators build up over time can become training data for their very own (or someone else's) SMT engine. Dominating the field Despite known problems, SMT systems have come to dominate the field of machine translation, out-performing previously leading systems. In the last two years, however, there has been a new kid on the block: neural machine translation (NMT). Like SMTs, NMT systems learn how to translate from pre-existing source texts and their translations. They have a simpler architecture than SMTs however, and don't use models based on n-grams. Instead, they use artificial neural networks in which individual nodes that can hold single words, phrases or whole sentences are connected with other nodes in the network. The connections between nodes are strengthened via bilingual training data. When it comes to translating new inputs, the system reads through the source-language sentence one word at a time. Then it starts outputting one target word at a time until it reaches the end of the sentence. NMT systems thus process full sentences (rather than n-grams). They handle morphology, lexical selection and word order phenomena (including discontinuous dependencies) better than SMTs, but they take much longer and much more computing power to train. These are problems that large corporations can overcome and, in late September 2016, Google announced that all Chinese-to- English translation delivered by Google Translate for mobile and web apps would henceforth be powered by Google Neural Machine Translation (GNMT). 6 However, problems like word drop, mistranslations (especially of rare words) and contextually inappropriate translations can still occur. There may still be work, in other words, for post-editors. To date, however, we have little or no knowledge of what it is like to work as a post-editor of NMT output. Training implications But back to our questions: what does all this mean for the training of future translators and interpreters? And what might a career in post-editing look like? To answer the first question it is worth looking to the field of labour economics. It used to be the case that routine work was considered particularly INSIGHTS Dorothy Kenny at Stationers' Hall (left)

Articles in this issue

Links on this page

Archives of this issue

view archives of The Linguist - The Linguist 55,6