The Linguist is a languages magazine for professional linguists, translators, interpreters, language professionals, language teachers, trainers, students and academics with articles on translation, interpreting, business, government, technology
Issue link: https://thelinguist.uberflip.com/i/354134
that will be read by the public, must be edited by their team of 40 translators. Lewis admits that machine translation sometimes produces what looks like 'gibberish', but while users may mock that output, at Microsoft they are kinder to their machines. "You have to say, 'This is reasonable output based on what it knows'," he says. And when you understand the wizardry that goes on behind the scenes, it's hard not to agree. Rhodri Glyn Thomas, Welsh Assembly Commissioner with responsibility for the Welsh language, explains the benefits of machine translation for the Assembly. We passed legislation a couple of years ago on the Official Languages Bill, and we now have an official languages policy based on two principals: that everybody working here is able to use either English or Welsh, according to their preference; and that everybody understands what is being said or written here. We saw machine translation as a way of facilitating that. In the Assembly, as in Wales, the majority of the people are non-Welsh speakers. So if Welsh-speakers were emailing a group of people, they felt they had to do it in English. Now people can just put the email into the machine translator. The translation will not be perfect but it will be useable. It was a real challenge to establish a situation where we could ensure that documentation in Welsh was available at the same time as English. Because, if paperwork is arriving later, it puts pressure on people to use the English version. Machine translation will allow people to have a useable working document, so there won't be a need for people to translate every piece of paper which flies around the Assembly. These are documents that we use for work but don't have any further value, so there isn't a need for a perfect translation. Official documents have to be edited in order to ensure they are correct. Now we've got people editing documents rather than translating word-for-word. Vol/53 No/4 2014 AUGUST/SEPTEMBER The Linguist 15 FEATURES when the definite article yr follows a vowel it becomes 'r (as in arwyddo'r = 'to sign the'), while possessive pronouns such as eu contract following the preposition o (e.g, o'u = 'of their')."If we separate everything, they are subject to being reordered in very strange ways," says Schwartz. Yet once the problem had been identified, the solution was relatively simple, and by adapting the word breaker to Welsh, they improved the output dramatically. Microsoft's statistical system has another trick up its sleeve: a parser that assigns constituent parts to sentences (head noun, direct object, subject, etc) in English, and provides information about the relationships between those constituent parts. "A sentence is not just a flat string of words – it's a complex system of relationships – and the parser codifies that in some ways," explains Lewis. This helps the translation engine to output case, sentence order and other linguistic characteristics correctly. "In a lot of languages, a subject is treated differently than a direct object," he adds. "So if you have linguistic information going in, even for only one of the two languages, you can prune out the invalid output by saying, 'This output doesn't occur with direct objects.'" With Welsh, sentence order was a particular problem because, unlike English, which has a subject-verb-object (SVO) order, Welsh has a VSO order. "The more similar two languages appear to a statistical system, the better results you are going to get," says Schwartz. The solution was to re-order the English sentences. "The machine is basically taking the words in English and jumbling them up so they are falling in a more Welsh-like order," says Lewis. A further problem arose because many proper names correspond to common nouns, for example Awen ('inspiration, muse'), Medi ('September') and Haf ('summer'). The system uses certain clues, such as upper/lower case or whether there is an article before the noun, but that doesn't always help. The issue is much harder for German, where objects take the upper case and an article may appear before a proper noun. In some cases, an individual 'fix' has to be added. "However, you would not want to override the automatic translation of the names of months and seasons in favour of the occasional person's name, so we needed to add surnames to those examples," says Siân Richards, Senior Project Manager at the Welsh Assembly. So how good is the translation? Machine translation always has errors," Lewis admits. "But given the amount of data that we were working with it is quite remarkable." They do not aim for perfection, but for a functional translation that offers meaning, not style. To assess this, the output is compared to test data produced by human translators and then scored, by both human evaluators and an automated scoring mechanism, on a scale of 1-4, where 1 is 'gibberish', 3 is 'readable content' and 4 is a perfect translation. "We can't hope to achieve 'readable' for all our languages, so we look at this magic number around 2.5, where you get a good sense of what's being said," says Lewis. For the Welsh Assembly, this means that the automated system works well enough on emails and internal working documents that would not otherwise be translated. But official and published documents, or anything The rules that the system makes up are not necessarily linguistically motivated NEW TECHNOLOGY The launch of Microsoft's Welsh translator at the Senedd on 21 February (opposite page); and (left) Will Lewis poses at the launch with Rhodri Glyn Thomas AM (r) and Rosemary Butler AM PROVIDING A SERVICE