The Linguist

The Linguist 53,4

The Linguist is a languages magazine for professional linguists, translators, interpreters, language professionals, language teachers, trainers, students and academics with articles on translation, interpreting, business, government, technology

Issue link:

Contents of this Issue


Page 13 of 35

This is enhanced by 'contextual clues', which increase the probability of a particular word being correct. Lewis gives the example of the German Wand versus Mauer, for which there is only one word in English: 'wall'. Rather than learning that Wand refers to an inside wall and Mauer to an outside wall, as a human might, the machine learns by seeing the co-occurrence of, for example, 'bedroom' or 'kitchen' and Wand, compared to 'Berlin Wall' and Mauer. It takes a lot of data to get such subtleties right: ideally, tens of millions of sentence pairs. And that data has to be of a high quality to make sure that the system is not inputting errors. For Lewis, this is why the Welsh engine works so well. It was launched in partnership with the Welsh Assembly, which gave Microsoft access to most of its documentation. 14 The Linguist AUGUST/SEPTEMBER FEATURES Find out how translation engines are made for each new language, as Miranda Moore meets Microsoft Will Lewis, Principal Program Manager for Microsoft Translator, is talking to me about 'magic'. Specifically, the sort of hocus pocus that goes into creating a machine that can translate information from one language into another at a functional level of quality. As with all conjuring tricks, the secret is altogether more mundane: data, data and more data. Microsoft's engines are statistical translators, meaning that they learn by comparing parallel documents to find out the most probable translation for each word. "It starts to see regular co-occurrences. So, over time, the probability of diolch showing up in a Welsh document every time 'thanks' or 'thank you' shows up in the English increases, until finally it becomes very confident that one is a translation of the other," explains Lewis. Crucially, Assembly staff also did their own quality checks to ensure that they were providing good, clean translations. The results have transformed the Assembly's working practices since the Welsh translator launched in February, enabling them to function more bilingually than ever. And because they use the translation engine every day, Assembly staff and elected members have been crucial 'eyes on the ground', reporting bugs to Microsoft via fortnightly teleconferences and emails in between. "I rely on negative feedback from users, although they often misdiagnose problems because they don't know how the system works," comments Lee Schwartz, a Linguist at Microsoft's Seattle offices. "We build a system and then we give it to a native speaker who looks through it to identify any problems. But it's extremely difficult for them to see patterns because the nature of statistical machine translation means that the rules the system makes up are not necessarily linguistically motivated," she explains. Schwartz, on the other hand, understands how the engines work – and therefore what sorts of problems to expect – but does not necessarily speak the languages she is working on. Adding linguistic clues Where Lewis concentrates on improving the engines using data, Schwartz gives them additional linguistic information so they can avoid mistakes, such as translating nouns as prepositions. For Welsh, she used a list of common function words. However, the biggest issue has been Welsh's contracted forms, which Microsoft's word breaker was separating at the apostrophe. For example, All systems go

Articles in this issue

Links on this page

Archives of this issue

view archives of The Linguist - The Linguist 53,4