The Linguist

The Linguist 59,2 - April/May 2020

The Linguist is a languages magazine for professional linguists, translators, interpreters, language professionals, language teachers, trainers, students and academics with articles on translation, interpreting, business, government, technology

Issue link: https://thelinguist.uberflip.com/i/1229313

Contents of this Issue

Navigation

Page 22 of 35

@Linguist_CIOL APRIL/MAY The Linguist 23 FEATURES Google Translate, which takes audio in the selected language as input and returns the text translated into English. The translator takes the output and times how long it takes to edit and correct it. We refer to this as post-editing speech translation (PEST). We assessed the word error rate (WER) and accuracy of IBM ASR and Google ASR using NIST sclite scoring software (see table 1), and found that Google performs better in general, and as much as 10% better for French. The documents processed by Google are more than 90% correct with the exception of 'Spanish 2'. The machine translation output was evaluated and post- edited by humans, thus mitigating the risk of any translation errors. The difference in performance between the two Spanish documents has to do with their content. 'Spanish 1' is the Mexican President's speech, while 'Spanish 2' is the Bolivian minister's interview, which is articulated less clearly. We found that high ASR performance is a strong indicator that the human effort needed to edit a document for accuracy will be reduced, even for more difficult transcription and content, such as the TV interview. Time-saving results Table 2 shows the number of words per minute for each task. It is clear that ASR with basic correction followed by human translation (task 1) is the slowest of the three methods since it averages 12 words per minute. Note that from a translation quality standpoint, PEMT appears to be very reliable and relatively quick for French and Spanish. Task 2 is slightly better than task 1, with an average of 15 words per minute. Task 3, which avoids the transcription correction process and proceeds directly to post-editing, seems to be the shortest way to obtain a correct translation for the audio documents we studied. The timing for speech-to-text translation with post-editing appears to be highly competitive (up to 60 words a minute), which leads us to conclude that this approach is compelling and promises to support more efficient audio translation workflows. ASR and MT have the potential to increase translator productivity, but the ASR language being used is critical, since this affects the outcome. The better the ASR and MT models, the more time can be saved. However, it is worth noting that automatic systems such as ASR and MT require human inspection and editing in the workflow. Further research is necessary to assess the efficacy of these systems. In the light of these encouraging results, speech technologists and researchers need to join efforts to develop robust ASR systems at the acoustic and language model levels. Moreover, ASR systems need to be fully developed for languages with fewer resources. The research community is continuing to improve in the area of audio translation and it is clear that further research is necessary. Nonetheless, this study will be helpful to decision-makers who need to modernise their processes, increase the productivity of their workforce and improve the overall quality of audio translation in certain fields. Notes 1 www.youtube.com/watch?v=K8Ea_RXktgg 2 www.youtube.com/watch?v=qUeqwMl-_U0 3 www.youtube.com/watch?v=kGz9diiTV-M 4 www.youtube.com/watch?v=pxqw4TaqK1A technology speed up Miller report ACCURACY OF IBM AND GOOGLE SPEECH RECOGNITION (TABLE 1) French 1 French 2 Spanish 1 Spanish 2 Serbian IBM ASR Correct 87.2 82.0 94.4 80.3 n/a 12.8 22.3 6.5 22.2 n/a Google ASR Correct 96.7 93.9 95.6 81.7 90.4 3.3 6.1 4.4 18.3 10.4 NUMBER OF WORDS PER MINUTE FOR EACH TASK (TABLE 2) French Spanish Serbian Task 1: ASR + human translation 12 7 13 19 n/a Task 2: ASR + MT + PEMT 15 15 26 7 11 Task 3: Google ASR/MT + PEST 60.0 48.3 68.6 29.0 15.6 Word error rate Word error rate ASR = automatic speech recognition; MT = machine translation; PEMT = post-editing MT; PEST = post-editing speech translation; speech translation = MT of ASR transcription

Articles in this issue

Links on this page

Archives of this issue

view archives of The Linguist - The Linguist 59,2 - April/May 2020