8 The Linguist Vol/63 No/1
thelinguist.uberflip.com
ARTIFICIAL INTERPRETING
As remote interpreting platforms start to offer AI services,
Jonathan Downie outlines how interpreters might respond
C
onference interpreting was in shock.
At the start of 2023, KUDO, a
leading remote interpreting platform,
announced that they had launched "the
World's first fully integrated artificial
intelligence speech translator". Many
interpreters were furious. It seemed that a
platform built to offer them work had now
created a service that would take work
away. Despite demos and reassurances,
doubts remained.
Later, another remote interpreting
platform, Interprefy, released their own
machine interpreting solution. This time, the
response was much more muted. But now
the precedent has been set. The days when
human and machine interpreting were
completely separate are over. But what does
this mean for human interpreters and how we
should adjust?
Back to the facts
To understand what this means for human
interpreters, we need to know a bit about
how machine interpreting works. Machine
interpreting uses one of two models. The first
is the cascade model. This takes in speech,
converts it to text, passes the text through
machine translation, and then reads out the
translation through automatic speech
synthesis. The second is the rarer end-to-end
model. This takes in sound, analyses it and
then uses that to create sounds in the other
language. There is no written stage.
The cascade model ignores emotion,
intonation, accent, speed, volume and
emphasis. It may be able to handle tonal
languages but anything that cannot be
represented in simple text is discarded.
Cascade model systems are therefore very
poor in situations that need emotional
sensitivity, and clever uses of tone of voice,
sarcasm, humour or timing. End-to-end
models can, at least in principle, get over
most of those hurdles. In theory, anything
that can be heard can be processed. These
models might also perform better with
accents and faster speech.
Neither model can check if people
understand the output, but they can be
customised for different clients and fields.
Neither model will do anything to adjust to
social context, such as who is speaking to
whom, differences in status and the
emotional resonance of what is going on. Yet
both promise to handle numbers, names and
specialist terminology as well as, if not better
than, most trained humans.
In short, machine interpreting will soon beat
humans at any kind of terminological and
numerical accuracy we might care to measure.
We will still beat them at customising our
work for the audience, reading the room,
speaking beautifully and pausing naturally.
Understanding client attitudes
It should come as no surprise that the one
video of a test of machine interpreting under
semi-realistic conditions showed exactly what
we might expect. In a video for Wired last
year,
1
two interpreters found that, while
KUDO's system was excellent at picking up
specific terms, it did a poor job of knowing
how to prioritise information and tended to
find some unnatural turns of phrase.
The overall result was that, while machine
interpreting would do well enough as a stop-
gap, it wasn't reliable as a human
Embrace the
machine
©
UNSPLASH