18 The Linguist Vol/61 No/2 2022
thelinguist.uberflip.com
ONLINE INCLUSION
Deborah Anderson looks at why it's so important for languages
to be in Unicode, and the work being done to get them there
M
ost speakers of modern
European languages today can
send text messages, email and
documents back and forth over the internet,
and be relatively confident that the letters
and symbols will be received as typed. This is
because the major European languages use
the Roman script, which is generally very well
supported on computers and devices.
However, those languages that are written
with less commonly used scripts typically
encounter problems: instead of getting the
letters and symbols they expect, nonsense
characters or boxes (known as 'tofu') may
appear, making text difficult or even
impossible to read.
The problem of sending and receiving texts
in different scripts electronically was apparent
by the 1970s and 1980s, when businesses,
governments, linguists and others were not
able to exchange text data easily or reliably.
To resolve this problem, an international
standard was developed: the Unicode
Standard and its close relative, ISO/IEC 10646.
Unicode is today supported on all modern
operating systems. It serves as the backbone
for sending and receiving text electronically in
the various languages of the world, and is the
foundation upon which fonts, keyboards and
software rely. In essence, Unicode enables
typing in different languages in text messages,
emails, webpages and word-processing
documents across platforms, and it also makes
search and cut-and-paste capabilities possible.
The major scripts, such as Latin, Cyrillic,
and Chinese/Japanese/Korean ideographs,
are included, but many less common scripts
are not. As a result, sending critical health
information in the Bété script, for example,
would not be possible unless one employed
a workaround, such as a non-standard font –
and this still doesn't guarantee that the
original text is received as expected.
A vital initiative
To address the problems facing communities
of lesser-used scripts, I started a project called
the Script Encoding Initiative (SEI) at UC
Berkeley's Department of Linguistics in 2002,
which has had support from the National
Endowment for the Humanities. The project
Supported scripts
INCOMPREHENSIBLE 'TOFU'
Text typed in Bété script with a non-Unicode
font (left); and (right) how the text appears
when sent via email to a mobile device
,-45=>
3'
¶ÆÈÉËÚ
©
SHUTTERSTOCK