The Linguist is a languages magazine for professional linguists, translators, interpreters, language professionals, language teachers, trainers, students and academics with articles on translation, interpreting, business, government, technology
Issue link: https://thelinguist.uberflip.com/i/220935
© ISTOCKPHOTO FOCUS: CAT TOOLS Working together How easy is it to switch between CAT tools, and which offer the best interoperability, asks Angelika Zerfaß ouldn't it be nice if it didn't matter which translation tool you used to get the job done? If the data could be moved between tools without any loss or difficulties? That would be true interoperability. But even though we have standard formats for exchanging translation memory data through TMX and for exchanging translation files through XLIFF, interoperability is still something that has only partly been achieved. There are some considerations to be taken into account when moving between tools, and as soon as conversions come into play, the chance of something going wrong – or at least not exactly right – increases significantly. W Exchange files Let us take TMX (Translation Memory Exchange format), which is used to transport translation memory (TM) data between tools. It works well for segment pairs, but whether formatting information can be exchanged and reused depends on the tools involved. To test this, we translated the same files (in Word, InDesign, html and XML) using various TM tools. We set up the tools so that metadata would be saved with each segment pair (a text field containing the file name and 16 The Linguist a pick-list field containing two values: website and marketing). Then we exported the TMs to TMX and imported them into all the other tools. The interoperability comparison chart opposite shows the exchange between two of the tools tested and what file formats yield the best results. Admittedly the samples were quite small. For InDesign INX, you get the highest recycling rates when moving from Trados 2009/2011 to memoQ, whereas moving from memoQ to Trados 2007 (not shown in the chart) provides the lowest rates. Exchanging TM data generated from Word or html files usually yields higher recycling rates than exchanging data from other file formats. The reasons for the lower match rates are manifold. It can have to do with the segmentation rules – ie, the way a tool splits the text into segments. It could be the way the tool saves formatting information to a TMX file – whether it is given as explicit information, such as 'bold', or as a non-specific placeholder (eg, '[1]'), meaning there was some kind of formatting at this place. These issues can make a difference during translation when it comes to matching a segment from the document to the one in the TM. The DECEMBER 2013/JANUARY 2014 same goes for all other kinds of placeholders, eg, for images, tags and references. There is not much we can do about this. One solution would be to clear the TMX file of all tags and formatting during import, if a text-only-import feature is available. Another, more involved, technique would be to search and replace the way tags are written in one tool with the way they are written in another before importing. But this usually only makes sense for large TMs where the match rate would otherwise decrease by 15 percent or more (for example, when moving from FrameMaker to XML format). Bilingual formats When it comes to translation files, we mostly have to deal with something derived from XLIFF (the XML Localisation Interchange File Format), which was designed to be a bilingual container for translations. It is similar to TMX but with possibilities to save additional information, such as the status of a translation (confirmed, edited, rejected) or the history of the changes within a segment. Take a look at the file formats some of the tools produce and you will see that they are using XLIFF as the basis for saving the www.iol.org.uk