Translation memory is a database for storing translated content in aligned pairs of source and target language segments.
Translation memory (TM) is now commonplace in the translation and localization industry. It is not Machine Translation, but rather technology that helps translators work faster, better, and more consistently. It stores how a section of text, called a segment, was previously translated and then repurposes it when the same or similar segment is identified by the software. Translation memory matches are these repeating segments. Fuzzy matches are segments that are partially changed but identified by the software as similar to previously used segments. TM saves time by minimizing unnecessary typing and ensures better consistency for highly repetitive texts, such as technical manuals and software help files. This ultimately saves time and money.
Benefits of Translation Memory
The benefits of TM are:
- Lower translation costs – by leveraging previously-translated content;
- Quicker turnaround times – your translators will spend less time on repeated content;
- Increased consistency – by reusing approved translations your organization will save time, money, and manpower.
TM tools improve your organization’s translation process considerably. Any type of content may have repeated segments, but software applications, websites, user manuals, and technical guides tend to have more than others. In addition to that, they are usually updated regularly, which makes the use of TM especially cost- and time-effective. Using TM tools allows you to easily calculate the size of new translation projects, the number of new and repeated words, and estimate the costs and time required to complete your projects on time.
TM is a common and essential feature in Computer-Aided Translation (CAT) Tools. It allows the reuse of repetitive information both within a single document and across multiple documents. The quality of results from TM depends on the quality of previous translations. This means you can leverage your previous human translations and disseminate the best ones across your organization.
Structure of Translation Memory
TM processes content in segments, such as phrases or sentences, and not at the word level. Translators will still need context to ensure that the matches are appropriate for reuse in terms of gender, number (singular or plural), and usage (for example, the word “bank” when used in riverbank or money bank). For more creative texts like marketing copy, it is common to use whole paragraphs as segments instead. The reason for this is that translators may need to change the order of entire sentences to produce translation that has the same flair as the original but in a slightly modified structure.
TM organizes segments into categories of new content, fuzzy matches, and repeated content. Each level gets progressively easier to translate and, correspondingly, is charged at lower rates.
- New Content: The brown dog ran up the hill.
- Fuzzy Match: The brown dog raced up the hill.
- Repeat Content / Exact Match: The brown dog ran up the hill.
Fuzzy matches are segments similar to previously translated text, but not exactly the same. TM tools give percentage scores depending on the degree of similarity, i.e. 100% would mean the segment is exactly the same and 80% would be almost the same. While the concept is simple, the calculation of the “distance” between segments is not trivial. For example, what is the distance between “All rights reserved 2010” and “All rights reserved 2011”? Or between “Monitors should be switched on” and “The monitor should be switched on”? TM tools use complex algorithms to calculate this distance.
Within the fuzzy matched category there are different match levels expressed by similarity percentage. In most cases, the following tiers are used:
- 75–84% Match
- 85–94% Match
- 95–99% March
Exact matches occur when a segment exactly matches a previously completed translation. They are used to autofill translations from previously translated source texts. Some users only apply exact matches automatically if both the previous and the following segments are also exact matches. This is referred to as an in-context exact (ICE) or 110% match.
The most common format of translation memory files is Translation Memory eXchange (TMX), which is a cross-platform interchange format compatible with most TM systems. Translation memory databases are work made-for-hire and therefore belong to the client, not the translation provider. They can be stored locally, on a network, or even in the cloud. Network and cloud storage has the advantage of allowing access from any machine connected to the internet and lets other collaborators use and update TMs at the same time.