Andovar, same as many other translation companies, handles a wide range of content types for its clients. We have a team of localization engineers who use various tools to extract text for translation from files in any format, and reintegrate it into the same formats after translation is done. However, sometimes unexpected challenges occur and they are not always of a technical nature. One of them is when clients want us to “translate a website”.

What they want is clear – there is a website online in one language, and they would like it to be in one or more other new languages. While this seems simple enough, it is actually more complicated than it sounds for a number of reasons.

What is a Website?

For a visitor, websites are what you see when you open an internet browser and type in an address, such as www.andovar.com. From a technical point of view however, a website is a collection of files in different formats that contain content and instructions of how that content should be displayed. By content I mean all text and images, and nowadays also video and audio that the visitors can see and hear when they access the website. By instructions I mean rules on what size and font the text should be in, where the different content types are located on the screen, what happens when a visitor clicks on something, etc.

A browser (Internet Explorer, Safari, Chrome, etc.) is software that knows how to interpret these instructions and display the content to the user who visits the website. In general, all browsers follow similar rules, but occasionally their creators have different ideas, which is why a certain website may look different depending on the browser you use.

What is Website Localization?

When people say they want a website localized, they usually mean the content. Simply put, they want all the text the visitor sees to be translated into a new language. In some cases however, making changes to how the content is presented may be important as well, so that different colors, layout or style is used when the language changes.

Let’s consider the basic scenario – that we are only talking about translating text.

The first step is to identify the text for translation. Often, it doesn’t make sense to translate all of it. Think about all the old blog posts, case studies, comments under articles and other content that may not need to be translated. Additionally, when a website has a lot of content, translation may become too costly and troublesome, and companies often decide to have a more basic version with less content in the new language. This is up to the client and has to be clarified early on.

How to Localize a Website?

Once the decision on what text to localize is done, can the translation agency get started on the work? Not yet.

Translation of a website is not done “on the website” after all. Translation is done in specialized software (CAT tools) and you start by importing the files with content. This is the point where some get confused. Let me answer the most common questions:

Q: If you have the website’s address like www.andovar.com, can’t you just copy and paste all the text into a file and then translate it?

A: While it is indeed possible to simply select all visible text with the mouse pointer and right-click to copy and paste it into a file, this is probably the worst way to do it, because:

  1. often a visitor will not see all the content on the website. The text may change depending on whether the visitor clicked on something or even what geographic location they are browsing from, so it’s easy to miss some text that should be included.
  2. many websites display the exact same text on many pages, such as menus, footers, sidebars, and other elements that are shown to the visitor regardless of where on the website they are. If you copy and paste everything from every page, you will include this content many times over.
  3. some websites contain areas that are password-protected or otherwise inaccessible or hard to find. If you rely on someone clicking around and copying content, it is very likely they will miss something.

Q: If it’s not a good idea to copy and paste text, then what about downloading all the website files and then extracting text from them?

A: This is a better solution, and every translation agency should be able to analyze and extract text from files in common website formats, such as .html or .php. However, there are two possible problems with this. The first one is that rare and complex formats will take more time and effort to analyze, and second one is that a visitor might not be able to access all the files. Some files are only accessible via the website’s hosting server, which is usually password-protected.

Q: If the above two approaches are not correct, what is the right way to do it?

A: This depends on how the website is built. Nowadays, many websites are created and managed using a Content Management System (CMS) such as WordPress, Drupal (we wrote about Drupal here) or Sitecore. Even though they are all different, each one provides a way to export and import content for localization. In the case of custom-made websites not built on any off-the-shelf CMS, the best way is usually to access the hosting server and copy all the website files which we will analyze ourselves.

Conclusion

Website localization is often a more complex challenge than people think, and this blog post only scratches the surface of the issue. Andovar has years of experience in dealing with different types of websites and we have created more in-depth documents describing best practices. Feel free to get in touch with us for a consultation.