Jul 20, 2011
The rise of Google Translate has been a double-edged sword for purveyors of Machine Translation systems. On the one hand, it’s brought mass awareness that MT can offer acceptable quality for certain purposes, and that MT quality for gisting is significantly better than in the past. At the same time, it’s fed expectations that MT is both free of charge and improving every year.
Two announcements from Google over the past year are set to change the game. Firstly, Andreas Zollman of Google Translate admitted that its brute force method of indexing the world’s content was unlikely to yield much better translations. Having already captured as much as 10% of the available bilingual data in some languages, their results showed that doubling the volume of training data typically yielded a quality improvement of just 0.5% -- so Google could not significantly increase quality just through data volume. Secondly came the announcement that the Google Translate API would be shut down due to “abuse of terms” – shortly followed by their decision to make it a paid service.
Industry blogs such as Common Sense Advisory’s Global Watchtower initially asserted that Google’s goal was to enter the paid MT space, but Andovar’s MT partner, AsiaOnline, makes a more convincing case that it is a strategic move to avoid polluting the internet with uncontrolled and unidentified MT. You can read the analysis of AsiaOnline’s CEO, Dion Wiggins here.
What does this mean for the advancement of MT going forward?
- Further improvements in quality will come down to customization and domain specialization rather than Google’s brute force methodology.
- The perception of “MT is free” will fade as it is understood that the extra quality of customized, domain specialized MT is worth paying for.
- MT companies will receive additional investment as the specter of competing with Google’s free translation services fades
At Andovar we expect that, over the coming decade, MT with human post editing will become increasingly important as a cheaper alternative to professional human translation plus editing in those subject domains which are less quality sensitive or more intrinsically suited to MT.
How will users accept Machine Translation solutions?
It has always been hard to judge the acceptability and utility of lower-than-professional quality translations, whether it is raw MT or human post edited work. In the context of searching for information, such as legal discovery or engineers looking for technical solutions, practice has shown that there is wide tolerance. But it is unknown whether consumers of mass market applications will show similar tolerance. Will online gamers accept in-game chat that works most of the time? Will tourists seeking facts on a destination gravitate to a site that is comprehensive and up to date, but translated with MT over a local site with perfect language but far less data? It’s likely that much will depend on the language and market size. MT quality is certainly higher in some languages, but competition from and availability of native language applications may be a stronger indicator. When the choice for users is between reading usable MT and reading English that has been forgotten since schooldays, MT looks more attractive.
Which applications and subjects are best suited to MT?
At this point in time, MT companies are universally bad in identifying up front which applications are not likely to be suited to MT, preferring to couch any reservations in disclaimers and vague advice about greater volumes of training data providing better results.
For MT usage to penetrate further into the everyday world, MT companies need to be more adept at identifying what is less likely to be a successful deployment. Successful MT is based on a complex blend of factors such as terminology, available training data, tolerance/acceptance of errors by end users, complexity and length of typical grammatical constructions, and feedback/improvement cycles. A traditional caveat toward MT is that its success is inversely proportional to the hype that promotes it; but clearer methods of appraising suitability are sorely needed.
How will the improvement curve change?
Statistical MT has breathed new life into MT. Decades of frustration with rules-based systems were replaced by a new optimism as SMT yielded a higher quality of more natural-sounding general purpose translation that could be seen by every internet user.
Now that MT is evolving into a more complex and hybrid mixture of techniques and domain specializations, will we see the same dramatic improvements as in recent years or will we revert to the "not in my lifetime" expectations of the recent past?
Despite all the improvements, machine translation solutions are still a long way off being a viable replacement for human translators in most mainstream usages, but they have proved their worth in enough real world situations that every Language Service Provider should be able to assess MT viability rather than rejecting it out of hand.
If you want to learn more about MT and the various workflow permutations that might make it viable for your business, please don’t hesitate to contact us for a free consultation.