Translation Initiative for COVID-19

Providing machine-readable translation data related to the COVID-19 pandemic

In response to the on-going crisis, several academic (Carnegie Mellon University, George Mason University, Johns Hopkins University) and industry (Amazon, Appen, Facebook, Google, Microsoft, Translated) partners have partnered with the Translators without Borders to prepare COVID-19 materials for a variety of the world’s languages to be used by professional translators and for training state-of-the-art Machine Translation (MT) models. The focus is on making emergency and crisis-related content available in as many languages as possible. The collected, curated and translated content across nearly 90 languages will be available to the professional translation as well the MT research community.

To this end, we have so far created:

Translation Memories for the Translation Community

We have combined the terminologies and other translation data to create translation memories in .tmx format for the majority of the language pairs.

Translated Terminologies

Translations of covid19-related terms in dozens of languages and locales, provided by Facebook and Google.

TICO-19 Translation Benchmark

The benchmark will include 30 documents (3071 sentences, 69.7k words) translated from English into 36 languages: Amharic, Arabic (Modern Standard), Bengali, Chinese (Simplified), Dari, Dinka, Farsi, French (European), Hausa, Hindi, Indonesian, Kanuri, Khmer (Central), Kinyarwanda, Kurdish Kurmanji, Kurdish Sorani, Lingala, Luganda, Malay, Marathi, Myanmar, Nepali, Nigerian Fulfulde, Nuer, Oromo, Pashto, Portuguese (Brazilian), Russian, Somali, Spanish (Latin American), Swahili, Congolese Swahili, Tagalog, Tamil, Tigrinya, Urdu, Zulu.

Other data sources

Other COVID19-related collections from our contributors and our friends (which might not be available under a permissive license!) are listed here

Media Communications

The effort has been featured in:

Contact

Contact us at: tico19 [dot] 2020 [at] gmail [dot] com.

Invitation for Contributions

We make a public call for community contributions to the TICO-19 project.

All community contributions will be properly acknowledged and labeled as such.

Contributors

       
GMU logo CMU LTI logo JHU logo Translators Without Borders logo
Appen logo Amazon AWS logo Facebook logo Google logo
  Microsoft logo Translated logo  

After the first phase of the project is completed, we will make a call for further community contributions, stay tuned!

License

All content is made publicly available through a Creative Commons CC0 license.