Providing machine-readable translation data related to the COVID-19 pandemic
In response to the on-going crisis, several academic (Carnegie Mellon University, George Mason University, Johns Hopkins University) and industry (Amazon, Appen, Facebook, Google, Microsoft, Translated) partners have partnered with the Translators without Borders to prepare COVID-19 materials for a variety of the world’s languages to be used by professional translators and for training state-of-the-art Machine Translation (MT) models. The focus is on making emergency and crisis-related content available in as many languages as possible. The collected, curated and translated content across nearly 90 languages will be available to the professional translation as well the MT research community.
To this end, we have so far created:
We have combined the terminologies and other translation data to create translation memories in .tmx format for the majority of the language pairs.
Translations of covid19-related terms in dozens of languages and locales, provided by Facebook and Google.
The benchmark will include 30 documents (3071 sentences, 69.7k words) translated from English into 36 languages: Amharic, Arabic (Modern Standard), Bengali, Chinese (Simplified), Dari, Dinka, Farsi, French (European), Hausa, Hindi, Indonesian, Kanuri, Khmer (Central), Kinyarwanda, Kurdish Kurmanji, Kurdish Sorani, Lingala, Luganda, Malay, Marathi, Myanmar, Nepali, Nigerian Fulfulde, Nuer, Oromo, Pashto, Portuguese (Brazilian), Russian, Somali, Spanish (Latin American), Swahili, Congolese Swahili, Tagalog, Tamil, Tigrinya, Urdu, Zulu.
Other COVID19-related collections from our contributors and our friends (which might not be available under a permissive license!) are listed here
The effort has been featured in:
Contact us at: tico19 [dot] 2020 [at] gmail [dot] com
.
We make a public call for community contributions to the TICO-19 project.
All community contributions will be properly acknowledged and labeled as such.
After the first phase of the project is completed, we will make a call for further community contributions, stay tuned!
All content is made publicly available through a Creative Commons CC0 license.