back to main page
Other COVID19-related collections from our contributors and our friends (which might not be available under a permissive license!):
- The Translators Without Borders have compiled glossaries and are starting to provide translations.
- Neulab members and other collaborators have collected several resources (especially crawled monolingual news data) in this GitHub repo.
- Microsoft has published Covid-19 related desktop searches from Bing. They are here.
- The Endangered Languages Project has aggregated community-produced information in more than 600 languages! Data here.
- TAUS has compiled a corpus of COVID-19-related parallel sentences. Available here. Note that these corpora are published under the CC BY-NC 4.0 license which means the data can be shared and modified only for non-commercial purposes.
- An international team of scientists that tries to estimate the number of cases with COVID-19 symptoms in different countries have put out surveys in 57 languages. (HT: @juliakreutzer)
- The COVID-19 Myth Busters in World Languages has information in 60+ languages.
- The EMEA corpus provides pdf conversions of documents from the European Medicines Agency (22 languages, 231 bitexts).
- SketchEngine has collected an English in-domain corpus.
- Amazon has created a public data lake for analysis of COVID-19 data.
- Achim Ruopp has crawled public COVID-19 parallel data between English and Spanish(US), Vietnamese, Korean, Chinese. Details here and data here.
The effort has been featured in: