0

Multilingual Open Text Release 1: Public Domain News in 44 Languages

We present a Multilingual Open Text (MOT), a new multilingual corpus containing text in 44 languages, many of which have limited existing text resources for natural language processing. The first release of the corpus contains over 2.8 million news …

SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation

To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility. The guidelines we propose are extremely simple and center around …

MasakhaNER: Named Entity Recognition for African Languages

Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition …

Saral: A low-resource cross-lingual domain-focused information retrieval system for effective rapid document triage

With the increasing democratization of electronic media, vast information resources are available in less-frequently-taught languages such as Swahili or Somali. That information, which may be crucially important and not available elsewhere, can be …

Real-World Causal Relationship Discovery from Text

Automatic extraction of causal relations from text has the potential to aid in the understanding of complex scenarios, but to date there has been limited work exploring extraction from natural data at scale. We describe a system that implements a …