Abstract
Bridging the performance gap between high- and low-resource languages has been the focus of much previous work. Typological features from databases such as the World Atlas of Language Structures (WALS) are a prime candidate for this, as such data exists even for very low-resource languages. However, previous work has only found minor benefits from using typological information. Our hypothesis is that a model trained in a cross-lingual setting will pick up on typological cues from the input data, thus overshadowing the utility of explicitly using such features. We verify this hypothesis by blinding a model to typological information, and investigate how cross-lingual sharing and performance is impacted. Our model is based on a cross-lingual architecture in which the latent weights governing the sharing between languages is learnt during training. We show that (i) preventing this model from exploiting typology severely reduces performance, while a control experiment reaffirms that (ii) encouraging sharing according to typology somewhat improves performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics |
Editors | Paola Merlo, Jorg Tiedemann, Reut Tsarfaty |
Publisher | Association for Computational Linguistics |
Publication date | 21 Apr 2021 |
Pages | 480-486 |
DOIs | |
Publication status | Published - 21 Apr 2021 |
Event | Conference of the European Chapter of the Association for Computational Linguistics - Duration: 21 Apr 2021 → 23 Apr 2021 Conference number: 16 https://2021.eacl.org/ |
Conference
Conference | Conference of the European Chapter of the Association for Computational Linguistics |
---|---|
Number | 16 |
Period | 21/04/2021 → 23/04/2021 |
Internet address |
Keywords
- Natural Language Processing
- Machine Learning
- Computational Linguistics