Projects per year
Abstract
In semantic typology, colexification refers to words with multiple meanings, either related (polysemy) or unrelated (homophony). Studies of cross-linguistic colexification have yielded insights into, e.g., psychology, historical linguistics and cognitive science (Xu et al., 2020; Brochhagen and Boleda, 2022; Schapper and Koptjevskaja-Tamm, 2022). While NLP research up until now has mainly focused on integrating syntactic typology (Naseem et al., 2012; Ponti et al., 2019; Chaudhary et al., 2019; Üstün et al., 2020; Ansell et al., 2021; Oncevay et al., 2022), we here investigate the potential of incorporating semantic typology, of which colexification is an example. We propose a framework for constructing a large-scale synset graph and learning language representations with node embedding algorithms. We demonstrate that cross-lingual colexification patterns provide a distinct signal for modelling language similarity and predicting typological features. Our representations achieve a 9.97% performance gain in predicting lexico-semantic typological features and expectantly contain a weaker syntactic signal. This study is the first attempt to learn language representations and model language similarities using semantic typology at a large scale, setting a new direction for multilingual NLP, especially for low-resource languages.
Original language | English |
---|---|
Title of host publication | Proceedings of the 24rd Nordic Conference on Computational Linguistics (NoDaLiDa) |
Publisher | Association for Computational Linguistics |
Publication date | 22 May 2023 |
Pages | 673-684 |
Publication status | Published - 22 May 2023 |
Event | The 24th Nordic Conference on Computational Linguistics - Tórshavn, Faroe Islands Duration: 22 May 2023 → 24 May 2023 https://www.nodalida2023.fo/ |
Conference
Conference | The 24th Nordic Conference on Computational Linguistics |
---|---|
Country/Territory | Faroe Islands |
City | Tórshavn |
Period | 22/05/2023 → 24/05/2023 |
Internet address |
Keywords
- Natural Language Processing
- Semantic Typology
Fingerprint
Dive into the research topics of 'Colex2Lang: Language Embeddings from Semantic Typology'. Together they form a unique fingerprint.Projects
- 1 Active
-
Multilingual Modelling for Resource-Poor Languages
Bjerva, J., Lent, H. C., Chen, Y., Ploeger, E., Fekete, M. R. & Lavrinovics, E.
01/09/2022 → 31/08/2025
Project: Research
Prizes
-
EliteForsk- Elite Research Travel Grant 2024
Chen, Yiyi (Recipient), 26 Feb 2024
Prize: Research, education and innovation prizes