Projekter pr. år
Abstract
In semantic typology, colexification refers to words with multiple meanings, either related (polysemy) or unrelated (homophony). Studies of cross-linguistic colexification have yielded insights into, e.g., psychology, historical linguistics and cognitive science (Xu et al., 2020; Brochhagen and Boleda, 2022; Schapper and Koptjevskaja-Tamm, 2022). While NLP research up until now has mainly focused on integrating syntactic typology (Naseem et al., 2012; Ponti et al., 2019; Chaudhary et al., 2019; Üstün et al., 2020; Ansell et al., 2021; Oncevay et al., 2022), we here investigate the potential of incorporating semantic typology, of which colexification is an example. We propose a framework for constructing a large-scale synset graph and learning language representations with node embedding algorithms. We demonstrate that cross-lingual colexification patterns provide a distinct signal for modelling language similarity and predicting typological features. Our representations achieve a 9.97% performance gain in predicting lexico-semantic typological features and expectantly contain a weaker syntactic signal. This study is the first attempt to learn language representations and model language similarities using semantic typology at a large scale, setting a new direction for multilingual NLP, especially for low-resource languages.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 24rd Nordic Conference on Computational Linguistics (NoDaLiDa) |
Forlag | Association for Computational Linguistics |
Publikationsdato | 22 maj 2023 |
Sider | 673-684 |
Status | Udgivet - 22 maj 2023 |
Begivenhed | The 24th Nordic Conference on Computational Linguistics - Tórshavn, Færøerne Varighed: 22 maj 2023 → 24 maj 2023 https://www.nodalida2023.fo/ |
Konference
Konference | The 24th Nordic Conference on Computational Linguistics |
---|---|
Land/Område | Færøerne |
By | Tórshavn |
Periode | 22/05/2023 → 24/05/2023 |
Internetadresse |
Fingeraftryk
Dyk ned i forskningsemnerne om 'Colex2Lang: Language Embeddings from Semantic Typology'. Sammen danner de et unikt fingeraftryk.Projekter
- 1 Igangværende
-
Multilingual Modelling for Resource-Poor Languages
Bjerva, J., Lent, H. C., Chen, Y., Ploeger, E., Fekete, M. R. & Lavrinovics, E.
01/09/2022 → 31/08/2025
Projekter: Projekt › Forskning
Priser
-
EliteForsk-rejsestipendium 2024
Chen, Yiyi (Modtager), 26 feb. 2024
Pris: Forsknings- uddannelses og innovationspriser