Multilingual Gradient Word-Order Typology from Universal Dependencies

Emi Baylor*, Esther Ploeger*, Johannes Bjerva

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

While information from the field of linguistic typology has the potential to improve performance on NLP tasks, reliable typological data is a prerequisite. Existing typological databases, including WALS and Grambank, suffer from inconsistencies primarily caused by their categorical format. Furthermore, typological categorisations by definition differ significantly from the continuous nature of phenomena, as found in natural language corpora. In this paper, we introduce a new seed dataset made up of continuous-valued data, rather than categorical data, that can better reflect the variability of language. While this initial dataset focuses on word-order typology, we also present the methodology used to create the dataset, which can be easily adapted to generate data for a broader set of features and languages.
Original languageEnglish
Title of host publicationProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics : EACL
Volume2
Place of PublicationSt. Julian’s, Malta
PublisherAssociation for Computational Linguistics
Publication date17 Mar 2024
ISBN (Electronic)979-8-89176-093-6
DOIs
Publication statusPublished - 17 Mar 2024
EventThe 18th Conference of the European Chapter of the Association for Computational Linguistics - Radisson Blu, St. Julian's, Malta
Duration: 17 Mar 202422 Mar 2024
https://2024.eacl.org/

Conference

ConferenceThe 18th Conference of the European Chapter of the Association for Computational Linguistics
LocationRadisson Blu
Country/TerritoryMalta
CitySt. Julian's
Period17/03/202422/03/2024
Internet address

Keywords

  • NLP
  • Typology

Fingerprint

Dive into the research topics of 'Multilingual Gradient Word-Order Typology from Universal Dependencies'. Together they form a unique fingerprint.

Cite this