Quantifying Synthesis and Fusion and their Impact on Machine Translation

Arturo Oncevay*, Duygu Ataman, Niels van Berkel, Barry Haddow, Alexandra Birch, Johannes Bjerva

*Kontaktforfatter

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

2 Citationer (Scopus)

Abstract

Theoretical work in morphological typology offers the possibility of measuring morphological diversity on a continuous scale. However, literature in Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative. In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level. We consider Payne (2017)'s approach to classify morphology using two indices: synthesis (e.g. analytic to polysynthetic) and fusion (agglutinative to fusional). For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study. Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish, and verbs in English-Spanish) and segment level (previous language pairs plus English-German in both directions). We complement the word-level analysis with human evaluation, and overall, we observe a consistent impact of both indexes on machine translation quality.

OriginalsprogEngelsk
TitelNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Proceedings of the Conference
Antal sider14
ForlagAssociation for Computational Linguistics
Publikationsdato2022
Sider1308-1321
ISBN (Elektronisk)9781955917711
DOI
StatusUdgivet - 2022
Begivenhed2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, USA
Varighed: 10 jul. 202215 jul. 2022

Konference

Konference2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022
Land/OmrådeUSA
BySeattle
Periode10/07/202215/07/2022
SponsorAmazon, Bloomberg, et al., Google Research, LIVE PERSON, Meta
NavnNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Bibliografisk note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Quantifying Synthesis and Fusion and their Impact on Machine Translation'. Sammen danner de et unikt fingeraftryk.

Citationsformater