Description
Only a handful of the world’s languages benefit from today’s modern language technology, such as online search, automatic translation, and recent advances in generative AI, as exemplified by ChatGPT.Due to the massive amounts of data required by state-of-the-art solutions, such languages will continue to be marginalised. For instance, it is not feasible for a community the size of the Faroe Islands to generate enough textual data, for modern NLP approaches to work.
This talk touches upon the core challenges in this setting, and considers a few possible solutions. We will first touch upon how we can use what we know about language, from the field of linguistic typology, before considering approaches to resource creation for truly low-resource language communities.
We will look at Creoles, a type of natural languages spoken by approximately 180 million people, which notably evolved from historical linguistic contact between unrelated languages. Creoles typically lack standardization of written language, and are frequently stigmatized due to historical ties with colonization and slavery. Whereas a large portion of the world’s languages can be characterised as low-resource, Creoles typically are in a no-resource scenario.
Because of these challenges, typical data-hungry approaches to NLP do not extend to Creoles. How can we develop technology to include such smaller communities?
Period | 9 Nov 2023 |
---|---|
Event title | Digital Tech Summit 2023: AI Transforming Business |
Event type | Exhibition |
Conference number | 3 |
Location | Copenhagen, DenmarkShow on map |
Degree of Recognition | National |
Keywords
- NLP
- Low-resource NLP
- Linguistic Diversity
- Creoles
Documents & Links
Related content
-
Projects
-
Multilingual Modelling for Resource-Poor Languages
Project: Research
-
Publications
-
CreoleVal: Multilingual Multitask Benchmarks for Creoles
Research output: Contribution to journal › Journal article › Research › peer-review
-
CreoleVal: Multilingual Multitask Benchmarks for Creoles
Research output: Working paper/Preprint › Preprint