Enabling NLP for Small Language Communities

Activity: Talks and presentationsConference presentations

Description

Only a handful of the world’s languages benefit from today’s modern language technology, such as online search, automatic translation, and recent advances in generative AI, as exemplified by ChatGPT.

Due to the massive amounts of data required by state-of-the-art solutions, such languages will continue to be marginalised. For instance, it is not feasible for a community the size of the Faroe Islands to generate enough textual data, for modern NLP approaches to work.

This talk touches upon the core challenges in this setting, and considers a few possible solutions. We will first touch upon how we can use what we know about language, from the field of linguistic typology, before considering approaches to resource creation for truly low-resource language communities.
We will look at Creoles, a type of natural languages spoken by approximately 180 million people, which notably evolved from historical linguistic contact between unrelated languages. Creoles typically lack standardization of written language, and are frequently stigmatized due to historical ties with colonization and slavery. Whereas a large portion of the world’s languages can be characterised as low-resource, Creoles typically are in a no-resource scenario.

Because of these challenges, typical data-hungry approaches to NLP do not extend to Creoles. How can we develop technology to include such smaller communities?
Period9 Nov 2023
Event titleDigital Tech Summit 2023: AI Transforming Business
Event typeExhibition
Conference number3
LocationCopenhagen, DenmarkShow on map
Degree of RecognitionNational

Keywords

  • NLP
  • Low-resource NLP
  • Linguistic Diversity
  • Creoles