Projects per year
Abstract
Textual data is often represented as real-numbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be susceptible to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages potentially exposed to attacks. This work explores LLM security through multilingual embedding inversion. We define the problem of black-box multilingual and crosslingual inversion attacks, and explore their potential implications. Our findings suggest that multilingual LLMs may be more vulnerable to inversion attacks, in part because English-based defences may be ineffective. To alleviate this, we propose a simple masking defense effective for both monolingual and multilingual models. This study is the first to investigate multilingual inversion attacks, shedding light on the differences in attacks and defenses across monolingual and multilingual settings.
Original language | English |
---|---|
Title of host publication | Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) : Volume 1: Long Papers |
Number of pages | 20 |
Place of Publication | Bangkok, Thailand |
Publisher | Association for Computational Linguistics |
Publication date | 11 Aug 2024 |
Pages | 7808-7827 |
ISBN (Electronic) | 979-8-89176-094-3 |
DOIs | |
Publication status | Published - 11 Aug 2024 |
Event | 62nd Annual Meeting of the Association for Computational Linguistics - Bangkok, Thailand Duration: 11 Aug 2024 → 16 Aug 2024 Conference number: 62 https://2024.aclweb.org/ |
Conference
Conference | 62nd Annual Meeting of the Association for Computational Linguistics |
---|---|
Number | 62 |
Country/Territory | Thailand |
City | Bangkok |
Period | 11/08/2024 → 16/08/2024 |
Internet address |
Keywords
- LLM Security
- NLP Security
- LLMsec
- NLPsec
- Inversion Attack
- Language Models
- LLM
Fingerprint
Dive into the research topics of 'Text Embedding Inversion Security for Multilingual Language Models'. Together they form a unique fingerprint.Projects
- 1 Active
-
Multilingual Modelling for Resource-Poor Languages
Bjerva, J. (PI), Lent, H. C. (Project Participant), Chen, Y. (Project Participant), Ploeger, E. (Project Participant), Fekete, M. R. (Project Participant) & Lavrinovics, E. (Project Participant)
01/09/2022 → 31/08/2025
Project: Research
Activities
-
Large Language Model Security in a Multilingual World
Lent, H. C. (Lecturer)
31 Oct 2024Activity: Talks and presentations › Conference presentations
-
Secure Integration of Large Language Models
Bjerva, J. (Speaker)
26 Aug 2024Activity: Talks and presentations › Guest lecturers
-
Text Embedding Inversion Security for Multilingual Language Models
Chen, Y. (Lecturer)
12 Aug 2024Activity: Talks and presentations › Conference presentations