Linguistically Motivated Language Model Security

Projektdetaljer

Lægmandssprog

Imagine a world where generative AI (gAI) exists free from any concerns for safety, security, privacy, trustworthiness, misinformation, or bias. We are far from this vision today, where negative consequences from gAI, such as deepfakes, are ever present in the news cycle. A new threat to gAI has emerged recently, as language models in particular can be attacked by malicious actors, leading to leakage of private data and manipulation of users, and even risks of medical misdiagnosis. For instance, an attacker can modify prompts so as to ‘trick’ a model into releasing private data. The project aims to draw upon the field of linguistics to detect and mitigate such attacks, relying on the hypothesis that there are identifiable linguistic patterns in signals that attempt to negatively affect a gAI model. If we can identify and detect such patterns, we may have the key for secure languagedriven AI in the future.
AkronymLM2-SEC
StatusIkke startet
Effektiv start/slut dato01/09/202531/08/2030

Finansiering

  • Novo Nordisk Foundation: 9.904.249,00 kr.

Fingerprint

Udforsk forskningsemnerne, som dette projekt berører. Disse etiketter er oprettet på grundlag af de underliggende bevillinger/legater. Sammen danner de et unikt fingerprint.