Exploring and Exploiting the capabilities of LLMs in Multilingualism, Safety and Security

Efficient and Robust Language Adaptation in Multilingual LLMs

Overview

Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating natural language, excelling in tasks such as question-answering, text generation, and classification. However, these models are predominantly monolingual, exhibiting limited proficiency in non-English and low-resource languages. Despite their problem-solving abilities and potential to enhance productivity, LLMs pose substantial risks by generating harmful, toxic, biased, or nonfactual content. Furthermore, even robust proprietary models have been susceptible to jailbreaks through sophisticated prompt engineering, particularly leveraging low-resource languages.

This study addresses these challenges by focusing on two key areas: (1) advancing multilingualism in LLMs through the exploration of cross-lingual knowledge sharing dynamics, and (2) probing the vulnerabilities that enable jailbreaks in LLMs. Specifically, the research investigates the feasibility of teaching new low-resource languages to pretrained LLMs using Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) and examines the extent to which factual knowledge is shared between languages within multilingual pretrained LLMs, including the role of donor languages and the impact of script families. Additionally, the study explores the limitations of the attention mechanism in transformer-based models and seeks to develop a unified threat model that addresses vulnerabilities analogous to cognitive overload in human cognition. Building on this, the research aims to construct proof-of-concept attack methodologies specifically targeting multilingual LLMs to better understand and mitigate potential exploits. Furthermore, the study examines the possibility of safety-training a pretrained LLM in one language to facilitate safety across related languages and develops methodologies to adapt safety guardrails, such as Llama Guards, for low-resource languages, assessing their challenges and effectiveness. By investigating these aspects, the research aims to develop safer and more robust multilingual LLMs, ensuring their responsible and effective deployment across diverse linguistic contexts.

Current Team Members:

Bibek Upadhayay

PI: Vahid Behzadan

Affiliate Organizations:

N/A