Overview
Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating natural language, excelling in tasks such as question-answering, text generation, and classification. However, these models are predominantly monolingual, exhibiting limited proficiency in non-English and low-resource languages. Despite their problem-solving abilities and potential to enhance productivity, LLMs pose substantial risks by generating harmful, toxic, biased, or nonfactual content. Furthermore, even robust proprietary models have been susceptible to jailbreaks through sophisticated prompt engineering, particularly leveraging low-resource languages.
This study addresses these challenges by focusing on two key areas: (1) advancing multilingualism in LLMs through the exploration of cross-lingual knowledge sharing dynamics, and (2) probing the vulnerabilities that enable jailbreaks in LLMs. Specifically, the research investigates the feasibility of teaching new low-resource languages to pretrained LLMs using Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low-Rank Adaptation (LoRA) and examines the extent to which factual knowledge is shared between languages within multilingual pretrained LLMs, including the role of donor languages and the impact of script families. Additionally, the study explores the limitations of the attention mechanism in transformer-based models and seeks to develop a unified threat model that addresses vulnerabilities analogous to cognitive overload in human cognition. Building on this, the research aims to construct proof-of-concept attack methodologies specifically targeting multilingual LLMs to better understand and mitigate potential exploits. Furthermore, the study examines the possibility of safety-training a pretrained LLM in one language to facilitate safety across related languages and develops methodologies to adapt safety guardrails, such as Llama Guards, for low-resource languages, assessing their challenges and effectiveness. By investigating these aspects, the research aims to develop safer and more robust multilingual LLMs, ensuring their responsible and effective deployment across diverse linguistic contexts.
Current Team Members:
PI: Vahid Behzadan
Affiliate Organizations:
N/A
Nepali 33B Model: https://huggingface.co/saillab/Nepali_33B/
Persian 33B Model: https://huggingface.co/saillab/g33b_persian
Code: TaCo
TLDR : Translation-Assisted Cross Linguality method for Efficient Multilingual LLMs
Published Paper: OpenReview
Tools and Datasets:
Code: NA
TLDR : Sandwich Attack: Multi-language Mixture Adaptive Attack on LLMs
Published Paper: Sandwich Attack: Multi-language Mixture Adaptive Attack on LLMs