Breaking the Curse of Multilinguality in Language Models
Terra Belvins
PhD Candidate, University of Washington
Abstract: While language models (or LMs, à la ChatGPT) have become the predominant tool in natural language processing, their performance in non-English languages increasingly lags behind. This gap is due to the curse of multilinguality, which harms individual language performance in multilingual models through inter-language competition for model capacity. In this talk, I examine how current language models do and don’t capture different languages and present new methods for fair modeling of all languages.
First, I demonstrate how LMs become multilingual through their data and training dynamics. Specifically, I show how data contamination teaches ostensibly English models cross-lingual information; I then characterize when multilingual models learn (and forget) languages during training to uncover how the curse of multilinguality develops. These analyses provide key insights into developing more equitable multilingual models, and I propose a new language modeling approach for Cross-Lingual Expert Language Models (X-ELM) that explicitly allocates model resources to reduce language competition.
Bio: Terra Blevins is a Ph.D. candidate in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. Her research focuses on linguistic analysis of language models and multilingual NLP, with the underlying aim of using analysis to build better, more equitable multilingual systems. She has received the NSF Graduate Research Fellowship for her research and previously worked as a visiting researcher at Facebook AI Research (FAIR).