Teaching 7,000 Languages: How LASA’s Semantic Bottleneck Enables Multilingual LLM Safety

The paper reveals a language‑agnostic "semantic bottleneck" layer inside large language models and introduces LASA, a three‑step framework that locates this layer, extracts safety signals with a lightweight interpreter, and injects them via KTO loss, dramatically improving multilingual safety without per‑language data collection.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Teaching 7,000 Languages: How LASA’s Semantic Bottleneck Enables Multilingual LLM Safety

Background and Motivation

Recent large language models exhibit a clear "subject‑specific" safety bias: high‑resource languages such as English are robust, while low‑resource languages are easily compromised. Conventional approaches collect separate safety data for each language and train individually, which is infeasible for the world’s >7,000 languages.

Key Insight: Semantic Bottleneck Layer

Layer‑wise Silhouette analysis on multilingual parallel sentences (e.g., "how to make a bomb" in English, Swahili, Bengali) shows a U‑shaped pattern: shallow layers cluster by language, an intermediate region (≈43‑68% depth) clusters by semantics, and deeper layers revert to language clustering. This intermediate "semantic bottleneck" layer groups same‑meaning queries across languages while stripping language identity. t‑SNE visualizations confirm the phenomenon. The effect is consistent across Llama‑3.1‑8B, Qwen2.5 series, and Qwen3 series, and the semantic clustering quality correlates positively with the model’s general MMLU score.

LASA Framework

LASA (Language‑Agnostic Semantic Alignment) exploits the bottleneck in three steps:

Locate : Compute language‑ and semantic‑based Silhouette scores for each layer; select the layer with the maximal difference as the bottleneck.

Interpret : Attach a lightweight Safety Semantic Interpreter (SSI), a 0.2 %‑size MLP, to the bottleneck output. Freeze the original model and train SSI to predict a scalar z indicating harmful vs. safe input using binary cross‑entropy.

Inject : Feed the SSI signal as a condition into the generation path and fine‑tune with KTO loss, establishing a mapping "semantic signal → refusal/compliance". Because the signal originates from a language‑agnostic layer, safety generalizes to unseen languages.

Experimental Setup

Evaluations use Llama‑3.1‑8B‑Instruct, Qwen2.5 (7B/14B/32B) and Qwen3 (8B/14B/32B) on MultiJail and HarmBench_translated benchmarks covering English, Chinese, Korean, Thai, Swahili, Bengali, etc. GPT‑4o judges attack success rate (ASR). Baselines include SFT, DPO, KTO, ORPO, CPO, MPO.

Results

Llama‑3.1‑8B average ASR drops from 24.7 % to 2.8 %.

Qwen2.5‑7B‑Instruct Swahili ASR falls from ~50 % to 13.0 %.

General capability (MMLU, MT‑Bench) remains unchanged.

Qwen2.5 and Qwen3 series achieve stable ASR of 3‑4 % across model sizes.

Ablation Studies

Training SSI only at the bottleneck yields the best safety; shallow or deep placement degrades performance, and training at the final layer underperforms KTO.

Replacing KTO with SFT or ORPO has negligible impact, confirming that the gains stem from bottleneck localization and SSI conditioning rather than a specific optimization algorithm.

Conclusion

LASA demonstrates that identifying and leveraging the semantic bottleneck layer enables language‑agnostic safety alignment, allowing safe behavior to naturally generalize to low‑resource languages without per‑language data collection. This provides a new research direction for multilingual LLM safety.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Large Language ModelsAI safetymultilingual LLMLLM safetycross‑lingual alignmentLASAsemantic bottleneck
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.