Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jan 14, 2026 · Artificial Intelligence

From Black‑Box Guessing to Quantitative Deconstruction: Unveiling the Mystery Inside Large Language Models

At EMNLP 2025, the BUPT NIRC team presented a paper that introduces the ARR metric to quantitatively separate latent reasoning from factual shortcuts in LLMs, using Logit Lens and Attention Knockout to reveal distinct internal pathways and shares their conference experience.

ARR metricAttention KnockoutEMNLP2025
0 likes · 6 min read
From Black‑Box Guessing to Quantitative Deconstruction: Unveiling the Mystery Inside Large Language Models
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Nov 11, 2025 · Artificial Intelligence

What Is Mechanistic Interpretability and Why It Matters for Large Language Models

The article defines mechanistic interpretability as reverse‑engineering LLMs to reveal how they represent knowledge and make decisions, explains its importance for transparency, risk mitigation, and model improvement, and surveys key techniques such as causal tracing, zero‑making, noise‑making, and logit‑lens methods with illustrative examples.

causal tracinglarge language modelslogit lens
0 likes · 8 min read
What Is Mechanistic Interpretability and Why It Matters for Large Language Models