From Post‑hoc to Intrinsic: Cutting‑Edge Advances in Making Large Language Models More Transparent
This article surveys recent progress in intrinsic interpretability for large language models, contrasting traditional post‑hoc analysis with design‑level approaches that embed transparency into model architecture, training objectives, and information flow, and outlines five core design paradigms and their challenges.
