Jan 28, 2026 · Artificial Intelligence

BLM‑Guard: Explainable Multimodal Ad Moderation Using Chain‑of‑Thought and Policy‑Aligned RL

The paper introduces BLM‑Guard, an explainable multimodal ad‑moderation framework that combines interleaved‑modal chain‑of‑thought reasoning with a policy‑aligned reinforcement‑learning reward to detect hidden cross‑modal violations in short‑video ads, and presents a new benchmark that demonstrates state‑of‑the‑art performance across multiple risk scenarios.

Chain-of-Thoughtad risk detectionbenchmark

0 likes · 12 min read

BLM‑Guard: Explainable Multimodal Ad Moderation Using Chain‑of‑Thought and Policy‑Aligned RL