How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding
RAVEN is a reinforcement‑reasoning framework that combines curriculum learning with hierarchical rewards to enable multimodal large language models to accurately locate and classify violation segments in advertisement videos, even under noisy, large‑scale industrial data.
Background and Challenges
Detecting violations in advertisement videos requires pinpointing the exact time span of each offending segment and correctly classifying its type. Traditional small‑scale supervised models struggle with noisy annotations and poor generalisation, while fine‑tuned multimodal LLMs are sensitive to label noise, suffer catastrophic forgetting, and lack explicit reasoning.
RAVEN Overview
The Tencent Advertising Technology team proposes RAVEN (Robust Advertisement Video Violation Temporal Grounding via Reinforcement Reasoning), a multimodal LLM framework that activates temporal reasoning without relying on manually annotated reasoning data. RAVEN integrates curriculum reinforcement learning and a hierarchical reward mechanism to achieve precise temporal grounding and robust classification.
Core Innovations
Curriculum Reinforcement Learning : A three‑stage training pipeline (precise data → coarse data → full dataset) gradually increases task difficulty, stabilising learning on noisy industrial data.
Hierarchical Rewards : Combines format rewards (enforcing <think> and <answer> structures), accuracy rewards (temporal IoU, boundary alignment, category consistency), and dynamic weighting across stages.
Structured Reasoning Mechanism : The model generates a full reasoning chain inside <think> tags and outputs a structured result inside <answer>, providing interpretability and logical consistency.
Reward Design Details
Format Reward
Reasoning must be enclosed in <think> tags.
Final answer must follow the pattern <answer>{category:..., interval:...}</answer>.
Temporal keywords "temporal start" and "temporal end" are required.
Accuracy Rewards
Temporal IoU Reward : Measures overlap between predicted and ground‑truth intervals.
Boundary Alignment Reward : Encourages exact start/end matching.
Category Consistency Reward : Ensures predicted violation categories match the ground truth.
Curriculum Training Stages
Stage 1 – Precise Annotations : Train on a small, accurately labelled subset. Rewards focus on all three accuracy components.
Stage 2 – Coarse Annotations : Train on large, noisy data. Rewards simplify to overall position and boundary alignment.
Stage 3 – Full Dataset Fine‑tuning : Combine precise and coarse data; balance all reward components for robust performance.
Experimental Validation
Offline Tests
RAVEN was compared against baselines LLaVA‑v1.5, Qwen2‑VL‑7B, and Qwen2.5‑VL‑7B (including their supervised‑fine‑tuned versions). It achieved superior violation‑category accuracy and temporal‑grounding precision, demonstrating the effectiveness of curriculum RL for robustness.
Online A/B Tests
Deployed on Tencent’s ad‑review platform with 20% traffic. RAVEN outperformed a smaller model and Qwen2.5‑VL‑7B‑SFT, improving category precision/recall and achieving an 8.5% higher interval‑accuracy.
Generalisation Study
RL‑trained RAVEN retained broader capabilities compared to SFT models, achieving higher accuracy on out‑of‑domain violation categories (e.g., low‑quality content, prohibited goods).
Ablation of Rewards and Curriculum
Removing format or boundary‑alignment rewards reduced performance, confirming their importance. Excluding the curriculum stage caused a 4.7% drop in temporal IoU, highlighting the necessity of progressive learning.
Conclusion
RAVEN demonstrates that combining curriculum reinforcement learning with hierarchical rewards enables multimodal LLMs to perform robust temporal grounding of ad‑video violations without extensive human‑annotated reasoning data. The framework achieves state‑of‑the‑art accuracy, mitigates catastrophic forgetting, and offers a scalable solution for real‑world ad compliance systems.
Tencent Advertising Technology
Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
