Artificial Intelligence 6 min read

Ant Insurance and Zhejiang University’s AAAI 2025 Papers Tackle Hallucination in Large Vision‑Language and Video Models

Two collaborative papers by Ant Insurance and Zhejiang University were accepted at AAAI 2025, introducing the MoLE decoding framework to reduce hallucination in large vision‑language models and the MHBench benchmark plus Motion Contrastive Decoding to address motion hallucination in video large language models, advancing reliable AI‑driven insurance claim processing.

AntTech
AntTech
AntTech
Ant Insurance and Zhejiang University’s AAAI 2025 Papers Tackle Hallucination in Large Vision‑Language and Video Models

Recently, the 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025) announced its paper acceptance results, with two papers co‑authored by Ant Insurance and Zhejiang University successfully accepted. The conference received 12,957 submissions and accepted 3,032 papers, yielding an acceptance rate of 23.4%.

In health‑insurance claim processing, large vision‑language models (LVLMs) are widely used to automatically handle claim applications, analyze medical documents, and determine eligibility. However, hallucination—where generated suggestions conflict with actual medical records or policy details—remains a critical issue. To mitigate this, Ant Insurance partnered with Zhejiang University to propose innovative solutions.

Paper 1: MoLE: Decoding by Mixture of Layer Experts Alleviates Hallucination in Large Vision‑Language Models

The authors analyze the hierarchical decoding of LVLMs and identify hallucination sources during inference and factual information injection, especially as token length grows and prompts are forgotten. They introduce a training‑free decoding method called Mixture of Layer Experts (MoLE), which employs a heuristic gating mechanism to dynamically select multiple LVLM layers as expert layers: a final‑expert layer, a second‑opinion expert layer, and a prompt‑preserving expert layer. The collaboration of these experts enhances generation robustness and credibility.

The final‑expert layer refines the ultimate output, the second‑opinion expert layer provides alternative insights (analogous to seeking a second opinion in medical or claim decisions), and the prompt‑preserving expert layer retains the original input to prevent prompt forgetting that can cause hallucination in long sequences.

Paper 2: MHBench: Demystifying Motion Hallucination in VideoLLMs

The second paper introduces the concept of “motion hallucination,” where video large language models (VideoLLMs) generate implausible actions due to insufficient motion perception. To evaluate this, the authors created MHBench, a benchmark comprising 1,200 videos across 20 action categories, using adversarial triplet videos (original, opposite, incomplete) for comprehensive assessment.

Additionally, the authors propose Motion Contrastive Decoding (MotionCD), which leverages bidirectional motion cancellation between an original video and its reversed version to construct a model that removes motion influence while preserving visual information, effectively suppressing motion hallucination.

These joint research outcomes provide innovative solutions for intelligent claim processing, especially in reducing hallucination, and are expected to improve the accuracy and efficiency of claim audits, driving the insurance industry toward more precise and effective AI‑enabled operations.

AI researchAAAI 2025hallucinationlarge vision-language modelsvideo LLMs
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.