Kuaishou Tech
Dec 4, 2025 · Artificial Intelligence
Can a Tree‑Reasoned Model Master Video Emotion Understanding?
The paper introduces VidEmo, a multimodal video foundation model that uses a two‑stage emotion‑clue‑guided reasoning framework and a large emotion‑centric dataset (Emo‑CFG) to achieve state‑of‑the‑art performance on facial attribute, expression, and fine‑grained emotion tasks, surpassing Gemini 2.0.
AIDatasetcomputer vision
0 likes · 15 min read
