Tagged articles
3 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 27, 2026 · Artificial Intelligence

Why Traditional Video Captions Fail and How MTSS Solves the Problem

The article introduces Multi-Stream Scene Script (MTSS), a structured JSON‑based video description paradigm that replaces monolithic captions, explains its design principles, compares its advantages, and presents experimental evidence showing significant gains in both video understanding and generation tasks.

MTSSMultimodal AIVideo Generation
0 likes · 8 min read
Why Traditional Video Captions Fail and How MTSS Solves the Problem
Alimama Tech
Alimama Tech
Oct 29, 2025 · Artificial Intelligence

LLM Breakthroughs at EMNLP 2025: Embedding Compression, Complex Instructions, Knowledge Scaling

EMNLP 2025 in Suzhou showcases Taobao's booth featuring four cutting‑edge AI papers that introduce a novel embedding compression framework, an automatic iterative refinement method for complex instruction generation, a knowledge infusion scaling law for large language models, and a video caption optimization approach for text‑to‑video generation.

Large Language Modelsembedding compressioninstruction generation
0 likes · 7 min read
LLM Breakthroughs at EMNLP 2025: Embedding Compression, Complex Instructions, Knowledge Scaling
DataFunSummit
DataFunSummit
Sep 17, 2024 · Artificial Intelligence

Multimodal Video Understanding for Real-World Surveillance: Tasks, Dataset, Models, and Challenges

This article presents a comprehensive overview of multimodal video understanding for real-world surveillance, covering task definitions, the new UCA multimodal surveillance dataset, baseline models for video moment localization, captioning, and anomaly detection, experimental results, challenges, and future research directions.

AI modelsmultimodal video understandingsurveillance dataset
0 likes · 19 min read
Multimodal Video Understanding for Real-World Surveillance: Tasks, Dataset, Models, and Challenges