Tagged articles
3 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 18, 2026 · Artificial Intelligence

Breaking the ‘See‑then‑Think’ Barrier: Real‑Time ‘See‑and‑Think’ for VLMs (CVPR 2026)

The paper introduces TaYS (Think‑as‑You‑See), a streaming chain‑of‑thought framework that replaces the traditional “watch‑then‑think” video inference pipeline with a parallel, real‑time “watch‑and‑think” approach, dramatically reducing latency and improving accuracy on complex video reasoning tasks.

Dual KV-CacheReal-time VideoStreaming Inference
0 likes · 8 min read
Breaking the ‘See‑then‑Think’ Barrier: Real‑Time ‘See‑and‑Think’ for VLMs (CVPR 2026)
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 16, 2025 · Artificial Intelligence

How CosyVoice 2.0 Cuts First‑Chunk Latency for High‑Fidelity Voice Cloning

CosyVoice 2.0, Alibaba DAMO Academy's next‑gen high‑fidelity speech synthesis model, introduces architecture decoupling, streaming generation, reference‑audio caching and dynamic load balancing to dramatically reduce first‑packet latency and improve real‑time factor while supporting multi‑language voice cloning.

AI model optimizationLow latencyStreaming Inference
0 likes · 9 min read
How CosyVoice 2.0 Cuts First‑Chunk Latency for High‑Fidelity Voice Cloning
58 Tech
58 Tech
Jan 12, 2023 · Artificial Intelligence

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

This article presents a comprehensive overview of the Efficient Conformer model for large‑scale end‑to‑end speech recognition, detailing its architectural improvements such as progressive downsampling and grouped multi‑head self‑attention, the PyTorch implementation in WeNet, streaming inference handling, experimental CER gains on AISHELL‑1 and production data, and future development plans.

ASREfficient ConformerModel Optimization
0 likes · 16 min read
Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results