Tagged articles

Streaming Inference

3 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Mar 18, 2026 · Artificial Intelligence

Breaking the ‘See‑then‑Think’ Barrier: Real‑Time ‘See‑and‑Think’ for VLMs (CVPR 2026)

The paper introduces TaYS (Think‑as‑You‑See), a streaming chain‑of‑thought framework that replaces the traditional “watch‑then‑think” video inference pipeline with a parallel, real‑time “watch‑and‑think” approach, dramatically reducing latency and improving accuracy on complex video reasoning tasks.

Chain-of-ThoughtDual KV-CacheStreaming Inference

0 likes · 8 min read

Breaking the ‘See‑then‑Think’ Barrier: Real‑Time ‘See‑and‑Think’ for VLMs (CVPR 2026)

Alibaba Cloud Big Data AI Platform

Dec 16, 2025 · Artificial Intelligence

How CosyVoice 2.0 Cuts First‑Chunk Latency for High‑Fidelity Voice Cloning

CosyVoice 2.0, Alibaba DAMO Academy's next‑gen high‑fidelity speech synthesis model, introduces architecture decoupling, streaming generation, reference‑audio caching and dynamic load balancing to dramatically reduce first‑packet latency and improve real‑time factor while supporting multi‑language voice cloning.

AI model optimizationStreaming Inferencelow-latency

0 likes · 9 min read

How CosyVoice 2.0 Cuts First‑Chunk Latency for High‑Fidelity Voice Cloning

58 Tech

Jan 12, 2023 · Artificial Intelligence

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

This article presents a comprehensive overview of the Efficient Conformer model for large‑scale end‑to‑end speech recognition, detailing its architectural improvements such as progressive downsampling and grouped multi‑head self‑attention, the PyTorch implementation in WeNet, streaming inference handling, experimental CER gains on AISHELL‑1 and production data, and future development plans.

ASREfficient ConformerModel Optimization

0 likes · 16 min read

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

Streaming Inference

Breaking the ‘See‑then‑Think’ Barrier: Real‑Time ‘See‑and‑Think’ for VLMs (CVPR 2026)

How CosyVoice 2.0 Cuts First‑Chunk Latency for High‑Fidelity Voice Cloning

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

How CosyVoice 2.0 Cuts First‑Chunk Latency for High‑Fidelity Voice Cloning