Tagged articles
18 articles
Page 1 of 1
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

The article clarifies the dual meaning of “end‑to‑end” in speech AI—product simplicity and engineering unification—then outlines six emerging trends, from real‑time conversational latency to multilingual robustness, token‑based audio pipelines, voice‑specific security, edge privacy, and the growing importance of data quality and reproducibility.

Edge ComputingEnd-to-EndSpeech AI
0 likes · 8 min read
Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 21, 2026 · Artificial Intelligence

HappyOyster: Build an Explorable Interactive World with a Single Prompt

Alibaba’s ATH team unveiled HappyOyster, a real‑time world‑model platform that lets users generate and explore interactive 3D environments from a single sentence or image, offering two modes—Wander for exploration and Direct for creation—while detailing its streaming architecture, multimodal foundation, competitive advantages, use cases, and current limitations.

AI videoGame Developmentgenerative AI
0 likes · 11 min read
HappyOyster: Build an Explorable Interactive World with a Single Prompt
Machine Heart
Machine Heart
Apr 20, 2026 · Artificial Intelligence

AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction

AURA introduces an always‑on video LLM that processes streams frame‑by‑frame, decides when to stay silent or answer, uses a dual sliding‑window context and a Silent‑Speech Balanced Loss, achieves state‑of‑the‑art scores on StreamingBench, OVO‑Bench and OmniMMI, and runs at 2 FPS with ~312 ms end‑to‑end latency on two 80G GPUs.

AURABenchmarkSilent-Speech Loss
0 likes · 15 min read
AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction
21CTO
21CTO
Nov 4, 2025 · Artificial Intelligence

LongCat-Flash-Omni: How an Open-Source 560B Model Achieves Real-Time Multimodal Mastery

LongCat-Flash-Omni, an open‑source 560 billion‑parameter multimodal model, combines efficient Shortcut‑Connected MoE architecture with advanced perception and speech modules to deliver low‑latency real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, video, and audio tasks.

Multimodal AIaudio-visual processingefficient inference
0 likes · 10 min read
LongCat-Flash-Omni: How an Open-Source 560B Model Achieves Real-Time Multimodal Mastery
Meituan Technology Team
Meituan Technology Team
Nov 3, 2025 · Artificial Intelligence

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

LongCat-Flash-Omni, the latest open‑source model from Meituan, combines a 560 billion‑parameter architecture, efficient multimodal perception and speech reconstruction modules, and a progressive training strategy to deliver real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, audio, and video tasks.

AIBenchmarklarge language model
0 likes · 9 min read
LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 31, 2025 · Cloud Computing

How WebRTC Enables Millisecond‑Level Dual‑Direction Streaming in Cloud‑Based Mobile Testing

This article explains how a cloud testing platform leverages WebRTC to achieve sub‑200 ms bidirectional video transmission, enabling ultra‑low‑latency screen casting and remote camera feed replacement for mobile devices, and details the architecture, optimizations, performance gains, and future enhancements.

Mobile AutomationWebRTCcloud testing
0 likes · 20 min read
How WebRTC Enables Millisecond‑Level Dual‑Direction Streaming in Cloud‑Based Mobile Testing
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 19, 2025 · Artificial Intelligence

Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation

II-Agent is an open‑source, multi‑domain AI agent framework that leverages powerful large language models, a rich toolset, planning‑and‑reflection mechanisms, and advanced context management to enable autonomous task execution, real‑time interaction, and seamless integration across development, data analysis, and enterprise workflows.

AI AgentAutomationContext management
0 likes · 21 min read
Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation
KooFE Frontend Team
KooFE Frontend Team
May 22, 2025 · Artificial Intelligence

How AG-UI Protocol Bridges AI Agents and User Interfaces for Real‑Time Collaboration

The AG-UI (Agent User Interaction) protocol standardizes communication between backend AI agents and front‑end interfaces using a single JSON event stream, addressing real‑time streaming, tool orchestration, shared state, concurrency, security, and framework fragmentation to enable seamless human‑agent collaboration.

AG-UIAI agentsBackend
0 likes · 8 min read
How AG-UI Protocol Bridges AI Agents and User Interfaces for Real‑Time Collaboration
AI Frontier Lectures
AI Frontier Lectures
Apr 10, 2025 · Artificial Intelligence

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

WonderTurbo introduces a real‑time 3D scene generation pipeline that accelerates both geometry and appearance modeling to under a second per view, using StepSplat, QuickDepth, and FastPaint modules, achieving up to 15× speedup while maintaining high visual quality.

3D generationComputer VisionDepth Completion
0 likes · 16 min read
How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds
DataFunSummit
DataFunSummit
Dec 25, 2024 · Artificial Intelligence

Design and Implementation of a Multimodal Real-Time Voice AI Teammate for Naraka: Bladepoint

This article explains the design, implementation, and underlying Agent‑Oriented‑Programming framework of NetEase Fuxi’s multimodal real‑time voice AI teammate for the mobile game ‘Naraka: Bladepoint’, highlighting its capabilities such as autonomous navigation, combat assistance, natural dialogue, teaching, and broader applications of voice technology in games.

Naraka Bladepointagent-oriented programminggame AI
0 likes · 12 min read
Design and Implementation of a Multimodal Real-Time Voice AI Teammate for Naraka: Bladepoint
Bilibili Tech
Bilibili Tech
Sep 13, 2024 · Backend Development

Architectural Evolution of Bilibili Live Interaction Center

To solve duplicated functionality, legacy code, and scalability limits in Bilibili’s live‑streaming interaction services, the team created a unified Interaction Center that abstracts RTC, consolidates session, link, UI, scoring and role management, introduces a shared state machine and tracing, and evolves through phased, extensible architecture for higher performance and maintainability.

Performance MonitoringRTClive streaming
0 likes · 22 min read
Architectural Evolution of Bilibili Live Interaction Center
Bilibili Tech
Bilibili Tech
May 30, 2023 · Backend Development

Evolution of Interactive Live Streaming: Bilibili's Open Platform Journey

Bilibili’s live‑streaming tech team created an open interactive platform—spurred by the 600,000‑viewer success of Xiu Gou Nightclub—that supports hang‑up, host‑enhanced, and tool‑assisted streams, provides SDKs, APIs, data‑compliant authentication, tackles latency and rendering challenges, and now explores advertising, sponsorship and game‑promotion models to sustain its ecosystem.

SDK DevelopmentWebSocket communicationdata compliance
0 likes · 13 min read
Evolution of Interactive Live Streaming: Bilibili's Open Platform Journey
DataFunSummit
DataFunSummit
Dec 9, 2022 · Artificial Intelligence

Volcano Engine Virtual Digital Human Technology Overview

This article provides a comprehensive overview of Volcano Engine's virtual digital human platform, detailing its definition, AI‑driven and human‑driven classifications, 2D and 3D technical architectures, multi‑modal perception, interaction capabilities, application scenarios, and future development directions.

2D avatar3D AvatarComputer Vision
0 likes · 15 min read
Volcano Engine Virtual Digital Human Technology Overview
Baidu Geek Talk
Baidu Geek Talk
Sep 7, 2022 · Artificial Intelligence

Design and Architecture of AI Digital Human Live Streaming System

The paper presents a cloud‑native architecture for AI‑driven digital‑human live‑streaming, detailing three‑layer asset, interaction, and media modules, real‑time script and Q&A scheduling, fault‑tolerant rendering and control services, and demonstrates how virtual anchors can deliver continuous, lifelike 24/7 e‑commerce streams.

AIDigital HumanSystem Architecture
0 likes · 21 min read
Design and Architecture of AI Digital Human Live Streaming System
Tencent Cloud Developer
Tencent Cloud Developer
Sep 4, 2020 · Frontend Development

Introducing TWebLive: Tencent Cloud Web Live Interactive SDK

TWebLive, Tencent Cloud’s new web‑live interactive SDK, bundles TRTC, TIM and TCPlayer to let developers add push streaming, low‑latency WebRTC or CDN playback, and real‑time chat or bullet‑screen interaction with simple APIs, demo projects and open‑source code, replacing legacy Flash solutions.

JavaScriptSDKTencent Cloud
0 likes · 11 min read
Introducing TWebLive: Tencent Cloud Web Live Interactive SDK
Youku Technology
Youku Technology
Nov 21, 2019 · Industry Insights

How Alibaba Delivered a Global 4K Dolby‑Atmos Live Stream to 200+ Countries

Alibaba Entertainment’s 2019 Double‑11 "Cat Night" showcased a suite of cutting‑edge streaming technologies—including multi‑angle frame alignment, Dolby‑Atmos audio, low‑latency SRT transport, smart bitrate, edge‑cloud distribution, and a zero‑loss quality‑assurance system—that enabled a seamless 4K experience for viewers in over 200 countries.

Dolby Atmosedge-cloudglobal scale
0 likes · 9 min read
How Alibaba Delivered a Global 4K Dolby‑Atmos Live Stream to 200+ Countries
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 16, 2019 · Artificial Intelligence

How Alibaba’s AliPlayStudio Powers Real‑Time AI Video Interactions on Mobile

This article details the research and engineering behind Alibaba's AliPlayStudio, a video‑interactive platform that combines computer‑vision algorithms such as human parsing, gesture and pose detection, and controllable style transfer, all optimized for real‑time deployment on low‑power mobile and embedded devices.

Mobile AIgesture recognitionpose estimation
0 likes · 17 min read
How Alibaba’s AliPlayStudio Powers Real‑Time AI Video Interactions on Mobile
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 3, 2017 · Backend Development

How Alibaba Engineered Real‑Time, Cross‑Device Interaction for the 2016 Double‑11 Live Show

The article details Alibaba's technical innovations for the 2016 Double‑11 live event, covering two‑way audience interaction, time‑offset synchronization, massive real‑time like ranking, AR cross‑screen features, and the custom internet‑director console that together enabled seamless, high‑concurrency, multi‑platform engagement.

ARBackend Engineeringhigh concurrency
0 likes · 14 min read
How Alibaba Engineered Real‑Time, Cross‑Device Interaction for the 2016 Double‑11 Live Show