Tagged articles

Real-time Interaction

21 articles · Page 1 of 1

Jul 3, 2026 · Artificial Intelligence

How an AI Agent Turned a Live Stream into a Real‑Time Interactive Show for 935,000 Viewers

A two‑hour Douyin live broadcast demonstrated an AI‑driven interactive game where the AI acted as scriptwriter, host and scheduler, handling multimodal inputs, real‑time state management and fault‑tolerant runtime, achieving 935k total exposures and 29k peak concurrent viewers while redefining live‑stream participation.

AI AgentAgent RuntimeComplexity Engineering

0 likes · 17 min read

How an AI Agent Turned a Live Stream into a Real‑Time Interactive Show for 935,000 Viewers

JD Cloud Developers

Jun 23, 2026 · Artificial Intelligence

From Q&A to Real‑Time Seeing & Speaking: JD’s First Open‑Source JoyAI‑VL‑Interaction

JD’s open‑source JoyAI‑VL‑Interaction transforms large‑model AI from static question‑answering to continuous, on‑scene observation, proactive judgment, and real‑time response, offering agent delegation and achieving up to 87.9% win rate against leading video assistants in live benchmarks.

AI assistantBenchmarkMultimodal AI

0 likes · 9 min read

From Q&A to Real‑Time Seeing & Speaking: JD’s First Open‑Source JoyAI‑VL‑Interaction

Machine Learning Algorithms & Natural Language Processing

May 22, 2026 · Artificial Intelligence

Li Mu Returns to Bilibili with a Real-Time AI Avatar

Li Mu (沐神) returns to Bilibili after a year to showcase Higgs Avatar v1, a fully AI‑generated real‑time digital human that can listen, speak, lip‑sync and display facial expressions, with performance metrics showing 16 ms per frame on a single H100 GPU and potential applications ranging from customer service to training, while also raising ethical considerations about identity and trust.

AI AvatarBoson AIHiggs Avatar

0 likes · 7 min read

Weekly Large Model Application

May 5, 2026 · Artificial Intelligence

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

The article clarifies the dual meaning of “end‑to‑end” in speech AI—product simplicity and engineering unification—then outlines six emerging trends, from real‑time conversational latency to multilingual robustness, token‑based audio pipelines, voice‑specific security, edge privacy, and the growing importance of data quality and reproducibility.

End-to-EndLarge Language ModelsReal-time Interaction

0 likes · 8 min read

Where Is End‑to‑End Speech AI Heading? Product vs Engineering Perspectives

Lao Guo's Learning Space

Apr 21, 2026 · Artificial Intelligence

HappyOyster: Build an Explorable Interactive World with a Single Prompt

Alibaba’s ATH team unveiled HappyOyster, a real‑time world‑model platform that lets users generate and explore interactive 3D environments from a single sentence or image, offering two modes—Wander for exploration and Direct for creation—while detailing its streaming architecture, multimodal foundation, competitive advantages, use cases, and current limitations.

AI videoGame DevelopmentGenerative AI

0 likes · 11 min read

HappyOyster: Build an Explorable Interactive World with a Single Prompt

Machine Heart

Apr 20, 2026 · Artificial Intelligence

AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction

AURA introduces an always‑on video LLM that processes streams frame‑by‑frame, decides when to stay silent or answer, uses a dual sliding‑window context and a Silent‑Speech Balanced Loss, achieves state‑of‑the‑art scores on StreamingBench, OVO‑Bench and OmniMMI, and runs at 2 FPS with ~312 ms end‑to‑end latency on two 80G GPUs.

AURABenchmarkReal-time Interaction

0 likes · 15 min read

AURA: Real-Time Video Understanding Shifts from Post-Play Q&A to Continuous Interaction

21CTO

Nov 4, 2025 · Artificial Intelligence

LongCat-Flash-Omni: How an Open-Source 560B Model Achieves Real-Time Multimodal Mastery

LongCat-Flash-Omni, an open‑source 560 billion‑parameter multimodal model, combines efficient Shortcut‑Connected MoE architecture with advanced perception and speech modules to deliver low‑latency real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, video, and audio tasks.

Efficient InferenceLarge Language ModelMultimodal AI

0 likes · 10 min read

LongCat-Flash-Omni: How an Open-Source 560B Model Achieves Real-Time Multimodal Mastery

Meituan Technology Team

Nov 3, 2025 · Artificial Intelligence

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

LongCat-Flash-Omni, the latest open‑source model from Meituan, combines a 560 billion‑parameter architecture, efficient multimodal perception and speech reconstruction modules, and a progressive training strategy to deliver real‑time audio‑video interaction and state‑of‑the‑art performance across text, image, audio, and video tasks.

AIBenchmarkLarge Language Model

0 likes · 9 min read

LongCat-Flash-Omni: 560B Open‑Source Multimodal Model with Real‑Time Interaction

Instant Consumer Technology Team

Oct 31, 2025 · Cloud Computing

How WebRTC Enables Millisecond‑Level Dual‑Direction Streaming in Cloud‑Based Mobile Testing

This article explains how a cloud testing platform leverages WebRTC to achieve sub‑200 ms bidirectional video transmission, enabling ultra‑low‑latency screen casting and remote camera feed replacement for mobile devices, and details the architecture, optimizations, performance gains, and future enhancements.

Real-time InteractionWebRTCcloud testing

0 likes · 20 min read

How WebRTC Enables Millisecond‑Level Dual‑Direction Streaming in Cloud‑Based Mobile Testing

Instant Consumer Technology Team

Jun 19, 2025 · Artificial Intelligence

Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation

II-Agent is an open‑source, multi‑domain AI agent framework that leverages powerful large language models, a rich toolset, planning‑and‑reflection mechanisms, and advanced context management to enable autonomous task execution, real‑time interaction, and seamless integration across development, data analysis, and enterprise workflows.

AI AgentAutomationContext Management

0 likes · 21 min read

Exploring II-Agent: An Open‑Source AI Agent Framework for Multi‑Domain Automation

KooFE Frontend Team

May 22, 2025 · Artificial Intelligence

How AG-UI Protocol Bridges AI Agents and User Interfaces for Real‑Time Collaboration

The AG-UI (Agent User Interaction) protocol standardizes communication between backend AI agents and front‑end interfaces using a single JSON event stream, addressing real‑time streaming, tool orchestration, shared state, concurrency, security, and framework fragmentation to enable seamless human‑agent collaboration.

AG-UIAI agentsReal-time Interaction

0 likes · 8 min read

How AG-UI Protocol Bridges AI Agents and User Interfaces for Real‑Time Collaboration

AI Frontier Lectures

Apr 10, 2025 · Artificial Intelligence

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

WonderTurbo introduces a real‑time 3D scene generation pipeline that accelerates both geometry and appearance modeling to under a second per view, using StepSplat, QuickDepth, and FastPaint modules, achieving up to 15× speedup while maintaining high visual quality.

3D generationDepth CompletionGeometry Modeling

0 likes · 16 min read

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

DataFunSummit

Dec 25, 2024 · Artificial Intelligence

Design and Implementation of a Multimodal Real-Time Voice AI Teammate for Naraka: Bladepoint

This article explains the design, implementation, and underlying Agent‑Oriented‑Programming framework of NetEase Fuxi’s multimodal real‑time voice AI teammate for the mobile game ‘Naraka: Bladepoint’, highlighting its capabilities such as autonomous navigation, combat assistance, natural dialogue, teaching, and broader applications of voice technology in games.

Naraka BladepointReal-time Interactionagent-oriented programming

0 likes · 12 min read

Design and Implementation of a Multimodal Real-Time Voice AI Teammate for Naraka: Bladepoint

Bilibili Tech

Sep 13, 2024 · Backend Development

Architectural Evolution of Bilibili Live Interaction Center

To solve duplicated functionality, legacy code, and scalability limits in Bilibili’s live‑streaming interaction services, the team created a unified Interaction Center that abstracts RTC, consolidates session, link, UI, scoring and role management, introduces a shared state machine and tracing, and evolves through phased, extensible architecture for higher performance and maintainability.

Live StreamingRTCReal-time Interaction

0 likes · 22 min read

Architectural Evolution of Bilibili Live Interaction Center

Bilibili Tech

May 30, 2023 · Backend Development

Evolution of Interactive Live Streaming: Bilibili's Open Platform Journey

Bilibili’s live‑streaming tech team created an open interactive platform—spurred by the 600,000‑viewer success of Xiu Gou Nightclub—that supports hang‑up, host‑enhanced, and tool‑assisted streams, provides SDKs, APIs, data‑compliant authentication, tackles latency and rendering challenges, and now explores advertising, sponsorship and game‑promotion models to sustain its ecosystem.

Live StreamingReal-time InteractionSDK Development

0 likes · 13 min read

Evolution of Interactive Live Streaming: Bilibili's Open Platform Journey

DataFunSummit

Dec 9, 2022 · Artificial Intelligence

Volcano Engine Virtual Digital Human Technology Overview

This article provides a comprehensive overview of Volcano Engine's virtual digital human platform, detailing its definition, AI‑driven and human‑driven classifications, 2D and 3D technical architectures, multi‑modal perception, interaction capabilities, application scenarios, and future development directions.

2D avatar3D avatarMultimodal AI

0 likes · 15 min read

Volcano Engine Virtual Digital Human Technology Overview

Baidu Geek Talk

Sep 7, 2022 · Artificial Intelligence

Design and Architecture of AI Digital Human Live Streaming System

The paper presents a cloud‑native architecture for AI‑driven digital‑human live‑streaming, detailing three‑layer asset, interaction, and media modules, real‑time script and Q&A scheduling, fault‑tolerant rendering and control services, and demonstrates how virtual anchors can deliver continuous, lifelike 24/7 e‑commerce streams.

AILive StreamingReal-time Interaction

0 likes · 21 min read

Design and Architecture of AI Digital Human Live Streaming System

Tencent Cloud Developer

Sep 4, 2020 · Frontend Development

Introducing TWebLive: Tencent Cloud Web Live Interactive SDK

TWebLive, Tencent Cloud’s new web‑live interactive SDK, bundles TRTC, TIM and TCPlayer to let developers add push streaming, low‑latency WebRTC or CDN playback, and real‑time chat or bullet‑screen interaction with simple APIs, demo projects and open‑source code, replacing legacy Flash solutions.

JavaScriptReal-time InteractionTencent Cloud

0 likes · 11 min read

Introducing TWebLive: Tencent Cloud Web Live Interactive SDK

Youku Technology

Nov 21, 2019 · Industry Insights

How Alibaba Delivered a Global 4K Dolby‑Atmos Live Stream to 200+ Countries

Alibaba Entertainment’s 2019 Double‑11 "Cat Night" showcased a suite of cutting‑edge streaming technologies—including multi‑angle frame alignment, Dolby‑Atmos audio, low‑latency SRT transport, smart bitrate, edge‑cloud distribution, and a zero‑loss quality‑assurance system—that enabled a seamless 4K experience for viewers in over 200 countries.

Dolby AtmosLive StreamingReal-time Interaction

0 likes · 9 min read

How Alibaba Delivered a Global 4K Dolby‑Atmos Live Stream to 200+ Countries

Alibaba Cloud Developer

Jan 16, 2019 · Artificial Intelligence

How Alibaba’s AliPlayStudio Powers Real‑Time AI Video Interactions on Mobile

This article details the research and engineering behind Alibaba's AliPlayStudio, a video‑interactive platform that combines computer‑vision algorithms such as human parsing, gesture and pose detection, and controllable style transfer, all optimized for real‑time deployment on low‑power mobile and embedded devices.

Real-time InteractionSemantic Segmentationgesture recognition

0 likes · 17 min read

How Alibaba’s AliPlayStudio Powers Real‑Time AI Video Interactions on Mobile

Alibaba Cloud Developer

Jan 3, 2017 · Backend Development

How Alibaba Engineered Real‑Time, Cross‑Device Interaction for the 2016 Double‑11 Live Show

The article details Alibaba's technical innovations for the 2016 Double‑11 live event, covering two‑way audience interaction, time‑offset synchronization, massive real‑time like ranking, AR cross‑screen features, and the custom internet‑director console that together enabled seamless, high‑concurrency, multi‑platform engagement.

ARBackend EngineeringHigh concurrency

0 likes · 14 min read

How Alibaba Engineered Real‑Time, Cross‑Device Interaction for the 2016 Double‑11 Live Show