Tagged articles

Real-time inference

27 articles · Page 1 of 1

Jun 10, 2026 · Artificial Intelligence

OmniVoice: A Zero‑Shot TTS Paradigm Covering 600+ Languages

OmniVoice introduces a single‑stage, diffusion‑style language model that maps text directly to multi‑codebook acoustic tokens, achieving zero‑shot voice cloning for over 600 languages with high intelligibility and real‑time factor as low as 0.025, making it suitable for large‑scale multilingual deployment.

Acoustic tokenDiffusion language modelMultilingual speech synthesis

0 likes · 8 min read

OmniVoice: A Zero‑Shot TTS Paradigm Covering 600+ Languages

SuanNi

Jun 6, 2026 · Artificial Intelligence

How JoyAI‑Echo Overcomes Forgetting in Minute‑Long Video Generation

JoyAI‑Echo introduces a cross‑modal audio‑visual memory bank, a three‑stage post‑training pipeline, and a Director Agent to enable consistent, high‑quality, real‑time generation of minute‑level videos, achieving up to 7.5× inference speedup and state‑of‑the‑art benchmark scores.

Director AgentJoyAI-EchoReal-time inference

0 likes · 13 min read

How JoyAI‑Echo Overcomes Forgetting in Minute‑Long Video Generation

SuanNi

May 31, 2026 · Artificial Intelligence

How NVIDIA’s Gamma‑World Turns Single‑Agent Models into Multiplayer Experiences

Gamma‑World introduces a multi‑agent world model that solves identity, interaction, and real‑time inference challenges with parameter‑free geometric encoding, sparse hub attention, and teacher‑student distillation, enabling zero‑shot generalization from two to four agents and achieving 24 FPS interactive video generation.

Gamma-WorldReal-time inferenceSimplex Rotary Agent Encoding

0 likes · 11 min read

How NVIDIA’s Gamma‑World Turns Single‑Agent Models into Multiplayer Experiences

DaTaobao Tech

May 25, 2026 · Artificial Intelligence

Scaling to Ten‑Thousand QPS: Lessons from Building a Real‑Time Product‑Domain Agent

The article details how the product team tackled AI‑driven challenges by designing a two‑layer, event‑driven Function‑Centric Agent architecture that unifies workflow orchestration and capability supply, enabling real‑time inference for billions of items, cutting development cycles to one person‑week, and boosting search conversion rates.

AI AgentAIFunctionFunction Calling

0 likes · 29 min read

Scaling to Ten‑Thousand QPS: Lessons from Building a Real‑Time Product‑Domain Agent

Machine Heart

May 14, 2026 · Artificial Intelligence

Introducing TTFA: Hong Kong University’s Open‑Source FASTER Gives VLA Models Instant Reaction

The paper identifies real‑time latency as the main obstacle for deploying VLA models on robots, proposes the TTFA metric and the FASTER framework with a Horizon‑Aware Schedule, mixed scheduling and streaming inference, and demonstrates through extensive GPU and task experiments that TTFA and reaction time can be cut by up to three‑fold without sacrificing motion quality.

Embodied AIFASTERReal-time inference

0 likes · 14 min read

Introducing TTFA: Hong Kong University’s Open‑Source FASTER Gives VLA Models Instant Reaction

AI Engineering

May 8, 2026 · Artificial Intelligence

How GPT‑Realtime‑2 Leverages GPT‑5‑Level Reasoning to Redefine Voice AI Architecture

OpenAI’s GPT‑Realtime‑2 embeds GPT‑5‑class reasoning into a continuous‑audio loop, achieving 96.6% accuracy on Big Bench Audio, offering adjustable inference intensity with latency from 1.12 s to 2.33 s, a 128 K context window, and demonstrable gains in real‑world call success rates, while prompting industry debate over pricing and competitive impact.

GPT-5GPT-Realtime-2Latency

0 likes · 5 min read

How GPT‑Realtime‑2 Leverages GPT‑5‑Level Reasoning to Redefine Voice AI Architecture

Machine Heart

Apr 11, 2026 · Artificial Intelligence

How PiLoT Enables Monocular Drones to Navigate 10 km Drift‑Free and Lock onto Targets

PiLoT, a CVPR 2026 Highlight paper, introduces a neural pixel‑to‑3D registration framework that lets a single‑camera UAV achieve drift‑free 6‑DoF pose and real‑time target locking over 10 km without GNSS, running at 25‑30 FPS on an NVIDIA Jetson Orin and outperforming existing hybrid and absolute‑pose methods.

GNSS-denied navigationPiLoTReal-time inference

0 likes · 12 min read

How PiLoT Enables Monocular Drones to Navigate 10 km Drift‑Free and Lock onto Targets

Machine Heart

Apr 2, 2026 · Artificial Intelligence

From Tokens to Revenue: Kuaishou’s GR4AD Pioneers Full‑Stack Generative Recommendation for Ads

GR4AD, Kuaishou’s generative recommendation system, redesigns the entire ad pipeline—from tokenizing multimodal ad material to value‑aware learning, lazy decoding, and dynamic beam search—delivering over 4 % revenue lift, higher eCPM, and sub‑100 ms latency for more than 400 million users.

AdvertisingReal-time inferencegenerative recommendation

0 likes · 17 min read

From Tokens to Revenue: Kuaishou’s GR4AD Pioneers Full‑Stack Generative Recommendation for Ads

AIWalker

Mar 9, 2026 · Artificial Intelligence

How EFSI‑DETR Achieves 188 FPS and Boosts Small‑Object Detection Accuracy by 5.8%

The article dissects EFSI‑DETR, a UAV small‑object detector that combines simulated frequency processing with dynamic semantic enhancement to overcome pixel scarcity, static fusion, and ignored frequency cues, delivering 188 FPS and a 5.8% APₛ gain on VisDrone while remaining lightweight.

DETRReal-time inferenceUAV vision

0 likes · 16 min read

How EFSI‑DETR Achieves 188 FPS and Boosts Small‑Object Detection Accuracy by 5.8%

DataFunSummit

Feb 7, 2026 · Big Data

How Flink Enables Real‑Time AI Inference and Agent Construction

This article explains Apache Flink’s stream processing fundamentals, introduces the open‑source Flink Agents framework for building event‑driven AI agents, details Alibaba Cloud’s Flink AI Function for real‑time LLM inference, and showcases demos, architecture, integration patterns, and practical use cases such as VOC analysis, live‑stream analytics, and intelligent operations.

Apache FlinkBig DataCloud Computing

0 likes · 24 min read

How Flink Enables Real‑Time AI Inference and Agent Construction

Old Zhang's AI Learning

Jan 30, 2026 · Artificial Intelligence

Qwen3-ASR: Open‑Source Speech Recognition Supporting 52 Languages and Dialects, Outperforming Whisper

The Qwen3‑ASR series, now open‑sourced by Alibaba, offers three models (1.7B, 0.6B, and a 0.6B forced aligner) that cover 52 languages and 22 Chinese dialects, support streaming and offline inference, achieve an RTF of 0.064 with 2000× realtime throughput, handle singing with background music, and provide detailed deployment guides, benchmarks, and comparisons with other ASR solutions.

Qwen3-ASRReal-time inferenceforced aligner

0 likes · 15 min read

Qwen3-ASR: Open‑Source Speech Recognition Supporting 52 Languages and Dialects, Outperforming Whisper

HyperAI Super Neural

Jan 3, 2026 · Artificial Intelligence

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

Resemble AI’s open‑source Chatterbox‑Turbo reduces TTS generation from ten steps to one, enabling high‑sample‑rate, lossless voice cloning from a 5‑10 second reference while supporting emotional control, side‑language tags, and embedded watermarking for real‑time applications across chatbots, games, podcasts, and education.

Chatterbox‑TurboReal-time inferenceText‑to‑Speech

0 likes · 7 min read

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

Tencent Architect

Jul 2, 2025 · Artificial Intelligence

How Tencent’s TEG Shannon Lab Dominated the NTIRE 2025 UGC Video Enhancement Challenge

Tencent TEG Shannon Lab won the NTIRE 2025 UGC Video Enhancement competition with a progressive training framework that combines adaptive color enhancement, high‑speed denoising, and temporal stability under bitrate constraints, achieving top subjective scores, significant inference speed‑ups, and successful INT8 quantization for real‑time deployment.

AI video codecNTIRE2025Quantization

0 likes · 18 min read

How Tencent’s TEG Shannon Lab Dominated the NTIRE 2025 UGC Video Enhancement Challenge

iQIYI Technical Product Team

Oct 10, 2024 · Artificial Intelligence

Online Deep Learning (ODL) for Real‑Time Advertising Effectiveness: Challenges and Solutions

iQIYI’s minute‑level online deep‑learning framework overcomes stability, timeliness, compatibility, delayed feedback, catastrophic forgetting, and i.i.d. constraints through high‑availability pipelines, TensorFlow Example serialization, rapid P2P model distribution, flexible scheduling, disaster‑recovery rollbacks, PU‑loss adjustment, and knowledge‑distillation, delivering a 6.2% revenue boost.

AdvertisingCTR PredictionReal-time inference

0 likes · 9 min read

Online Deep Learning (ODL) for Real‑Time Advertising Effectiveness: Challenges and Solutions

Alibaba Cloud Big Data AI Platform

Jun 5, 2023 · Artificial Intelligence

How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

The Dynamic Graph Sampling (DGS) service, built on GraphLearn, delivers sub‑20 ms latency for real‑time GNN inference on large, constantly evolving graphs by separating storage from computation, using event‑driven pre‑sampling, lazy multi‑hop concatenation, and a publish‑subscribe architecture that scales linearly across distributed workers.

Alibaba CloudGraph Neural NetworksGraphLearn

0 likes · 12 min read

How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

Alipay Experience Technology

Nov 28, 2022 · Artificial Intelligence

Why Edge Intelligence Is Shaping the Future of Mobile Apps

This article explains the concept of edge intelligence, its advantages over cloud‑based AI, the technical challenges of deploying AI on mobile devices, Ant Group's development timeline, core technology stack, and future directions for edge‑cloud collaboration.

Real-time inferenceai-optimizationedge AI

0 likes · 10 min read

Why Edge Intelligence Is Shaping the Future of Mobile Apps

DataFunSummit

Sep 9, 2022 · Artificial Intelligence

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

The presentation by Tencent expert Yuan Yi details the Wuliang deep learning system for recommendation, covering its background, technical challenges such as massive data and real‑time requirements, the parameter‑server based solutions for training and inference, model compression techniques, and continuous online deployment strategies.

Large‑Scale TrainingParameter ServerReal-time inference

0 likes · 14 min read

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

Youku Technology

Jun 7, 2022 · Artificial Intelligence

Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough

To enable real‑time bullet‑comment passthrough on Youku’s mobile app, the team built a million‑scale portrait dataset and designed the AirSegNet series—CPU, GPU, and server variants—using VGG‑style nets, edge‑aware losses, and hybrid CPU‑GPU inference, achieving 0.98 IoU and sub‑15 ms latency on most devices.

MNN FrameworkPortrait SegmentationReal-time inference

0 likes · 13 min read

Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough

Baidu App Technology

Nov 25, 2021 · Game Development

Building an AI-Powered Object Hunt Game with Paddle.js and PaddleClas

The article details how to create the AI‑driven “Object Hunt Battle” game by processing data, designing and training a PP‑LCNet model with PaddleClas, converting it for Paddle.js, and integrating real‑time WebGL inference on mobile devices, achieving sub‑50 ms latency and encouraging developers to explore further.

AI game developmentPaddle.jsPaddleClas

0 likes · 9 min read

Building an AI-Powered Object Hunt Game with Paddle.js and PaddleClas

DataFunTalk

Sep 28, 2021 · Artificial Intelligence

Graph Modeling and GCN Exploration at 极验: Evolution, Offline and Real‑time Solutions

The talk presents an overview of graph neural network development, explains 极验's graph modeling research and evolution, and details offline and real‑time GCN solutions, including self‑supervised training, large‑scale handling, and performance comparisons, highlighting practical applications in fraud detection and risk control.

Anomaly DetectionGCNGraph Modeling

0 likes · 26 min read

Graph Modeling and GCN Exploration at 极验: Evolution, Offline and Real‑time Solutions

DataFunSummit

Mar 9, 2021 · Artificial Intelligence

Weibo Multimodal Content Understanding Service Architecture and GPU Heterogeneous Cluster Solutions

This article details Weibo's multimodal content understanding platform, covering its massive data challenges, heterogeneous model support, standardized pipelines, platformization, workflow architecture, GPU heterogeneous cluster management, resource scheduling, performance optimization, and full‑stack monitoring to achieve stable, low‑latency AI services at scale.

GPU ClusterMultimodal AIReal-time inference

0 likes · 18 min read

Weibo Multimodal Content Understanding Service Architecture and GPU Heterogeneous Cluster Solutions

DataFunTalk

Aug 27, 2020 · Artificial Intelligence

Model Serving in Real-Time: Insights from Alibaba’s User Interest Center

This article explains Alibaba’s User Interest Center approach to real‑time model serving, detailing how it separates offline sequence modeling from lightweight online inference, uses an online interest‑embedding store, and dramatically reduces latency for recommendation models such as DIEN and MIMN.

AlibabaEmbeddingReal-time inference

0 likes · 8 min read

Model Serving in Real-Time: Insights from Alibaba’s User Interest Center

iQIYI Technical Product Team

Jun 12, 2020 · Artificial Intelligence

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

Deepthought is iQIYI’s end‑to‑end machine‑learning platform that unifies distributed frameworks, decouples pipeline stages, integrates with Tongtian Tower, and offers visual drag‑and‑drop configuration, evolving from a fraud‑detection prototype to a generic system with real‑time inference, automated hyper‑parameter optimization, and support for large‑scale data across anti‑fraud, recommendation, and analytics workloads.

AI platformAutoMLData Engineering

0 likes · 13 min read

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

DataFunTalk

May 8, 2020 · Artificial Intelligence

Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems

The article explains how the fourth paradigm's distributed machine learning framework GDBT tackles the massive data, high‑dimensional features, and real‑time requirements of modern recommendation systems by leveraging heterogeneous computing, parameter servers, RDMA networking, and optimized workloads.

GDBTParameter ServerRDMA

0 likes · 18 min read

Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems

Tencent Cloud Developer

Mar 6, 2020 · Artificial Intelligence

WeChat "Scan" Object Detection: Mobile AI Model Design, Optimization, and Deployment

The paper presents a lightweight, anchor‑free CenterNet‑based object‑ness detector for WeChat’s Scan feature, built on a ShuffleNetV2 backbone with enlarged 5×5 depth‑wise convolutions, a streamlined detection head, and a Pyramid Interpolation Module, then quantized, ONNX‑converted and NCNN‑deployed to achieve a 436 KB model running in ~15 ms per frame on an iPhone 8 CPU.

CenterNetModel OptimizationReal-time inference

0 likes · 12 min read

WeChat "Scan" Object Detection: Mobile AI Model Design, Optimization, and Deployment

Alibaba Cloud Developer

Dec 20, 2019 · Artificial Intelligence

How AI-Powered Hand Gesture Detection Drove a Double‑11 Celebrity Rock‑Paper‑Scissors Game

This article details how Alibaba leveraged AI-driven hand‑gesture detection and a lightweight SSD‑based object detection model to create an interactive rock‑paper‑scissors game for Double‑11, addressing challenges of undefined gestures, real‑time mobile performance, and data collection, and achieving over 16 million page views and high accuracy.

Real-time inferenceSSDfeature pyramid network

0 likes · 22 min read

How AI-Powered Hand Gesture Detection Drove a Double‑11 Celebrity Rock‑Paper‑Scissors Game

Alibaba Cloud Developer

Oct 23, 2018 · Artificial Intelligence

How DFSMN Cuts Speech Synthesis Model Size by 75% While Quadrupling Speed

This paper introduces a Deep Feedforward Sequential Memory Network (DFSMN) for statistical parametric speech synthesis that matches BLSTM quality with only a quarter of the model size and four times faster inference, making it ideal for memory‑constrained, real‑time IoT devices.

DFSMNIoT devicesReal-time inference

0 likes · 10 min read

How DFSMN Cuts Speech Synthesis Model Size by 75% While Quadrupling Speed