Huya Live Streaming Recommendation Architecture: Business Background, System Design, Vector Retrieval, and Ranking
This article presents a comprehensive overview of Huya's live‑streaming recommendation system, covering business background, overall architecture, vector‑based retrieval, detailed ranking pipeline, technical challenges, deployment strategies, scalability, and future outlook.
Guest Speaker: Li Cha (Huya Live)
Editor: Luo Zhuang (Soul)
Platform: DataFunTalk
Introduction: Hello, I am Li Cha from Huya Live's recommendation engineering team. Huya's live‑stream recommendation focuses on top streamers, emphasizing relationship graphs, textual cues, and long‑term value, which leads to distinct engineering requirements compared with other recommendation scenarios.
The talk will cover the following topics: Business Background, System Architecture, Vector Retrieval, Ranking, and Summary & Outlook.
Business Background
System Architecture
Vector Retrieval
Ranking
Summary and Outlook
01
Business Background
Huya's recommendation scenarios include homepage live recommendations, square video recommendations, and live‑room ad recommendations. Live streaming is a top‑streamer‑centric scenario that values relationship chains, textual cues, and long‑term value, resulting in unique business demands that are reflected in the system architecture.
02
System Architecture
Huya's recommendation pipeline follows the typical industry architecture with some customizations. The ingestion layer handles transparent passing, fusion, degradation, and deduplication. The profiling layer provides long‑term, short‑term, and real‑time user and streamer features. Downstream modules include recall, ranking, re‑ranking, and supporting platform services.
Compared with typical image/video recommendation, Huya requires higher‑frequency deduplication because streamer attributes can change rapidly (e.g., a gamer switching to a talent stream). This imposes stricter timeliness requirements on the deduplication process.
Later sections will dive into vector retrieval and ranking, which cover most of the technical depth of the recommendation system.
03
Vector Retrieval
1. Background
In 2016 Google published the vector‑based retrieval architecture used in YouTube recommendation and search, showing significant gains. Many modern recommendation systems now improve business metrics by optimizing embeddings.
Huya initially used brute‑force retrieval due to a small number of streamers. As the platform grew, brute‑force became infeasible, prompting a shift to vector retrieval at the beginning of this year.
We evaluated Facebook's open‑source Faiss and Google's open‑source ScaNN; ScaNN offered algorithmic optimizations that suited our needs.
2. Technical Challenges
Production requires a high‑throughput, low‑latency, highly available system.
Data must be updated quickly to meet vector‑retrieval business needs, and the system must tolerate failures.
Efficient data‑building pipelines are needed to guarantee service quality.
3. Architecture Implementation
We designed a read‑write‑separated, file‑based architecture:
The index builder produces vector embeddings and writes them to binary .npy files, reducing size and simplifying debugging. The builder interacts with models via an SDK and can be used independently for testing.
File distribution uses Alibaba's open‑source Dragonfly for P2P delivery, integrating with the company's file system.
The online server is split into a retrieval engine and an operator module, both accessed via SDK.
Retrieval Engine: Supports ANN and brute‑force search, with load/unload and double‑buffer switching for stability.
Operator Module: Designed with a generic input‑output interface for easy extension and reuse.
Deployment is managed through a control platform, improving iteration speed.
Online queries use a lock‑free double‑buffered index load, batch processing, pure‑memory computation, LRU caching, and CPU instruction optimizations to achieve high throughput and low latency. Builder and server are decoupled, and the service is stateless for rapid scaling.
Data updates are fast: a 2‑million‑record dataset can be loaded into memory within 5 seconds and distributed in 10 seconds. Files are versioned by timestamp, supporting multi‑version online loading with validation and alerting, completing the whole update cycle within a minute.
Offline builder optimizations include a semi‑automatic hyper‑parameter search tool, distributed locking for task acquisition, multi‑process parallel building, and extensive metric validation (latency, recall, etc.). Currently, Top‑20 ANN recall reaches 0.99 coverage, with over 90 % success rate and the ability to handle 50+ tasks across three builder nodes within minutes.
Scalability is achieved at service, data, and engine layers: stateless services, distributed lock‑based builder, configurable data shards, standard data‑read APIs, compute‑storage integration, and heterogeneous file distribution.
04
Ranking
1. Data Flow
The ranking pipeline consists of offline training, online scoring, and feature processing. Feature processing extracts long‑term, short‑term, and real‑time user/streamer interests. User profile service uses LRU caching and graceful degradation; streamer profile service employs local double‑buffer caching to handle high read amplification.
2. Features
Features are stored in clear‑text TFRecord files with Protocol Buffers schemas for validation. Offline feature extraction uses JNI to call the same extractor as the online path, ensuring consistency.
3. Inference Optimizations
Integrated the gRPC‑based inference service as a dynamic library to fit the company’s ecosystem.
Applied common community optimizations: model warm‑up and dedicated thread pools.
Bandwidth throttling during peak periods to control model download traffic.
Moved user‑feature copying from the client side to the inference service, reducing bandwidth by over 50 %.
After these optimizations, the ranking service achieves four‑nine availability and saves more than 50 % of data‑transfer bandwidth.
05
Summary and Outlook
We conclude with a brief outlook: the architecture still has many optimization opportunities. We will continue to follow business trends, refine the platform, and improve iteration efficiency.
Thank you for listening.
Please like, share, and give a triple‑click at the end!
Guest Speaker:
Free Resources:
Download the PPT of the core AI algorithm treasure book (electronic version)
Download the Big Data classic collection PPT e‑book
Event Recommendation:
About Us:
DataFun focuses on big data and AI technology sharing and交流. Founded in 2017, it has held over 100 offline and 100 online salons, forums, and conferences in Beijing, Shanghai, Shenzhen, Hangzhou, etc., inviting nearly 1,000 experts and scholars. Its WeChat public account DataFunTalk has produced over 500 original articles, with millions of reads and over 130,000 followers.
🧐 Share, like, and watch , give a triple‑click ! 👇
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.