How Huya Live Uses Vector Search and Fine‑Ranking to Power Real‑Time Recommendations
This article explains Huya Live's recommendation architecture, covering business background, system design, vector retrieval challenges and solutions with ScaNN, and the fine‑ranking pipeline, while highlighting performance optimizations, scalability, and future directions for their live‑streaming platform.
Business Background
Huya Live’s recommendation scenarios include homepage live streams, square video recommendations, and in‑stream advertising. These focus on relationship graphs, keyword relevance, and long‑term value, differing from typical image or text recommendation tasks and influencing the engineering architecture.
System Architecture
The platform consists of an ingestion layer (transparent transmission, fusion, degradation, deduplication), a profiling layer (long‑term, short‑term, real‑time user and streamer features), and downstream recall, ranking, and re‑ranking modules supported by platform services.
Key differences include high‑frequency deduplication requirements for streamers whose tags change rapidly, demanding low‑latency, up‑to‑date processing.
Vector Retrieval
Background
Inspired by Google’s 2016 YouTube vector search, Huya moved from brute‑force retrieval to vector‑based ANN search using Faiss and ScaNN, selecting ScaNN for its algorithmic optimizations.
Technical Challenges
Need for high‑throughput, low‑latency, highly available service in production.
Fast data updates and fault tolerance to meet real‑time business needs.
Efficient data building while maintaining online service quality.
Architecture Implementation
A read‑write‑separated, file‑based design was adopted:
Index Builder : Generates vectors and builds index files in .npy format for compactness and easy debugging.
Distribution : Uses Alibaba’s open‑source Dragonfly for P2P file distribution.
Online Server : Split into a retrieval engine (supporting ANN and brute‑force) and an operator module, both accessed via SDK.
The retrieval engine employs double‑buffer lock‑free loading, batch queries, in‑memory computation, LRU caching, and instruction‑set optimizations to achieve high throughput and low latency. Data updates are versioned by timestamps, allowing multi‑version loading within one minute.
Fine‑Ranking Pipeline
Data Flow
Three stages: offline training, online scoring, and feature processing. Feature services cache user profiles with LRU and fallback strategies; streamer profiles use localized caching with double‑buffering to handle massive read amplification.
Feature Engineering
Features are serialized to tfrecord using Protocol Buffers for schema validation. Offline processing calls native extractors via JNI to keep consistency with online logic.
Inference Optimizations
Integrated inference service as a dynamic library compatible with the company’s gRPC ecosystem.
Applied community‑standard model warm‑up and dedicated thread pools.
Bandwidth‑limited model distribution during peak traffic.
Moved user‑feature copying to the inference service, reducing upstream bandwidth by over 50%.
These optimizations raised service stability to four‑nines and cut data transmission bandwidth by more than half.
Scalability and Extensibility
Scalability is achieved across three dimensions:
Standard data‑read APIs with customizable operator logic.
Compute‑storage co‑location with generic data‑management abstractions.
File‑level distribution for heterogeneous data sources.
Summary and Outlook
The current architecture meets high‑throughput, low‑latency, and high‑availability requirements, but further optimizations are planned to keep pace with evolving business demands and to continuously improve iteration efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
