Artificial Intelligence 11 min read

How Huya Live Uses Vector Search and Fine‑Ranking to Power Real‑Time Recommendations

This article explains Huya Live's recommendation architecture, covering business background, system design, vector retrieval challenges and solutions with ScaNN, and the fine‑ranking pipeline, while highlighting performance optimizations, scalability, and future directions for their live‑streaming platform.

ITPUB

Jul 16, 2022

How Huya Live Uses Vector Search and Fine‑Ranking to Power Real‑Time Recommendations

Business Background

Huya Live’s recommendation scenarios include homepage live streams, square video recommendations, and in‑stream advertising. These focus on relationship graphs, keyword relevance, and long‑term value, differing from typical image or text recommendation tasks and influencing the engineering architecture.

System Architecture

The platform consists of an ingestion layer (transparent transmission, fusion, degradation, deduplication), a profiling layer (long‑term, short‑term, real‑time user and streamer features), and downstream recall, ranking, and re‑ranking modules supported by platform services.

Key differences include high‑frequency deduplication requirements for streamers whose tags change rapidly, demanding low‑latency, up‑to‑date processing.

Vector Retrieval

Background

Inspired by Google’s 2016 YouTube vector search, Huya moved from brute‑force retrieval to vector‑based ANN search using Faiss and ScaNN, selecting ScaNN for its algorithmic optimizations.

Technical Challenges

Need for high‑throughput, low‑latency, highly available service in production.

Fast data updates and fault tolerance to meet real‑time business needs.

Efficient data building while maintaining online service quality.

Architecture Implementation

A read‑write‑separated, file‑based design was adopted:

Index Builder : Generates vectors and builds index files in .npy format for compactness and easy debugging.

Distribution : Uses Alibaba’s open‑source Dragonfly for P2P file distribution.

Online Server : Split into a retrieval engine (supporting ANN and brute‑force) and an operator module, both accessed via SDK.

The retrieval engine employs double‑buffer lock‑free loading, batch queries, in‑memory computation, LRU caching, and instruction‑set optimizations to achieve high throughput and low latency. Data updates are versioned by timestamps, allowing multi‑version loading within one minute.

Fine‑Ranking Pipeline

Data Flow

Three stages: offline training, online scoring, and feature processing. Feature services cache user profiles with LRU and fallback strategies; streamer profiles use localized caching with double‑buffering to handle massive read amplification.

Feature Engineering

Features are serialized to tfrecord using Protocol Buffers for schema validation. Offline processing calls native extractors via JNI to keep consistency with online logic.

Inference Optimizations

Integrated inference service as a dynamic library compatible with the company’s gRPC ecosystem.

Applied community‑standard model warm‑up and dedicated thread pools.

Bandwidth‑limited model distribution during peak traffic.

Moved user‑feature copying to the inference service, reducing upstream bandwidth by over 50%.

These optimizations raised service stability to four‑nines and cut data transmission bandwidth by more than half.

Scalability and Extensibility

Scalability is achieved across three dimensions:

Standard data‑read APIs with customizable operator logic.

Compute‑storage co‑location with generic data‑management abstractions.

File‑level distribution for heterogeneous data sources.

Summary and Outlook

The current architecture meets high‑throughput, low‑latency, and high‑availability requirements, but further optimizations are planned to keep pace with evolving business demands and to continuously improve iteration efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system Vector Search FAISS large‑scale inference online ranking Huya Live ScaNN

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.