Databases 10 min read

How Youku Scales Billions of Video Nodes with Real‑Time Graph Databases

Facing billions of video entities and edges, Youku’s engineering team replaced traditional relational stores with a graph‑based knowledge platform, leveraging Alibaba’s Blink streaming engine and Lindorm to enable real‑time, incremental updates, unified UDF logic, and scalable feature computation for search and recommendation.

Alibaba Cloud Developer

Jun 16, 2020

How Youku Scales Billions of Video Nodes with Real‑Time Graph Databases

Background

Video content on Youku forms a massive network structure with billions of vertices and edges, making traditional relational databases inadequate. A graph data model better fits the business scenario, supporting daily updates of hundreds of millions of messages.

Design Overview

The search recommendation system builds indexes from offline and nearline processed data. Traditional data warehousing cannot meet the agility and service‑oriented needs, prompting a shift to a knowledge‑graph‑centric platform that centralizes video, program, user, and element data.

Key Modules

1. Feature Store

The feature store has two layers: a primary incremental feature computation layer that ingests real‑time and offline sources, and a secondary layer for algorithmic features such as relevance, ranking, and recall. Data is stored as vertices and edges, and updates are handled via internal graph queries and unified DataAPI calls.

2. Component Library

To avoid duplicated code across business lines, a component library provides reusable interfaces, encapsulating business logic as UDF‑based arithmetic expressions, simplifying maintenance and promoting reuse.

3. Trace & Debug

Each message carries a unique UUID; the Trace&Debug service aggregates data by UUID and entity ID, allowing developers to follow the processing flow across systems.

Technical Details

The computation framework uses Alibaba’s Blink engine, offering stream‑batch integration, automatic checkpointing, failover recovery, and distributed processing. Storage relies on Lindorm, using secondary indexes to store KV and KKV structures for the knowledge graph.

1. Knowledge Graph Storage and Organization

A Labeled Property Graph (LPG) model is employed, with Lindorm tables representing vertices (videos, programs, persons) and edges via secondary indexes.

2. Computation and Update Strategy

Full‑incremental architecture reduces upstream query pressure. Updates propagate via cascading messages; MetaQ ensures at‑least‑once delivery, with fast‑retry mechanisms handling transient failures.

3. Unified UDF

Business logic is encapsulated in a single UDF implementation, enabling reuse across offline and real‑time pipelines and ensuring consistency.

Summary & Outlook

The graph‑based feature and index update platform breaks traditional data‑warehouse modeling, emphasizing knowledge‑centric, business‑centric, and service‑oriented design. It is already applied in Youku search and ticketing scenarios. Future work includes deeper integration of graph neural networks, representation learning, and DSL‑driven business self‑service to further enhance real‑time inference and incremental indexing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Real-time Streaming graph database Knowledge Graph search indexing

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.