Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice
This technical presentation explains how Youku tackles the massive, real‑time update problem of video‑content graphs by adopting a graph‑database architecture, sub‑graph partitioning, schema‑driven logical views, and Flink‑based pipelines to achieve second‑level updates for billions of entities and attributes.
The session, presented by Ao Xiang (Alibaba Entertainment Technical Expert) and organized by DataFunTalk, introduces the difficulty of real‑time aggregation for social/media relationship entities and the massive graph structure of video content on Youku.
Traditional relational aggregation is infeasible because it requires multi‑table cascaded queries; the sheer volume of billions of vertices and hundreds of billions of edges leads to severe performance bottlenecks and high programming complexity.
The core of the solution is a graph architecture that leverages knowledge graphs and graph databases. Two modeling approaches are discussed: RDF (triples) and LPG (Labeled Property Graph). LPG is chosen for its industrial support and ability to uniquely identify vertices, making it suitable for large‑scale, real‑time updates.
Sub‑graph partitioning (edge‑cut or vertex‑cut) is used to split the hundred‑billion‑scale graph into manageable pieces, balancing load and minimizing cross‑edge cuts.
The design introduces a logical view (KG schema) that decouples graph management from physical storage. The schema describes metadata, enables sub‑graph assembly, and isolates logical sub‑graphs from the full graph, while physical storage uses KV tables for vertices and edge tables for relationships.
Critical techniques include distributed DDL locking (WinLock) with multi‑version strategies, broadcast coordination via Zookeeper, and handling concurrent schema updates across Flink jobs.
For content updates, the end‑to‑end pipeline ingests data (content and behavior), de‑duplicates messages, builds entity‑relationship graphs, and processes updates in Flink. A diff‑based mechanism isolates changed attributes, and a fast‑slow channel separates hot (e.g., view count) from cold data to achieve second‑level update latency.
Practical experience highlights sub‑graph decomposition, selective redundancy to reduce edge traversals, automated schema orchestration, and balancing fine‑grained updates with management overhead.
In conclusion, the graph‑based approach enables real‑time, second‑level updates of tens of millions of attributes across billions of vertices, offering a scalable blueprint for large‑scale video or social platforms.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.