Technical Deep Dive of Youku Media Asset Platform: Storage, Search, and Data Aggregation
The article details Youku’s new media‑asset platform, which replaces a fragmented MySQL‑based system with a domain‑driven entity model stored in Ali‑HBase, leverages Elasticsearch for flexible front‑and back‑end search, and adds an aggregation layer that unifies diverse data sources and reusable computation tasks, delivering high‑availability, low‑latency service for billions of daily API calls.
The article presents a comprehensive technical analysis of the media‑asset platform (媒资中台) that powers Youku’s entire video catalog. It outlines the challenges of the legacy media‑asset system, which relied on dozens of MySQL tables and suffered from inflexible schema changes, limited search capabilities, and poor data completeness.
To address these issues, the new platform introduces a domain‑driven entity model that abstracts media assets into entities such as programs, videos, persons, and insertion points. Each entity possesses attributes classified as basic, enum, relational, or multi‑value, enabling dynamic extension of both entities and their fields without developer bottlenecks.
For storage, the platform migrates from MySQL to Ali‑HBase, allowing dynamic table and column creation and supporting the rapid addition of new entities and attributes. A Java‑based SDK pre‑compiles field metadata, so downstream developers can query data without worrying about field names or types.
The search layer is built on Elasticsearch. It distinguishes front‑end (low‑latency, partial‑field) and back‑end (higher latency, full‑field) use cases, configuring independent clusters to isolate traffic. Index configurations are decoupled from entity definitions, supporting selective indexing, per‑field refresh rates, and a rich query language that handles exact matches, range queries, fuzzy string matches, and array‑element queries. Techniques such as filter push‑down and search_after are employed to optimize performance and deep pagination.
Data aggregation and computation are handled by a dedicated aggregation layer that abstracts heterogeneous source systems (ODPS tables, HTTP/RPC services, MQ, databases). Producers register data sources and field mappings, while consumers access a unified API. The platform also offers a computation service where common business logic can be packaged as reusable tasks; simple calculations use built‑in methods, whereas complex logic can be supplied via custom code packages with priority control.
Overall, the media‑asset platform provides Youku’s business units with a high‑availability, low‑latency, and scalable backend service that supports billions of API calls daily, simplifies data access, and reduces development overhead. The article concludes with a call for continued iteration and deeper integration of business needs.
Youku Technology
Discover top-tier entertainment technology here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.