Big Data 6 min read

How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

At a Tsinghua University forum, Hulu presented a comprehensive overview of its big‑data solutions for advertising and streaming, covering challenges of massive, complex data, the limits of MySQL, and advanced techniques using HBase, Protobuf, Redis batch pipelines, and its own MPP engine Nesto for high‑performance, scalable analytics.

Hulu Beijing
Hulu Beijing
Hulu Beijing
How Hulu Uses Big Data to Power Precise Advertising and Real‑Time Streaming

On October 27, Hulu was invited to give a keynote at Tsinghua University's Knowledge Engineering and Data Management Frontier Seminar and the “Computing Future” graduate forum, presenting “Big Data Technologies in Advertising and Marketing Scenarios”.

The talk began by outlining the background of traffic acquisition and monetization, leading to the need for rule‑based precise ad targeting.

It introduced the simplest solution for precise ad placement using MySQL: creating tables to store information, querying schema, altering fields, and retrieving data with SQL SELECT statements.

However, production environments face challenges that MySQL alone cannot handle:

Massive data volume: billions of users and terabytes of data.

Complex data formats: such as user viewing histories.

Complicated rule conditions: SQL struggles to express intricate logic.

Performance and scalability: generating thousands of user groups daily based on numerous rules.

To address these pain points, Hulu proposed several improvements.

For complex data formats, HBase’s multi‑version data model can record user behavior, with rows, column families, qualifiers, values, and timestamps allowing versioned values. Additionally, Protobuf provides a language‑agnostic, lightweight serialization protocol for encoding and decoding complex structures.

When SQL cannot express sophisticated rule logic, extending the query syntax is suggested.

For performance and scalability, Redis batch and pipeline operations enable sending multiple read/write requests in a single round‑trip, significantly boosting throughput.

Hulu also developed its own massive parallel processing (MPP) engine, Nesto, a distributed OLAP solution designed for nested data. Nesto combines columnar storage and code generation to accelerate processing of TB‑scale, nested datasets, delivering near‑real‑time ingestion and sub‑second query latency.

OLAP, originally proposed by Edgar F. Codd, offers high‑speed retrieval and flexible multidimensional analysis of large data volumes. Nesto joins industry peers such as Amazon Redshift, Google Dremel, Oracle OLAP, Microsoft Analysis Services, Druid, Greenplum, Impala, Presto, Apache HAWQ, and Apache Kylin.

Deploying Nesto is straightforward: after placing configuration files and JARs on HDFS, a single submit command with server count and resource specifications launches a distributed cluster.

Beyond advertising, Hulu’s streaming service must manage massive content and user data with stringent real‑time performance and stability requirements. Consequently, Hulu invests heavily in big‑data research, producing proprietary platforms, industrial solutions, and open‑source contributions.

advertisingHBaseOLAPMPP
Hulu Beijing
Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.