Big Data 11 min read

Building a Model‑Driven Data Platform at Tubi: From Data Warehouse to Automated Machine Learning

The article describes how Tubi, North America’s largest free‑streaming service, built a model‑driven data platform using a high‑quality data warehouse, DBT‑based transformations, Kubernetes‑hosted JupyterHub, low‑latency Scala/Akka services, and automated machine‑learning pipelines to accelerate experimentation and decision‑making.

Bitu Technology
Bitu Technology
Bitu Technology
Building a Model‑Driven Data Platform at Tubi: From Data Warehouse to Automated Machine Learning

Tubi, the largest AVOD streaming company in North America, aims to become a model‑driven organization by leveraging data engineering and machine‑learning to power its product and business decisions.

Model‑driven means treating data as static, past‑oriented information while treating models as dynamic, future‑oriented tools; every decision is backed by rigorous analysis and online validation, and the core focus is on productionizing data science and ML experiments.

The company built a high‑quality data warehouse using a schema‑on‑write approach, a custom event‑grammar that records user interactions, and a pipeline that converts raw JSON events to protobuf, enriches them with Scala + Akka services, and stores the results in a DBT‑managed warehouse; Spark is used only for the few complex use‑cases.

On top of the warehouse, Tubi deployed a Kubernetes‑based JupyterHub data platform (Tubi Data Runtime, TDR) that provides a pandas‑Redshift interface using Python multiprocessing, a custom JupyterLab extension for one‑line visualizations (display(df)), and EFS‑backed shared notebooks with URL‑sharing capabilities.

To speed up ML iteration, Tubi created two reactive gRPC services with Akka: the Ranking Service (using ScyllaDB for sub‑30 ms latency) and the Popper Engine for A/B testing, both decoupling ML experiments from backend deployments.

The roadmap includes an automated ML platform that automatically creates candidate models when new features land, evaluates them offline, and enables one‑click promotion to online A/B tests, providing end‑to‑end model tracking and monitoring.

Looking ahead, Tubi believes model‑driven companies will dominate the next decade and invites engineers who value technical challenge, rapid iteration, and non‑996 work cultures to join the team.

data engineeringmachine learningData Platformdbt
Bitu Technology
Written by

Bitu Technology

Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.