Building Popper: Tubi’s Scalable Experiment Platform for Data‑Driven Decision Making
At Tubi, the Popper experiment engine—a Scala‑based, Akka‑powered backend service—combined with a self‑serve UI, automated analysis pipelines, and rigorous health checks, enables teams across ML, mobile, and OTT to run scalable A/B tests, rapidly iterate, and make data‑driven product decisions.
Tubi relies heavily on experimentation to guide product decisions across feature, infrastructure, and machine‑learning changes, achieving an 18‑fold increase in experiment velocity over three years and seeing one‑third of ML experiments positively impact key company metrics.
The platform is built around three core components: the Popper experiment engine, a low‑barrier UI for company‑wide self‑service, and automated analysis and QA methods.
Popper Experiment Engine
Popper is a Scala service built on the Akka framework, representing the second iteration of Tubi’s experiment engine and incorporating lessons from the earlier Facebook‑based PlanOut library. It is named after philosopher Karl Popper to emphasize falsifiability.
Core Concepts
Experiments are organized into Namespaces , each representing a mutually exclusive set of experiments. Within a namespace, a consistent hash maps users, devices, or IPs to configurable Segments , which are then assigned to a single experiment and further to a Treatment Group (e.g., control or variant). Experiments can have multiple Phases that gradually increase the allocation percentage, and optional conditions (e.g., "new users only") can further refine segment targeting.
Built‑in Reproducibility
Experiment definitions and namespace configurations are stored as JSON in a Git repository. Popper uses an append‑only sequence to record every CRUD operation, ensuring that a user’s hash to an experiment/group is deterministic across service restarts without needing a separate database for segment assignments.
This immutable sequence also simplifies architecture: no inter‑node coordination is required for segment logic, and new experiments can be deployed with near‑zero risk.
Making Experiments Accessible
The UI, built with React, abstracts away JSON schema details and coordination steps, allowing non‑experts to create, schedule, and monitor experiments. Configuration‑driven start/end dates decouple deployment from experiment launch, enabling automatic rollout of new configs as soon as code merges to the main branch.
While the core segmentation logic remains stateless, Popper stores device coverage for QA purposes in a database, facilitating error reproduction and testing.
Decision Support
Clients retrieve experiment configurations from Popper and emit an Exposure Event when a user/device is exposed to a particular namespace, experiment, and treatment. These events flow through a Spark Streaming pipeline on Databricks and Akka Streams services, both written in Scala.
By joining exposure data with engagement metrics, Tubi calculates key indicators such as watch time, conversion, and retention, segmented by platform or content type. Significance is assessed using CUPED for variance reduction and the Benjamini‑Hochberg procedure to control false discovery rate.
Experiment Health Checks
To maintain trust, Popper includes automated health checks that surface problematic experiments before they influence decisions. Common failure modes include uneven group sizes, cross‑experiment interference, and biased exposure due to client‑side bugs. A pre‑experiment t‑stat check flags statistically significant pre‑existing differences as a danger signal.
Key Lessons
Self‑service drives speed: Decoupling model changes from backend code and using configuration‑driven experiments increased ML experiment throughput fivefold.
Ask important questions: The platform focuses on detecting statistically significant signals that affect core KPIs, aligning experimentation with business impact.
Cross‑platform consistency: A single experiment definition runs on Scala/Elixir backends and Kotlin, Swift, or TypeScript clients across 20 OTT and mobile platforms, ensuring comparable results company‑wide.
Conclusion
The investment in Popper, data pipelines, and analysis dashboards has dramatically boosted Tubi’s productivity and decision quality. Automated health checks, a user‑friendly UI, and a focus on “north‑star” metrics have empowered non‑experts to run reliable experiments, surface problems early, and keep the platform trustworthy.
Teams passionate about building better experimentation tools are invited to join Tubi.
Bitu Technology
Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.