Building Popper: Tubi’s Scalable Experimentation Platform
Tubi’s Popper platform combines a Scala‑based experiment engine, reproducible JSON‑stored configurations, a React UI, and data pipelines using Spark and Akka to enable fast, cross‑team A/B testing, automated analysis, health checks, and data‑driven decision making across mobile and OTT services.
At Tubi, every team—from feature development to ML—relies heavily on experiments to guide decisions, and experiment velocity has grown 18‑fold over three years, with a third of ML experiments positively impacting company KPIs.
The experiment system, built collaboratively across teams, consists of three main components: the Popper experiment engine, a UI that lowers the barrier to experimentation, and automated analysis and QA methods.
Popper Experiment Engine
The engine is a backend service written in Scala on the Akka framework, representing the second iteration of a system that learned from an earlier Facebook‑based PlanOut implementation. It is named after Karl Popper to emphasize falsifiability.
In so far as a scientific statement speaks about reality, it must be falsifiable Karl Popper
Core Concepts
The top‑level concept is a Namespace, representing a set of mutually exclusive experiments, allowing parallel experiments without coordination across teams.
Each Namespace hashes experiment targets (device, user, IP, etc.) into configurable Segments, which are assigned to at most one experiment, then to Treatment Groups (control and variant). Experiments can have multiple Phases with varying allocation percentages, and conditional rules can further refine segment usage.
Built‑in Reproducibility
Experiments and namespaces are stored as JSON in a Git repository, with an append‑only sequence tracking all CRUD operations, ensuring deterministic user hashing across service restarts without database‑stored segment assignments.
This persistence simplifies architecture, eliminates node coordination, and makes deployments low‑risk; Tubi has not experienced a failed deployment.
Making Experiments Accessible
Popper abstracts away low‑level details such as JSON schema and coordination steps, allowing non‑experts to create and run experiments easily.
Start and end dates are specified in the configuration, enabling independent deployment of new configs and experiment activation.
While the core segmentation logic is stateless, a database stores coverage for development and testing devices, aiding QA.
The React UI guides users through the entire workflow, provides a filtered calendar view, and records publication decisions.
Decision Support
Clients fetch configurations from Popper to decide which code branch to run; each experiment exposure generates an event processed by Spark Streaming (on Databricks) and Akka Streams, both written in Scala.
Combining exposure data with engagement metrics yields key indicators such as watch time, conversion, and retention, segmented by platform or content type.
Statistical significance is assessed using CUPED for variance reduction and the Benjamini‑Hochberg procedure to control false discovery rate.
All metrics are automatically calculated and displayed on a BI dashboard, with a subset designated as “North Star” metrics that drive release decisions, and a “no‑harm” rule that blocks releases harming any North Star metric.
Experiment Health Checks
Popper includes a validation system that surfaces problematic experiments before they affect decisions.
Common failure modes include uneven group sizes, cross‑experiment interference, and biased exposure due to client bugs; pre‑experiment t‑stat checks flag dangerous signals.
Lessons Learned
Self‑service is key to speed
By making experiment configuration the only required change for most model or feature updates, Tubi increased ML experiment throughput five‑fold.
Ask important questions
The platform focuses on identifying statistically significant signals that impact core KPIs, encouraging teams to prioritize high‑value ideas.
Embrace cross‑platform consistency
Experiments run across 20 platforms (Scala, Elixir, Kotlin, Swift, Typescript) are unified under Popper, ensuring consistent language and analysis.
Conclusion
The investment in Popper—from the engine to data pipelines and analysis dashboards—has dramatically boosted productivity and decision quality across Tubi, enabling non‑experts to run experiments, health checks to maintain trust, and North Star metrics to align experimentation with business goals.
If you are passionate about building better experimentation tools, consider joining Tubi.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.