Fundamentals 15 min read

Building Popper: Tubi’s Scalable Experimentation Platform

Tubi’s Popper platform combines a Scala‑based experiment engine, reproducible JSON‑stored configurations, a React UI, and data pipelines using Spark and Akka to enable fast, cross‑team A/B testing, automated analysis, health checks, and data‑driven decision making across mobile and OTT services.

DataFunTalk

Mar 18, 2021

Building Popper: Tubi’s Scalable Experimentation Platform

At Tubi, every team—from feature development to ML—relies heavily on experiments to guide decisions, and experiment velocity has grown 18‑fold over three years, with a third of ML experiments positively impacting company KPIs.

The experiment system, built collaboratively across teams, consists of three main components: the Popper experiment engine, a UI that lowers the barrier to experimentation, and automated analysis and QA methods.

Popper Experiment Engine

The engine is a backend service written in Scala on the Akka framework, representing the second iteration of a system that learned from an earlier Facebook‑based PlanOut implementation. It is named after Karl Popper to emphasize falsifiability.

In so far as a scientific statement speaks about reality, it must be falsifiable Karl Popper

Core Concepts

The top‑level concept is a Namespace, representing a set of mutually exclusive experiments, allowing parallel experiments without coordination across teams.

Each Namespace hashes experiment targets (device, user, IP, etc.) into configurable Segments, which are assigned to at most one experiment, then to Treatment Groups (control and variant). Experiments can have multiple Phases with varying allocation percentages, and conditional rules can further refine segment usage.

Built‑in Reproducibility

Experiments and namespaces are stored as JSON in a Git repository, with an append‑only sequence tracking all CRUD operations, ensuring deterministic user hashing across service restarts without database‑stored segment assignments.

This persistence simplifies architecture, eliminates node coordination, and makes deployments low‑risk; Tubi has not experienced a failed deployment.

Making Experiments Accessible

Popper abstracts away low‑level details such as JSON schema and coordination steps, allowing non‑experts to create and run experiments easily.

Start and end dates are specified in the configuration, enabling independent deployment of new configs and experiment activation.

While the core segmentation logic is stateless, a database stores coverage for development and testing devices, aiding QA.

The React UI guides users through the entire workflow, provides a filtered calendar view, and records publication decisions.

Decision Support

Clients fetch configurations from Popper to decide which code branch to run; each experiment exposure generates an event processed by Spark Streaming (on Databricks) and Akka Streams, both written in Scala.

Combining exposure data with engagement metrics yields key indicators such as watch time, conversion, and retention, segmented by platform or content type.

Statistical significance is assessed using CUPED for variance reduction and the Benjamini‑Hochberg procedure to control false discovery rate.

All metrics are automatically calculated and displayed on a BI dashboard, with a subset designated as “North Star” metrics that drive release decisions, and a “no‑harm” rule that blocks releases harming any North Star metric.

Experiment Health Checks

Popper includes a validation system that surfaces problematic experiments before they affect decisions.

Common failure modes include uneven group sizes, cross‑experiment interference, and biased exposure due to client bugs; pre‑experiment t‑stat checks flag dangerous signals.

Lessons Learned

Self‑service is key to speed

By making experiment configuration the only required change for most model or feature updates, Tubi increased ML experiment throughput five‑fold.

Ask important questions

The platform focuses on identifying statistically significant signals that impact core KPIs, encouraging teams to prioritize high‑value ideas.

Embrace cross‑platform consistency

Experiments run across 20 platforms (Scala, Elixir, Kotlin, Swift, Typescript) are unified under Popper, ensuring consistent language and analysis.

Conclusion

The investment in Popper—from the engine to data pipelines and analysis dashboards—has dramatically boosted productivity and decision quality across Tubi, enabling non‑experts to run experiments, health checks to maintain trust, and North Star metrics to align experimentation with business goals.

If you are passionate about building better experimentation tools, consider joining Tubi.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

metrics A/B testing statistical analysis Akka data pipelines Scala Experimentation platform

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.