Backend Development 12 min read

Performance Testing and Optimization of Tubi's Real-Time Recommendation Service

This article describes how Tubi’s engineering team built and optimized a real‑time recommendation backend, using ScalaMeter microbenchmarks and wrk2 load testing to measure latency, throughput and error rates, and demonstrates scaling the service across multiple machines with custom scripts.

Bitu Technology

Feb 12, 2020

Performance Testing and Optimization of Tubi's Real-Time Recommendation Service

Tubi provides free streaming and needs a backend Predictor Service to recommend content in real time. The article explains the performance testing and optimization steps to meet latency and throughput goals.

Microbenchmark – Using ScalaMeter, a benchmark generates 1,000‑10,000 rows of synthetic data and measures the time to run the prediction model. Results show ~23 ms for 1,000 rows and ~222 ms for 10,000 rows, confirming the model is fast enough to proceed.

val sizes = Gen.range("size")(1000, 10000, 1000)
val input = for { size <- sizes } yield genRows(size)
performance of "RealTimeModelServing" in {
  measure method "predict" in {
    using(input) in { rows => rows.map(predictor.predict) }
  }
}

Load testing – The team uses wrk2 (a rate‑controlled version of wrk) to simulate concurrent users and record latency percentiles, throughput (requests per second), and error rate. Tests at 100 req/s, 150 req/s, and 200 req/s show p99 latency rising from ~108 ms to >180 ms, with CPU saturation identified via the USE method and vmstat.

$ wrk --rate 100 --duration 5m --latency --u_latency --threads 8 --connections 16 --script wrk.lua http://predictor-02.node.tubi:8080/predictor/v1/get-ranking
...
Latency Distribution (HdrHistogram - Recorded Latency)
50.000%   74.43ms
75.000%   87.42ms
90.000%   95.36ms
99.000%  108.42ms
...

To verify horizontal scalability, the service was deployed on four identical EC2 instances. A custom wrk2 script distributes traffic evenly across hosts, and the multi‑machine test handles 400 req/s with p99 latency under 150 ms.

-- usage `wrk --latency -c 8 -t 4 -d 1m -R 300 -s wrk.lua http://predictor.service.tubi:8080/predictor/v1/get-ranking`
local addrs = wrk.lookup(wrk.host, wrk.port or "http")
...
function init(args)
   local msg = "thread addr: %s"
   print(msg:format(wrk.thread.addr))
end

Autobench2 – A Python wrapper automates the whole load‑testing workflow: warm‑up, gradual ramp‑up, metric collection, and result visualization.

$ autobench --verbose --connection 8 --thread 4 --duration 1m \
            --script wrk.lua --warmup_duration 1m --low_rate 10 \
            --high_rate 20 --rate_step 10 http://predictor.service.tubi:8080/predictor/v1/get-ranking

In summary, the article demonstrates why performance testing is essential for real‑time recommendation services, walks through microbenchmarking, load testing, scaling, and introduces the open‑source Autobench2 tool for simplifying the process.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend load testing microbenchmark real-time recommendation performance-testing

Written by

Bitu Technology

Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.