Backend Development 9 min read

Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study

This article describes how Tubi improved the latency of its Redis‑backed online feature store for machine‑learning inference by analyzing query patterns, measuring client‑side bottlenecks, and applying optimizations such as binary Avro encoding, MGET usage, virtual partitioning, and parallel deserialization to meet a sub‑10 ms SLA.

Bitu Technology

Mar 21, 2025

Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study

Background: Tubi's movie recommendation system relies on machine‑learning models that consume high‑quality features stored in an online feature store (OFS) backed by Redis.

Feature families are grouped as Entity, Context, and Candidate, each with distinct query patterns: point lookups, batch lookups based on context, and candidate lookups.

To keep latency low, different Redis data structures are used; Entity and Context families are stored as simple key‑value pairs, while Candidate families are stored as hash maps.

Challenge: Context‑family queries become batch operations that can request thousands of rows per request, causing a fan‑out effect and P99 latency above the 10 ms SLA.

Initial approach: store each feature as a Redis hash and use Lettuce pipeline with HGETALL. Scala example:

// 创建 RedisClusterClient
val redisUri = RedisURI.create("redis://localhost:6379")
val clusterClient = RedisClusterClient.create(redisUri)

val connection = clusterClient.connect()
val asyncCommands: RedisClusterAsyncCommands[String, String] = connection.async()
val keys = List("key1","key2","key3","key4","key5")
asyncCommands.setAutoFlushCommands(false)
val futures = keys.map(key => asyncCommands.hgetall(key))
asyncCommands.flushCommands()
asyncCommands.setAutoFlushCommands(true)

This yielded a P99 latency of 20‑30 ms, still above the target.

First optimization: encode rows as Avro binary and store them as plain key‑value pairs, then replace pipeline with a single MGET.

Result: latency dropped to 3‑4 ms, but later feature families with much higher fan‑out (e.g., 855 rows per request) pushed P99 back to 15 ms.

Second attempt: virtual partitioning – split 800 rows into 10 partitions and issue concurrent MGETs to multiple Redis shards. This did not improve latency.

Root‑cause analysis: separate Lettuce “first‑byte latency” (network) from “completion latency” (client processing). The first‑byte latency was low; the bottleneck was client‑side deserialization, which took up to 10 ms.

Verification: added metrics to measure deserialization time, confirming the hypothesis.

Final solution: parallelize deserialization using Scala Parallel Collections.

if (rows.size > settings.parallelDecodeThreshold) {
  rows.par.map { row => deserializeAvroRow(schema, row.getValue, featureFamily, features) }.seq
} else {
  rows.map { row => deserializeAvroRow(schema, row.getValue, featureFamily, features) }
}

After this change, P99 latency fell to 10 ms, meeting the SLA.

Key takeaways: do not optimize blindly; decompose metrics to locate true bottlenecks, adopt an end‑to‑end perspective, validate assumptions with instrumentation, and collaborate with experts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization MLOps Redis Latency batch query Feature Store Scala

Written by

Bitu Technology

Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.