Backend Development 19 min read

Design and Optimization of Distributed and Local Shared Variables for Strategy Engine Services

By introducing distributed and local shared variables that propagate user profiles via trace context and cache parallel requests, the iQIYI strategy engine cuts redundant DMP calls, reduces traffic up to 25%, lowers P99 latency by nearly 50%, and achieves a 90% cost saving compared to step‑function micro‑services.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Design and Optimization of Distributed and Local Shared Variables for Strategy Engine Services

Background

A case study from the Amazon Prime Video team highlighted the scalability challenges of a monitoring tool that processes thousands of live streams daily. The initial solution used a micro‑service architecture, which split the system into three parts: a media converter, a defect detector, and a control orchestrator. However, the cost of invoking AWS Step Functions for each stream proved prohibitive, leading the team to migrate back to a monolithic architecture and achieve a 90% cost reduction.

The same cost‑explosion problem appears in iQIYI’s overseas strategy‑engine, where distributed services cannot share variables, causing repeated DMP (Data Management Platform) calls for the same user profile.

iQIYI Overseas Strategy Engine Call Relationship

The engine determines whether a user belongs to a specific audience (e.g., Japanese gold members, male, membership expiring in <7 days, anime lover). When a client request arrives, the navigation API fetches the associated strategy, forwards the user ID and device ID to the strategy engine, which then queries DMP for the user profile. This process repeats for page, card, and card‑data requests, leading to multiple identical DMP calls per user request.

Challenges

1. High request volume to DMP creates excessive traffic. 2. Real‑time data requirement: any delay in profile updates (e.g., after a membership purchase) makes the user experience unacceptable. 3. Parallel calls generate simultaneous DMP requests for the same user.

Shared Variable Concept

To eliminate redundant DMP calls, the authors propose two kinds of shared variables:

Distributed Shared Variable : In a serial call chain, the first node retrieves the user profile from DMP and propagates it downstream via a trace context (similar to a TraceId). Subsequent nodes reuse the same data, reducing DMP calls from T1…Tn to a single request.

Local Shared Variable : In parallel call scenarios, a request queue is created per TraceId. The first request fetches the profile; the remaining requests wait for the result, effectively turning N concurrent DMP calls into one.

Implementation Details

• Trace Context Propagation : The profile is stored in the request‑level trace context and passed along the call chain. If a downstream node finds the profile already present, it skips the DMP query.

• SDK Wrapper : The shared‑variable logic is encapsulated in an SDK so that business code remains unaware of the optimization.

• Full‑Link Tracing with SkyWalking : SkyWalking is used to carry the trace context across micro‑services. Only a Maven dependency is required to inject the data into the trace.

• Compression of Trace Payload : Because the trace payload adds network overhead, the authors evaluate several compression schemes (Gzip, LZ4, custom). Scheme 3 yields the smallest size and is selected.

• Network Impact : Experiments show that when the compressed payload is <2000 bytes, the additional network cost is negligible. The optimization reduces P99 latency from 25 ms to 2.96 ms and cuts DMP traffic by up to 25% for certain services.

Local Shared Variable Optimizations

• Hash‑Based Routing at the Gateway : To keep parallel requests for the same user on the same service instance, the gateway hashes a custom field (e.g., qyid) so that all requests from the same device are routed to the same backend node, improving cache hit rate.

• Cache Choices : Existing local‑cache libraries such as Caffeine or Guava are used. Only one request per key reaches DMP; others retrieve the cached profile.

• Parameter Design : With a service QPS of 10 000 across 50 instances (≈200 QPS per instance) and typical TraceId‑grouped request latency <100 ms, a cache TTL of 1 s and a maximum of 200 entries per instance guarantee that most parallel requests share the same profile.

Results

Distributed shared variables reduce DMP traffic by ~25% for service A, ~10% for service B, and ~2% for service C.

Local shared variables achieve traffic reductions of 15.8%–16.7% across three DMP services while preserving real‑time data freshness.

Conclusion

The shared‑variable approach—both distributed and local—successfully lowers DMP service traffic, improves latency (P99 reduced by ~48.8%), and meets strict real‑time data requirements without sacrificing accuracy. The solution combines trace‑context propagation, lightweight compression, gateway hash routing, and proven caching libraries.

Distributed Systemsperformance optimizationmicroservicesCachingservice meshshared variablestraceability
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.