Backend Development 16 min read

How We Rescued a Live‑Streaming Service from 404 Crashes: Real‑World Performance Optimization Strategies

This article walks through the root causes of a live‑streaming outage caused by traffic spikes, explains core performance metrics such as response time and concurrency, and details a systematic set of optimizations—including timeout tuning, caching, fallback, retry policies, parallel processing, and API redesign—that restored system stability and improved latency.

dbaplus Community

Feb 16, 2021

How We Rescued a Live‑Streaming Service from 404 Crashes: Real‑World Performance Optimization Strategies

1. What Is Performance Optimization

Software services tend to become slower as user load and feature complexity increase. Without timely optimization, latency spikes and crashes can occur, making performance optimization a continuous concern throughout a system's lifecycle.

1) Performance Measurement Indicators

Two primary dimensions are considered: response time (RT) and concurrency capability.

Response Time (RT)

Response time measures how long a request takes to complete. It can be evaluated using average response time (AVG) and percentile metrics (TPn=m), where n% of requests finish within m milliseconds.

AVG example: for 8 requests with times 3 ms, 5 ms, 5 ms, 5 ms, 6 ms, 6 ms, 6 ms, 6 ms, the average is (1·3 + 3·5 + 4·6) / 8 = 5.25 ms.

Percentile example: TP99 = 5 ms means 99% of requests finish within 5 ms; TP95 = 95 ms for a set of 100 requests ranging from 1 ms to 100 ms.

Percentiles often reflect overall performance better than averages because they expose long‑tail latency that averages can mask.

Concurrency Capability

Measured by QPS (queries per second) or TPS (transactions per second), with TPS being more common for performance evaluation.

2. Essence of Performance Optimization

Analogous to algorithm analysis, response time corresponds to time complexity and concurrency to space complexity. Optimization therefore involves three perspectives: improving time, improving space, and trading one for the other.

Illustrative road analogy: increasing lane count (space) or raising speed limit (time) both raise throughput.

Single‑lane road with speed limit 50 km/h

Single‑lane road with speed limit 100 km/h

3. How We Performed the Optimization

3.1 Systematic Thinking of Optimization Points

The process involves developers, testers, and operations together with product requirements. The steps are:

Identify business scenarios that need optimization.

Collect monitoring and load‑test data for those scenarios.

Locate bottlenecks from the data.

Iterate the above steps until performance goals are met.

3.2 Common Optimization Techniques

Optimizations can be grouped into improving single‑request efficiency and parallelizing multiple requests.

Improving Single‑Request Efficiency

Speed up each node in the call chain: add database indexes, use read/write separation, shard tables, add local or distributed caches (e.g., Redis, Guava), route complex queries to Elasticsearch, and apply more efficient algorithms or data structures.

Reduce redundant queries and batch requests where possible.

Parallelize internal processing of a request using frameworks such as CompletableFuture.

Make non‑critical steps asynchronous via message queues, background threads, or delayed tasks stored in DB/Redis.

Parallel Processing of Multiple Requests

Deploy services in clusters behind load balancers and use thread pools to handle concurrent requests.

3.3 Case Study: Live‑Streaming Product Detail Page

User flow: view product detail → place order → re‑enter detail page → click live‑stream entry → permission check → join live room. The performance bottleneck was identified in the “product detail” request.

Root causes included missing caches, lack of degradation, heavy downstream queries returning unnecessary fields, and misuse of APIs.

Optimizations Applied

Weak‑dependency interfaces : adjusted RPC timeout to (1 + 50%) × TP99/TP95, introduced a two‑level cache (zanKV) with asynchronous refresh, and added a circuit‑breaker fallback returning default values.

Strong‑dependency interfaces : set timeout based on TP99, configured Dubbo retry count (2‑3 times), and applied transparent multi‑level cache (TMC) for hot data plus Guava local cache for stateless queries.

Parallelized product detail aggregation : split the detail page into four independent sub‑tasks (basic info, A, B, C) and processed them concurrently using an internal parallel framework, then merged the results.

Standardized query APIs : consolidated numerous overlapping query endpoints into three atomic interfaces—coarse‑grained, medium‑grained, and fine‑grained—to enforce consistent usage.

4. Summary

Performance optimization is an ongoing, case‑by‑case effort. Early stages may be solved with simple indexing; as the system evolves, batch queries, caching, and parallelism become necessary. The key principle is to analyze each case concretely and apply targeted techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Performance Optimization scalability

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.