How We Rescued a Live‑Streaming Service from 404 Crashes: Real‑World Performance Optimization Strategies
This article walks through the root causes of a live‑streaming outage caused by traffic spikes, explains core performance metrics such as response time and concurrency, and details a systematic set of optimizations—including timeout tuning, caching, fallback, retry policies, parallel processing, and API redesign—that restored system stability and improved latency.
1. What Is Performance Optimization
Software services tend to become slower as user load and feature complexity increase. Without timely optimization, latency spikes and crashes can occur, making performance optimization a continuous concern throughout a system's lifecycle.
1) Performance Measurement Indicators
Two primary dimensions are considered: response time (RT) and concurrency capability.
Response Time (RT)
Response time measures how long a request takes to complete. It can be evaluated using average response time (AVG) and percentile metrics (TPn=m), where n% of requests finish within m milliseconds.
AVG example: for 8 requests with times 3 ms, 5 ms, 5 ms, 5 ms, 6 ms, 6 ms, 6 ms, 6 ms, the average is (1·3 + 3·5 + 4·6) / 8 = 5.25 ms.
Percentile example: TP99 = 5 ms means 99% of requests finish within 5 ms; TP95 = 95 ms for a set of 100 requests ranging from 1 ms to 100 ms.
Percentiles often reflect overall performance better than averages because they expose long‑tail latency that averages can mask.
Concurrency Capability
Measured by QPS (queries per second) or TPS (transactions per second), with TPS being more common for performance evaluation.
2. Essence of Performance Optimization
Analogous to algorithm analysis, response time corresponds to time complexity and concurrency to space complexity. Optimization therefore involves three perspectives: improving time, improving space, and trading one for the other.
Illustrative road analogy: increasing lane count (space) or raising speed limit (time) both raise throughput.
3. How We Performed the Optimization
3.1 Systematic Thinking of Optimization Points
The process involves developers, testers, and operations together with product requirements. The steps are:
Identify business scenarios that need optimization.
Collect monitoring and load‑test data for those scenarios.
Locate bottlenecks from the data.
Iterate the above steps until performance goals are met.
3.2 Common Optimization Techniques
Optimizations can be grouped into improving single‑request efficiency and parallelizing multiple requests.
Improving Single‑Request Efficiency
Speed up each node in the call chain: add database indexes, use read/write separation, shard tables, add local or distributed caches (e.g., Redis, Guava), route complex queries to Elasticsearch, and apply more efficient algorithms or data structures.
Reduce redundant queries and batch requests where possible.
Parallelize internal processing of a request using frameworks such as CompletableFuture.
Make non‑critical steps asynchronous via message queues, background threads, or delayed tasks stored in DB/Redis.
Parallel Processing of Multiple Requests
Deploy services in clusters behind load balancers and use thread pools to handle concurrent requests.
3.3 Case Study: Live‑Streaming Product Detail Page
User flow: view product detail → place order → re‑enter detail page → click live‑stream entry → permission check → join live room. The performance bottleneck was identified in the “product detail” request.
Root causes included missing caches, lack of degradation, heavy downstream queries returning unnecessary fields, and misuse of APIs.
Optimizations Applied
Weak‑dependency interfaces : adjusted RPC timeout to (1 + 50%) × TP99/TP95, introduced a two‑level cache (zanKV) with asynchronous refresh, and added a circuit‑breaker fallback returning default values.
Strong‑dependency interfaces : set timeout based on TP99, configured Dubbo retry count (2‑3 times), and applied transparent multi‑level cache (TMC) for hot data plus Guava local cache for stateless queries.
Parallelized product detail aggregation : split the detail page into four independent sub‑tasks (basic info, A, B, C) and processed them concurrently using an internal parallel framework, then merged the results.
Standardized query APIs : consolidated numerous overlapping query endpoints into three atomic interfaces—coarse‑grained, medium‑grained, and fine‑grained—to enforce consistent usage.
4. Summary
Performance optimization is an ongoing, case‑by‑case effort. Early stages may be solved with simple indexing; as the system evolves, batch queries, caching, and parallelism become necessary. The key principle is to analyze each case concretely and apply targeted techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
