Backend Development 18 min read

How We Cut Publishing Latency by 600ms: A Real‑World Backend Optimization Case Study

Through profiling with flame graphs, log analysis, and targeted refactoring—including async task handling, rule‑engine tuning, data‑load reduction, and cache redesign—we reduced the 95th‑percentile publishing latency on Baixing.com from around 3 seconds to under 1 second, achieving near‑instant “second‑post” performance.

Baixing.com Technical Team

Oct 16, 2017

How We Cut Publishing Latency by 600ms: A Real‑World Backend Optimization Case Study

Background

When users click the publish button on Baixing.com, the request passes through a risk‑control system that performs extensive risk and quality analysis before returning a publishing status. This heavy analysis lengthened the response time, prompting the technical team to launch a "Second‑Kill" optimization project at the end of July.

Current State and Goal

Current State

The 95th‑percentile latency for publish & update operations was about 3 seconds.

Goal

Reduce the 95th‑percentile latency to under 1 second.

Problem Identification

Using flame‑graph profiling and historical slow‑query logs, the team pinpointed two major hot‑spots: the cloud‑association analysis module (data loading) and the keyword‑matching module (matching algorithm).

Flame‑graph Y‑axis shows call order; X‑axis shows time percentage per sample.

Flame‑Graph Tool

The flame graph revealed the call stacks that consumed the most time, highlighting the cloud‑association analysis and keyword modules as primary targets.

Log Data

Analysis of slow‑query logs exposed insufficient data structures and inadequate caching, while timing logs of key algorithms quantified the performance gap.

Optimization Plan

Asynchronously process risk‑control sub‑services that can be queued.

Optimize usage of the rule‑engine module on the business side.

Improve the cloud‑association analysis module, which has the highest potential but also the highest difficulty.

Optimize the keyword‑matching module.

Acceptance Environment

Before development, real‑time timing points were added to all publishing modules and key risk‑control components to enable precise measurement of improvements.

Development and Incremental Improvements

6.1 Extracting Asynchronous Tasks

Identified and async‑ified two risk‑control sub‑services. The async services reduced their 95th‑percentile latency to under 20 ms, but the overall publishing curve showed little change due to noise.

6.2 Optimizing Rule‑Engine Usage

Removed obsolete rules and moved some checks to post‑publish, gaining roughly 300 ms of latency reduction.

6.3 Cloud‑Association Analysis

6.3.1 Reducing Redundant Data Loads

The module loaded excessive data via Data::load. Refactoring the data‑loading logic cut about 600 ms from the overall latency.

6.3.2 Parallelizing Search Requests

Attempted to parallelize HTTP calls to the search service using curl_multi. Although parallelism sped up the search phase in tests, the single‑threaded multiplexing model and increased CPU usage limited real‑world gains.

6.3.3 Additional Attempts

Implemented cloud pre‑warming and cloud weighting, yielding 50‑100 ms improvements for pre‑warming and negligible speed impact for weighting.

6.4 Keyword Matching Optimization

6.4.1 Matching Algorithm

Replaced the original mb_strpos() approach with a Trie‑tree algorithm, achieving O(m) matching time where m is the text length.

6.4.2 Cache Structure

Serialized the Trie‑tree into Redis, but deserialization via json_decode() became a bottleneck (≈52 ms). Switching to serialize()/unserialize() was slower; using Memcache for direct object storage reduced latency, with Redis as a fallback.

After these changes, the keyword module contributed an additional 300‑400 ms improvement.

Project Results

Timeline

July 31 – August 16, 2017 (13 workdays). One engineer full‑time, plus business support.

Performance Gains

Across all platforms, the 95th‑percentile latency dropped from ~3 seconds to under 1 second, with individual gains of 600 ms (cloud‑association), 300 ms (rule‑engine), and 300‑400 ms (keyword module).

Further Improvement Thoughts

Turn the keyword module into a long‑running service to keep the Trie‑tree resident in memory.

Explore more efficient Trie implementations to reduce size and boost speed.

Modularize the risk‑control system by media type (text, image, audio, video) to enable finer‑grained concurrency optimizations.

Conclusion

Thorough analysis with flame graphs and real‑time metrics is essential before tackling performance problems.

Establish a solid acceptance environment to attribute latency changes to specific code changes.

For deeper dives into search performance, PHP concurrency, Trie algorithms, and serialization overhead, refer to the sections on cloud‑association and keyword optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Caching PHP flame graphs

Written by

Baixing.com Technical Team

A collection of the Baixing.com tech team's insights and learnings, featuring one weekly technical article worth following.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.