Building a Custom RPC Stress‑Testing Tool: Insights from Meituan
Meituan’s internal RPC services, largely built on Thrift, required a streamlined pressure‑testing solution, leading to the development of a custom tool that automates traffic capture, provides an intuitive UI, aggregates metrics via InfluxDB, and supports both Thrift and HTTP workloads, addressing the shortcomings of existing open‑source options.
Background
Most of Meituan’s internal RPC services are built on Apache Thrift. During routine development, engineers need to perform pressure (stress) testing to uncover potential issues. Existing approaches—writing custom scripts in Python or Ruby to replay logs, or using generic open‑source tools—are time‑consuming, error‑prone, and lack unified reporting.
Problems with Existing Solutions
Heavy code effort to parse logs and reconstruct requests, especially for complex Thrift payloads.
Setup of scripting environments or third‑party tools consumes significant time.
Inconsistent result formats; many tools output raw terminal data that is hard to interpret.
Difficulty sharing test configurations across teams due to environment and code differences.
Evaluation of Open‑Source Tools
We examined several popular stress‑testing frameworks:
JMeter – excellent for HTTP but lacks native Thrift support, requires complex local installation, and is not user‑friendly for our use case.
Twitter’s iago – supports HTTP and Thrift but forces a project per test, presents non‑intuitive results, depends on an outdated Scala version, and has sparse documentation.
Other tools such as Gatling, Grinder, and Locust were also considered but did not align with Meituan’s specific requirements.
Given these gaps, building a simple, easy‑to‑use internal tool became necessary.
Design Goals
Capture live traffic from production services.
Provide an intuitive UI that enables test setup within an hour.
Display clear charts for key performance metrics.
Support both Thrift and HTTP services.
Architecture Overview
The tool follows a three‑stage lifecycle: init (resource preparation such as DB connections and client creation), run (multi‑threaded request generation while recording timestamps to compute response times, TP90, average, max, etc.), and destroy (resource cleanup). The core interface abstracts these stages, allowing developers to implement service‑specific runners while the framework handles concurrency and result aggregation.
Traffic Capture (VCR)
To simplify replaying real production traffic, the tool introduces a VCR (Video Cassette Recorder) component. VCR serializes incoming requests into JSON and stores them in Redis using a single‑threaded asynchronous writer, minimizing impact on the live service.
Data Aggregation
After a test run, the tool aggregates metrics such as maximum response time, average response time, QPS, TP90, and TP50. InfluxDB is used as the time‑series backend, enabling simple queries—for example, a one‑line InfluxQL statement can retrieve TP90 values.
Implementation Details
The tool is packaged as a Maven artifact, making it easy for Java‑based services to consume. Users only need two lines of code to start traffic capture. For isolation, a dedicated machine performs the capture to avoid affecting production latency.
After capture, a web UI lets users inspect collected logs and view details of individual requests.
Performance Metrics Visualization
Results are presented through intuitive charts, allowing users to quickly assess application stability and performance under load.
Conclusion
Since its launch, the custom stress‑testing tool has been adopted by over 20 services, executing hundreds of test runs. Setup time has been reduced to 15–30 minutes per application, improving service stability and freeing developers from cumbersome manual testing processes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
