Backend Development 17 min read

Kylin: 58’s Integrated Performance Testing Platform – Architecture, Features, and Future Roadmap

The article introduces Kylin, 58’s self‑developed one‑stop performance testing platform, detailing its background, development stages, core modules such as management console, test engine, scheduler, data monitoring and report aggregation, as well as its architecture, SCF support, task scheduling, data collection, reporting mechanisms, practical results and future roadmap.

58 Tech
58 Tech
58 Tech
Kylin: 58’s Integrated Performance Testing Platform – Architecture, Features, and Future Roadmap

1. Overview

Kylin is a one‑stop performance testing platform developed by 58, designed to help business lines quickly conduct performance tests, accurately evaluate service performance and capacity, and improve service stability and reliability. The platform includes a management console, data monitoring, report calculation, task scheduling and test engine modules.

2. Background

With increasing demand for interface load testing in scenarios such as Spring Festival traffic, flash sales and surge activities, traditional tools like JMeter present high learning costs, fragmented management of test cases, poor result sharing, loss of historical data, and difficulty testing custom 58 RPC services (SCF). These issues motivated the creation of Kylin.

3. Development Stages

Since its launch in 2020, Kylin has undergone three stages: initial support (SCF testing, real‑time monitoring, report viewing), expansion (added HTTP & JMX testing, parameterization, assertions, dual‑engine Gatling/JMeter integration), and improvement (architecture redesign to eliminate master‑slave bottleneck, dynamic resource scaling, Kafka‑Flink real‑time reporting).

4. Platform Functions

The platform consists of five parts: management console, test engine, scheduler, data monitoring, and report aggregation.

4.1 Management Console

Provides task management, execution history, case management, resource file management, permission control and data maintenance. Features task copy, third‑party import, CURL import, case reuse, debugging, and risk controls such as permission limits and one‑click stop.

4.2 Test Engine

Supports multiple protocols (HTTP, SCF RPC, WMB) and script uploads (JMX, Scala).

Offers various pressure models (ramp‑up, time‑based, loop‑based, custom combinations).

Allows parameterization, random variables and custom functions.

Provides rich assertions on code, body, headers, etc.

4.3 Scheduler

Dispatch receives tasks from the console, splits them into jobs, calculates required resources, creates containers, distributes tasks to agents, monitors execution, and recovers resources after completion.

4.4 Data Monitoring

Real‑time dashboards display QPS, response time, error rate, CPU, memory, network, and SCF‑specific metrics. Execution logs are collected for troubleshooting.

4.5 Report Aggregation

Reports are generated via a custom Kafka‑Flink pipeline, delivering millisecond‑level latency and supporting 27 metrics including percentiles. TreeMap storage optimizes memory usage for high‑QPS scenarios.

5. Core Design – SCF Support

SCF (58’s RPC service) requires custom sampler extensions in JMeter. The sampler handles initialization (service discovery, dependency download, client creation), request processing (parameter building, invocation) and cleanup. UI aids users by auto‑filling service details, validating keys, and providing dropdown suggestions.

5.1 Task Scheduling

Dispatch creates jobs, splits them into tasks, launches agents, and synchronizes status. Agents perform pre‑processing, execute the test, and respond to stop commands.

5.2 Data Collection & Logging

Machine metrics are collected via 58’s OpenAPI. Logs are streamed with Filebeat to Kafka, persisted in ClickHouse, and visualized for flexible querying. Log verbosity can be adjusted per task to balance detail and performance impact.

5.3 Report Calculation

Kafka‑Flink computes QPS, average/median/max response times, error rates and percentile lines in real time, overcoming JMeter’s latency issues in large‑scale tests.

6. Practical Effects

Since V1.0, Kylin has supported major events such as Super‑Job Season, House‑Hunting Festival and Spring Festival capacity evaluation, deployed in 11 business units, handling over 200 tasks with >10 000 QPS, 2 300+ interfaces and 16 000+ test runs, with SCF accounting for ~65 % of tasks.

7. Future Plans

Full‑link load testing with multi‑interface scenarios and traffic replay.

Data isolation by routing test data to shadow databases.

Circuit‑breaker safety mechanisms that automatically throttle or stop tests when performance thresholds are breached.

Author Bio

Zhao Ru and Tang Mengqian, Senior Test Engineers at 58.com.

distributed systemsbackend developmentperformance testingplatform architectureload testingSCF
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.