Operations 21 min read

How Huolala Automated Full‑Link Load Testing to Boost Efficiency and Cut Costs

This article details Huolala's journey from manual, resource‑intensive full‑link load testing to a fully automated, model‑driven platform that improves peak‑capacity verification, reduces testing time and manpower, ensures safety through circuit‑breaker mechanisms, and delivers measurable cost and performance gains.

Huolala Tech
Huolala Tech
Huolala Tech
How Huolala Automated Full‑Link Load Testing to Boost Efficiency and Cut Costs

Background and Challenges

Huolala has rapidly grown into the world’s largest closed‑loop logistics platform, leading to explosive growth in user numbers and freight orders. Ensuring system stability became critical, and full‑link load testing was essential for validating capacity and stability. However, rapid business iteration raised new demands: timely verification of peak capacity, higher testing efficiency, and safety safeguards to prevent test‑induced production incidents.

Solution and Goals

Faced with limited reference solutions, Huolala designed an automation framework that aligns with its business characteristics and internal platforms. The goal was to eliminate repetitive manual steps by mapping pre‑, during‑, and post‑test scenarios to platform services.

Key functional requirements include:

Modeling and model‑effect comparison : Identify suitable algorithms to define test traffic models and compare results.

Test task orchestration : Provide a flexible system to arrange tasks with temporal or causal dependencies.

Test task scheduling : Manage over 70 scripts, 300+ load‑generator machines, and millions of virtual orders.

Robust circuit‑breaker : Detect bottlenecks and stop traffic before services collapse.

Capability Building

3.1 Peak Traffic and Load Modeling

The load model defines target QPS ratios for each service and strives to mimic real traffic. Models are built before testing, adjusted during execution, and evaluated after testing.

3.1.1 Peak Traffic Modeling

Manual configuration becomes infeasible at scale; therefore Huolala extracts monitoring data, correlates metrics, and periodically generates peak‑traffic models for each service.

/** Simplified code, core flow only */
public PressureModel getAndSaveOneDayMonitorData(String appId, String env, String name, String batchId, long start, long end, long step, MonitorData orderMonitorData) {
    PressureModel model = new PressureModel();
    String timeScope = StrUtil.isEmpty(orderMonitorData.getTimeScope()) ? MonitorData.TIME_ALL_DAY : MonitorData.TIME_CUSTOM;
    // Retrieve SOA monitoring data and convert to model
    HashMap<String, MetricIntegratorData> monitorQpsSoaData = getMonitorQpsSoaData(env, appId, false, start, end, step);
    List<MonitorData> monitorQpsSoaDataFromMetricDates = getMonitorDataFromMetricDates(appId, env, name, batchId, MonitorData.TYPE_QPS_SOA, MonitorData.TYPE_QPS_SOA, timeScope, monitorQpsSoaData);
    if (!monitorQpsSoaDataFromMetricDates.isEmpty()) {
        monitorDataRepository.saveAll(monitorQpsSoaDataFromMetricDates);
        model = MonitorDataUtil.transitionPressureModel(model, monitorQpsSoaDataFromMetricDates, orderMonitorData);
    }
    // Save other key metrics (CPU, MySQL, Redis, etc.) into the model
    pressureModelRepository.save(model);
}

Initial peak‑time algorithms proved insufficient; Huolala introduced a fitted scaling factor based on order volume and service ratios, achieving high alignment between modeled and real traffic.

3.1.2 Traffic Ratio Adjustment

The system automatically adjusts interface traffic ratios in scripts based on the latest service model, ensuring each test run reflects current business patterns.

3.1.3 Test Effect Model

After each run, six key dimensions (CPU Avg/Max, HTTP, SOA, Redis, MySQL) are modeled and compared with the target model to spot deviations and guide optimizations.

3.2 Automated Task Flow Orchestration

3.2.1 Abstract Task Node

public abstract class AbstractTaskNodeService implements TaskNodeService {
    protected TaskNode taskNode;
    public AbstractTaskNodeService(TaskNode taskNode) {
        this.taskNode = taskNode;
        init();
        preExecute();
        execute();
        postProcessor();
    }
    @Override public void init() { /* parse context */ }
    @Override public void preExecute() { /* pre‑process */ }
    @Override public void execute() { /* concrete task */ }
    @Override public void postProcessor() { /* save state, create next node */ }
    @Override public TaskNode getTaskNode() { return this.taskNode; }
}

3.2.2 Task Flow Composition

Different scenarios (e.g., order‑driver positioning) are expressed as composable task graphs, allowing flexible combination without manual intervention.

3.2.3 Intermediate State Handling

If an urgent pause is required, the platform can halt or finish the current node and resume later.

3.3 Test Task Scheduling

3.3.1 Large‑Scale Cluster Management

Huolala migrated from static ECS clusters to serverless containers, enabling on‑demand provisioning of 500+ load‑generator pods within a minute and reducing hardware costs dramatically.

3.3.2 File Pre‑Distribution

Scripts, data files, and dependent JARs are split, pre‑processed, and deployed to clean environments on each target machine to avoid conflicts.

3.3.3 Elastic Traffic Management

When target QPS is not reached, the platform automatically adds threads or machines (auto‑pressurization). If downstream services miss traffic due to caching or branching, pre‑written supplement scripts are triggered automatically (auto‑traffic‑补流).

3.3.4 Real‑Time Data Collection

// Load‑generator sends metrics to Kafka every second
@Slf4j
public class HllBackendListenerClient extends AbstractBackendListenerClient {
    @Override public void setupTest(BackendListenerContext ctx) throws Exception { /* settings */ }
    @Override public void handleSampleResults(List<SampleResult> list, BackendListenerContext ctx) { /* send */ }
    @Override public void teardownTest(BackendListenerContext ctx) throws Exception { analysis(); schedule.shutdown(); }
    private void analysis() { /* aggregate performance data */ }
}
// Collector consumes Kafka and aggregates results
@Slf4j
public class AnalysisProcessor implements Processor<String, String> {
    private ProcessorContext context;
    private ConcurrentHashMap<String, List<SampleState>> sampleMap = new ConcurrentHashMap<>();
    @Override public void init(ProcessorContext ctx) { this.context = ctx; /* schedule periodic analysis */ }
    @Override public void process(String key, String message) { /* parse and store */ }
    @Override public void close() { }
    private void analysis() { /* summarize and persist performance records */ }
}

3.4 Dual Safety Guarantees

3.4.1 Active Circuit‑Breaker

The platform monitors error rates and response times; when thresholds are crossed, it automatically aborts the test.

3.4.2 Passive Circuit‑Breaker

External alerts from monitoring or DB platforms trigger the test‑stop API, providing an additional safety net.

Practical Results

Within a year of launch, the automated full‑link load testing achieved significant improvements:

Testing efficiency : Frequency increased from bi‑weekly to weekly with same‑day on‑demand capability.

Cost reduction : Manual involvement dropped by over 80%; containerized load generators cut hardware spend by more than 90%.

Test effectiveness : Coverage rose from 43% to 90% after model‑driven adjustments.

Future Outlook

Future work includes building a performance‑testing large model powered by AI to uncover hidden bottlenecks, and expanding service‑oriented capabilities by integrating with fault‑injection platforms, NOC tools, traffic‑replay systems, and precise testing services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeScalabilityLoad Testingcircuit breaker
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.