How Huolala Automated Full‑Link Load Testing to Boost Efficiency and Cut Costs
This article details Huolala's journey from manual, resource‑intensive full‑link load testing to a fully automated, model‑driven platform that improves peak‑capacity verification, reduces testing time and manpower, ensures safety through circuit‑breaker mechanisms, and delivers measurable cost and performance gains.
Background and Challenges
Huolala has rapidly grown into the world’s largest closed‑loop logistics platform, leading to explosive growth in user numbers and freight orders. Ensuring system stability became critical, and full‑link load testing was essential for validating capacity and stability. However, rapid business iteration raised new demands: timely verification of peak capacity, higher testing efficiency, and safety safeguards to prevent test‑induced production incidents.
Solution and Goals
Faced with limited reference solutions, Huolala designed an automation framework that aligns with its business characteristics and internal platforms. The goal was to eliminate repetitive manual steps by mapping pre‑, during‑, and post‑test scenarios to platform services.
Key functional requirements include:
Modeling and model‑effect comparison : Identify suitable algorithms to define test traffic models and compare results.
Test task orchestration : Provide a flexible system to arrange tasks with temporal or causal dependencies.
Test task scheduling : Manage over 70 scripts, 300+ load‑generator machines, and millions of virtual orders.
Robust circuit‑breaker : Detect bottlenecks and stop traffic before services collapse.
Capability Building
3.1 Peak Traffic and Load Modeling
The load model defines target QPS ratios for each service and strives to mimic real traffic. Models are built before testing, adjusted during execution, and evaluated after testing.
3.1.1 Peak Traffic Modeling
Manual configuration becomes infeasible at scale; therefore Huolala extracts monitoring data, correlates metrics, and periodically generates peak‑traffic models for each service.
/** Simplified code, core flow only */
public PressureModel getAndSaveOneDayMonitorData(String appId, String env, String name, String batchId, long start, long end, long step, MonitorData orderMonitorData) {
PressureModel model = new PressureModel();
String timeScope = StrUtil.isEmpty(orderMonitorData.getTimeScope()) ? MonitorData.TIME_ALL_DAY : MonitorData.TIME_CUSTOM;
// Retrieve SOA monitoring data and convert to model
HashMap<String, MetricIntegratorData> monitorQpsSoaData = getMonitorQpsSoaData(env, appId, false, start, end, step);
List<MonitorData> monitorQpsSoaDataFromMetricDates = getMonitorDataFromMetricDates(appId, env, name, batchId, MonitorData.TYPE_QPS_SOA, MonitorData.TYPE_QPS_SOA, timeScope, monitorQpsSoaData);
if (!monitorQpsSoaDataFromMetricDates.isEmpty()) {
monitorDataRepository.saveAll(monitorQpsSoaDataFromMetricDates);
model = MonitorDataUtil.transitionPressureModel(model, monitorQpsSoaDataFromMetricDates, orderMonitorData);
}
// Save other key metrics (CPU, MySQL, Redis, etc.) into the model
pressureModelRepository.save(model);
}Initial peak‑time algorithms proved insufficient; Huolala introduced a fitted scaling factor based on order volume and service ratios, achieving high alignment between modeled and real traffic.
3.1.2 Traffic Ratio Adjustment
The system automatically adjusts interface traffic ratios in scripts based on the latest service model, ensuring each test run reflects current business patterns.
3.1.3 Test Effect Model
After each run, six key dimensions (CPU Avg/Max, HTTP, SOA, Redis, MySQL) are modeled and compared with the target model to spot deviations and guide optimizations.
3.2 Automated Task Flow Orchestration
3.2.1 Abstract Task Node
public abstract class AbstractTaskNodeService implements TaskNodeService {
protected TaskNode taskNode;
public AbstractTaskNodeService(TaskNode taskNode) {
this.taskNode = taskNode;
init();
preExecute();
execute();
postProcessor();
}
@Override public void init() { /* parse context */ }
@Override public void preExecute() { /* pre‑process */ }
@Override public void execute() { /* concrete task */ }
@Override public void postProcessor() { /* save state, create next node */ }
@Override public TaskNode getTaskNode() { return this.taskNode; }
}3.2.2 Task Flow Composition
Different scenarios (e.g., order‑driver positioning) are expressed as composable task graphs, allowing flexible combination without manual intervention.
3.2.3 Intermediate State Handling
If an urgent pause is required, the platform can halt or finish the current node and resume later.
3.3 Test Task Scheduling
3.3.1 Large‑Scale Cluster Management
Huolala migrated from static ECS clusters to serverless containers, enabling on‑demand provisioning of 500+ load‑generator pods within a minute and reducing hardware costs dramatically.
3.3.2 File Pre‑Distribution
Scripts, data files, and dependent JARs are split, pre‑processed, and deployed to clean environments on each target machine to avoid conflicts.
3.3.3 Elastic Traffic Management
When target QPS is not reached, the platform automatically adds threads or machines (auto‑pressurization). If downstream services miss traffic due to caching or branching, pre‑written supplement scripts are triggered automatically (auto‑traffic‑补流).
3.3.4 Real‑Time Data Collection
// Load‑generator sends metrics to Kafka every second
@Slf4j
public class HllBackendListenerClient extends AbstractBackendListenerClient {
@Override public void setupTest(BackendListenerContext ctx) throws Exception { /* settings */ }
@Override public void handleSampleResults(List<SampleResult> list, BackendListenerContext ctx) { /* send */ }
@Override public void teardownTest(BackendListenerContext ctx) throws Exception { analysis(); schedule.shutdown(); }
private void analysis() { /* aggregate performance data */ }
} // Collector consumes Kafka and aggregates results
@Slf4j
public class AnalysisProcessor implements Processor<String, String> {
private ProcessorContext context;
private ConcurrentHashMap<String, List<SampleState>> sampleMap = new ConcurrentHashMap<>();
@Override public void init(ProcessorContext ctx) { this.context = ctx; /* schedule periodic analysis */ }
@Override public void process(String key, String message) { /* parse and store */ }
@Override public void close() { }
private void analysis() { /* summarize and persist performance records */ }
}3.4 Dual Safety Guarantees
3.4.1 Active Circuit‑Breaker
The platform monitors error rates and response times; when thresholds are crossed, it automatically aborts the test.
3.4.2 Passive Circuit‑Breaker
External alerts from monitoring or DB platforms trigger the test‑stop API, providing an additional safety net.
Practical Results
Within a year of launch, the automated full‑link load testing achieved significant improvements:
Testing efficiency : Frequency increased from bi‑weekly to weekly with same‑day on‑demand capability.
Cost reduction : Manual involvement dropped by over 80%; containerized load generators cut hardware spend by more than 90%.
Test effectiveness : Coverage rose from 43% to 90% after model‑driven adjustments.
Future Outlook
Future work includes building a performance‑testing large model powered by AI to uncover hidden bottlenecks, and expanding service‑oriented capabilities by integrating with fault‑injection platforms, NOC tools, traffic‑replay systems, and precise testing services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
