Scaling Automated API Testing for Millions of Microservices
This article outlines the background, testing strategy, and practical implementation of automated API testing within a large-scale microservice environment, detailing the shift from traditional test pyramids to a honeycomb model, technology choices, test case design, mock servers, platform management, and measures to prevent test suite decay.
1. Background
ShouQianBa business serves millions of merchants, with a massive microservice architecture comprising hundreds of backend services written in Node.js, Java, Go, Python, using MySQL, MongoDB, Elasticsearch, Kafka, Redis, Apollo, RabbitMQ, etc. As product complexity grew, traditional functional testing became costly and inefficient, prompting the adoption of automated testing to detect deep issues early and reduce fix time.
2. Testing Strategy
Automated testing is a generic term that includes unit testing, API testing, web testing, etc. In a microservice architecture, instead of the traditional test pyramid, we favor a honeycomb layered model.
Reasons:
In microservice projects, a service is a "unit"; interfaces expose unit capabilities and enable communication; orchestrating interface calls implements business logic.
Unit tests are large in code volume, developers cannot maintain them, and they do not cover integration or business scenarios, yielding limited benefit.
When interfaces are defined early, test engineers can design and develop API test cases early, achieving left‑shift testing and earlier issue detection.
API automation testing offers early involvement, low maintenance cost, and comprehensive business logic coverage, making it our primary focus.
2.1 Refined Testing Strategy for Microservices
Beyond emphasizing API testing, we further refine the automation strategy according to the layered characteristics of microservices.
System architecture overview:
Access Layer : Front‑end entry point (e.g., API gateway) handling authentication, validation, response packaging, and routing without business logic.
Application Layer : Business services that orchestrate domain services to implement functionality (e.g., merchant onboarding).
Domain Layer : Domain objects with high cohesion and low coupling, implementing business rules (e.g., payment, card, settlement services).
Infrastructure Layer : Databases, caches, message queues, etc.
We simplify the view to the access perspective, as shown in the following diagram:
Based on each layer’s responsibilities, we define testing focus:
Access Layer: interface authentication, validation, input/output legality, connectivity, routing correctness.
Application Layer: functional coverage of each interface and end‑to‑end business flow through integrated interface testing.
Domain Layer: business rule, algorithm, and third‑party interaction coverage.
3. Automated Testing Practice
With a clear strategy, we develop automated test cases.
3.1 Technology Stack
We use Python with the built‑in unittest framework and pytest for execution.
Test data object management and relationship handling
Data‑driven testing support
Multi‑dimensional test case sorting for organizing test plans
Multi‑environment execution
Middleware connection pooling
Automated report integration
Platform integration
3.2 API Functional Testing
Key aspects include parameter validation, permission checks, response verification, persistence validation, middleware validation, and logic branch coverage. Effective equivalence class partitioning reduces test effort. Example: login interface covered by two cases—correct credentials and incorrect credentials—based on the implementation that only a matching username/password succeeds.
def test_login_succ(self):
res = self.client.login(username, password)
self.assertEqual(res.code, SUCC)3.3 API Integration Testing
Beyond single‑interface tests, we verify interactions and data flow across services, covering normal business flows, exception flows, call‑chain coverage, and sequence coverage. Example scenario: merchant enrollment, activity registration, payment source onboarding, rate discount, and successful payment.
def test_new_merchant_pay_success(self):
res = self.merchant_service.create_merchant()
self.assertEqual(res.code, SUCC)
another_res = self.another_service.do_something(res.field)
self.assertTrue(another_res, COMPLETE)3.4 Asset‑Loss Testing
Financial operations (payment, settlement, profit sharing, etc.) require tests to prevent monetary loss due to duplicate submissions, concurrency issues, logic errors, or security flaws. Three scenarios are illustrated:
3.4.1 Duplicate Calls
Idempotent handling ensures only one deduction despite repeated requests.
Test simulates multiple identical payment requests and asserts balance changes reflect a single successful deduction.
def test_pay_more_time():
old_balance = self.client.get_merchant_balance(merchant_id) # get balance
[self.client.pay(amount, client_sn) for _ in range(5)] # multiple payments
current_balance = self.client.get_merchant_balance(merchant_id) # re‑get balance
assert current_balance == old_balance + amount # only one cent added3.4.2 Concurrent Requests to the Same Interface
Concurrent identical payment requests must result in a single deduction.
def test_pay_concurrency(self):
old_balance = self.client.get_merchant_balance(merchant_id)
pool = [threading.Thread(target=self.client.pay, args=(1,)) for _ in range(10)]
[t.start() for t in pool]
[t.join() for t in pool]
current_balance = self.client.get_merchant_balance(merchant_id)
assert current_balance == old_balance + 13.4.3 Concurrent Calls to Multiple Financial Interfaces
Simultaneous add, reduce, and refund operations on the same account must preserve balance consistency.
def test_account():
old_balance = self.client.get_merchant_balance(merchant_id)
concurrent_add(self.client, add_amount, a)
concurrent_reduce(self.client, reduce_amount, b)
concurrent_refund(self.client, refund_amount, c)
current_balance = self.client.get_merchant_balance(merchant_id)
assert current_balance == old_balance + a*add_amount - b*reduce_amount - c*refund_amount3.5 Mocking Third‑Party Interfaces
We built a Go‑based Mock Server to simulate third‑party responses, enabling testing of failure scenarios such as account errors or payment refusals.
@mock("RETURN_CODE", "ACCOUNT_ERROR")
def test_pay_abnormal():
response = self.client.pay()
self.assertEqual(response.code, FAIL)4. Platform Management
4.1 Automated Execution Platform
We developed Zepar, a generic platform for test case presentation, plan management, and automated/manual execution, improving test case reuse and cross‑team accessibility.
Test plan management
Execution records
4.2 Reporting Platform
We integrated the open‑source ReportPortal to collect logs, analyze results, and visualize metrics across frameworks such as TestNG, Pytest, JUnit, etc., helping identify flaky tests and accelerate remediation.
5. Anti‑Decay Measures
To avoid test suite decay, we enforce PEP‑8 style, AIR principles, comprehensive docstrings, code review, decorator usage to hide technical details, and encourage Pythonic code. All test engineers are responsible for maintaining the usability of their automation assets.
6. Summary and Outlook
Our automated API test suite now exceeds 30,000 cases with over 95% availability and more than 70% coverage of core services. As test volume and asynchronous scenarios grow, we are developing a distributed execution framework to sustain performance.
Automated testing has become a vital asset for accelerating delivery while ensuring quality, and we will continue to refine and expand our capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
