How We Revamped a Large‑Scale API Automation Platform with Kubernetes and Tekton
This article details the evolution of a high‑traffic API automation testing platform, covering challenges such as multi‑environment isolation, execution speed, quality assessment, and stability, and explains how the team leveraged Kubernetes, Rancher, Tekton, precise testing, and modern reporting to dramatically improve efficiency and reliability.
Preface
Interface automation testing is a core pillar for quality assurance in fast‑growing internet companies. As business complexity and scale increase, the platform must evolve to support multi‑environment deployment, boost test efficiency, and provide rapid quality feedback.
Chapter 1: Iteration Background
After initial success with jar‑based acceleration, Klov+ExtentReport dashboards, and precise testing recommendations, the team identified three main pain points: network isolation, long test runtimes, and subjective quality evaluation.
Chapter 2: Solution Gathering
2.1 Environment Isolation
Two completely separate environments—an isolated test network and a cross‑cloud overseas environment—caused collaboration friction and complex deployments.
Team collaboration difficulty : Teams could not work in the same environment, leading to repeated deployments.
Increased deployment complexity : Manual configuration was error‑prone and time‑consuming.
2.2 Need for Faster Execution
Test case volume grew, extending run times from minutes to hours, causing release delays and resource waste.
Release delays : Automation became a bottleneck for production releases.
Resource waste : Redundant test execution consumed valuable resources.
2.3 Quality Assessment Challenges
Increasing test cases made manual, subjective quality judgments unreliable, leading to delayed feedback and production issues.
To address these, the team implemented three solutions:
Network isolation resolution : Deployed new Rancher and Kubernetes clusters, integrated services, and opened network connections where possible.
Execution acceleration : Parallelized compilation and report generation, reducing minutes‑level steps to seconds.
Precise testing standards : Analyzed code changes to recommend impacted interfaces, enforced full coverage, and pushed real‑time results.
Chapter 3: Leveraging Cluster Capabilities
To support diverse business lines (domestic, international, and “small‑pull”), the team built a rapid cluster replication process, ensuring consistency, quick rollout, and elastic scaling across environments.
3.2 Admission/Exit Gate for Test Quality
All interface automation cases must pass before a service can be released, turning the exit gate into a hard quality standard.
3.3 Precise Testing to Boost Coverage
By establishing unified automation standards and adapting strategies for high‑dependency projects, the team improved coverage while avoiding over‑reliance on independent interface recommendations.
3.4 Task Management for Uncovered Interfaces
Uncovered interfaces generate automatic tasks; once coverage is added, tasks close automatically, achieving a fully automated test‑case lifecycle.
Chapter 4: Task Execution
4.1 Build‑Free Task Templates
Using Kubernetes resource scheduling and Tekton pipelines, the platform runs tests in isolated containers without a build step, downloading pre‑built artifacts from OSS.
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: auto-test-api-jar-java
namespace: tekton-pipelines
spec:
params:
- name: gitId
type: string
- name: commitId
type: string
- name: appId
type: string
- name: tektonId
type: string
resources:
inputs:
- name: maven-test
type: git
steps:
- image: 'harbor.xxx.com/basic/maven-mitmproxy:api-jar-24070101'
script: |
#!/usr/bin/env bash
start=`date +%s`
python3 /home/downloadJar.py $(params.repo) $(params.revision) | tee /workspace/maven-test/testNG.txt
...4.2 Parsing Test Cases into the Database
Implemented ExtentTestNGIReporterListener to capture test metadata and push JSON payloads to a backend service.
public class ExtentTestNGIReporterListener implements IReporter, ITestListener {
private ExtentReports extent;
private List<TestData> testDataList = new ArrayList<>();
...
public void generateReport(List<XmlSuite> xmlSuites, List<ISuite> suites, String outputDirectory) {
// build Extent report and send data to MongoDB/Klov
}
}Test cases are also stored as XML suites for flexible execution.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="【乐高】自动化" verbose="1" parallel="classes" thread-count="10">
<test name="组装case重跑">
<classes>
<class name="com.huolala.qaautotest.TestCase.BfeCustomerApplicationQuerySvc.CommodityPriceFacade.ConfirmEvaluate">
<methods>
<include name="ConfirmEvaluate_001_001"/>
</methods>
</class>
</classes>
<listeners>
<listener class-name="com.huolala.qaautotest.utils.ExtentTestNGIReporterListener"/>
</listeners>
</test>
</suite>Chapter 5: Stability Governance
Operational incidents such as expired Rancher certificates, Harbor SSL expiration, DNS resolution failures, and pod IP exhaustion were resolved through certificate rotation, CoreDNS restart, and automated pod cleanup scripts.
openssl req -newkey rsa:4096 -nodes -sha256 -keyout ca.key -subj "/C=CN/ST=HB/O=QC/CN=your.domain.com" -x509 -days 3650 -out ca.crt
openssl req -x509 -new -nodes -key ca.key -subj "/C=CN/ST=HB/O=QC/CN=your.domain.com" -sha256 -days 100000 -out ca.crtChapter 6: Splitting for Speed
By moving compilation and packaging to CI/CD pipelines and storing built JARs in OSS, runtime compilation was eliminated, cutting test execution time by ~30%.
6.2 Klov Report Management
Replaced the Jenkins+TestNG report chain with direct MongoDB ingestion and Klov front‑end, achieving faster generation and richer visualizations.
Chapter 7: Report Presentation
Adopting ExtentReports with Klov provided modern dashboards, visual charts, interactive navigation, and clear grouping, vastly improving user experience over the legacy Jenkins+TestNG reports.
Chapter 8: Past and Future
The platform’s evolution mirrors a resilient tree, continuously adapting through architectural upgrades, precise testing integration, and governance loops ("double‑card double‑wait"). Future goals include tighter linkage between test cases and business services and measurable readability of that relationship.
Chapter 9: Returning to the Origin
Interface automation remains the foundational theme, guiding all subsequent quality‑assurance practices within the organization.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
