Cloud Native 32 min read

How We Revamped a Large‑Scale API Automation Platform with Kubernetes and Tekton

This article details the evolution of a high‑traffic API automation testing platform, covering challenges such as multi‑environment isolation, execution speed, quality assessment, and stability, and explains how the team leveraged Kubernetes, Rancher, Tekton, precise testing, and modern reporting to dramatically improve efficiency and reliability.

Huolala Tech

Sep 24, 2024

How We Revamped a Large‑Scale API Automation Platform with Kubernetes and Tekton

Preface

Interface automation testing is a core pillar for quality assurance in fast‑growing internet companies. As business complexity and scale increase, the platform must evolve to support multi‑environment deployment, boost test efficiency, and provide rapid quality feedback.

Chapter 1: Iteration Background

After initial success with jar‑based acceleration, Klov+ExtentReport dashboards, and precise testing recommendations, the team identified three main pain points: network isolation, long test runtimes, and subjective quality evaluation.

Chapter 2: Solution Gathering

2.1 Environment Isolation

Two completely separate environments—an isolated test network and a cross‑cloud overseas environment—caused collaboration friction and complex deployments.

Team collaboration difficulty : Teams could not work in the same environment, leading to repeated deployments.

Increased deployment complexity : Manual configuration was error‑prone and time‑consuming.

2.2 Need for Faster Execution

Test case volume grew, extending run times from minutes to hours, causing release delays and resource waste.

Release delays : Automation became a bottleneck for production releases.

Resource waste : Redundant test execution consumed valuable resources.

2.3 Quality Assessment Challenges

Increasing test cases made manual, subjective quality judgments unreliable, leading to delayed feedback and production issues.

To address these, the team implemented three solutions:

Network isolation resolution : Deployed new Rancher and Kubernetes clusters, integrated services, and opened network connections where possible.

Execution acceleration : Parallelized compilation and report generation, reducing minutes‑level steps to seconds.

Precise testing standards : Analyzed code changes to recommend impacted interfaces, enforced full coverage, and pushed real‑time results.

Chapter 3: Leveraging Cluster Capabilities

To support diverse business lines (domestic, international, and “small‑pull”), the team built a rapid cluster replication process, ensuring consistency, quick rollout, and elastic scaling across environments.

3.2 Admission/Exit Gate for Test Quality

All interface automation cases must pass before a service can be released, turning the exit gate into a hard quality standard.

3.3 Precise Testing to Boost Coverage

By establishing unified automation standards and adapting strategies for high‑dependency projects, the team improved coverage while avoiding over‑reliance on independent interface recommendations.

3.4 Task Management for Uncovered Interfaces

Uncovered interfaces generate automatic tasks; once coverage is added, tasks close automatically, achieving a fully automated test‑case lifecycle.

Chapter 4: Task Execution

4.1 Build‑Free Task Templates

Using Kubernetes resource scheduling and Tekton pipelines, the platform runs tests in isolated containers without a build step, downloading pre‑built artifacts from OSS.

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: auto-test-api-jar-java
  namespace: tekton-pipelines
spec:
  params:
    - name: gitId
      type: string
    - name: commitId
      type: string
    - name: appId
      type: string
    - name: tektonId
      type: string
  resources:
    inputs:
      - name: maven-test
        type: git
  steps:
    - image: 'harbor.xxx.com/basic/maven-mitmproxy:api-jar-24070101'
      script: |
        #!/usr/bin/env bash
        start=`date +%s`
        python3 /home/downloadJar.py $(params.repo) $(params.revision) | tee /workspace/maven-test/testNG.txt
        ...

4.2 Parsing Test Cases into the Database

Implemented ExtentTestNGIReporterListener to capture test metadata and push JSON payloads to a backend service.

public class ExtentTestNGIReporterListener implements IReporter, ITestListener {
    private ExtentReports extent;
    private List<TestData> testDataList = new ArrayList<>();
    ...
    public void generateReport(List<XmlSuite> xmlSuites, List<ISuite> suites, String outputDirectory) {
        // build Extent report and send data to MongoDB/Klov
    }
}

Test cases are also stored as XML suites for flexible execution.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="【乐高】自动化" verbose="1" parallel="classes" thread-count="10">
  <test name="组装case重跑">
    <classes>
      <class name="com.huolala.qaautotest.TestCase.BfeCustomerApplicationQuerySvc.CommodityPriceFacade.ConfirmEvaluate">
        <methods>
          <include name="ConfirmEvaluate_001_001"/>
        </methods>
      </class>
    </classes>
    <listeners>
      <listener class-name="com.huolala.qaautotest.utils.ExtentTestNGIReporterListener"/>
    </listeners>
  </test>
</suite>

Chapter 5: Stability Governance

Operational incidents such as expired Rancher certificates, Harbor SSL expiration, DNS resolution failures, and pod IP exhaustion were resolved through certificate rotation, CoreDNS restart, and automated pod cleanup scripts.

openssl req -newkey rsa:4096 -nodes -sha256 -keyout ca.key -subj "/C=CN/ST=HB/O=QC/CN=your.domain.com" -x509 -days 3650 -out ca.crt
openssl req -x509 -new -nodes -key ca.key -subj "/C=CN/ST=HB/O=QC/CN=your.domain.com" -sha256 -days 100000 -out ca.crt

Chapter 6: Splitting for Speed

By moving compilation and packaging to CI/CD pipelines and storing built JARs in OSS, runtime compilation was eliminated, cutting test execution time by ~30%.

6.2 Klov Report Management

Replaced the Jenkins+TestNG report chain with direct MongoDB ingestion and Klov front‑end, achieving faster generation and richer visualizations.

Chapter 7: Report Presentation

Adopting ExtentReports with Klov provided modern dashboards, visual charts, interactive navigation, and clear grouping, vastly improving user experience over the legacy Jenkins+TestNG reports.

Chapter 8: Past and Future

The platform’s evolution mirrors a resilient tree, continuously adapting through architectural upgrades, precise testing integration, and governance loops ("double‑card double‑wait"). Future goals include tighter linkage between test cases and business services and measurable readability of that relationship.