Operations 14 min read

Leveraging Real Online Traffic for Quality Assurance and Efficiency in Online Education Platforms

This article explains how an online education quality team uses real user traffic to build a comprehensive platform that improves CI/CD maturity, automates testing, captures and replays traffic, and provides scalable services for continuous quality and efficiency across the entire development lifecycle.

TAL Education Technology
TAL Education Technology
TAL Education Technology
Leveraging Real Online Traffic for Quality Assurance and Efficiency in Online Education Platforms

The online education quality team focuses on improving online quality and development efficiency, using real user traffic to strengthen CI/CD maturity and overall team effectiveness.

Preface

Technical measures protect core business principles by examining problems at their root.

No stable automated services are as useless as a chicken rib.

Delivery without coverage analysis is like running naked.

CI/CD without continuous testing capability is a disaster.

Core Capabilities Based on Traffic Assurance

1. Traffic Recording and Collection

Real online user traffic is the foundation for service verification, performance testing, and technical refactoring. Two main traffic collection methods are used:

Main‑path duplication (log reporting, code AOP, framework‑level duplication such as Dubbo Service Mesh, open‑source tools like rdebug, jvm‑sandbox).

Side‑path duplication (network protocol stack capture).

Main‑path duplication offers strong control and flexibility but is tightly coupled to business and language stacks, potentially impacting performance. Side‑path duplication is business‑agnostic and non‑intrusive but incurs higher parsing overhead and processing cost.

For online education, the side‑path approach based on Nginx access logs was chosen because it provides multi‑dimensional traffic capture without service intrusion.

Challenges

Collecting traffic across multiple dimensions.

Identifying traffic identities.

Ensuring call sequence control.

Adapting quickly to different business unit architectures.

Traffic Capture and Recording Architecture

The architecture follows a streaming model: gateway logs are cleaned, core fields are transformed into replay‑ready streams, tokens are parsed to identify traffic, and token‑traceId linkage ensures precise behavior and link capture. Data is colored and replaced to protect privacy.

Provided Capabilities

User‑based traffic collection.

Scenario‑based traffic collection.

Traffic sequencing identifiers.

OpenAPI to empower other platforms.

2. Business Model Construction

The team builds models for core write‑scenario business to combine with traffic data, enabling stable replay, performance testing, and internal platform data construction.

Challenges include classifying model features, ensuring no impact on users or business, efficient data replacement, and adaptability to business changes.

Using the large‑class live broadcast scenario as an example, core fields are manually labeled, traffic is recolored, and replay‑dependent scenarios are constructed. The model preserves real traffic characteristics while replacing sensitive data.

Classification Basis

Feature data (user answer records, courseware, teacher actions, etc.).

Course data (course ID, session ID, etc.).

Fixed‑step offset data (student ID, teacher ID, etc.).

When business changes add new base data, the classification system and custom replacement algorithm keep the model stable, avoiding the need for constant script maintenance.

3. Traffic Replay Execution

Replay runs on processed traffic data, applying thread control to balance link chaining and execution efficiency.

Challenges

Propagating upstream/downstream data.

Mapping traffic identity to replay account identity.

Maintaining data lifecycle.

For the large‑class live broadcast scenario, the replay flow includes result collection and validation, automatically generating JSON schemas and providing unified validation rules, alerts for failures, and reducing configuration errors.

4. Traffic Comparison Service

The Diff capability compares replay responses against historical baselines to detect anomalies, supporting structured, ignored, and array comparisons, and feeding results into training data for anomaly detection.

Challenges

Noise in response values (random data, timestamps).

Support for multiple comparison rule types.

Change impact calculation.

Precise extraction from massive results.

Capability Serviceization and Platformization

To keep up with rapid business iteration, core capabilities are offered as services and platforms, enhancing overall quality and efficiency.

Backend Architecture

The Conan platform provides traffic replay, data construction, pressure modeling, and other services, supporting performance testing, interface testing, and online health checks. It uses a distributed architecture with master‑worker nodes, load balancing, and Kubernetes for dynamic scaling and high availability.

Online Traffic‑Based Pressure Model and Data

The platform builds performance models from real online traffic, enabling accurate pressure testing without the distortion of synthetic data, and supports model scaling for various performance expectations.

Other Platform Capabilities

Additional backend features reduce usage cost, offering simple interactions, OpenAPI access, and data download capabilities.

Conclusion

Improving quality and efficiency requires both technology and people; leveraging innovative platforms like Conan enables teams to handle frequent, complex demands. The platform plans to open source in early 2021 to invite broader community participation.

cloud nativeCI/CDquality assuranceplatform architectureonline educationtraffic capture
TAL Education Technology
Written by

TAL Education Technology

TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.