How LLMs Transform Traffic Replay Testing for Backend Services
This article walks through the challenges of traditional traffic replay, explains the design of a conventional replay system, and then details a novel LLM‑powered solution that automates data preparation, script generation, validation, and continuous integration for backend service testing.
Preface
Many developers have heard of traffic replay but find it harder to implement than most engineering tasks because it heavily depends on internal backend services, environment conditions, and architecture.
Purpose of the Traffic Replay System
Traffic replay records real online requests and replays them in an offline environment to verify that interfaces still function correctly. It ensures authenticity (real user agents, user profiles, and request diversity), provides data reference for test script writing, supports large‑scale regression, and boosts testing confidence.
Traditional Traffic Replay System
The team built a traditional system that records traffic via Nginx, writes logs to Kafka, extracts and flattens JSON responses into a response_shape, stores data, and runs replay using Celery tasks. The process includes test case collection, de‑duplication of response shapes, request headers, and parameters, followed by strict and fuzzy matching of responses.
Key steps:
Collect test case set from the testing platform.
De‑duplicate response shapes, request headers, and parameters.
Execute requests and compare responses.
Code example of the test case template:
{
"host": "xxxx",
"request_path": "/a/b/c",
"request_headers": ["..."],
"request_params": ["..."],
"request_method": "POST",
"response_shape": ["data.user,data.name,data.age"]
}LLM‑Based Traffic Replay System
The traditional approach suffers from two major problems: inability to achieve both precise and generic validation, and difficulty handling stateful interfaces. To address these, the team introduced large language models (LLMs) to generate test scripts, perform intelligent de‑duplication, and assist in error analysis.
Data preparation stage extracts unique response shapes from Elasticsearch, joins them with request data from MySQL, de‑duplicates tokens and parameters, and randomly supplements a small number of records (5‑6) to keep AI prompts stable.
Data storage stage aggregates daily data, feeds it to an AI workflow (DIFY) to produce test scripts, and stores the scripts with metadata indicating whether they are new, updated, or need human review.
Replay stage binds test cases to the front‑end testing platform, executes scripts, collects logs, and uses LLMs to summarize errors and generate alerts.
Future Plans
Current outputs include integration with the DevOps pipeline, coverage of 257 backend interfaces with 583 generated scripts, and ongoing challenges such as increasing review workload, three‑day data gaps, and instability due to non‑idempotent interfaces.
The team aims to reduce manual review time, shorten data gaps, and improve stability of replay results.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
