Backend Development 17 min read

How LLMs Transform Traffic Replay Testing for Backend Services

This article walks through the challenges of traditional traffic replay, explains the design of a conventional replay system, and then details a novel LLM‑powered solution that automates data preparation, script generation, validation, and continuous integration for backend service testing.

Sohu Tech Products

Jul 16, 2025

How LLMs Transform Traffic Replay Testing for Backend Services

Preface

Many developers have heard of traffic replay but find it harder to implement than most engineering tasks because it heavily depends on internal backend services, environment conditions, and architecture.

Purpose of the Traffic Replay System

Traffic replay records real online requests and replays them in an offline environment to verify that interfaces still function correctly. It ensures authenticity (real user agents, user profiles, and request diversity), provides data reference for test script writing, supports large‑scale regression, and boosts testing confidence.

Traditional Traffic Replay System

The team built a traditional system that records traffic via Nginx, writes logs to Kafka, extracts and flattens JSON responses into a response_shape, stores data, and runs replay using Celery tasks. The process includes test case collection, de‑duplication of response shapes, request headers, and parameters, followed by strict and fuzzy matching of responses.

Key steps:

Collect test case set from the testing platform.

De‑duplicate response shapes, request headers, and parameters.

Execute requests and compare responses.

Code example of the test case template:

{
  "host": "xxxx",
  "request_path": "/a/b/c",
  "request_headers": ["..."],
  "request_params": ["..."],
  "request_method": "POST",
  "response_shape": ["data.user,data.name,data.age"]
}

LLM‑Based Traffic Replay System

The traditional approach suffers from two major problems: inability to achieve both precise and generic validation, and difficulty handling stateful interfaces. To address these, the team introduced large language models (LLMs) to generate test scripts, perform intelligent de‑duplication, and assist in error analysis.

Data preparation stage extracts unique response shapes from Elasticsearch, joins them with request data from MySQL, de‑duplicates tokens and parameters, and randomly supplements a small number of records (5‑6) to keep AI prompts stable.

Data storage stage aggregates daily data, feeds it to an AI workflow (DIFY) to produce test scripts, and stores the scripts with metadata indicating whether they are new, updated, or need human review.

Replay stage binds test cases to the front‑end testing platform, executes scripts, collects logs, and uses LLMs to summarize errors and generate alerts.

Future Plans

Current outputs include integration with the DevOps pipeline, coverage of 257 backend interfaces with 583 generated scripts, and ongoing challenges such as increasing review workload, three‑day data gaps, and instability due to non‑idempotent interfaces.

The team aims to reduce manual review time, shorten data gaps, and improve stability of replay results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

traffic replay AI integration service testing Backend automation LLM testing

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.