Operations 14 min read

Mastering Full‑Link Stress Testing and Stability Assurance for Large‑Scale Promotions

This guide details a comprehensive approach to stability assurance and test innovation, covering full‑link stress testing, functional pre‑runs, loss‑prevention, fault drills, efficiency innovation from zero to one, and systematic quality assurance thinking for large‑scale promotional events.

Software Development Quality

Jul 14, 2022

Mastering Full‑Link Stress Testing and Stability Assurance for Large‑Scale Promotions

3. Thinking Chapter – Stability Assurance & Test Innovation

3.1 Stability Assurance & Promotion Guarantee

Three‑dimensional design of guarantee

Horizontal: generic platform testing solutions such as transaction middleware, search middleware; Industry: business‑specific testing solutions focusing on loss prevention, contingency plans, strong/weak dependencies, full‑link pressure testing, etc.

Test technology innovation

We tolerate some overlap, but aim to avoid manual errors through efficient, technology‑driven solutions; years of test‑tech development have built rich promotion quality‑assurance practices.

3.1.1 Full‑Link Stress Test

Full‑link stress testing has been the core weapon for promotion quality assurance, evolving into a regular, international, service‑oriented system.

3.1.1.1 Overview

Full‑link stress testing incorporates the entire system (frontend, backend, middleware, DB) into the test scope, using HTTP requests to simulate massive real‑world traffic, identifying bottlenecks and validating capacity.

Business architecture

Basic principles

Entry: frontend HTTP requests such as detail page, confirm order page, submit order, etc.
Medium: eagleeye as the main line, transmitting full‑link markers via HSF, tddl, notify, etc.
Endpoint: shadow tables; when traffic reaches the storage layer, tddl routes data with the full‑link marker to shadow tables named __test_<original_table_name>.

Stress marker transmission principle

HTTP request adds eagleeye marker (tb_eagleeyex_t=1); tbsession embeds the marker into eagleeye (t=1); middleware propagates the t=1 marker; business logic uses the marker to decide whether to follow full‑link logic; at the DB layer, tddl checks the marker—if t=1, route to shadow table, otherwise to the official table.
Login and non‑login flows are distinguished; logged‑in users pre‑heat via collective login, carrying the marking cookie.

3.1.1.2 Terminology

Shadow table : stores full‑link test data, resides in the same database as the official table, named __test_ .

Full‑link test marker : tb_eagleeyex_t=1 (eagleeye context t=1).

Full‑link function marker : tb_eagleeyex_f=1 (sub‑marker of t, used for functional checks without bypassing).

Traffic construction interface : provides APIs to build complex HTTP requests with dynamic parameters (e.g., product, buyer, seller, SKU, address).

Single‑link HTTP request standards :

Executable URL.

Clear condition flags (eagleeye marker, login requirement, GET/POST).

URL count ≥ 2× test volume.

Provision timing based on URL count (e.g., <5 URLs: 0.5 day ahead, 5‑10 URLs: 1 day ahead, etc.).

3.1.2 Functional Pre‑run

Annual full‑scale rehearsals involve product, operations, development, testing, and CCO to validate core promotion scenarios, improving efficiency and bug detection.

3.1.3.1 Module Architecture & Platform

Pre‑run organization management: case entry, task allocation, data construction, account assignment, business settings, progress dashboard.

Pre‑run execution guidance: automatic task and account claiming, business execution, issue tracking.

Pre‑run issue investigation: reproduce and analyze bugs, auto‑populate additional bug information.

3.1.3 Pre‑plan Special

Each BU has a plan; incorrect plans pose high risk. Strict execution discipline, permission controls, and AB‑role mechanisms are essential.

3.1.4 Loss Prevention

3.1.4.1 How to Prevent Loss During Promotions?

Understand promotion goals, set loss‑prevention targets, coordinate with domain owners (time plan, loss target, discipline).

Review historical loss issues and remediation progress.

Define loss‑point templates, review with central loss lead, guide domain owners.

Test loss points, complete monitoring items, add loss cases to the dedicated test suite.

Map core loss chains, review, track pre‑run loss issues, assess risk, sync with promotion team.

On‑call plan, monitor loss alerts (e.g., BCP), record and evaluate risks.

Post‑promotion review: analyze issues of the day and propose improvements.

3.1.5 Fault Drills

Goal: improve system, process, and personnel response to incidents for rapid detection, containment, and recovery, enhancing overall robustness.

3.1.5.2 Drill Process Standards

3.1.5.3 Attack‑Defense Drill Example

Preparation methods

1. Analyze link‑monitoring relationship, design drill scenarios.

2. Simulate injection in pre‑release environment, ensure fault triggers.

3. Validate in production‑grade environment, ensure monitoring alarms fire.

4. Archive scenario in the MK platform.

Traffic simulation: inject fault traffic, configure strategy.

Business owners must identify services that cause alerts; new emergency scenarios are defined.

Fault simulation steps in pre‑release and production environments.

3.1.5.5 Expected Recovery Actions

Recovery actions assess developer handling; blue‑team solutions must be accurate.

3.2 From 0‑1 to Efficiency Innovation

Efficiency stems from business‑driven testing pain points; the article traces an automated efficiency product’s evolution, abstracting solutions from problems, building a platform, and iterating through practice.

Step 1: Origin

Keywords: foresight, pressure, dreams, reality. Review interface testing 1.0 (high barrier), 2.0 (visual UI, still rigid), 3.0 (big‑data‑driven intelligent testing).

Step 2: Transformation

Keywords: efficiency goals, AI exploration, capability integration. Early products focus on tooling; later aim at broader user value and competitive differentiation.

Step 3: Breakthrough

When facing challenges, reconsider product essence, focus on value.

Market is broad, product differentiation exists.

Many user pain points remain; collaborate to solve technical challenges.

AI exploration is far; innovation drives competitiveness.

Core value: empower non‑TMF technical testers, solve quality and efficiency issues.

Competitiveness: continuous intelligent testing innovation.

Decision: further open collaboration.

Step 4: Deepening, Intelligence, Openness

3.3 Systematic Quality Assurance Thinking

Quality assurance is a precise discipline with its own knowledge system; systematic thinking expands all quality‑related factors into a closed loop.

Step 1: List all inputs – business background, complexity, technical maturity, team maturity, current quality.

Step 2: Integrate inputs, devise solutions, close loops.

Step 3: Deploy and schedule.

Step 4: Consolidate, reflect, optimize.

operations stress testing fault drills full‑link testing efficiency innovation

Written by

Software Development Quality

Discussions on software development quality, R&D efficiency, high availability, technical quality, quality systems, assurance, architecture design, tool platforms, test development, continuous delivery, continuous testing, etc. Contact me with any article questions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.