Operations 13 min read

How Red‑Blue Drills Boost Securities Ops: From Capacity Testing to Full‑Scale Automation

Lin Ying, a senior test manager at Guoxin Securities, shares insights from his GOPS 2021 talk on the securities industry's digital transformation, current IT challenges, and a comprehensive red‑blue exercise strategy that combines full‑link load testing, automated workflows, and proactive monitoring to ensure system stability during market peaks.

Efficient Ops
Efficient Ops
Efficient Ops
How Red‑Blue Drills Boost Securities Ops: From Capacity Testing to Full‑Scale Automation

1. Digital Transformation of Securities

In recent years, the securities industry has undergone a digital transformation driven by internet disruption and regulatory pressure, moving toward self‑service, intelligent, and agile solutions that improve convenience, efficiency, and personalized wealth management.

2. IT Landscape in Securities

Regulators and users demand high stability; a five‑minute outage can trigger severe penalties. Market volatility creates unpredictable load spikes, capacity awareness is limited, and complex architectures make fault isolation difficult, prompting a shift to a dual‑state IT system.

3. Red‑Blue Exercise for Application Operations

Red‑blue drills simulate peak market traffic to uncover maximum capacity and bottlenecks, while the blue side builds comprehensive monitoring and fault‑location tools to improve emergency response.

Red Team Design

The design focuses on automated, platform‑based, and standardized load‑testing processes, using full‑link traffic replay derived from big‑data user distribution analysis. The solution consists of four parts:

Data construction using a big‑data platform to model user distribution.

Test cluster with two‑site three‑center deployment supporting dynamic horizontal scaling.

Application flow covering access layer, channel, application, middleware, and backend.

Monitoring system built on Prometheus and big‑data for full‑stack visibility.

Security measures include hardware whitelist and configurable application‑layer interception. Automation links environment preparation, recovery, and hardware checks.

Data for load generation is extracted from production, desensitized, and used to build realistic traffic models.

Tool selection criteria: high concurrency, multi‑protocol support (HTTPS, HTTP, private TCP), extensibility, low learning curve, and open‑source cost‑effectiveness. Deployment follows a two‑site three‑center architecture with container‑based dynamic scaling.

During testing, mechanisms such as traffic tagging, middleware adaptation, and shadow databases isolate test data from production.

Blue Team Design

The blue side focuses on fault prevention, detection, localization, and remediation through lifecycle capacity management, continuous baseline updates, and alerting.

Monitoring spans four layers: data collection, processing, big‑data aggregation, and event handling, enabling rapid incident response and post‑mortem analysis.

Fault localization proceeds from system‑level metrics to component‑level details and finally to individual user transactions.

When unexpected traffic surges occur, strategies include auto‑scaling, cost reduction, traffic limiting via custom frameworks, and rapid service degradation.

4. Outcomes at Guoxin Securities

Full‑link load testing is used in semi‑annual production drills and monthly disaster‑recovery exercises, covering over 10 application systems, hundreds of components, and thousands of hosts. System tuning increased primary data‑center throughput threefold, and new data centers achieved 4‑5× performance gains, successfully handling the 2020 market peak.

The capacity awareness journey progressed through four stages: ignorance, desperation, enlightenment, and stable plateau, culminating in proactive capacity baselines, automated testing pipelines, and refined fault‑location expertise.

monitoringoperationsDevOpscapacity testingred-blue exercise
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.