Operations 14 min read

Building High‑Availability Systems in Securities: Practices and Tech Choices

This article examines the unique regulatory and operational characteristics of the securities industry and explains how careful technology selection—ranging from native database log replay to third‑party replication and big‑data platforms—enables robust high‑availability architectures, illustrated with real‑world practices from Dongfang Securities.

Efficient Ops
Efficient Ops
Efficient Ops
Building High‑Availability Systems in Securities: Practices and Tech Choices

Reliable systems are the foundation for stable and fast‑growing business, and high availability is a strong guarantee for normal operation. The securities industry has its own characteristics and rules, requiring both accurate technology selection and compliance with business realities.

1. Industry Attributes of the Securities Sector

The sector is under strict supervision by the China Securities Regulatory Commission, which imposes rigorous, comprehensive regulatory requirements. The regulator issues numerous IT management measures and industry standards, mandating business continuity, disaster‑recovery capabilities, regular drills, backup metrics, and periodic audits.

Beyond regulation, rapid market standardization and innovation pressure securities firms to add many new systems—over 50 per year at Dongfang Securities—raising security and technical demands on operations. Key challenges include creating high‑availability capabilities that meet these pressures.

Specific operational characteristics of securities information systems are:

First, a long change window The 24/7 nature of trading leaves weekdays and a 48‑hour weekend window, allowing relatively long periods for system changes.

Second, frequent testing directly in production Long change windows enable frequent testing, often using the production environment. Exchanges require members to test new business in production, leading to many roll‑backs that stress high‑availability systems. Dongfang Securities rebuilds its high‑availability infrastructure hundreds of times a year, far more than banks.

Third, staff reduction With new technologies, operations have shifted from dispersed to centralized and intelligent models, increasing skill requirements and driving further staff consolidation.

2. High‑Availability Technology Selection

Given these traits, technology choices must be made carefully. Most systems consist of an access layer, application layer, data layer, and hardware layer. All layers except the data layer are stateless, making high‑availability straightforward through redundancy. The data layer stores state and requires specialized high‑availability solutions.

Four main streams of data‑layer high‑availability technologies are identified:

1. Native database log replay – Databases write logs before committing transactions; replicating these logs to a standby system provides high availability. Advantages: built‑in, accurate, efficient. Limitation: only works for homogeneous databases.

2. Third‑party log parsing and replay – Tools like i2Active capture binary logs via APIs, parse them, re‑package, and replay to target systems (e.g., databases, files, Kafka). Advantages: supports heterogeneous databases and multiple target types. Limitation: accuracy depends on vendor parsing, and the extra processing reduces efficiency.

3. Third‑party file‑system synchronization – Early solutions used hardware‑level storage replication. Modern software like i2COOPY simulates hardware replication by capturing I/O at the OS level and transmitting it over the network, providing high availability for both applications and databases without being tied to specific hardware.

4. Integrated big‑data platforms – Solutions such as Hadoop, MPP data warehouses, and NewSQL distributed databases offer built‑in high‑availability as part of a larger ecosystem, though they can be costly and heavyweight.

3. Scenario Applications

Technology choices must match concrete scenarios. In the securities industry, high‑availability scenarios fall into two categories: many simple systems that together create complexity, and tightly coupled multi‑system business flows.

From an operations perspective, we re‑classify the four technical streams into two groups: "white stone" solutions that require ongoing maintenance (e.g., database replication) and "black stone" solutions that are maintenance‑free after deployment (e.g., file‑system copy). A balanced architecture uses both, prioritizing business requirements.

Principles:

Combine white‑stone and black‑stone approaches to avoid single points of failure.

Let business needs dictate the proportion of each.

Example: an over‑the‑counter trading platform consists of a main database plus separate account and transaction subsystems. Each subsystem requires read/write separation, leading to a "five‑white‑one‑black" high‑availability design. After optimization, it was reduced to a "three‑white‑three‑black" architecture, significantly lowering operational effort.

Black‑stone solutions can also reduce resource waste by consolidating storage snapshots for testing. Instead of a multi‑hour backup‑restore‑desensitization cycle, a snapshot of existing black‑stone data can be mounted directly on a desensitization server, cutting delivery time from 7‑8 hours to about one hour and enabling dynamic disaster‑recovery resource allocation.

4. From Simplicity to Complexity and Back

High‑availability operations have evolved through four stages: manual command execution, standardization via scripts, automation through orchestration platforms, and finally intelligent operations driven by data analysis.

Automation relies on scheduled or event‑driven workflows, while intelligent operations aim to reduce human intervention by leveraging unified monitoring (Zabbix) and operational big‑data analytics platforms to enable automated decision‑making.

In summary, even the most complex high‑availability systems are built from simple, well‑designed components. Effective planning involves decomposing, aggregating, and iteratively refining these components to achieve a robust yet manageable architecture.

System ArchitectureoperationsHigh AvailabilityDatabase Replicationsecurities industry
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.