Databases 18 min read

How SF Express Transformed Its Database Operations: From Legacy to Open‑Source, Distributed, and Intelligent Ops

This talk details SF Express’s journey from heterogeneous legacy databases to standardized open‑source, distributed architectures and intelligent operations, covering standardization, migration to open‑source, scaling with Mycat, automated resource pooling, and the ThinkDB platform that drives proactive, automated DBA workflows.

Efficient Ops
Efficient Ops
Efficient Ops
How SF Express Transformed Its Database Operations: From Legacy to Open‑Source, Distributed, and Intelligent Ops

Preface

Today’s theme is “Carrying Heavy Loads Forward – The Path of Change in SF Express’s Database Operations”. The company’s rapid growth and diversification (express, cold‑chain, warehousing, finance) have multiplied data instances, demanding a series of technical transformations.

1. From Non‑Standard to Standard

1.1 The chaotic early years

Multiple database types fragmented DBA work and ignored user needs.

1.2 Moving toward standards

We reduced DB varieties, chose a commercial DB as the standard, unified HA and disaster‑recovery, monitored infrastructure health, and built a closed‑loop proactive prevention process that freed DBAs for higher‑value tasks.

First, we cut DB types and selected the strongest commercial DB as the standard.

Second, we unified HA and disaster‑recovery to avoid custom solutions causing operational errors.

Third, we monitored basic resources (storage, network, hosts) and addressed issues promptly.

Fourth, we established a proactive prevention loop, improving stability and allowing DBAs to focus on meaningful work.

2. From Traditional to Open‑Source

2.1 De‑commercializing

Top‑down decision to adopt open‑source; pilot on the core warehouse system, relying mainly on internal talent rather than external MySQL experts.

Key database fundamentals: transaction control and indexing.

We kept business logic out of the DB, used vertical data source splitting, and employed SAN for HA while developing a non‑SAN automatic failover solution.

2.2 Operations‑oriented development

We built a dedicated ops‑dev team, blending experienced engineers with fresh graduates, aligning ops rigor with dev creativity, and establishing separate but collaborative teams for configuration, monitoring, capacity, hardware‑software labs, interaction, disaster recovery, and restoration.

3. From Centralized to Distributed

3.1 Scaling limits of single instances

MySQL TPS caps around 5 000 in production; vertical splitting reached limits, leading to regional data partitioning and increased operational complexity.

3.2 Mycat as a proxy

After extensive testing, we adopted Mycat as the middleware, extending it with an SQL firewall, large‑data aggregation, and performance tweaks; the order‑processing system now handles >200 k TPS during peaks.

3.3 Large‑data aggregation

Moved massive sorting from heap to external storage, enabling aggregation of billions of rows without OOM.

3.4 SQL firewall

Embedded firewall enforces coding standards, blocks missing indexes, and captures problematic SQL in development environments.

4. Intelligent Operations

4.1 Collaborative platform

ThinkDB provides configuration discovery, real‑time monitoring, capacity forecasting, hardware‑software labs, self‑service portals, automated disaster‑recovery, and point‑in‑time restoration.

4.2 Automation of HA and resource pools

Using MGR, semi‑sync replication, and dual‑heartbeat checks, we achieve automatic failover; resource pools balance instances based on usage thresholds and peak forecasts.

Resource‑pool logic controls Pctfree/Pctused limits, automatically relocating instances when thresholds are exceeded and integrating capacity predictions for peak demand.

4.3 SQL quality control

Mycat’s firewall blocks bad SQL in dev, release pipelines include code review, and production scores drive continuous optimization across business systems.

4.4 Operational insights

By treating upstream issues as our own, we reduced annual incidents from 20 to zero over three years, showing that solid foundational ops enable higher‑level innovation.

distributed systemsautomationoperationsDatabaseopen-sourceintelligent opsMycat
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.