Cloud Computing 19 min read

How a Chinese Telecom Payment Platform Mastered Cloud Migration in 8 Hours

This article details the end‑to‑end cloud migration of China Telecom's payment platform, covering pre‑migration challenges, architectural redesign, data‑sync strategies, the eight‑hour cut‑over process, post‑migration performance gains, and future DBaaS plans, all based on a 2017 DBAplus conference talk.

dbaplus Community
dbaplus Community
dbaplus Community
How a Chinese Telecom Payment Platform Mastered Cloud Migration in 8 Hours

Background

In 2016‑2017 China Telecom Group’s payment platform performed a large‑scale migration from a fragmented on‑premises environment to a cloud‑native architecture. The effort covered the entire six‑year history of the platform, including applications, middleware, and data services.

Pre‑migration Issues

Unclear service boundaries : Network, system, server, and hardware resources were not consistently defined, causing operational friction.

High database coupling : Each business application owned its own database instance, leading to resource waste and complex maintenance.

Excessive shared‑disk architecture : Expensive storage and complex cabling limited scalability and introduced single points of failure.

Non‑standard application configuration : Inconsistent connection‑pool implementations (JDBC vs. c3p0) and legacy service routing (e.g., F5) prevented unified management.

Preparation (≈8 months)

Define and enforce basic service boundaries for network, compute, storage, and connection‑pool resources.

Decouple database services: replace direct IP connections with internal DNS, vertically split business systems, and restrict each business to its own database and user.

Partial storage redesign: retain high‑end storage for latency‑sensitive core workloads; migrate non‑core services to open‑source solutions such as MongoDB and Redis.

Standardize middleware and service‑invocation patterns (e.g., Dubbo) across all applications.

Key Technical Solutions

Network Redesign

Adopted Cisco ACI to build a three‑tier network (data, management, business) with endpoint groups (EPG) for multi‑tenant isolation. The design separates the data plane, management plane, and business plane, enabling automatic policy enforcement and resource isolation.

Cross‑Region Data Synchronization

Two pipelines were implemented:

Core systems : Oracle Active Data Guard (ADG) with a four‑layer replication topology (primary → standby → compression → WAN link) guaranteeing zero data loss during cut‑over.

Non‑core workloads : A custom pipeline that captures MySQL binlog changes with Canal (extended for internal needs) and replicates them to MongoDB for unstructured data, while MySQL‑to‑MySQL sync handles relational data.

Open‑Source Stack

MySQL ecosystem: Canal for binlog capture, Sharding‑JDBC (replacing MyCAT) for transparent sharding.

Redis: Sentinel for high‑availability and Redis Cluster for horizontal scaling; a 9‑node, 27‑instance cluster supports high‑throughput risk‑control workloads.

MongoDB: Deployed for billing and other unstructured data; custom management module provides one‑click scaling.

Monitoring & Automation

Extended Zabbix with automatic service discovery and template assignment based on VM tags and DNS records.

Built a DBaaS provisioning portal powered by Ansible; the portal translates asset requests into Ansible playbooks that provision VMs, install database software, and apply configuration templates.

Containerized and virtualized database services where feasible, laying the groundwork for future Kubernetes adoption.

Migration Execution (Eight‑Hour Cut‑over)

After four rehearsal runs, the production cut‑over was performed at the end of October 2016. Key metrics:

Data volume: ~150 TB across 94 database instances.

Cut‑over duration: 2.5 hours of live data transfer.

Validation: Proprietary checksum algorithm executed 1,300+ compliance checks with zero data loss.

Incident: A core user‑account database failed to start due to a bug in the automation platform; the issue was resolved manually within 30 minutes.

Post‑migration Outcomes

Provisioning time reduced from days to minutes (VM pool in 15 min, scaling in 3‑5 min).

525‑day promotional events now handle traffic spikes without performance degradation.

Established a DBaaS layer on top of IaaS, with plans to further virtualize and containerize database services.

Achieved tenant, software, and resource isolation; implemented comprehensive backup and disaster‑recovery policies.

Future Directions

Expand the DBaaS offering, integrating cloud‑native data platforms (e.g., Kubernetes‑based operators).

Refine MySQL high‑availability mechanisms, moving from ADG to multi‑master or distributed solutions.

Replace legacy components such as MyCAT with maintainable alternatives (continue Sharding‑JDBC development).

Automate full lifecycle management of Redis and MongoDB clusters, including seamless scaling and rolling upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud migrationOperationsInfrastructureDBaaS
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.