Seamless High‑Concurrency Database Migration for Massive Domestic Transformation
This talk outlines the strategic shift toward domestic, high‑performance databases, comparing shared‑everything, shared‑nothing, and shared‑storage architectures, and presents a seven‑step migration framework—including selection, testing, full and incremental sync, application refactoring, dual‑write or middleware switching, and post‑migration observability—to achieve seamless, low‑impact migration of massive, high‑concurrency workloads.
Background of Database Domestic Transformation
National strategies require autonomous, secure, and efficient core technologies, while rapid business growth pushes single‑node databases to their limits, necessitating vertical, horizontal, or functional sharding.
Mainstream Database Architectures
Industry‑standard architectures fall into three categories: Shared‑Everything, Shared‑Nothing, and Shared‑Storage.
Shared‑Everything : Classic single‑node design where CPU, memory, and I/O are shared; any hardware bottleneck becomes a database bottleneck.
Shared‑Nothing : Typically built on a proxy layer that distributes data across nodes; examples include TiDB and OceanBase.
Shared‑Storage : Uses a shared storage layer (e.g., AWS Aurora, Alibaba PolarDB) and usually runs only in the cloud.
Key Components of Shared‑Nothing Architecture
The architecture consists of three main parts:
GTM : Global transaction manager handling distributed transaction IDs and snapshots.
Proxy/Compute Nodes : Initially perform routing, later add capabilities such as distributed transaction optimization, push‑down computation, and SQL parsing.
Storage Nodes : Often built on open‑source databases like MySQL or PostgreSQL; stable but require careful sharding key selection during migration.
Challenges in Domestic Database Migration
Deep coupling between business logic and data layer makes migration painful. A practical seven‑step migration process is:
Selection → Testing → Synchronization → Refactoring → Gray Release → Launch → Guarantee
1. Selection
Focus on stability, efficiency, cost, and ecosystem. Technical factors are essential, but non‑technical considerations (vendor support, compliance) also influence the final choice.
2. Testing
Conduct functional, availability, maintainability, and performance tests. Combine offline tests with traffic replay to ensure the candidate meets real‑world demands.
3. Data Synchronization
Split into full‑data sync and incremental sync. Analyze data to identify historical, hot, and cold segments, then decide which parts can be migrated immediately and which can be deferred.
Incremental sync is implemented via a log‑listener that writes changes to a middleware such as Kafka; downstream services subscribe to these topics.
4. Application Refactoring
Key concerns include driver compatibility, SQL dialect differences, data object mapping, and API changes. On the database side, address sharding, hot‑cold separation, read/write splitting, and query optimization.
5. Switching Schemes
Two practical approaches are used:
Middleware‑Based Switch : Add a middleware layer between the application and the database, gradually migrate traffic to the middleware, then sync data to the new database and perform read‑write splitting transparently.
Dual‑Write (Application‑Level) Switch : Deploy the new database, perform full and incremental sync, enable simultaneous writes to both databases, verify consistency, then cut over to the new database as the primary.
Both schemes allow rollback within seconds by switching back the middleware target or toggling the dual‑write flag.
6. Post‑Launch Guarantee
Focus on observability (logging, tracing, metrics) and controllability (rapid fault detection, isolation, and recovery). Build monitoring for resources, business metrics, and call‑chain analysis.
Establish an incident‑response framework that classifies issues (minor, moderate, critical) and provides atomic runbooks (e.g., SQL kill, master‑slave switch). Integrate AI‑driven recommendations to suggest appropriate runbooks.
Key Takeaways
Fit the problem : Centralized databases remain optimal for workloads under a few hundred gigabytes to a few terabytes.
No silver bullet : Use the right tool for each job—Redis for caching, ClickHouse for ad‑hoc analytics, etc.—and synchronize data to a big‑data platform when needed.
Break down operational silos : Align technology with business, and adopt a holistic view of operations to achieve efficient, reliable service.
Thank you for listening.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
