How to Migrate to AntDB with a Six‑Step Progressive Strategy and Optimize Key Parameters
This article outlines a six‑stage, progressive migration plan for replacing legacy databases with AntDB, covering deployment, extensive testing, SQL refactoring, data migration, go‑live procedures, and monitoring, while providing detailed tuning recommendations for dozens of critical database parameters.
Progressive Migration Plan
In response to the national push for self‑reliant technology, the authors' province accelerated the replacement of commercial databases with domestic alternatives, using AntDB 6.3.9 on BC Linux For Euler 21.10. A primary‑master‑two‑standby architecture was deployed, with one replica in a separate data center to ensure high availability.
Phase 1 – Deployment
AntDB 6.3.9 was installed on the specified OS, using a one‑master‑two‑standby topology. One standby node resides in a different rack to mitigate environment‑related failures.
Phase 2 – Testing
A comprehensive test suite covering 12 categories and 67 items evaluated functional correctness, compatibility, performance, stress, high‑availability, scalability, disaster‑recovery, operations, ecosystem tools, and security.
Machine configuration:
Stress Test Results
With the tested hardware, the workload reached a peak of 400 000 TPCC orders when the process count hit 512, after which CPU and memory usage continued to rise while I/O saturated, causing performance to degrade. CPU context switches increased sharply, indicating the test limit was reached.
Parameter Recommendations
shared_buffers – Set to 40 % of physical memory (e.g., shared_buffers = '200GB') to allocate sufficient shared memory for AntDB.
effective_cache_size – Approx. 70 % of RAM (e.g., effective_cache_size = '400GB'); influences the optimizer’s estimate of available cache.
max_wal_size – Increase to accommodate larger WAL bursts (e.g., max_wal_size = '64GB').
checkpoint_completion_target – Set between 0.5‑0.9 depending on I/O bandwidth (recommended 0.9) to smooth checkpoint writes.
checkpoint_timeout – Extend to reduce checkpoint frequency (recommended '15min').
wal_keep_size – Retain enough WAL segments for standby replication (recommended '512GB').
max_parallel_maintenance_workers – Limit to 8 workers (recommended '8').
max_parallel_workers – Up to 128, but not exceeding half the CPU cores (recommended '128').
max_parallel_workers_per_gather – Default 2 is sufficient for most workloads (recommended '2').
maintenance_work_mem – Set to 4 GB to speed up VACUUM/CREATE INDEX operations (recommended '4GB').
bgwriter_delay – Reduce to 10 ms on fast I/O systems (recommended '10ms').
autovacuum_max_workers – Use 5 workers for balanced cleaning (recommended '5').
autovacuum_analyze_scale_factor – 0.01 for timely statistics (recommended '0.01').
autovacuum_analyze_threshold – 1000 rows (recommended '1000').
autovacuum_naptime – 1 min for small‑table environments (recommended '1min').
autovacuum_vacuum_cost_delay – 10 ms to balance I/O load (recommended '10ms').
autovacuum_vacuum_cost_limit – 4000 for SSDs, higher than HDDs (recommended '4000').
autovacuum_vacuum_scale_factor – 0.01 to avoid excessive cleaning (recommended '0.01').
autovacuum_vacuum_threshold – 1000 rows (recommended '1000').
work_mem – 16 MB per sort/hash operation (recommended '16MB').
High‑Availability Testing
Various HA scenarios were exercised, confirming failover and recovery capabilities.
Business Testing
Tests were run on BigCloud Enterprise Linux For Euler 21.10 LTS with ESB workloads. Success rate reached 99.99 % and latency stayed within 15 ms; a few timeouts were due to upstream service limits, not migration issues.
Phase 3 – Refactoring
SQL code and stored procedures were adapted to AntDB syntax while preserving existing behavior. Compatibility tests produced an impact assessment report, guiding the refactoring effort according to internal development standards.
Phase 4 – Migration
A full‑load (MTK) plus incremental (DSG) approach ensured strict data consistency. After the full load, DSG captured changes, parsed and stored them locally, then replayed logs to the target cluster, using primary keys to guarantee uniqueness.
Phase 5 – Go‑Live
A dual‑write mechanism synchronized the old and new databases. Traffic was shifted gradually using F5 load balancers, with staged roll‑outs, node upgrades, and rolling replacements. After a stabilization period, the dual‑write was disabled and the legacy system decommissioned.
Phase 6 – Monitoring
A unified database operations platform was built to ingest AntDB, Oracle, GoldenDB, MySQL, GaussDB, OceanBase, etc. The platform provides performance dashboards, load analysis, and alerts. Key AntDB monitoring points are illustrated below.
Typical Issues and Solutions
1. Merge syntax not supported
AntDB does not accept the MERGE keyword. Rewrite statements using INSERT … ON CONFLICT DO ….
2. Long‑lived connections consume memory
Sessions cache metadata for all accessed objects, leading to high memory usage. Recommended actions: automatically release idle connections and provide a SQL command for users to clear the cache; implement LRU eviction for relcache.
3. High buff/cache usage
Oversized effective_cache_size inflates OS memory consumption. Adjust to 50‑75 % of total RAM.
4. MTK timeout (org.h2jdbc.JdbcSQLTimeoutException)
Too many Oracle tables cause MTK scans to stall, leading to deadlocks. Replace the MTK package with a newer antcdc-db-sync.tar version and restart services.
5. Multi‑instance MTK installation issue
Permission errors on /tmp/vertx‑cache when running under different users. Disable the unnecessary kafkasql process for full‑load migrations.
6. Invalid UTF‑8 byte sequence (0x00)
AntDB‑T rejects null bytes. Either change column type to bytea or enable replace.zero.char=true in config/application.properties to strip nulls.
7. Oracle 19c “ORA‑00942” during MTK migration
Insufficient privileges to query dictionary views. Temporary workaround: grant DBA rights; long‑term solution: coordinate with vendor to provide a minimal privilege list.
8. Pip3 list “no module pip._internal” error
Occurs after installing Etcd/Patroni for AntDB HA. Ensure the Python environment is correctly configured and the required modules are installed.
9. Security baseline – umask configuration
Verify that /etc/profile ends with umask 027 to meet compliance; adjust as needed.
Conclusion
Domestic databases have been successfully deployed across multiple business systems without any rollback. The migration process proved reliable under production load, confirming that the new databases meet core operational requirements and reinforcing confidence for future migrations. The experience also yielded a systematic, repeatable migration methodology for subsequent projects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
