Operations 14 min read

How We Overcame Real‑World Challenges in a Large‑Scale Oracle Database Cutover

This article recounts a seven‑year‑old Oracle 10g database migration, detailing project background, team turmoil, topology redesign, security constraints, data‑sync strategies, custom tools, high‑fidelity testing, unexpected failures, and the lessons learned for reliable operations.

Efficient Ops
Efficient Ops
Efficient Ops
How We Overcame Real‑World Challenges in a Large‑Scale Oracle Database Cutover

Overview

This sharing starts from a real database cutover case, describing project background, data‑sync solutions, tool development, simulation testing, and the psychological aspects of cutover, revealing behind‑the‑scenes stories of engineering practice.

Project Background

The enterprise support system had been running for over seven years, with a core Oracle 10g database on a small‑frame and a disk array that had never been touched. Rapid business growth, rising load, and aging hardware caused frequent failures, CPU idle rate dropping to 0% and heavy disk I/O, putting huge pressure on the operations team.

Main Difficulties

Difficulty 1: Team Turmoil

The original operations team left en masse, leaving the new team with no knowledge and high psychological and business pressure.

Difficulty 2: Topology Redesign

The original star topology (database + app server at the core, ~100 collection servers) required a shift to a separated internal‑external network.

Difficulty 3: Security Constraints

Upgrading Oracle from 10g to 11g introduced strict password policies (60‑day rotation, lockout after repeated failures), which conflicted with the need for uninterrupted service during migration.

Pre‑Migration Preparations

1. Strengthen Monitoring – Define key business and infrastructure metrics, simulate high‑frequency jobs to expose hidden issues.

2. Train New Staff – Pause migration to focus on onboarding and rebuilding the team.

3. Build a Global View

Redraw system architecture based on independent research.

Identify all stakeholders through extensive interviews.

Data Synchronization Strategy

Implemented with OGG + DBLINK + custom migration scripts.

Oracle GoldenGate

Initially planned to rely entirely on OGG, but limitations appeared during trials:

Massive historical data made real‑time consistency difficult.

Unpartitioned large tables and many unnecessary tables hindered extraction.

EXP/IMP

Example commands (shown as images):

DATABASE LINK

Provides a simple channel between old and new databases.

Custom Migration Program

For massive tables (e.g., 40 million rows per partition), data is sliced into 100 k‑row batches and pushed in parallel, keeping each commit small to allow quick retries and avoid undo tablespace explosion.

Tool Development & Testing

1. Forwarding Component

Daemon on a jump server listens on a port and forwards to the next hop; pseudo‑code illustrated in an image.

2. Cutover Tools

Automation of configuration collection, path monitoring, pre‑/post‑cutover checks, connection switch & rollback, and load testing.

3. High‑Fidelity Simulation

Dual‑database parallel ingestion on all collection servers simulated production concurrency, exposing performance bottlenecks before the actual cutover.

Unexpected Failures

The first cutover succeeded functionally but caused a dramatic drop in key business throughput due to a 100‑fold increase in connection latency. Investigation revealed many hanging processes and table locks, forcing a rollback.

Ghost Process

A forgotten monitoring script triggered Oracle 11g’s “failed‑login delay” policy, locking accounts and rendering the new database unusable.

Post‑Cutover Insights

Thorough configuration checks, simulated execution, and a well‑practiced rollback plan allowed rapid recovery.

Understanding every process in the system eliminated hidden “ghost” processes.

Conclusion

Although the migration was not perfect, several key takeaways emerged:

Knowledge Graph – Documentation alone is insufficient; practitioners need a holistic view of tools and their interactions.

Flexibility – Creative alternatives (network separation, permission negotiations, real‑time data copy) are essential.

Resilience – Persistence, willingness to confront unknowns, and acceptance of failure are critical for success.

operationsDevOpsData SynchronizationOracleDatabase Migrationcutover
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.