How a Midnight Migration Saved Millions: Lessons in Problem‑Solving for Developers
A senior engineer recounts a high‑pressure, overnight data‑migration from an overloaded legacy platform to a new micro‑service system, detailing the technical challenges, rapid troubleshooting, multithreaded workarounds, and the broader lessons on what truly makes a programmer great.
1. The Core Skill: Solving Problems
When a reader asked what makes a programmer truly strong, the author answered simply: the ability to solve problems. He illustrates this with a story about two developers debating how to check network connectivity between servers, ultimately writing a Java‑based ping tool.
2. A Late‑Night Technical Story
Old platform vs. new platform – The company ran an aging Oracle‑based system that was designed for 1‑2 billion daily transactions but now handled 40 billion. After years of incremental optimizations, the architecture could no longer scale, prompting the development of a new micro‑service platform built on MySQL HA and hundreds of services.
The migration had to be seamless, likened to changing a car’s wheels while driving at highway speed. A sudden policy change forced the team to accelerate the migration schedule.
3. Midnight Migration Execution
The team prepared a migration tool that could move merchants from the old to the new platform in batches. On New Year’s Eve, they planned to migrate the remaining millions of merchants in a single, uninterrupted window.
After extensive pre‑testing, the migration began at 1 am. Initial batches of agents migrated successfully, but the speed slowed dramatically – only 100 k merchants per half‑hour, threatening a multi‑day operation.
Realising the urgency, the team analyzed logs and discovered that the migration program processed agents sequentially, despite each agent’s internal work being multithreaded. The main loop lacked parallelism.
4. Rapid Remedy: Manual Multithreading
Instead of rewriting code, the engineers opened multiple browser windows, each invoking the migration servlet with a different agent ID. Because each HTTP request runs in its own servlet thread, this achieved concurrent agent migration without code changes.
Testing with a few agents succeeded, but scaling to dozens introduced occasional errors due to shared mutable state. The root cause was identified as non‑thread‑safe global variables in the servlet.
Fixing this involved wrapping the shared data in ThreadLocal, giving each thread its own isolated copy.
5. Scaling Across Servers
When more than six concurrent servlet threads strained a single Tomcat instance, the team deployed the migration UI on ten separate servers. Each server handled a subset of agents, and the engineers manually entered agent groups into the web pages, effectively distributing the load.
Within two hours the migration of all agents completed, and by 6 am the new platform was fully operational. Subsequent monitoring showed only minor, non‑critical issues.
6. Reflections on What Makes a Great Programmer
The author emphasizes that technical knowledge alone is insufficient; the ability to stay calm under pressure, analyze logs, devise quick workarounds, and execute reliably distinguishes top engineers. Practicing problem‑solving in real incidents, documenting lessons, and continuously refining one’s approach are essential for growth.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
