How to Seamlessly Migrate Elasticsearch from Cloud to On‑Premises Without Downtime
This article walks through a practical, step‑by‑step migration of an Elasticsearch cluster from a public‑cloud environment to a self‑hosted data‑center, covering strategy, configuration changes, node role separation, manual data transfer, and post‑migration re‑enabling of automatic balancing to ensure a smooth, low‑impact transition.
Preface
Elasticsearch automatically balances shard load when nodes join or leave a cluster. The author, an Elastic‑Stack power user and ES‑certified engineer, shares a real‑world migration from a public‑cloud Elasticsearch cluster to a self‑built data‑center while keeping services available.
Background
The existing big‑data platform runs on a public cloud; Elasticsearch serves most external queries and some real‑time compute. Business requirements demand moving the cluster to an on‑premises environment without degrading user experience.
Custom API services tightly coupled with the Elasticsearch cluster.
Full migration of the Elasticsearch cluster, including data and nodes.
Migration Strategy
The migration emphasizes stability over speed and follows these high‑level actions:
Disable automatic shard rebalancing.
Start new on‑premises nodes and join them to the existing cluster.
Manually move data to the new nodes.
Switch external traffic to the new nodes.
Shut down the public‑cloud nodes.
Re‑enable automatic balancing.
Migration Steps
All steps must be performed in strict order to avoid cluster turbulence.
1. Original Cluster Architecture
Separate master‑eligible nodes from data nodes to avoid performance bottlenecks.
# Master node settings
node.master: true
node.data: false
# Data node settings
node.master: false
node.data: true2. Configure New Cluster
New nodes also separate master and data roles. Hosts point to both old and new master nodes.
# Master discovery
discovery.zen.ping.unicast.hosts: ["old_master_ip:port", "new_master_ip:port"]Allocate CPU resources per instance on physical servers:
# Processors per instance
processors: <= (CPU_cores / instance_count)3. Disable Cluster Auto‑Balancing
# Disable new index allocation
cluster.routing.allocation.enable: false
# Disable shard rebalance
cluster.routing.rebalance.enable: false4. Start New Data Nodes
After disabling balancing, safely start all new data nodes. Assign custom attributes for later operations.
# Node attributes
node.attr.rack: rack1
node.attr.zone: zone1
node.attr.disk: ssd15. Switch Cluster Access
Three consumer groups need updating:
Hadoop‑ES connector: recreate Hive‑ES mapping tables with new IPs.
Custom API services: point proxy to new data nodes.
Real‑time compute (Kafka): start consumers on new nodes, then stop old ones.
6. Manual Data Transfer
Reasons for manual transfer:
Limited cross‑segment bandwidth.
Large indices (hundreds of GB) would overload I/O.
Co‑existence of old and new nodes would cause excessive network traffic.
Prioritise small, offline, or low‑query‑frequency indices and control parallelism to stay within bandwidth limits.
# Restrict index allocation to new nodes
"index.routing.allocation.include._ip": "new_node_ip1,new_node_ip2"
# Restrict old indices to old nodes
"index.routing.allocation.include._ip": "old_node_ip1,old_node_ip2"7. Shut Down Old Data Nodes
Gradually power off old data nodes after confirming data migration and traffic switch.
8. Start New Master Nodes
Activate new master‑eligible nodes one by one, retiring old standby masters to avoid split‑brain scenarios.
9. Re‑enable Auto‑Balancing
# Re‑enable allocation and rebalance
cluster.routing.allocation.enable: true
cluster.routing.rebalance.enable: trueKey Elasticsearch Concepts Used
Cluster elasticity : Nodes can join or leave without service interruption.
Master election : Only one master‑eligible node is active; others are standby.
Node roles : Master, Data, Ingest, Coordinating, Voting, Machine Learning.
Routing : Requests are forwarded to the shard’s residing node.
Shards and replicas : Enable granular migration and reduce I/O load.
Conclusion
After migration, the on‑premises cluster delivered significantly higher throughput, multiple‑times faster parallel writes, and reduced query‑write interference. The experience shows that while Elastic’s documentation is thorough, practical migration requires hands‑on experimentation, deep understanding of shard allocation, and careful sequencing of operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
