Operations 6 min read

elasticdumpWeb: A New Web Tool for Cross‑Cluster Elasticsearch to Easysearch Index Migration

elasticdumpWeb is a web‑based utility that automates cross‑cluster index migration between Elasticsearch (including version 9.0.0) and Easysearch, handling version compatibility, field type conversion, data validation, performance tuning, and SSL issues while providing visual progress and detailed reports.

Mingyi World Elasticsearch
Mingyi World Elasticsearch
Mingyi World Elasticsearch
elasticdumpWeb: A New Web Tool for Cross‑Cluster Elasticsearch to Easysearch Index Migration

elasticdumpWeb Overview

elasticdumpWeb is a web‑based tool for migrating indexes between Elasticsearch and Easysearch clusters, supporting migrations from Elasticsearch 9.0.0 to various Easysearch versions.

Core Technical Details

Version Compatibility Handling

Elasticsearch 9.0.0 introduces settings absent in earlier versions (e.g., 7.10.2). elasticdumpWeb maintains a blacklist of incompatible configurations and automatically filters system metadata such as index.creation_date, index.uuid, index.version.created, and index.default_pipeline. Tests show business‑related settings remain compatible; only system‑level metadata requires removal.

Data Migration Implementation

The tool combines the scroll API with the bulk API. The scroll API performs a rolling query on the source cluster to fetch batches of documents; the bulk API writes those batches to the target cluster, improving transfer efficiency. An optimal batch_size is between 1000 and 5000; smaller sizes increase network overhead, larger sizes consume excessive memory. The process loops until all data are transferred.

Field Type Conversion

To bridge version differences, unsupported types are downgraded. The wildcard type in ES 9.0.0 is converted to keyword for ES 7.10.2, and the version type is also mapped to keyword. A type‑mapping table automates these conversions.

Data Validation Mechanism

Two‑layer verification is used. First, document counts of source and target clusters must match exactly. Second, 100 documents are randomly sampled and their _source fields compared. Validation must be performed after Elasticsearch refresh to avoid false negatives.

Performance Optimizations

Memory usage is controlled by setting an appropriate batch_size, using generator mode, and promptly clearing scroll contexts. Network performance is improved by enabling HTTP compression, configuring sensible timeouts, and adding retry mechanisms. Empirical tests indicate that network quality is the dominant factor; a well‑provisioned LAN can achieve tens of thousands of documents per second.

SSL Certificate Handling

When clusters use self‑signed certificates, the Python client validates them by default. The tool sets verify_certs=False in the connection configuration, which reduces security but is acceptable in internal networks; production environments should use CA‑signed certificates.

Results and Future Work

Internal testing migrated dozens of indexes containing millions of documents without data loss or errors. Planned extensions include incremental migration, scheduled and real‑time sync modes, broader version support, and multi‑threaded concurrent migration to further boost performance. The source code will be open‑sourced.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Elasticsearchdata validationbulk APIscroll APIEasysearchelasticdumpWebindex migration
Mingyi World Elasticsearch
Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.