ECP (Elasticsearch Chain Planning) System: Design, Features, and Implementation for Efficient Index Management
The article introduces the ECP system, a backend platform built on Elasticsearch that standardizes, automates, and visualizes index refresh workflows, addressing manual bottlenecks, data cleaning challenges, and coupling issues while providing task management, permission control, and environment isolation for high‑efficiency index operations.
1. Business Background
Zhuanzhuan, a leading domestic circular‑economy company, uses a middle‑platform architecture where the middle platform provides generic transaction capabilities and the front‑end explores innovations. Their transaction middle platform includes services such as order, promotion, payment, each with dozens of Elasticsearch indices holding billions of records.
Rapid business growth made manual support for Elasticsearch (ES) requirements untenable, leading to the creation of the Elasticsearch Chain Planning (ECP) system.
2. Current Situation and Problems
2.1 Current Overview
Index rebuilding traditionally requires a lengthy 12‑step manual process, including identifying the index, editing templates, creating new indices, updating write handlers, configuring dual‑write, exporting IDs via Shell/Python scripts, uploading data, and finally switching aliases.
2.2 Existing Problems
Manual scripts for ID export face memory/disk limits; sandbox disables MySQL commands for security.
High cost and low efficiency of index rebuilding (5‑7 days for an order index).
No visibility into cleaning progress; cannot estimate completion time.
Lack of checkpoint‑resume; failures require manual intervention.
Mixing bulk cleaning and incremental data in the same queue can impact online services.
3. Solution Idea
Abstract Process Steps : Standardize, automate, and visualize the index refresh workflow to improve efficiency and accuracy.
System Empowerment : Provide task management features such as interruption recovery, progress visualization, QPS throttling, and heartbeat detection.
Isolation of Bulk and Incremental Data : Use tag‑based traffic routing to separate cleaning of historic data from live incremental data.
Permission Control and Data Consolidation : Integrate with the company’s unified permission system for managing data sources, scripts, ES clusters, templates, tasks, and operation logs.
4. Practical Reveal
4.1 What Is the ECP System?
ECP (Elasticsearch Chain Planning) is a platform for managing Elasticsearch data‑transfer chains, helping developers efficiently handle index creation, data cleaning, and index rebuilding tasks.
4.2 ECP System Functions
4.2.1 Task Management
Supports ES index creation, data cleaning, and index rebuilding tasks with modules for alias switching, dual‑write management, progress visualization, pause/resume, QPS limiting, and automatic recovery.
4.2.2 Data Source and Script Management
Manages database connection info and SQL scripts for source data extraction, offering connection testing and syntax validation.
4.2.3 Cluster and Index Management
Provides overview of index name, alias, disk usage, cluster, shard count, health status, and department ownership.
4.2.4 Index Template Management
Centralizes management of index templates used during index creation.
4.3 Problems Solved by ECP
4.3.1 Eliminated Manual Bottlenecks in ES Index Rebuilding
Automated ID export, script execution, and RPC triggering, removing the need for manual monitoring and retry, thus increasing efficiency and standardization.
4.3.2 Isolated Bulk Cleaning from Live Traffic
Used tag‑based routing to separate historic data cleaning from incremental data, preventing cleaning spikes from affecting user‑facing services.
4.3.3 Consolidated Scattered Indexes, Templates, and Scripts
Centralized assets to reduce time spent searching for previous scripts/templates, improving response speed and knowledge retention.
4.4 Terminology
4.4.1 Task
A defined activity with clear goals, time limits, and progress tracking, such as bulk ID cleaning or index building.
4.4.2 Index Cluster (cù) and Index
An abstract definition (cluster) and its concrete instances (indices), similar to interfaces and classes in Java.
4.4.3 Data Source
Sources include ID source, text source, and MySQL source.
4.4.4 Script
Combination of MySQL source and SQL script used to read source data.
4.5 Overall Design
4.6 System Demonstration
4.6.1 Create Task
4.6.2 Execute Task
5. Conclusion
5.1 Summary
ECP is a platform for managing Elasticsearch data‑transfer chains, offering a more efficient and convenient data‑cleaning solution that will continue to evolve with business needs.
5.2 Roadmap
Version 1.0 is in internal testing; future plans include scheduled cleaning tasks, reindex support, alias rollback, and data consistency checks.
5.3 Acknowledgements
Thanks to teammate Yan Zhan and the transaction team for their contributions and to the low‑code platform from Zhuanzhuan FE.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.