Elasticsearch Overview, Comparison, Maintenance Challenges, Deployment Strategies, and Automation Management Platform
This document provides a comprehensive technical overview of Elasticsearch, comparing it with Solr and ClickHouse, detailing common operational pain points and configuration solutions, describing containerized and ECK deployments, and outlining a company‑wide automation platform for cluster provisioning, monitoring, index and security management, with future directions for lifecycle and backup strategies.
1. Background Introduction
1. What is Elasticsearch?
Elasticsearch is the world’s most popular open‑source, distributed, RESTful full‑text search engine built on Lucene. It also functions as a distributed document database where every field is indexed and searchable, can scale to hundreds of nodes handling petabytes of data, and stores, searches and analyzes massive datasets in very short time.
Key advantages include horizontal scalability (add a server and start the process), a shard mechanism similar to HDFS for efficient distribution, and high availability through replica shards that keep the cluster running when a node fails.
2. Elasticsearch vs Solr
Both products provide full‑text search with inverted indexes. The comparison highlights that Elasticsearch offers non‑blocking real‑time index creation, more stable performance under dynamic data loads, built‑in distributed management (no external Zookeeper), a focus on core search features with optional third‑party plugins, and a built‑in Zen discovery mechanism.
Comparison Item
Solr
Elasticsearch
Real‑time index creation
IO blocking, lower efficiency
Non‑blocking, high efficiency
Dynamic data addition
Search efficiency degrades
Stable performance
System management
Uses Zookeeper
Built‑in distributed management
Feature scope
Officially provided features
Core search focus, third‑party plugins for extensions
3. Elasticsearch vs ClickHouse
The table compares simple and complex query QPS, latency, data isolation, maintenance, shard support, and resource utilization. ClickHouse shows higher QPS for complex queries, stronger data isolation, automatic TTL for index lifecycle, and higher resource utilization, while Elasticsearch offers higher simple‑query QPS and built‑in distributed capabilities.
4. Search Engine Ranking
5. Elasticsearch Scale at Yiche
The production environment currently supports versions 6.8 and 7.11, maintains 80 clusters, over 1,000 nodes, and 400 TB of containerized storage.
6. Company Application Scenarios
Key use cases include security/business‑log analysis, site‑wide data search, and log/metric analysis for dashboards. The team is migrating log analysis from Elasticsearch to ClickHouse because of higher complex‑query QPS, automatic TTL, and better compression.
2. Elasticsearch Maintenance Pain Points
1. Over‑reliance on Manual Troubleshooting
Problem: Night‑time non‑green alerts caused by massive index creation.
Solution: Pre‑create indices using index templates to avoid bulk creation at midnight.
from elasticsearch import Elasticsearch
es = Elasticsearch('192.168.1.1:9200')
for log_index_name in log_index_list:
es.indices.create(index=log_index_name, body=mappings)2. Unexpected Node Process Crashes
Cause: Mixed role deployment. Solution: Separate roles (client, master, data) with explicit node settings.
# elasticsearch.yml role settings
node.master: false
node.data: false
node.ingest: true # client node
node.master: true
node.data: false
node.ingest: false # master node
node.master: false
node.data: true
node.ingest: true # data node3. Index Read‑Only or Creation Failures
Root cause is high disk watermarks (low 85 %, high 90 %). When the disk is near full, Elasticsearch sets index.blocks.read_only_allow_delete to true.
# Increase high watermark
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.info.update.interval": "1m"
}
}
# Reset read‑only flag
PUT _all/_settings
{
"index.blocks.read_only_allow_delete": false
}4. Master Node Split‑Brain
When the number of master‑eligible nodes is not a majority, the cluster cannot elect a master. Set discovery.zen.minimum_master_nodes to (master_nodes/2)+1 to avoid split‑brain.
# Verify master
GET /_cat/master?v
# View nodes
GET _cat/nodes
# Set quorum
PUT /_cluster/settings
{
"persistent": {
"discovery.zen.minimum_master_nodes": 2
}
}5. High Cost of Management and Maintenance
Issues include plugin installation, node scaling, and configuration complexity. Solutions involve pre‑installing common plugins (IK Analyzer, Elasticsearch‑SQL, Smart Chinese, ICU), using containerized deployments for dynamic scaling, and standardizing JVM and thread‑pool settings.
# Install plugins
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.13/elasticsearch-analysis-ik-6.8.13.zip
./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/6.8.13.0/elasticsearch-sql-6.8.13.0.zip
./bin/elasticsearch-plugin install analysis-smartcn
./bin/elasticsearch-plugin install analysis-icu6. GC Frequency and Swap Usage
Switch to G1GC for better heap management and disable swapping with bootstrap.mlockall: true.
# JVM options
-XX:+UseG1GC
-XX:MaxGCPauseMillis=50
# Disable swap (Linux/Unix)
bootstrap.mlockall: true7. Data Migration Across Clusters
Configure reindex.remote.whitelist to allow cross‑cluster reindexing; the setting requires a node restart.
# elasticsearch.yml
reindex.remote.whitelist: "192.169.*,192.168.*"
# Example reindex request
POST _reindex?pretty
{
"source": {
"remote": {
"host": "${oldClusterHost}",
"username": "${oldClusterUser}",
"password": "${oldClusterPass}",
"socket_timeout": "1m",
"connect_timeout": "10s"
},
"index": "${indexName}",
"size": 2000,
"query": { "match_all": {} }
},
"dest": { "index": "${indexName}" }
}8. Version Fragmentation and Upgrade
Multiple legacy versions (2.x, 5.x, 6.x) cause maintenance difficulty. The plan is to standardize on 6.8 and 7.11, leveraging new features such as index sorting, shrink API, native REST client, and Zen2 coordination.
3. Elasticsearch Container Deployment
1. Official Container Image Optimizations
The Dockerfile modifies the distribution type, adds configuration files, certificates, optimized JVM options, and pre‑installs common plugins.
# Dockerfile snippets
RUN grep ES_DISTRIBUTION_TYPE=tar /usr/share/elasticsearch/bin/elasticsearch-env && \
sed -i 's/ES_DISTRIBUTION_TYPE=tar/ES_DISTRIBUTION_TYPE=docker/' /usr/share/elasticsearch/bin/elasticsearch-env
RUN mkdir -p config data logs && chmod 0775 config data logs
COPY config/elasticsearch.yml config/log4j2.properties config/
COPY config/certs/elastic-certificates.p12 config/certs/
COPY config/jvm.options config/
RUN ./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/6.8.13.0/elasticsearch-sql-6.8.13.0.zip && \
./bin/elasticsearch-plugin install analysis-smartcn && \
./bin/elasticsearch-plugin install analysis-icu && \
./bin/elasticsearch-plugin install repository-s3 && \
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.13/elasticsearch-analysis-ik-6.8.13.zip2. Container Storage Options
HostPath and LocalVolume are evaluated; both have limitations such as lack of lifecycle management and capacity isolation. OpenEBS is chosen as a CSI solution that provides block‑level storage with per‑volume controllers.
3. ECK Cluster Deployment
Elastic Cloud on Kubernetes (ECK) is used to declaratively manage Elasticsearch clusters. The YAML snippet defines the cluster name, version, node sets (master, data, ingest), resource requests, affinity, tolerations, and init containers that set system limits.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: my-es-cluster
namespace: elastic-system
spec:
version: 7.11.0
nodeSets:
- name: master
count: 3
config:
node.master: true
node.data: false
node.ingest: false
podTemplate:
spec:
nodeSelector:
schedule-only: "elasticsearch"
tolerations:
- key: "schedule-only"
operator: "Equal"
value: "elasticsearch"
effect: "NoSchedule"
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ["sh", "-c", "ulimit -unlimited && sysctl -w vm.max_map_count=262144 vm.dirty_ratio=10 vm.swappiness=0 vm.dirty_background_ratio=5 && chown -R elasticsearch:root /usr/share/elasticsearch/data /usr/share/elasticsearch/logs"]
containers:
- name: elasticsearch
resources:
requests:
memory: 4Gi
cpu: 2
limits:
memory: 8Gi
env:
- name: ES_JAVA_OPTS
value: "-Xms2g -Xmx2g"
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
storageClassName: openebs-es-lvm4. Elasticsearch Automation Management Platform
1. Cluster Ticketing and Provisioning
Users submit a ticket selecting a package; after approval the platform automatically creates the cluster via ECK, binds it to a service tree, and returns connection and credential details, reducing manual effort by ~400 %.
2. Cluster Management Features
Dashboard shows health status, storage usage, index count, shard count, and provides quick links to Kibana. Administrators can modify common settings (e.g., shard count per node, disk watermarks) and view real‑time metrics such as JVM usage, CPU usage, query/write latency, pending tasks, GC stats, and request rates.
3. Index Management
Functions include creating indices, viewing settings and mappings, opening/closing/deleting indices, and checking index storage size.
4. Security and Access Control
Supports user creation, role assignment, and permission granularity (read‑only, write‑only, full access). Security is enabled via xpack.security.enabled: true and TLS settings in elasticsearch.yml.
# elasticsearch.yml security settings
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p125. Future Exploration
Planned features include full lifecycle management (freezing indices, hot‑cold tiering), advanced backup to S3 or HDFS, data import/export pipelines (Kafka, MySQL), and automated billing integration.
# Example S3 repository plugin installation
bin/elasticsearch-plugin install repository-s3
# Add credentials to keystore
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_keyAuthor: Chen Kewei – middleware operations and development specialist at Yiche.com, focusing on private‑cloud Kafka, RocketMQ, Elasticsearch, ClickHouse, etc.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Yiche Technology
Official account of Yiche Technology, regularly sharing the team's technical practices and insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
