Databases 31 min read

Elasticsearch Overview, Comparison, Maintenance Challenges, Deployment Strategies, and Automation Management Platform

This document provides a comprehensive technical overview of Elasticsearch, comparing it with Solr and ClickHouse, detailing common operational pain points and configuration solutions, describing containerized and ECK deployments, and outlining a company‑wide automation platform for cluster provisioning, monitoring, index and security management, with future directions for lifecycle and backup strategies.

Yiche Technology
Yiche Technology
Yiche Technology
Elasticsearch Overview, Comparison, Maintenance Challenges, Deployment Strategies, and Automation Management Platform

1. Background Introduction

1. What is Elasticsearch?

Elasticsearch is the world’s most popular open‑source, distributed, RESTful full‑text search engine built on Lucene. It also functions as a distributed document database where every field is indexed and searchable, can scale to hundreds of nodes handling petabytes of data, and stores, searches and analyzes massive datasets in very short time.

Key advantages include horizontal scalability (add a server and start the process), a shard mechanism similar to HDFS for efficient distribution, and high availability through replica shards that keep the cluster running when a node fails.

2. Elasticsearch vs Solr

Both products provide full‑text search with inverted indexes. The comparison highlights that Elasticsearch offers non‑blocking real‑time index creation, more stable performance under dynamic data loads, built‑in distributed management (no external Zookeeper), a focus on core search features with optional third‑party plugins, and a built‑in Zen discovery mechanism.

Comparison Item

Solr

Elasticsearch

Real‑time index creation

IO blocking, lower efficiency

Non‑blocking, high efficiency

Dynamic data addition

Search efficiency degrades

Stable performance

System management

Uses Zookeeper

Built‑in distributed management

Feature scope

Officially provided features

Core search focus, third‑party plugins for extensions

3. Elasticsearch vs ClickHouse

The table compares simple and complex query QPS, latency, data isolation, maintenance, shard support, and resource utilization. ClickHouse shows higher QPS for complex queries, stronger data isolation, automatic TTL for index lifecycle, and higher resource utilization, while Elasticsearch offers higher simple‑query QPS and built‑in distributed capabilities.

4. Search Engine Ranking

Search Engine Ranking Diagram
Search Engine Ranking Diagram

5. Elasticsearch Scale at Yiche

The production environment currently supports versions 6.8 and 7.11, maintains 80 clusters, over 1,000 nodes, and 400 TB of containerized storage.

6. Company Application Scenarios

Key use cases include security/business‑log analysis, site‑wide data search, and log/metric analysis for dashboards. The team is migrating log analysis from Elasticsearch to ClickHouse because of higher complex‑query QPS, automatic TTL, and better compression.

2. Elasticsearch Maintenance Pain Points

1. Over‑reliance on Manual Troubleshooting

Problem: Night‑time non‑green alerts caused by massive index creation.

Solution: Pre‑create indices using index templates to avoid bulk creation at midnight.

from elasticsearch import Elasticsearch
es = Elasticsearch('192.168.1.1:9200')
for log_index_name in log_index_list:
    es.indices.create(index=log_index_name, body=mappings)

2. Unexpected Node Process Crashes

Cause: Mixed role deployment. Solution: Separate roles (client, master, data) with explicit node settings.

# elasticsearch.yml role settings
node.master: false
node.data: false
node.ingest: true   # client node

node.master: true
node.data: false
node.ingest: false  # master node

node.master: false
node.data: true
node.ingest: true   # data node

3. Index Read‑Only or Creation Failures

Root cause is high disk watermarks (low 85 %, high 90 %). When the disk is near full, Elasticsearch sets index.blocks.read_only_allow_delete to true.

# Increase high watermark
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.info.update.interval": "1m"
  }
}

# Reset read‑only flag
PUT _all/_settings
{
  "index.blocks.read_only_allow_delete": false
}

4. Master Node Split‑Brain

When the number of master‑eligible nodes is not a majority, the cluster cannot elect a master. Set discovery.zen.minimum_master_nodes to (master_nodes/2)+1 to avoid split‑brain.

# Verify master
GET /_cat/master?v

# View nodes
GET _cat/nodes

# Set quorum
PUT /_cluster/settings
{
  "persistent": {
    "discovery.zen.minimum_master_nodes": 2
  }
}

5. High Cost of Management and Maintenance

Issues include plugin installation, node scaling, and configuration complexity. Solutions involve pre‑installing common plugins (IK Analyzer, Elasticsearch‑SQL, Smart Chinese, ICU), using containerized deployments for dynamic scaling, and standardizing JVM and thread‑pool settings.

# Install plugins
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.13/elasticsearch-analysis-ik-6.8.13.zip
./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/6.8.13.0/elasticsearch-sql-6.8.13.0.zip
./bin/elasticsearch-plugin install analysis-smartcn
./bin/elasticsearch-plugin install analysis-icu

6. GC Frequency and Swap Usage

Switch to G1GC for better heap management and disable swapping with bootstrap.mlockall: true.

# JVM options
-XX:+UseG1GC
-XX:MaxGCPauseMillis=50

# Disable swap (Linux/Unix)
bootstrap.mlockall: true

7. Data Migration Across Clusters

Configure reindex.remote.whitelist to allow cross‑cluster reindexing; the setting requires a node restart.

# elasticsearch.yml
reindex.remote.whitelist: "192.169.*,192.168.*"

# Example reindex request
POST _reindex?pretty
{
  "source": {
    "remote": {
      "host": "${oldClusterHost}",
      "username": "${oldClusterUser}",
      "password": "${oldClusterPass}",
      "socket_timeout": "1m",
      "connect_timeout": "10s"
    },
    "index": "${indexName}",
    "size": 2000,
    "query": { "match_all": {} }
  },
  "dest": { "index": "${indexName}" }
}

8. Version Fragmentation and Upgrade

Multiple legacy versions (2.x, 5.x, 6.x) cause maintenance difficulty. The plan is to standardize on 6.8 and 7.11, leveraging new features such as index sorting, shrink API, native REST client, and Zen2 coordination.

3. Elasticsearch Container Deployment

1. Official Container Image Optimizations

The Dockerfile modifies the distribution type, adds configuration files, certificates, optimized JVM options, and pre‑installs common plugins.

# Dockerfile snippets
RUN grep ES_DISTRIBUTION_TYPE=tar /usr/share/elasticsearch/bin/elasticsearch-env && \
    sed -i 's/ES_DISTRIBUTION_TYPE=tar/ES_DISTRIBUTION_TYPE=docker/' /usr/share/elasticsearch/bin/elasticsearch-env

RUN mkdir -p config data logs && chmod 0775 config data logs
COPY config/elasticsearch.yml config/log4j2.properties config/
COPY config/certs/elastic-certificates.p12 config/certs/
COPY config/jvm.options config/
RUN ./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/6.8.13.0/elasticsearch-sql-6.8.13.0.zip && \
    ./bin/elasticsearch-plugin install analysis-smartcn && \
    ./bin/elasticsearch-plugin install analysis-icu && \
    ./bin/elasticsearch-plugin install repository-s3 && \
    ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.13/elasticsearch-analysis-ik-6.8.13.zip

2. Container Storage Options

HostPath and LocalVolume are evaluated; both have limitations such as lack of lifecycle management and capacity isolation. OpenEBS is chosen as a CSI solution that provides block‑level storage with per‑volume controllers.

3. ECK Cluster Deployment

Elastic Cloud on Kubernetes (ECK) is used to declaratively manage Elasticsearch clusters. The YAML snippet defines the cluster name, version, node sets (master, data, ingest), resource requests, affinity, tolerations, and init containers that set system limits.

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: my-es-cluster
  namespace: elastic-system
spec:
  version: 7.11.0
  nodeSets:
  - name: master
    count: 3
    config:
      node.master: true
      node.data: false
      node.ingest: false
    podTemplate:
      spec:
        nodeSelector:
          schedule-only: "elasticsearch"
        tolerations:
        - key: "schedule-only"
          operator: "Equal"
          value: "elasticsearch"
          effect: "NoSchedule"
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ["sh", "-c", "ulimit -unlimited && sysctl -w vm.max_map_count=262144 vm.dirty_ratio=10 vm.swappiness=0 vm.dirty_background_ratio=5 && chown -R elasticsearch:root /usr/share/elasticsearch/data /usr/share/elasticsearch/logs"]
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 4Gi
              cpu: 2
            limits:
              memory: 8Gi
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms2g -Xmx2g"
          volumeMounts:
          - name: elasticsearch-data
            mountPath: /usr/share/elasticsearch/data
        volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 100Gi
            storageClassName: openebs-es-lvm

4. Elasticsearch Automation Management Platform

1. Cluster Ticketing and Provisioning

Users submit a ticket selecting a package; after approval the platform automatically creates the cluster via ECK, binds it to a service tree, and returns connection and credential details, reducing manual effort by ~400 %.

2. Cluster Management Features

Dashboard shows health status, storage usage, index count, shard count, and provides quick links to Kibana. Administrators can modify common settings (e.g., shard count per node, disk watermarks) and view real‑time metrics such as JVM usage, CPU usage, query/write latency, pending tasks, GC stats, and request rates.

3. Index Management

Functions include creating indices, viewing settings and mappings, opening/closing/deleting indices, and checking index storage size.

4. Security and Access Control

Supports user creation, role assignment, and permission granularity (read‑only, write‑only, full access). Security is enabled via xpack.security.enabled: true and TLS settings in elasticsearch.yml.

# elasticsearch.yml security settings
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

5. Future Exploration

Planned features include full lifecycle management (freezing indices, hot‑cold tiering), advanced backup to S3 or HDFS, data import/export pipelines (Kafka, MySQL), and automated billing integration.

# Example S3 repository plugin installation
bin/elasticsearch-plugin install repository-s3
# Add credentials to keystore
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key

Author: Chen Kewei – middleware operations and development specialist at Yiche.com, focusing on private‑cloud Kafka, RocketMQ, Elasticsearch, ClickHouse, etc.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AutomationKubernetesCluster Management
Yiche Technology
Written by

Yiche Technology

Official account of Yiche Technology, regularly sharing the team's technical practices and insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.