Tagged articles
6 articles
Page 1 of 1
dbaplus Community
dbaplus Community
Mar 5, 2024 · Operations

How to Recover a Failing Elasticsearch Cluster: Master Loss, Shard Corruption, and More

This guide explains Elasticsearch cluster architecture, node roles, and metadata storage, then details step‑by‑step recovery procedures for master‑node loss, complete master outage, data‑node failures, shard allocation problems, corrupted shards, translog issues, and missing segment files, including relevant API commands and tool usage.

Cluster RecoveryData NodeElasticsearch
0 likes · 17 min read
How to Recover a Failing Elasticsearch Cluster: Master Loss, Shard Corruption, and More
vivo Internet Technology
vivo Internet Technology
Nov 22, 2023 · Operations

Investigation and Resolution of Elasticsearch node_concurrent_recoveries Performance Issue

The team traced read‑request timeouts to a single overloaded Elasticsearch node where an excessively high node_concurrent_recoveries setting caused many simultaneous shard recoveries and disk‑watermark‑driven relocations, and resolved the issue by lowering concurrent recoveries, enabling adaptive replica selection, and adjusting allocation settings.

CPUClusterDisk Watermark
0 likes · 16 min read
Investigation and Resolution of Elasticsearch node_concurrent_recoveries Performance Issue
ITPUB
ITPUB
Jul 12, 2023 · Databases

How We Migrated a Multi‑Petabyte Elasticsearch Cluster Across Data Centers Without Downtime

This article details the end‑to‑end process of moving Qunar's massive Elasticsearch logging cluster from a saturated data‑center to a new facility, covering background constraints, migration planning, manual and automated steps, performance‑tuning parameters, shard‑balancing techniques, and the final outcomes achieved.

Cluster MigrationData centerElasticsearch
0 likes · 21 min read
How We Migrated a Multi‑Petabyte Elasticsearch Cluster Across Data Centers Without Downtime
dbaplus Community
dbaplus Community
Apr 24, 2023 · Operations

Why Your Elasticsearch Cluster Stalls at Red and How to Recover It Fast

A large foreign‑enterprise Elasticsearch cluster with 10 TB of data and 200 shards got stuck in a red state after a restart, prompting a detailed diagnosis and step‑by‑step recovery plan that includes shard actions, recovery API tuning, delayed allocation, speed limits, and cautious index deletion to restore normal operation.

Cluster RecoveryIndex ManagementRecovery API
0 likes · 10 min read
Why Your Elasticsearch Cluster Stalls at Red and How to Recover It Fast
MaGe Linux Operations
MaGe Linux Operations
Oct 12, 2020 · Databases

How to Diagnose and Fix Elasticsearch Cluster Health Issues

This guide explains how to monitor Elasticsearch cluster health, interpret green/yellow/red statuses, troubleshoot unassigned shards, adjust JVM and system settings, resolve common configuration errors, and use scripts and APIs to keep your ELK stack stable and performant.

ElasticsearchShard AllocationSplit-Brain Prevention
0 likes · 28 min read
How to Diagnose and Fix Elasticsearch Cluster Health Issues
Architect
Architect
May 15, 2020 · Databases

Understanding Elasticsearch Architecture: Segments, Translog, Refresh, Shard Allocation and Cluster Operations

This article provides a comprehensive overview of Elasticsearch's internal architecture, explaining how data flows from memory buffers to Lucene segments, the role of refresh and translog for durability, segment merging strategies, shard routing, replica consistency, allocation controls, hot‑cold data separation, and cluster discovery settings.

Cluster ManagementElasticsearchSegments
0 likes · 23 min read
Understanding Elasticsearch Architecture: Segments, Translog, Refresh, Shard Allocation and Cluster Operations