Tagged articles

Shard Allocation

6 articles · Page 1 of 1

Mar 5, 2024 · Operations

How to Recover a Failing Elasticsearch Cluster: Master Loss, Shard Corruption, and More

This guide explains Elasticsearch cluster architecture, node roles, and metadata storage, then details step‑by‑step recovery procedures for master‑node loss, complete master outage, data‑node failures, shard allocation problems, corrupted shards, translog issues, and missing segment files, including relevant API commands and tool usage.

Cluster RecoveryData NodeElasticsearch

0 likes · 17 min read

How to Recover a Failing Elasticsearch Cluster: Master Loss, Shard Corruption, and More

vivo Internet Technology

Nov 22, 2023 · Operations

Investigation and Resolution of Elasticsearch node_concurrent_recoveries Performance Issue

The team traced read‑request timeouts to a single overloaded Elasticsearch node where an excessively high node_concurrent_recoveries setting caused many simultaneous shard recoveries and disk‑watermark‑driven relocations, and resolved the issue by lowering concurrent recoveries, enabling adaptive replica selection, and adjusting allocation settings.

CPUDisk WatermarkElasticsearch

0 likes · 16 min read

Investigation and Resolution of Elasticsearch node_concurrent_recoveries Performance Issue

ITPUB

Jul 12, 2023 · Databases

How We Migrated a Multi‑Petabyte Elasticsearch Cluster Across Data Centers Without Downtime

This article details the end‑to‑end process of moving Qunar's massive Elasticsearch logging cluster from a saturated data‑center to a new facility, covering background constraints, migration planning, manual and automated steps, performance‑tuning parameters, shard‑balancing techniques, and the final outcomes achieved.

Cluster MigrationData CenterElasticsearch

0 likes · 21 min read

How We Migrated a Multi‑Petabyte Elasticsearch Cluster Across Data Centers Without Downtime

dbaplus Community

Apr 24, 2023 · Operations

Why Your Elasticsearch Cluster Stalls at Red and How to Recover It Fast

A large foreign‑enterprise Elasticsearch cluster with 10 TB of data and 200 shards got stuck in a red state after a restart, prompting a detailed diagnosis and step‑by‑step recovery plan that includes shard actions, recovery API tuning, delayed allocation, speed limits, and cautious index deletion to restore normal operation.

Cluster RecoveryIndex managementRecovery API

0 likes · 10 min read

Why Your Elasticsearch Cluster Stalls at Red and How to Recover It Fast

MaGe Linux Operations

Oct 12, 2020 · Databases

How to Diagnose and Fix Elasticsearch Cluster Health Issues

This guide explains how to monitor Elasticsearch cluster health, interpret green/yellow/red statuses, troubleshoot unassigned shards, adjust JVM and system settings, resolve common configuration errors, and use scripts and APIs to keep your ELK stack stable and performant.

Cluster healthElasticsearchJVM Tuning

0 likes · 28 min read

How to Diagnose and Fix Elasticsearch Cluster Health Issues

Architect

May 15, 2020 · Databases

Understanding Elasticsearch Architecture: Segments, Translog, Refresh, Shard Allocation and Cluster Operations

This article provides a comprehensive overview of Elasticsearch's internal architecture, explaining how data flows from memory buffers to Lucene segments, the role of refresh and translog for durability, segment merging strategies, shard routing, replica consistency, allocation controls, hot‑cold data separation, and cluster discovery settings.

ElasticsearchIndexingSegments

0 likes · 23 min read

Understanding Elasticsearch Architecture: Segments, Translog, Refresh, Shard Allocation and Cluster Operations