Tagged articles
162 articles
Page 1 of 2
MaGe Linux Operations
MaGe Linux Operations
May 3, 2026 · Cloud Native

How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide

This article walks Kubernetes operators through a systematic investigation of NotReady node symptoms, explaining the kubelet status mechanism, detailing each diagnostic step—from verifying node conditions with kubectl to checking kubelet, container runtime, resources, network, and certificates—and providing concrete remediation and preventive measures.

KubernetesNotReadycontainerd
0 likes · 35 min read
How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide
ITPUB
ITPUB
Apr 27, 2026 · Cloud Native

Why Skipping Backups Makes Kubernetes Operations Impossible

The article explains that running production Kubernetes clusters without regular backup and recovery plans exposes businesses to severe risks such as cluster failures, data loss, and prolonged downtime, and it details practical etcd physical and Velero logical backup strategies to mitigate these threats.

BackupCloud NativeKubernetes
0 likes · 9 min read
Why Skipping Backups Makes Kubernetes Operations Impossible
Raymond Ops
Raymond Ops
Mar 6, 2026 · Cloud Native

Scaling Kubernetes from 1k to 5k Nodes: Complete Performance Tuning Playbook

This article presents a comprehensive, real‑world guide for expanding a Kubernetes cluster from 1,000 to 5,000 nodes, covering control‑plane HA, etcd optimization, network and scheduler tuning, monitoring, and automation, with detailed configurations, code snippets, and a step‑by‑step case study of a large‑scale production environment.

CNIControl Planecluster scaling
0 likes · 22 min read
Scaling Kubernetes from 1k to 5k Nodes: Complete Performance Tuning Playbook
Code Wrench
Code Wrench
Jan 13, 2026 · Backend Development

Unlocking etcd: Deep Dive into Go’s Distributed Key‑Value Engine

This article offers a thorough source‑code walkthrough of etcd v3.5+, revealing how its Go‑based architecture implements the Raft consensus algorithm, MVCC storage with BoltDB, efficient network communication via rafthttp, and Go concurrency patterns, while providing practical operational insights for performance tuning and reliability.

BackendGoMVCC
0 likes · 12 min read
Unlocking etcd: Deep Dive into Go’s Distributed Key‑Value Engine
Ray's Galactic Tech
Ray's Galactic Tech
Dec 12, 2025 · Cloud Native

Inside the Kubernetes Master: A Complete Breakdown of Core Components

Master nodes act as the brain of a Kubernetes cluster, hosting essential components such as kube‑apiserver, etcd, kube‑scheduler, kube‑controller‑manager and optionally cloud‑controller‑manager, each with distinct roles, high‑availability designs, security considerations, and operational workflows that together orchestrate and maintain cluster state.

Control PlaneMaster NodeScheduler
0 likes · 8 min read
Inside the Kubernetes Master: A Complete Breakdown of Core Components
Ray's Galactic Tech
Ray's Galactic Tech
Nov 30, 2025 · Cloud Native

Mastering etcd: The Core of Kubernetes State Management and High‑Availability

etcd is the distributed, strongly consistent key‑value store that serves as Kubernetes' single source of truth, handling all cluster state data; this guide explains its architecture, data model, watch mechanism, high‑availability deployment, backup, monitoring, security, and operational best practices for reliable cluster management.

Kubernetesdistributed storageetcd
0 likes · 8 min read
Mastering etcd: The Core of Kubernetes State Management and High‑Availability
dbaplus Community
dbaplus Community
Nov 24, 2025 · Operations

How We Rescued a Critical etcd Outage in 4 Hours: Step‑by‑Step Recovery Guide

A midnight Kubernetes disaster caused API server timeouts, etcd health failures, and a full service outage, prompting a detailed investigation, root‑cause analysis of massive database fragmentation, and a four‑stage emergency recovery that restored the cluster within 4 hours while outlining preventive measures.

KubernetesOperationsdatabase fragmentation
0 likes · 10 min read
How We Rescued a Critical etcd Outage in 4 Hours: Step‑by‑Step Recovery Guide
Ops Community
Ops Community
Oct 12, 2025 · Operations

When etcd Certificates Expire: How One Failure Crippled an Entire Kubernetes Cluster

A midnight alarm revealed that an expired etcd TLS certificate caused a cascade of failures across a Kubernetes cluster, leading to a full outage that took over half an hour to diagnose, remediate, and restore, highlighting the critical need for proactive certificate management and automated monitoring.

Cluster RecoveryKubernetescertificate expiration
0 likes · 44 min read
When etcd Certificates Expire: How One Failure Crippled an Entire Kubernetes Cluster
Raymond Ops
Raymond Ops
Sep 16, 2025 · Cloud Native

How to Build a Secure High‑Availability Etcd Cluster on Linux

This guide walks through installing etcd, configuring a three‑node high‑availability cluster with TLS certificates, setting up host files, disabling SELinux and firewalld, creating a Certificate Authority using cfssl, generating node certificates, distributing them, and finally deploying and verifying the cluster on Linux systems.

CertificateCloud NativeLinux
0 likes · 19 min read
How to Build a Secure High‑Availability Etcd Cluster on Linux
Code Wrench
Code Wrench
Sep 5, 2025 · Backend Development

Mastering Distributed Locks in Go: Principles, Implementations, and Pitfalls

This article explains the fundamentals of distributed locks, compares Redis, etcd, ZooKeeper and database approaches, provides practical Go code examples, highlights common mistakes, and offers optimization tips so developers can confidently apply the right locking strategy in real-world systems.

BackendGolangZooKeeper
0 likes · 12 min read
Mastering Distributed Locks in Go: Principles, Implementations, and Pitfalls
MaGe Linux Operations
MaGe Linux Operations
Jul 23, 2025 · Operations

How We Rescued a Crashed K8s Cluster: etcd 100% Fragmentation Recovery

This article details a P0 production incident where a Kubernetes cluster became completely unresponsive due to 100% etcd database fragmentation, describing the step‑by‑step diagnosis, emergency recovery actions, root‑cause analysis, and long‑term preventive measures for reliable cluster operation.

Cluster RecoveryKubernetesOperations
0 likes · 12 min read
How We Rescued a Crashed K8s Cluster: etcd 100% Fragmentation Recovery
Baidu Tech Salon
Baidu Tech Salon
Jun 17, 2025 · Operations

How Baidu Scaled Its Vertical Search: Elastic Scheduling and Data Management Secrets

This article explains how Baidu's vertical search platform tackled massive data growth and scaling challenges by redesigning its data management system, introducing elastic scheduling, decoupling ETCD access, implementing auto‑scaling, and advancing shard expansion to improve performance, stability, and cost efficiency.

Auto ScalingData ManagementSearch Architecture
0 likes · 18 min read
How Baidu Scaled Its Vertical Search: Elastic Scheduling and Data Management Secrets
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
May 28, 2025 · Operations

How to Fix etcd “NOSPACE” Errors in Kubernetes Clusters

When a Kubernetes cluster’s etcd reaches its default 2 GB quota, it triggers a “NOSPACE” alarm that blocks all write operations, causing critical services to fail; this guide explains the root cause, how to diagnose the issue with etcdctl, and step‑by‑step remediation including compaction, defragmentation, and quota expansion.

KubernetesNOSPACEcompaction
0 likes · 7 min read
How to Fix etcd “NOSPACE” Errors in Kubernetes Clusters
Linux Ops Smart Journey
Linux Ops Smart Journey
Apr 25, 2025 · Cloud Native

How to Seamlessly Migrate Calico from etcd to Kubernetes Datastore

Learn step‑by‑step how to transition Calico’s data store from etcd to Kubernetes, covering prerequisite checks, locking the datastore, exporting and importing data, reconfiguring calicoctl, applying the new manifests, and unlocking the store, while highlighting benefits and tips for a smooth migration.

CalicoKubernetesNetwork Policy
0 likes · 11 min read
How to Seamlessly Migrate Calico from etcd to Kubernetes Datastore
MaGe Linux Operations
MaGe Linux Operations
Mar 13, 2025 · Operations

How to Build a Secure High‑Availability Etcd Cluster on Linux

This guide walks through installing etcd, generating TLS certificates with cfssl, configuring static, dynamic, or DNS‑based discovery, setting up systemd service files for three nodes, and verifying cluster health using etcdctl, providing a complete step‑by‑step deployment for a production‑grade, cloud‑native key‑value store.

TLSetcdhigh availability
0 likes · 19 min read
How to Build a Secure High‑Availability Etcd Cluster on Linux
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Nov 5, 2024 · Cloud Native

How to Build a High‑Availability Kubernetes Cluster: Kubeadm & Binary Package Guide

This comprehensive tutorial walks you through planning, preparing hardware, choosing deployment methods, and step‑by‑step installation of a highly available Kubernetes cluster using kubeadm and manual binary packages, covering system initialization, certificate generation, component configuration, CNI networking, and cluster verification.

CNIDockerKubernetes
0 likes · 28 min read
How to Build a High‑Availability Kubernetes Cluster: Kubeadm & Binary Package Guide
Su San Talks Tech
Su San Talks Tech
Sep 30, 2024 · Backend Development

How JD’s Hotkey Framework Detects and Pushes Hot Data in Milliseconds

JD’s Hotkey framework provides millisecond‑level detection and cluster‑wide push of hot data, users, and interfaces, dramatically reducing backend query load, improving performance, and supporting scenarios such as local caching and rate limiting, with proven scalability demonstrated in large‑scale e‑commerce promotions.

Backend PerformanceJavadistributed caching
0 likes · 7 min read
How JD’s Hotkey Framework Detects and Pushes Hot Data in Milliseconds
FunTester
FunTester
Jul 15, 2024 · Backend Development

Comparison and Practical Guide to Java etcd Clients

This article compares popular Java etcd client libraries, evaluates their features, performance, and suitability, and provides a hands‑on tutorial using jetcd with Maven dependencies, code examples for watching and reading keys, and discusses runtime considerations such as thread handling.

DistributedJavaetcd
0 likes · 10 min read
Comparison and Practical Guide to Java etcd Clients
FunTester
FunTester
Jul 1, 2024 · Cloud Native

Mastering etcd with Go: From Basics to Distributed Locks

This article introduces etcd as a reliable distributed key‑value store built on Raft, outlines its key features and common use cases such as service discovery and configuration management, and provides a complete Go tutorial covering dependency setup, server launch, client implementation, read/write testing, and distributed lock usage.

Configuration ManagementGoRaft
0 likes · 10 min read
Mastering etcd with Go: From Basics to Distributed Locks
Su San Talks Tech
Su San Talks Tech
Jun 11, 2024 · Backend Development

Choosing the Right Service Registry: Zookeeper, Eureka, Nacos, Consul, and Etcd Compared

This comprehensive guide explains the fundamentals, CAP trade‑offs, and core algorithms of service registries, then details Zookeeper, Eureka, Nacos, Consul, and Etcd features, compares them across health checks, multi‑datacenter support, KV storage, and provides practical selection advice for developers and architects.

ConsulNacosZooKeeper
0 likes · 23 min read
Choosing the Right Service Registry: Zookeeper, Eureka, Nacos, Consul, and Etcd Compared
Ops Development Stories
Ops Development Stories
Apr 12, 2024 · Cloud Native

Mastering etcd: Architecture, Monitoring & Performance Tuning

This article provides a comprehensive overview of etcd—including its origins, role in Kubernetes, version evolution, layered architecture, key terminology, operational commands, monitoring metrics, benchmarking procedures, disk‑performance testing, and tuning recommendations—for building reliable cloud‑native clusters.

Benchmarkcloud-nativedistributed storage
0 likes · 17 min read
Mastering etcd: Architecture, Monitoring & Performance Tuning
Liangxu Linux
Liangxu Linux
Mar 7, 2024 · Operations

How Upgrading EBS Volumes Boosted etcd Write Performance by 30%

A technical deep‑dive shows how a team managing dozens of Kubernetes clusters diagnosed a write‑ahead‑log bottleneck in etcd, measured IOPS and latency with etcdctl and fio, upgraded gp2 volumes to gp3, and discovered diminishing returns beyond 3000 IOPS while explaining the role of fdatasync in storage performance.

AWSEBSIOPS
0 likes · 11 min read
How Upgrading EBS Volumes Boosted etcd Write Performance by 30%
Architect
Architect
Feb 29, 2024 · Cloud Native

Which Service Registry Should You Choose? Zookeeper, Eureka, Nacos, Consul, or Etcd

This comprehensive guide analyzes the core concepts, CAP trade‑offs, consensus algorithms, and practical deployment details of Zookeeper, Eureka, Nacos, Consul, and Etcd, providing concrete examples and selection criteria to help engineers and architects decide the most suitable service registry for their micro‑service environments.

CAP theoremConsulMicroservices
0 likes · 26 min read
Which Service Registry Should You Choose? Zookeeper, Eureka, Nacos, Consul, or Etcd
Beike Product & Technology
Beike Product & Technology
Jan 29, 2024 · Information Security

Kubernetes Security Risks and Hardening Recommendations

This article analyzes Kubernetes security threats from cloud, cluster, and container perspectives, enumerates high‑risk permissions, default privileged accounts, and insecure configurations, and provides concrete hardening steps such as least‑privilege RAM policies, etcd encryption, RBAC tightening, and workload isolation measures.

CloudNativeKubernetesPodSecurity
0 likes · 31 min read
Kubernetes Security Risks and Hardening Recommendations
Tencent Cloud Developer
Tencent Cloud Developer
Jan 24, 2024 · Backend Development

Understanding the Safety of Redis Distributed Locks and the Redlock Debate

Redis distributed locks require unique identifiers, atomic Lua releases, and TTL refreshes to avoid deadlocks, while the Redlock algorithm adds majority quorum but remains vulnerable to clock drift and client pauses, so critical systems should combine it with fencing tokens or version checks for true safety.

RedlockZooKeeperconcurrency
0 likes · 36 min read
Understanding the Safety of Redis Distributed Locks and the Redlock Debate
Efficient Ops
Efficient Ops
Dec 13, 2023 · Cloud Native

How to Build Your Own Kubernetes‑Style Container Orchestration System

This article walks through the evolution from a single‑machine Java monolith to a distributed, container‑based platform, detailing master‑worker roles, core Kubernetes‑like components, networking, scheduling, and plug‑ins for a complete cloud‑native orchestration solution.

Cloud NativeKubernetescontainer orchestration
0 likes · 8 min read
How to Build Your Own Kubernetes‑Style Container Orchestration System
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 6, 2023 · Backend Development

Comparison of Consistency Read Implementations in Consul and etcd

This article compares the consistency read mechanisms of the distributed key‑value stores Consul and etcd, detailing Consul’s three read modes and leader‑forwarding logic, and explaining etcd’s serialize and linearizable reads, including the internal notification and index‑checking processes.

Backend DevelopmentConsistency ReadConsul
0 likes · 6 min read
Comparison of Consistency Read Implementations in Consul and etcd
Efficient Ops
Efficient Ops
Dec 4, 2023 · Cloud Native

How Does a Kubernetes Pod Get Created? Step‑by‑Step Walkthrough

This article walks through the complete Kubernetes pod creation workflow, from submitting the YAML with kubectl to the API server, storing the definition in etcd, scheduling, kubelet orchestration, container runtime delegation, CNI networking, health probing, and endpoint setup for services.

CNIKubernetesPod Lifecycle
0 likes · 3 min read
How Does a Kubernetes Pod Get Created? Step‑by‑Step Walkthrough
DevOps Cloud Academy
DevOps Cloud Academy
Aug 2, 2023 · Cloud Native

Backing Up and Restoring etcd in a Kubernetes Cluster

This tutorial walks through installing the etcd client, creating an Nginx deployment for verification, backing up the etcd data store, validating the backup, and restoring the backup to a Kubernetes cluster while handling component shutdown and restart procedures.

Cloud NativeDevOpsKubernetes
0 likes · 14 min read
Backing Up and Restoring etcd in a Kubernetes Cluster
Efficient Ops
Efficient Ops
Jul 11, 2023 · Operations

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

This article details a real-world Kubernetes control‑plane outage where kube‑apiserver repeatedly OOM‑killed, explores cluster metrics, logs, heap and goroutine profiles, hypothesizes root causes such as etcd latency and DeleteCollection memory leaks, and offers step‑by‑step troubleshooting and prevention guidance.

OOMetcdkube-apiserver
0 likes · 21 min read
Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures
Open Source Linux
Open Source Linux
Apr 21, 2023 · Cloud Native

Mastering Kubernetes Architecture: How Control Plane and Worker Nodes Work Together

This article explains the core components of Kubernetes architecture—including the control plane (etcd, API server, controller manager, scheduler) and worker node components (kubelet, kube-proxy, container runtimes)—detailing their roles, interactions, and best‑practice considerations for maintaining healthy, scalable clusters.

Control PlaneKubernetesScheduler
0 likes · 12 min read
Mastering Kubernetes Architecture: How Control Plane and Worker Nodes Work Together
Liangxu Linux
Liangxu Linux
Apr 16, 2023 · Backend Development

Mastering API Gateways: Concepts, Features, and a Traefik‑Based Custom Solution

This article provides a comprehensive overview of API gateway fundamentals, compares popular open‑source gateways, and details a custom Traefik‑based microservice gateway architecture with routing, authentication, protocol conversion, and high‑performance connection pooling.

Backend ArchitectureTraefikapi-gateway
0 likes · 18 min read
Mastering API Gateways: Concepts, Features, and a Traefik‑Based Custom Solution
Cloud Native Technology Community
Cloud Native Technology Community
Feb 1, 2023 · Cloud Native

Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview

This article breaks down the core concepts of Kubernetes—including its master‑worker architecture, pod scheduling, etcd storage, service exposure, scaling mechanisms, and controller interactions—through a series of clear questions and illustrated answers to help beginners grasp the platform’s complexity.

Cloud NativeKubernetesPod Scheduling
0 likes · 8 min read
Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview
MaGe Linux Operations
MaGe Linux Operations
Nov 6, 2022 · Cloud Native

How to Safely Shut Down and Restart a Kubernetes Cluster

This guide walks you through the essential steps, commands, and precautions for safely draining nodes, backing up applications, CRDs, and etcd, then shutting down and later restarting a Kubernetes cluster while avoiding common pitfalls.

BackupCluster MaintenanceKubernetes
0 likes · 6 min read
How to Safely Shut Down and Restart a Kubernetes Cluster
Open Source Linux
Open Source Linux
Oct 14, 2022 · Cloud Native

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

On September 10 2021, a Kubernetes cluster experienced intermittent kubectl hangs caused by kube-apiserver OOM kills, leading to cascading control-plane failures; this article details the environment, observed metrics, log analysis, code inspection of DeleteCollection, and provides troubleshooting steps to prevent similar incidents.

OOMcloud-nativeetcd
0 likes · 21 min read
Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures
Practical DevOps Architecture
Practical DevOps Architecture
Sep 15, 2022 · Cloud Native

Brief Overview of etcd and Kubernetes: Features, Use Cases, and Core Components

This article provides a concise overview of etcd and Kubernetes, detailing etcd’s features and use‑cases, explaining Kubernetes fundamentals, its relationship with Docker, and describing key components such as Minikube, Kubectl, Kubelet, common deployment methods, and the platform’s cluster management architecture.

Cloud NativeDistributed Systemsetcd
0 likes · 6 min read
Brief Overview of etcd and Kubernetes: Features, Use Cases, and Core Components
Top Architect
Top Architect
Sep 3, 2022 · Backend Development

Implementing Distributed Locks with Redis, Zookeeper, and etcd

The article explains how to build reliable distributed locks using Redis, Zookeeper, and etcd, describing the essential concepts of mutual exclusion, safety, and liveness, showing code examples, highlighting common issues, and comparing each solution's advantages and drawbacks.

ZooKeeperconcurrencydistributed-lock
0 likes · 6 min read
Implementing Distributed Locks with Redis, Zookeeper, and etcd
Tencent Cloud Developer
Tencent Cloud Developer
Aug 29, 2022 · Cloud Computing

High‑Availability DNS Solutions on Tencent Cloud: BIND and CoreDNS with ETCD

The article details two high‑availability DNS implementations for Tencent Cloud—an intelligent BIND‑based server and a CoreDNS solution backed by an ETCD cluster—covering DNS fundamentals, installation steps, configuration files, zone creation, health checks, and verification of internal and external name resolution across multi‑AZ deployments.

BINDCoreDNSDNS
0 likes · 24 min read
High‑Availability DNS Solutions on Tencent Cloud: BIND and CoreDNS with ETCD
Efficient Ops
Efficient Ops
Aug 9, 2022 · Operations

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

This article analyzes a September 2021 incident where a Kubernetes cluster’s kube-apiserver repeatedly OOM-killed, causing kubectl hangs, by examining cluster specs, monitoring data, logs, heap and goroutine profiles, and the DeleteCollection implementation, ultimately offering troubleshooting steps and preventive measures for control-plane stability.

GoroutineOOMcloud-native
0 likes · 20 min read
Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures
Architecture Digest
Architecture Digest
Jul 27, 2022 · Databases

Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage

This article provides a detailed introduction to etcd, covering its purpose as a highly‑available distributed key‑value store, core Raft‑based architecture, key concepts, common application scenarios, step‑by‑step installation and cluster deployment, as well as essential command‑line operations for managing data, backups, and cluster members.

BackupDeploymentKubernetes
0 likes · 26 min read
Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage
Top Architect
Top Architect
Jul 23, 2022 · Cloud Native

Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage

This article provides a detailed introduction to etcd, covering its purpose as a highly available distributed key‑value store, core concepts like Raft consensus, key features, common use cases such as service discovery and configuration management, step‑by‑step installation for single‑node and cluster deployments, and essential etcdctl commands for managing data and cluster members.

Cloud NativeConfiguration ManagementDistributed Systems
0 likes · 24 min read
Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage
Architect
Architect
Jul 21, 2022 · Cloud Native

Comprehensive Guide to etcd: Overview, Architecture, Installation, and Usage

This article provides a thorough introduction to etcd, covering its purpose, history, core features, key terminology, internal architecture, common application scenarios such as service discovery and distributed locking, step‑by‑step installation and cluster deployment, essential command‑line operations, backup procedures, and practical recommendations.

InstallationKubernetescommand-line
0 likes · 25 min read
Comprehensive Guide to etcd: Overview, Architecture, Installation, and Usage
Open Source Linux
Open Source Linux
Jun 16, 2022 · Cloud Native

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler, and Nodes

This article explains the key Kubernetes control‑plane components—including etcd, the API Server, Controller Manager, Scheduler, as well as worker‑node components like Kubelet, kube‑proxy, and the container runtime—detailing their roles, interactions, and the underlying mechanisms such as Raft consensus and admission control.

API ServerControl PlaneKubernetes
0 likes · 10 min read
Mastering Kubernetes Control Plane: etcd, API Server, Scheduler, and Nodes
MaGe Linux Operations
MaGe Linux Operations
May 25, 2022 · Operations

Why Kubernetes LIST Requests Can Cripple Your Cluster and How to Fix Them

This article examines how heavy LIST operations in unstructured storage systems like Ceph and etcd consume massive I/O, network and CPU, threaten cluster stability, and offers detailed code analysis, performance testing, and practical tuning recommendations to keep large‑scale Kubernetes clusters reliable.

KubernetesListScalability
0 likes · 29 min read
Why Kubernetes LIST Requests Can Cripple Your Cluster and How to Fix Them
Yiche Technology
Yiche Technology
May 20, 2022 · Cloud Native

APISIX API Gateway: Architecture, Features, Performance Comparison, and Future Outlook

This article introduces the APISIX API gateway, explaining its cloud‑native architecture built on OpenResty and Etcd, the advantages over traditional monolithic service frameworks, detailed feature breakdowns, performance benchmark comparisons with OpenResty, multi‑cluster management practices, usage scenarios, monitoring, logging, and future development directions.

APISIXCloud NativeOpenResty
0 likes · 12 min read
APISIX API Gateway: Architecture, Features, Performance Comparison, and Future Outlook
Open Source Linux
Open Source Linux
May 12, 2022 · Cloud Native

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler & More

This article explains the core components of the Kubernetes control plane—including etcd, the API Server, Controller Manager, Scheduler—as well as key worker‑node components like Kubelet, kube‑proxy, and the container runtime, detailing their roles, interactions, and essential functions.

API ServerControl PlaneKubernetes
0 likes · 11 min read
Mastering Kubernetes Control Plane: etcd, API Server, Scheduler & More
Cloud Native Technology Community
Cloud Native Technology Community
May 10, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,100 nodes and 200,000 Pods, describing cluster topology, workload generation, API server bottlenecks, controller manager and scheduler tuning, extensive etcd optimizations, and the resulting performance gains that met Kubernetes SLOs.

Cloud NativeKubernetesPayPal
0 likes · 13 min read
How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods
Architecture Digest
Architecture Digest
Apr 25, 2022 · Cloud Native

Kubernetes Architecture Overview and Detailed Components

This article explains the goals, design principles, and detailed components of Kubernetes architecture, covering its control plane, API server, etcd store, scheduler, kubelet, container runtime, and kube-proxy, and summarizes how these parts work together to provide a scalable, portable, and automated container orchestration platform.

Control PlaneKubernetescontainer orchestration
0 likes · 12 min read
Kubernetes Architecture Overview and Detailed Components
Open Source Linux
Open Source Linux
Mar 17, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,000 nodes and 200,000 pods, describing the cluster topology, workload generation, bottlenecks in the API server, controller manager, scheduler, and etcd, and the optimizations that enabled stable performance at massive scale.

Cloud NativeKubernetesPayPal
0 likes · 12 min read
How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods
Architect
Architect
Feb 18, 2022 · Cloud Native

Large‑Scale etcd Cluster Performance Optimization and Pod Data Splitting in Ant Group’s Sigma

This article describes how Ant Group tackled the performance ceiling of its massive Sigma Kubernetes clusters by horizontally splitting etcd storage for Pods, Leases and Events, redesigning watch handling to avoid component restarts, and using snapshot‑based migration to preserve data integrity while reducing latency.

Cluster PerformanceData MigrationKubernetes
0 likes · 27 min read
Large‑Scale etcd Cluster Performance Optimization and Pod Data Splitting in Ant Group’s Sigma
Top Architect
Top Architect
Feb 17, 2022 · Cloud Native

Understanding etcd: Features, Use Cases, and Comparison with Zookeeper

This article provides a comprehensive overview of etcd, describing its purpose as a distributed, reliable key‑value store, outlining its core features, detailing multiple real‑world scenarios such as service discovery, configuration management, load balancing, distributed locking, and comparing its advantages over Zookeeper.

RaftZookeeper comparisondistributed key-value store
0 likes · 15 min read
Understanding etcd: Features, Use Cases, and Comparison with Zookeeper
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Feb 15, 2022 · Operations

Why etcd Is the Backbone of Modern Distributed Systems

This article explains what etcd is, its origins, core features such as simplicity, security, speed, and reliability, and details eight practical scenarios—including service discovery, messaging, load balancing, distributed coordination, locks, queues, monitoring, and leader election—showing why it often outperforms Zookeeper in cloud‑native environments.

Raftetcdkey-value store
0 likes · 15 min read
Why etcd Is the Backbone of Modern Distributed Systems
Architecture Digest
Architecture Digest
Feb 13, 2022 · Cloud Native

What Is etcd? Features, Use Cases, and Comparison with Zookeeper

This article explains the distributed key‑value store etcd, its origin, core characteristics such as simplicity, security, speed and Raft‑based reliability, and details eight practical scenarios—including service discovery, pub/sub, load balancing, distributed locks and leader election—while also comparing it with Zookeeper.

Configuration ManagementRaftdistributed key-value store
0 likes · 15 min read
What Is etcd? Features, Use Cases, and Comparison with Zookeeper
Architect
Architect
Jan 12, 2022 · Cloud Native

Service Governance and etcd: Architecture, Core Technologies, and Large‑Scale Implementation

This article explains service governance concepts, the challenges of managing thousands of micro‑services, introduces etcd and its Raft‑based consistency model, details BoltDB storage internals, and describes Baidu's large‑scale Tianlu platform with its high‑availability, performance, scalability, and operational metrics.

Distributed Systemsetcdservice governance
0 likes · 21 min read
Service Governance and etcd: Architecture, Core Technologies, and Large‑Scale Implementation
Efficient Ops
Efficient Ops
Nov 30, 2021 · Cloud Native

How to Safely Backup and Restore etcd in a Kubernetes Cluster

This guide explains why etcd is critical for Kubernetes, walks through creating snapshots with etcdctl, automating backups via scripts and cron, and details step‑by‑step procedures for restoring a failed etcd cluster, including stopping services, cleaning data directories, and restarting components to recover the whole cluster.

BackupRestorecloud-native
0 likes · 16 min read
How to Safely Backup and Restore etcd in a Kubernetes Cluster
Baidu Intelligent Testing
Baidu Intelligent Testing
Nov 16, 2021 · Cloud Native

Service Governance and etcd: Concepts, Raft & BoltDB Implementation, and Large‑Scale Practices at Baidu

This article introduces service governance fundamentals, explains how etcd’s Raft‑based consensus and BoltDB storage work, compares etcd with ZooKeeper and Consul, and describes Baidu’s large‑scale, high‑availability, high‑performance service‑governance platform built on these technologies.

BoltDBCloud NativeRaft
0 likes · 20 min read
Service Governance and etcd: Concepts, Raft & BoltDB Implementation, and Large‑Scale Practices at Baidu
Baidu Geek Talk
Baidu Geek Talk
Nov 10, 2021 · Operations

How etcd Powers Scalable Service Governance: Raft, BoltDB, and Real‑World Practices

This article explores service governance fundamentals, examines why etcd’s Raft‑based consensus and BoltDB storage make it ideal for large‑scale systems, compares it with ZooKeeper and Consul, and shares Baidu’s practical architecture, performance tricks, and operational metrics for high‑availability, high‑performance service management.

BoltDBDistributed SystemsPerformance Optimization
0 likes · 23 min read
How etcd Powers Scalable Service Governance: Raft, BoltDB, and Real‑World Practices
360 Tech Engineering
360 Tech Engineering
Sep 9, 2021 · Databases

PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd

This article details the design, deployment, configuration, operation, monitoring, and backup of a PostgreSQL high‑availability cluster built on Patroni, Etcd, and LVS at 360, covering hardware layout, software versions, installation steps, parameter tuning, fail‑over testing, and future outlook.

BackupClusterPatroni
0 likes · 16 min read
PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd
High Availability Architecture
High Availability Architecture
Aug 31, 2021 · Cloud Native

High‑Availability Architecture for etcd in Ant Group’s Massive Kubernetes Clusters

The article describes how Ant Group operates a world‑largest Kubernetes deployment of over 10,000 nodes, details the performance challenges of the etcd key‑value store at such scale, and outlines a comprehensive set of hardware upgrades, configuration tuning, monitoring, data‑splitting, and future distributed‑etcd strategies to achieve robust high‑availability.

etcdperformance tuningscale-out
0 likes · 21 min read
High‑Availability Architecture for etcd in Ant Group’s Massive Kubernetes Clusters
Open Source Linux
Open Source Linux
Jul 25, 2021 · Cloud Native

Demystifying Kubernetes: Core Components and How They Work Together

This article provides a concise, question‑driven overview of Kubernetes, explaining the roles of master and worker nodes, pod networking, scheduling, storage with etcd, service exposure, scaling mechanisms, and how the various controllers collaborate to manage a cloud‑native cluster.

Cloud NativeKubernetesPods
0 likes · 10 min read
Demystifying Kubernetes: Core Components and How They Work Together
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Jul 19, 2021 · Cloud Native

Mastering Kubernetes Node Isolation, Scaling, and Rolling Updates – Practical Commands and Tips

This guide walks through essential Kubernetes operations such as isolating and recovering nodes, expanding clusters with new nodes, dynamically scaling Pods, managing Labels, scheduling Pods to specific Nodes, performing rolling updates, and configuring high‑availability for etcd and Master components, all with concrete command‑line examples and YAML snippets.

KubernetesNode ManagementRolling Update
0 likes · 19 min read
Mastering Kubernetes Node Isolation, Scaling, and Rolling Updates – Practical Commands and Tips
Code Ape Tech Column
Code Ape Tech Column
Jul 15, 2021 · Operations

What Really Caused Bilibili’s Sudden Outage? A Deep Dive into the Technical Failure

The article analyzes Bilibili's recent half‑hour service disruption, explores technical rumors such as an etcd crash, examines Kubernetes‑based cloud‑native infrastructure, reviews similar historic outages, and offers expert recommendations for improving high‑availability and disaster‑recovery in large‑scale internet services.

BilibiliCloud NativeKubernetes
0 likes · 8 min read
What Really Caused Bilibili’s Sudden Outage? A Deep Dive into the Technical Failure
Ops Development Stories
Ops Development Stories
Jun 16, 2021 · Backend Development

How Raft Achieves Consensus: Leader Election, Log Replication, and State Machine Explained

This article explains the core mechanisms of the Raft consensus algorithm—including leader election, log replication, safety guarantees, message structures, state transitions, and key Go implementations in etcd-raft—providing code examples and detailed analysis of functions such as becomeLeader, tickElection, and appendEntry.

ConsensusDistributed SystemsGo
0 likes · 21 min read
How Raft Achieves Consensus: Leader Election, Log Replication, and State Machine Explained
Open Source Linux
Open Source Linux
May 30, 2021 · Cloud Native

What Is etcd? Features, Use Cases, and How It Powers Kubernetes

This article explains etcd as a highly available distributed key‑value store, outlines its simple, secure, fast, and reliable characteristics, describes typical scenarios such as service discovery and distributed locking, and then provides a comprehensive overview of Kubernetes architecture, components, deployment methods, security, networking, storage, and operational best practices.

Kubernetescontainer orchestrationetcd
0 likes · 45 min read
What Is etcd? Features, Use Cases, and How It Powers Kubernetes
Efficient Ops
Efficient Ops
May 6, 2021 · Operations

How to Safely Backup and Restore etcd in a Kubernetes Cluster

This guide explains why etcd backup is critical for Kubernetes disaster recovery, walks through snapshot creation, distribution, scheduled cron jobs, and provides a step‑by‑step procedure to restore the cluster on all nodes, ensuring services resume correctly.

BackupClusterKubernetes
0 likes · 14 min read
How to Safely Backup and Restore etcd in a Kubernetes Cluster
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Mar 31, 2021 · Operations

How to Efficiently Backup and Restore Your Kubernetes Cluster with Velero and Other Tools

Accidental namespace deletions in Kubernetes can cause massive data loss, but by using etcd snapshots, resource‑level backup tools like Velero, PX‑Backup, and Kasten, and configuring scheduled backups, hooks, and PVC migration, you can protect clusters, streamline recovery, and avoid painful manual redeployments.

BackupCluster MigrationKubernetes
0 likes · 12 min read
How to Efficiently Backup and Restore Your Kubernetes Cluster with Velero and Other Tools
360 Quality & Efficiency
360 Quality & Efficiency
Mar 12, 2021 · Backend Development

Distributed Lock Implementations with Redis, Etcd, and Zookeeper

This article explains the concept of distributed locks, outlines common application scenarios, and provides detailed Java implementations using Redis (including Redisson and RedLock), Etcd, and Zookeeper, complete with code examples and a comparative summary of their advantages and drawbacks.

Backenddistributed-locketcd
0 likes · 14 min read
Distributed Lock Implementations with Redis, Etcd, and Zookeeper
MaGe Linux Operations
MaGe Linux Operations
Mar 8, 2021 · Operations

How to Build a Highly Available etcd Cluster with SSL Security

This guide explains the fundamentals of etcd, its Raft‑based architecture, cluster planning, secure certificate generation, installation steps, service configuration, and verification commands to deploy a reliable, SSL‑protected etcd cluster for service discovery and configuration management.

ClusterConfiguration ManagementRaft
0 likes · 16 min read
How to Build a Highly Available etcd Cluster with SSL Security
360 Smart Cloud
360 Smart Cloud
Feb 25, 2021 · Backend Development

Understanding Distributed Locks: Concepts, System Classification, and Implementations with Redis and etcd/Zookeeper

This article explains the fundamentals of distributed locks, compares lock implementations based on asynchronous replication and Paxos protocols, and provides practical Redis and etcd/Zookeeper examples—including exclusive and shared lock mechanisms, code snippets, and usage considerations for reliability and safety.

BackendZooKeeperconcurrency
0 likes · 9 min read
Understanding Distributed Locks: Concepts, System Classification, and Implementations with Redis and etcd/Zookeeper
Open Source Linux
Open Source Linux
Feb 20, 2021 · Cloud Native

Fix Inconsistent Kubernetes rc/deployment/service Deletions and Etcd Failures

This guide walks through troubleshooting Kubernetes issues such as partially deleted resources, resetting etcd, apiserver start failures due to missing ServiceAccount certificates, SELinux permission errors, ServiceAccount key generation, etcd startup errors, host trust configuration, and resource limit pitfalls, providing concrete commands and scripts for each problem.

Cluster ManagementKubernetesLinux
0 likes · 17 min read
Fix Inconsistent Kubernetes rc/deployment/service Deletions and Etcd Failures
JD Tech
JD Tech
Feb 8, 2021 · Big Data

JD Remote Shuffle Service: Design, Implementation, and Performance Evaluation

This article presents JD's self‑developed Remote Shuffle Service for Spark, detailing its architecture, goals, implementation details, performance benchmarks, and real‑world production case studies that demonstrate its impact on shuffle efficiency and system stability in large‑scale data processing.

Distributed SystemsRemote Shuffle ServiceShuffle Optimization
0 likes · 17 min read
JD Remote Shuffle Service: Design, Implementation, and Performance Evaluation