Tagged articles

162 articles

Page 1 of 2

May 3, 2026 · Cloud Native

How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide

This article walks Kubernetes operators through a systematic investigation of NotReady node symptoms, explaining the kubelet status mechanism, detailing each diagnostic step—from verifying node conditions with kubectl to checking kubelet, container runtime, resources, network, and certificates—and providing concrete remediation and preventive measures.

KubernetesNotReadycontainerd

0 likes · 35 min read

How to Troubleshoot Kubernetes NotReady Nodes: A Complete Step‑by‑Step Guide

ITPUB

Apr 27, 2026 · Cloud Native

Why Skipping Backups Makes Kubernetes Operations Impossible

The article explains that running production Kubernetes clusters without regular backup and recovery plans exposes businesses to severe risks such as cluster failures, data loss, and prolonged downtime, and it details practical etcd physical and Velero logical backup strategies to mitigate these threats.

BackupCloud NativeKubernetes

0 likes · 9 min read

Why Skipping Backups Makes Kubernetes Operations Impossible

Raymond Ops

Mar 6, 2026 · Cloud Native

Scaling Kubernetes from 1k to 5k Nodes: Complete Performance Tuning Playbook

This article presents a comprehensive, real‑world guide for expanding a Kubernetes cluster from 1,000 to 5,000 nodes, covering control‑plane HA, etcd optimization, network and scheduler tuning, monitoring, and automation, with detailed configurations, code snippets, and a step‑by‑step case study of a large‑scale production environment.

CNIControl Planecluster scaling

0 likes · 22 min read

Scaling Kubernetes from 1k to 5k Nodes: Complete Performance Tuning Playbook

Code Wrench

Jan 13, 2026 · Backend Development

Unlocking etcd: Deep Dive into Go’s Distributed Key‑Value Engine

This article offers a thorough source‑code walkthrough of etcd v3.5+, revealing how its Go‑based architecture implements the Raft consensus algorithm, MVCC storage with BoltDB, efficient network communication via rafthttp, and Go concurrency patterns, while providing practical operational insights for performance tuning and reliability.

BackendGoMVCC

0 likes · 12 min read

Unlocking etcd: Deep Dive into Go’s Distributed Key‑Value Engine

Ray's Galactic Tech

Jan 10, 2026 · Cloud Native

Unlocking Kubernetes: Deep Dive into Core Modules and Extension Mechanisms

This article explores Kubernetes' core components—API Server, Controller Manager, Scheduler, and etcd—detailing their source‑code architecture, key mechanisms, extension points, performance tips, and practical steps for building operators, custom schedulers, and API extensions.

API ServerCloud NativeOperator

0 likes · 10 min read

Unlocking Kubernetes: Deep Dive into Core Modules and Extension Mechanisms

Ray's Galactic Tech

Jan 10, 2026 · Cloud Native

Inside Kubernetes Control Plane: API Server, Scheduler, and Controller Manager Explained

An in‑depth look at Kubernetes’ control plane reveals how the API Server, Scheduler, and Controller Manager work together to manage cluster state, handle authentication, schedule pods, and ensure convergence, with practical HA tips, advanced features, and real‑world deployment workflows.

API ServerControl PlaneController Manager

0 likes · 9 min read

Inside Kubernetes Control Plane: API Server, Scheduler, and Controller Manager Explained

Ray's Galactic Tech

Dec 12, 2025 · Cloud Native

Inside the Kubernetes Master: A Complete Breakdown of Core Components

Master nodes act as the brain of a Kubernetes cluster, hosting essential components such as kube‑apiserver, etcd, kube‑scheduler, kube‑controller‑manager and optionally cloud‑controller‑manager, each with distinct roles, high‑availability designs, security considerations, and operational workflows that together orchestrate and maintain cluster state.

Control PlaneMaster NodeScheduler

0 likes · 8 min read

Inside the Kubernetes Master: A Complete Breakdown of Core Components

Ray's Galactic Tech

Nov 30, 2025 · Cloud Native

Mastering etcd: The Core of Kubernetes State Management and High‑Availability

etcd is the distributed, strongly consistent key‑value store that serves as Kubernetes' single source of truth, handling all cluster state data; this guide explains its architecture, data model, watch mechanism, high‑availability deployment, backup, monitoring, security, and operational best practices for reliable cluster management.

Kubernetesdistributed storageetcd

0 likes · 8 min read

Mastering etcd: The Core of Kubernetes State Management and High‑Availability

dbaplus Community

Nov 24, 2025 · Operations

How We Rescued a Critical etcd Outage in 4 Hours: Step‑by‑Step Recovery Guide

A midnight Kubernetes disaster caused API server timeouts, etcd health failures, and a full service outage, prompting a detailed investigation, root‑cause analysis of massive database fragmentation, and a four‑stage emergency recovery that restored the cluster within 4 hours while outlining preventive measures.

KubernetesOperationsdatabase fragmentation

0 likes · 10 min read

How We Rescued a Critical etcd Outage in 4 Hours: Step‑by‑Step Recovery Guide

Go Development Architecture Practice

Oct 29, 2025 · Backend Development

How to Build a Go API Gateway with Micro Framework: Step‑by‑Step Guide

This article explains the role of an API gateway in microservice architectures, walks through configuring the micro framework’s gateway with options like address, namespace and handler, shows how to modify server and client code, and demonstrates testing the gateway using curl.

GoMicroMicroservices

0 likes · 9 min read

How to Build a Go API Gateway with Micro Framework: Step‑by‑Step Guide

MaGe Linux Operations

Oct 14, 2025 · Cloud Native

Scaling Kubernetes from 1,000 to 5,000 Nodes: Real‑World Performance Tuning Guide

This article details a step‑by‑step, production‑grade guide for expanding a Kubernetes cluster from 1,000 to 5,000 nodes, covering control‑plane HA, etcd tuning, network and scheduler optimizations, monitoring, and real‑world case studies to achieve stable, high‑performance large‑scale deployments.

Control PlaneKubernetesScheduler

0 likes · 27 min read

Scaling Kubernetes from 1,000 to 5,000 Nodes: Real‑World Performance Tuning Guide

Ops Community

Oct 12, 2025 · Operations

When etcd Certificates Expire: How One Failure Crippled an Entire Kubernetes Cluster

A midnight alarm revealed that an expired etcd TLS certificate caused a cascade of failures across a Kubernetes cluster, leading to a full outage that took over half an hour to diagnose, remediate, and restore, highlighting the critical need for proactive certificate management and automated monitoring.

Cluster RecoveryKubernetescertificate expiration

0 likes · 44 min read

When etcd Certificates Expire: How One Failure Crippled an Entire Kubernetes Cluster

Raymond Ops

Sep 16, 2025 · Cloud Native

How to Build a Secure High‑Availability Etcd Cluster on Linux

This guide walks through installing etcd, configuring a three‑node high‑availability cluster with TLS certificates, setting up host files, disabling SELinux and firewalld, creating a Certificate Authority using cfssl, generating node certificates, distributing them, and finally deploying and verifying the cluster on Linux systems.

CertificateCloud NativeLinux

0 likes · 19 min read

How to Build a Secure High‑Availability Etcd Cluster on Linux

Code Wrench

Sep 5, 2025 · Backend Development

Mastering Distributed Locks in Go: Principles, Implementations, and Pitfalls

This article explains the fundamentals of distributed locks, compares Redis, etcd, ZooKeeper and database approaches, provides practical Go code examples, highlights common mistakes, and offers optimization tips so developers can confidently apply the right locking strategy in real-world systems.

BackendGolangZooKeeper

0 likes · 12 min read

Mastering Distributed Locks in Go: Principles, Implementations, and Pitfalls

360 Zhihui Cloud Developer

Aug 25, 2025 · Cloud Native

How a Unified Layer‑7 Load Balancer Powers Hybrid Cloud Traffic Management

This article explains the design, features, and deployment architecture of a unified Layer‑7 load‑balancing service that supports both classic and VPC networks, offering intelligent routing, session persistence, health checks, hot‑config reload, and high availability for cloud‑native environments.

Cloud NativeLayer 7Service Architecture

0 likes · 7 min read

How a Unified Layer‑7 Load Balancer Powers Hybrid Cloud Traffic Management

MaGe Linux Operations

Jul 23, 2025 · Operations

How We Rescued a Crashed K8s Cluster: etcd 100% Fragmentation Recovery

This article details a P0 production incident where a Kubernetes cluster became completely unresponsive due to 100% etcd database fragmentation, describing the step‑by‑step diagnosis, emergency recovery actions, root‑cause analysis, and long‑term preventive measures for reliable cluster operation.

Cluster RecoveryKubernetesOperations

0 likes · 12 min read

How We Rescued a Crashed K8s Cluster: etcd 100% Fragmentation Recovery

Baidu Tech Salon

Jun 17, 2025 · Operations

How Baidu Scaled Its Vertical Search: Elastic Scheduling and Data Management Secrets

This article explains how Baidu's vertical search platform tackled massive data growth and scaling challenges by redesigning its data management system, introducing elastic scheduling, decoupling ETCD access, implementing auto‑scaling, and advancing shard expansion to improve performance, stability, and cost efficiency.

Auto ScalingData ManagementSearch Architecture

0 likes · 18 min read

How Baidu Scaled Its Vertical Search: Elastic Scheduling and Data Management Secrets

Full-Stack DevOps & Kubernetes

May 28, 2025 · Operations

How to Fix etcd “NOSPACE” Errors in Kubernetes Clusters

When a Kubernetes cluster’s etcd reaches its default 2 GB quota, it triggers a “NOSPACE” alarm that blocks all write operations, causing critical services to fail; this guide explains the root cause, how to diagnose the issue with etcdctl, and step‑by‑step remediation including compaction, defragmentation, and quota expansion.

KubernetesNOSPACEcompaction

0 likes · 7 min read

How to Fix etcd “NOSPACE” Errors in Kubernetes Clusters

360 Zhihui Cloud Developer

May 26, 2025 · Cloud Native

Expose Kubernetes Pod Domains Internally with CoreDNS and etcd

This article outlines a step‑by‑step solution for exposing pod domain names inside a corporate network using CoreDNS with an etcd backend, including server and agent deployment, configuration, verification, and practical usage recommendations.

Cloud NativeCoreDNSDNS

0 likes · 6 min read

Expose Kubernetes Pod Domains Internally with CoreDNS and etcd

Cloud Native Technology Community

May 15, 2025 · Operations

How to Precisely Recover a Single Kubernetes Resource from an etcd Snapshot in 5 Steps

This guide explains how to extract and restore a specific Kubernetes resource from an etcd snapshot using a lightweight, step‑by‑step process that avoids full‑cluster recovery, minimizes downtime, and works with tools like etcdctl, auger, and kubectl.

CLIDevOpsKubernetes

0 likes · 8 min read

How to Precisely Recover a Single Kubernetes Resource from an etcd Snapshot in 5 Steps

Linux Ops Smart Journey

Apr 25, 2025 · Cloud Native

How to Seamlessly Migrate Calico from etcd to Kubernetes Datastore

Learn step‑by‑step how to transition Calico’s data store from etcd to Kubernetes, covering prerequisite checks, locking the datastore, exporting and importing data, reconfiguring calicoctl, applying the new manifests, and unlocking the store, while highlighting benefits and tips for a smooth migration.

CalicoKubernetesNetwork Policy

0 likes · 11 min read

How to Seamlessly Migrate Calico from etcd to Kubernetes Datastore

MaGe Linux Operations

Mar 13, 2025 · Operations

How to Build a Secure High‑Availability Etcd Cluster on Linux

This guide walks through installing etcd, generating TLS certificates with cfssl, configuring static, dynamic, or DNS‑based discovery, setting up systemd service files for three nodes, and verifying cluster health using etcdctl, providing a complete step‑by‑step deployment for a production‑grade, cloud‑native key‑value store.

TLSetcdhigh availability

0 likes · 19 min read

Su San Talks Tech

Feb 24, 2025 · Backend Development

How JD’s Hotkey Framework Detects and Mitigates Hot Data in Milliseconds

The JD App backend hotkey framework provides millisecond‑level detection of bursty hot data, users, and interfaces, pushes the hot keys to all JVMs in the cluster, and dramatically reduces database query pressure while improving overall application performance.

BackendDistributedHotKey

0 likes · 8 min read

How JD’s Hotkey Framework Detects and Mitigates Hot Data in Milliseconds

360 Zhihui Cloud Developer

Feb 14, 2025 · Backend Development

How go-zero Extends gRPC: Architecture, Integration, and Service Startup

This article explains why extending gRPC is necessary, outlines the go‑zero directory structure, describes how go‑zero adapts gRPC through a wrapper generated by goctl, and walks through service initialization and startup, highlighting metrics, etcd registration, interceptors, and health checks.

Backend DevelopmentGoMicroservices

0 likes · 9 min read

How go-zero Extends gRPC: Architecture, Integration, and Service Startup

Linux Cloud Computing Practice

Nov 5, 2024 · Cloud Native

How to Build a High‑Availability Kubernetes Cluster: Kubeadm & Binary Package Guide

This comprehensive tutorial walks you through planning, preparing hardware, choosing deployment methods, and step‑by‑step installation of a highly available Kubernetes cluster using kubeadm and manual binary packages, covering system initialization, certificate generation, component configuration, CNI networking, and cluster verification.

CNIDockerKubernetes

0 likes · 28 min read

How to Build a High‑Availability Kubernetes Cluster: Kubeadm & Binary Package Guide

Su San Talks Tech

Sep 30, 2024 · Backend Development

How JD’s Hotkey Framework Detects and Pushes Hot Data in Milliseconds

JD’s Hotkey framework provides millisecond‑level detection and cluster‑wide push of hot data, users, and interfaces, dramatically reducing backend query load, improving performance, and supporting scenarios such as local caching and rate limiting, with proven scalability demonstrated in large‑scale e‑commerce promotions.

Backend PerformanceJavadistributed caching

0 likes · 7 min read

How JD’s Hotkey Framework Detects and Pushes Hot Data in Milliseconds

Java Backend Technology

Sep 23, 2024 · Backend Development

How JD’s Hotkey Framework Detects and Mitigates Hot Data in Milliseconds

The JD App backend hotkey framework instantly detects bursty hot data, users, and interfaces, pushes the hot keys to all JVMs in the cluster, and dramatically reduces database load while supporting fine‑grained rate limiting and caching across distributed services.

BackendDistributedHotKey

0 likes · 7 min read

FunTester

Jul 15, 2024 · Backend Development

Comparison and Practical Guide to Java etcd Clients

This article compares popular Java etcd client libraries, evaluates their features, performance, and suitability, and provides a hands‑on tutorial using jetcd with Maven dependencies, code examples for watching and reading keys, and discusses runtime considerations such as thread handling.

DistributedJavaetcd

0 likes · 10 min read

Comparison and Practical Guide to Java etcd Clients

MaGe Linux Operations

Jul 13, 2024 · Operations

Boosting etcd Write Performance on AWS: From gp2 Limits to GP3 and Beyond

This article details how a team evaluated and improved etcd cluster write performance on AWS by testing gp2 volume IOPS limits, using etcdctl and fio, upgrading to GP3, and analyzing latency and throughput to identify storage bottlenecks and achieve faster synchronization.

AWSGP3IOPS

0 likes · 9 min read

Boosting etcd Write Performance on AWS: From gp2 Limits to GP3 and Beyond

FunTester

Jul 1, 2024 · Cloud Native

Mastering etcd with Go: From Basics to Distributed Locks

This article introduces etcd as a reliable distributed key‑value store built on Raft, outlines its key features and common use cases such as service discovery and configuration management, and provides a complete Go tutorial covering dependency setup, server launch, client implementation, read/write testing, and distributed lock usage.

Configuration ManagementGoRaft

0 likes · 10 min read

Mastering etcd with Go: From Basics to Distributed Locks

Su San Talks Tech

Jun 11, 2024 · Backend Development

Choosing the Right Service Registry: Zookeeper, Eureka, Nacos, Consul, and Etcd Compared

This comprehensive guide explains the fundamentals, CAP trade‑offs, and core algorithms of service registries, then details Zookeeper, Eureka, Nacos, Consul, and Etcd features, compares them across health checks, multi‑datacenter support, KV storage, and provides practical selection advice for developers and architects.

ConsulNacosZooKeeper

0 likes · 23 min read

Choosing the Right Service Registry: Zookeeper, Eureka, Nacos, Consul, and Etcd Compared

Ops Development Stories

Apr 12, 2024 · Cloud Native

Mastering etcd: Architecture, Monitoring & Performance Tuning

This article provides a comprehensive overview of etcd—including its origins, role in Kubernetes, version evolution, layered architecture, key terminology, operational commands, monitoring metrics, benchmarking procedures, disk‑performance testing, and tuning recommendations—for building reliable cloud‑native clusters.

Benchmarkcloud-nativedistributed storage

0 likes · 17 min read

Mastering etcd: Architecture, Monitoring & Performance Tuning

dbaplus Community

Apr 1, 2024 · Cloud Native

Uncovering Kubernetes List Ordering: WatchCache, WatchList, and Hidden Costs

This article explains why Kubernetes list results are alphabetically ordered, how the WatchCache and WatchList mechanisms affect ordering and performance, and examines the underlying Etcd behavior, code implementations, and ongoing community efforts to improve consistency and latency.

Cloud NativeKubernetesList Ordering

0 likes · 11 min read

Uncovering Kubernetes List Ordering: WatchCache, WatchList, and Hidden Costs

Open Source Linux

Mar 25, 2024 · Cloud Native

How to Safely Backup and Restore Etcd in Kubernetes: A Step‑by‑Step Guide

This article explains why regular Etcd snapshots are essential for Kubernetes disaster recovery and provides detailed, command‑line procedures for restoring Etcd data on both single‑node and high‑availability clusters, including necessary configuration adjustments and verification steps.

BackupOperationsRestore

0 likes · 13 min read

How to Safely Backup and Restore Etcd in Kubernetes: A Step‑by‑Step Guide

Liangxu Linux

Mar 7, 2024 · Operations

How Upgrading EBS Volumes Boosted etcd Write Performance by 30%

A technical deep‑dive shows how a team managing dozens of Kubernetes clusters diagnosed a write‑ahead‑log bottleneck in etcd, measured IOPS and latency with etcdctl and fio, upgraded gp2 volumes to gp3, and discovered diminishing returns beyond 3000 IOPS while explaining the role of fdatasync in storage performance.

AWSEBSIOPS

0 likes · 11 min read

How Upgrading EBS Volumes Boosted etcd Write Performance by 30%

Architect

Feb 29, 2024 · Cloud Native

Which Service Registry Should You Choose? Zookeeper, Eureka, Nacos, Consul, or Etcd

This comprehensive guide analyzes the core concepts, CAP trade‑offs, consensus algorithms, and practical deployment details of Zookeeper, Eureka, Nacos, Consul, and Etcd, providing concrete examples and selection criteria to help engineers and architects decide the most suitable service registry for their micro‑service environments.

CAP theoremConsulMicroservices

0 likes · 26 min read

Which Service Registry Should You Choose? Zookeeper, Eureka, Nacos, Consul, or Etcd

Beike Product & Technology

Jan 29, 2024 · Information Security

Kubernetes Security Risks and Hardening Recommendations

This article analyzes Kubernetes security threats from cloud, cluster, and container perspectives, enumerates high‑risk permissions, default privileged accounts, and insecure configurations, and provides concrete hardening steps such as least‑privilege RAM policies, etcd encryption, RBAC tightening, and workload isolation measures.

CloudNativeKubernetesPodSecurity

0 likes · 31 min read

Kubernetes Security Risks and Hardening Recommendations

MaGe Linux Operations

Jan 27, 2024 · Operations

Why Upgrading EBS Volumes Boosted etcd Write Performance—and What Still Limits It

This article details how upgrading AWS EBS volumes from gp2 to GP3 and adjusting instance types improved etcd cluster write throughput, analyzes IOPS bottlenecks using iostat and fio, and explains why further IOPS gains remain constrained by storage and OS caching.

AWSEBSGP3

0 likes · 11 min read

Why Upgrading EBS Volumes Boosted etcd Write Performance—and What Still Limits It

Tencent Cloud Developer

Jan 24, 2024 · Backend Development

Understanding the Safety of Redis Distributed Locks and the Redlock Debate

Redis distributed locks require unique identifiers, atomic Lua releases, and TTL refreshes to avoid deadlocks, while the Redlock algorithm adds majority quorum but remains vulnerable to clock drift and client pauses, so critical systems should combine it with fencing tokens or version checks for true safety.

RedlockZooKeeperconcurrency

0 likes · 36 min read

Understanding the Safety of Redis Distributed Locks and the Redlock Debate

Su San Talks Tech

Dec 22, 2023 · Cloud Native

Choosing the Right Service Registry: Zookeeper, Eureka, Nacos, Consul & Etcd Compared

This comprehensive guide explains the core concepts, architecture, CAP trade‑offs, and practical features of five popular service registries—Zookeeper, Eureka, Nacos, Consul, and Etcd—helping engineers and architects select the most suitable solution for microservice environments.

ConsulNacosRegistry

0 likes · 23 min read

Choosing the Right Service Registry: Zookeeper, Eureka, Nacos, Consul & Etcd Compared

Efficient Ops

Dec 13, 2023 · Cloud Native

How to Build Your Own Kubernetes‑Style Container Orchestration System

This article walks through the evolution from a single‑machine Java monolith to a distributed, container‑based platform, detailing master‑worker roles, core Kubernetes‑like components, networking, scheduling, and plug‑ins for a complete cloud‑native orchestration solution.

Cloud NativeKubernetescontainer orchestration

0 likes · 8 min read

How to Build Your Own Kubernetes‑Style Container Orchestration System

Aikesheng Open Source Community

Dec 6, 2023 · Backend Development

Comparison of Consistency Read Implementations in Consul and etcd

This article compares the consistency read mechanisms of the distributed key‑value stores Consul and etcd, detailing Consul’s three read modes and leader‑forwarding logic, and explaining etcd’s serialize and linearizable reads, including the internal notification and index‑checking processes.

Backend DevelopmentConsistency ReadConsul

0 likes · 6 min read

Comparison of Consistency Read Implementations in Consul and etcd

Efficient Ops

Dec 4, 2023 · Cloud Native

How Does a Kubernetes Pod Get Created? Step‑by‑Step Walkthrough

This article walks through the complete Kubernetes pod creation workflow, from submitting the YAML with kubectl to the API server, storing the definition in etcd, scheduling, kubelet orchestration, container runtime delegation, CNI networking, health probing, and endpoint setup for services.

CNIKubernetesPod Lifecycle

0 likes · 3 min read

How Does a Kubernetes Pod Get Created? Step‑by‑Step Walkthrough

IT Services Circle

Nov 3, 2023 · Databases

Resolving Compatibility Issues Between etcd v3.3/v3.4, gRPC, and Protobuf

This article analyses the frequent compatibility problems that arise when using etcd v3.3/v3.4 together with newer gRPC and protobuf versions, explains their root causes, and presents the solution introduced in etcd v3.5 with modular Go packages.

CompatibilityetcdgRPC

0 likes · 8 min read

Resolving Compatibility Issues Between etcd v3.3/v3.4, gRPC, and Protobuf

Test Development Learning Exchange

Oct 31, 2023 · Cloud Native

Understanding Kubernetes Master Components: API Server, etcd, Scheduler, and More

This article explains the key components running on a Kubernetes master node—including the API Server, etcd, kube‑scheduler, kube‑controller‑manager, and Cloud Provider—detailing their roles, how they interact, and providing practical curl and kubectl commands for common operations.

API ServerCloud ProviderController Manager

0 likes · 13 min read

Understanding Kubernetes Master Components: API Server, etcd, Scheduler, and More

Open Source Linux

Oct 12, 2023 · Operations

Automate etcd Snapshots and Store to MinIO with Kubernetes CronJobs

This guide shows how to create daily etcd snapshots on a Kubernetes cluster, upload them to MinIO, and orchestrate the whole process with a Python script, CronJob, Docker, Drone CI/CD, and ArgoCD for seamless backup automation.

CronJobDockerKubernetes

0 likes · 5 min read

Automate etcd Snapshots and Store to MinIO with Kubernetes CronJobs

DevOps Cloud Academy

Aug 2, 2023 · Cloud Native

Backing Up and Restoring etcd in a Kubernetes Cluster

This tutorial walks through installing the etcd client, creating an Nginx deployment for verification, backing up the etcd data store, validating the backup, and restoring the backup to a Kubernetes cluster while handling component shutdown and restart procedures.

Cloud NativeDevOpsKubernetes

0 likes · 14 min read

Backing Up and Restoring etcd in a Kubernetes Cluster

Efficient Ops

Jul 11, 2023 · Operations

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

This article details a real-world Kubernetes control‑plane outage where kube‑apiserver repeatedly OOM‑killed, explores cluster metrics, logs, heap and goroutine profiles, hypothesizes root causes such as etcd latency and DeleteCollection memory leaks, and offers step‑by‑step troubleshooting and prevention guidance.

OOMetcdkube-apiserver

0 likes · 21 min read

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

Open Source Linux

Apr 21, 2023 · Cloud Native

Mastering Kubernetes Architecture: How Control Plane and Worker Nodes Work Together

This article explains the core components of Kubernetes architecture—including the control plane (etcd, API server, controller manager, scheduler) and worker node components (kubelet, kube-proxy, container runtimes)—detailing their roles, interactions, and best‑practice considerations for maintaining healthy, scalable clusters.

Control PlaneKubernetesScheduler

0 likes · 12 min read

Mastering Kubernetes Architecture: How Control Plane and Worker Nodes Work Together

Liangxu Linux

Apr 16, 2023 · Backend Development

Mastering API Gateways: Concepts, Features, and a Traefik‑Based Custom Solution

This article provides a comprehensive overview of API gateway fundamentals, compares popular open‑source gateways, and details a custom Traefik‑based microservice gateway architecture with routing, authentication, protocol conversion, and high‑performance connection pooling.

Backend ArchitectureTraefikapi-gateway

0 likes · 18 min read

Mastering API Gateways: Concepts, Features, and a Traefik‑Based Custom Solution

Efficient Ops

Feb 7, 2023 · Operations

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

This article details a real‑world Kubernetes control‑plane outage where kube‑apiserver repeatedly OOM‑killed, examines cluster metrics, logs, heap and goroutine profiles, explores root‑cause hypotheses such as etcd latency and DeleteCollection memory leaks, and offers practical prevention steps.

OOMProfilingetcd

0 likes · 19 min read

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

Cloud Native Technology Community

Feb 1, 2023 · Cloud Native

Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview

This article breaks down the core concepts of Kubernetes—including its master‑worker architecture, pod scheduling, etcd storage, service exposure, scaling mechanisms, and controller interactions—through a series of clear questions and illustrated answers to help beginners grasp the platform’s complexity.

Cloud NativeKubernetesPod Scheduling

0 likes · 8 min read

Why Is Kubernetes So Hard to Master? A Step‑by‑Step Overview

JD Cloud Developers

Dec 19, 2022 · Cloud Native

Why etcd Is the Backbone of Cloud‑Native Service Discovery and Coordination

This article explains what etcd is, compares it with Zookeeper, describes its architecture and core components such as WAL, snapshots and boltdb, outlines its key features, and shows how it powers service registration, watch mechanisms, cluster monitoring and leader election in cloud‑native systems.

Cloud NativeRaftetcd

0 likes · 10 min read

Why etcd Is the Backbone of Cloud‑Native Service Discovery and Coordination

MaGe Linux Operations

Nov 6, 2022 · Cloud Native

How to Safely Shut Down and Restart a Kubernetes Cluster

This guide walks you through the essential steps, commands, and precautions for safely draining nodes, backing up applications, CRDs, and etcd, then shutting down and later restarting a Kubernetes cluster while avoiding common pitfalls.

BackupCluster MaintenanceKubernetes

0 likes · 6 min read

How to Safely Shut Down and Restart a Kubernetes Cluster

Open Source Linux

Oct 14, 2022 · Cloud Native

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

On September 10 2021, a Kubernetes cluster experienced intermittent kubectl hangs caused by kube-apiserver OOM kills, leading to cascading control-plane failures; this article details the environment, observed metrics, log analysis, code inspection of DeleteCollection, and provides troubleshooting steps to prevent similar incidents.

OOMcloud-nativeetcd

0 likes · 21 min read

Practical DevOps Architecture

Sep 15, 2022 · Cloud Native

Brief Overview of etcd and Kubernetes: Features, Use Cases, and Core Components

This article provides a concise overview of etcd and Kubernetes, detailing etcd’s features and use‑cases, explaining Kubernetes fundamentals, its relationship with Docker, and describing key components such as Minikube, Kubectl, Kubelet, common deployment methods, and the platform’s cluster management architecture.

Cloud NativeDistributed Systemsetcd

0 likes · 6 min read

Brief Overview of etcd and Kubernetes: Features, Use Cases, and Core Components

Top Architect

Sep 3, 2022 · Backend Development

Implementing Distributed Locks with Redis, Zookeeper, and etcd

The article explains how to build reliable distributed locks using Redis, Zookeeper, and etcd, describing the essential concepts of mutual exclusion, safety, and liveness, showing code examples, highlighting common issues, and comparing each solution's advantages and drawbacks.

ZooKeeperconcurrencydistributed-lock

0 likes · 6 min read

Implementing Distributed Locks with Redis, Zookeeper, and etcd

Tencent Cloud Developer

Aug 29, 2022 · Cloud Computing

High‑Availability DNS Solutions on Tencent Cloud: BIND and CoreDNS with ETCD

The article details two high‑availability DNS implementations for Tencent Cloud—an intelligent BIND‑based server and a CoreDNS solution backed by an ETCD cluster—covering DNS fundamentals, installation steps, configuration files, zone creation, health checks, and verification of internal and external name resolution across multi‑AZ deployments.

BINDCoreDNSDNS

0 likes · 24 min read

High‑Availability DNS Solutions on Tencent Cloud: BIND and CoreDNS with ETCD

Efficient Ops

Aug 9, 2022 · Operations

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

This article analyzes a September 2021 incident where a Kubernetes cluster’s kube-apiserver repeatedly OOM-killed, causing kubectl hangs, by examining cluster specs, monitoring data, logs, heap and goroutine profiles, and the DeleteCollection implementation, ultimately offering troubleshooting steps and preventive measures for control-plane stability.

GoroutineOOMcloud-native

0 likes · 20 min read

Architecture Digest

Jul 27, 2022 · Databases

Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage

This article provides a detailed introduction to etcd, covering its purpose as a highly‑available distributed key‑value store, core Raft‑based architecture, key concepts, common application scenarios, step‑by‑step installation and cluster deployment, as well as essential command‑line operations for managing data, backups, and cluster members.

BackupDeploymentKubernetes

0 likes · 26 min read

Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage

Top Architect

Jul 23, 2022 · Cloud Native

Comprehensive Guide to etcd: Overview, Architecture, Deployment, and Usage

This article provides a detailed introduction to etcd, covering its purpose as a highly available distributed key‑value store, core concepts like Raft consensus, key features, common use cases such as service discovery and configuration management, step‑by‑step installation for single‑node and cluster deployments, and essential etcdctl commands for managing data and cluster members.

Cloud NativeConfiguration ManagementDistributed Systems

0 likes · 24 min read

Architect

Jul 21, 2022 · Cloud Native

Comprehensive Guide to etcd: Overview, Architecture, Installation, and Usage

This article provides a thorough introduction to etcd, covering its purpose, history, core features, key terminology, internal architecture, common application scenarios such as service discovery and distributed locking, step‑by‑step installation and cluster deployment, essential command‑line operations, backup procedures, and practical recommendations.

InstallationKubernetescommand-line

0 likes · 25 min read

Comprehensive Guide to etcd: Overview, Architecture, Installation, and Usage

Open Source Linux

Jun 16, 2022 · Cloud Native

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler, and Nodes

This article explains the key Kubernetes control‑plane components—including etcd, the API Server, Controller Manager, Scheduler, as well as worker‑node components like Kubelet, kube‑proxy, and the container runtime—detailing their roles, interactions, and the underlying mechanisms such as Raft consensus and admission control.

API ServerControl PlaneKubernetes

0 likes · 10 min read

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler, and Nodes

MaGe Linux Operations

May 25, 2022 · Operations

Why Kubernetes LIST Requests Can Cripple Your Cluster and How to Fix Them

This article examines how heavy LIST operations in unstructured storage systems like Ceph and etcd consume massive I/O, network and CPU, threaten cluster stability, and offers detailed code analysis, performance testing, and practical tuning recommendations to keep large‑scale Kubernetes clusters reliable.

KubernetesListScalability

0 likes · 29 min read

Why Kubernetes LIST Requests Can Cripple Your Cluster and How to Fix Them

Yiche Technology

May 20, 2022 · Cloud Native

APISIX API Gateway: Architecture, Features, Performance Comparison, and Future Outlook

This article introduces the APISIX API gateway, explaining its cloud‑native architecture built on OpenResty and Etcd, the advantages over traditional monolithic service frameworks, detailed feature breakdowns, performance benchmark comparisons with OpenResty, multi‑cluster management practices, usage scenarios, monitoring, logging, and future development directions.

APISIXCloud NativeOpenResty

0 likes · 12 min read

APISIX API Gateway: Architecture, Features, Performance Comparison, and Future Outlook

Open Source Linux

May 12, 2022 · Cloud Native

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler & More

This article explains the core components of the Kubernetes control plane—including etcd, the API Server, Controller Manager, Scheduler—as well as key worker‑node components like Kubelet, kube‑proxy, and the container runtime, detailing their roles, interactions, and essential functions.

API ServerControl PlaneKubernetes

0 likes · 11 min read

Mastering Kubernetes Control Plane: etcd, API Server, Scheduler & More

Cloud Native Technology Community

May 10, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,100 nodes and 200,000 Pods, describing cluster topology, workload generation, API server bottlenecks, controller manager and scheduler tuning, extensive etcd optimizations, and the resulting performance gains that met Kubernetes SLOs.

Cloud NativeKubernetesPayPal

0 likes · 13 min read

How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods

Architecture Digest

Apr 25, 2022 · Cloud Native

Kubernetes Architecture Overview and Detailed Components

This article explains the goals, design principles, and detailed components of Kubernetes architecture, covering its control plane, API server, etcd store, scheduler, kubelet, container runtime, and kube-proxy, and summarizes how these parts work together to provide a scalable, portable, and automated container orchestration platform.

Control PlaneKubernetescontainer orchestration

0 likes · 12 min read

Kubernetes Architecture Overview and Detailed Components

Open Source Linux

Mar 17, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,000 nodes and 200,000 pods, describing the cluster topology, workload generation, bottlenecks in the API server, controller manager, scheduler, and etcd, and the optimizations that enabled stable performance at massive scale.

Cloud NativeKubernetesPayPal

0 likes · 12 min read

How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods

Architect

Feb 18, 2022 · Cloud Native

Large‑Scale etcd Cluster Performance Optimization and Pod Data Splitting in Ant Group’s Sigma

This article describes how Ant Group tackled the performance ceiling of its massive Sigma Kubernetes clusters by horizontally splitting etcd storage for Pods, Leases and Events, redesigning watch handling to avoid component restarts, and using snapshot‑based migration to preserve data integrity while reducing latency.

Cluster PerformanceData MigrationKubernetes

0 likes · 27 min read

Large‑Scale etcd Cluster Performance Optimization and Pod Data Splitting in Ant Group’s Sigma

Top Architect

Feb 17, 2022 · Cloud Native

Understanding etcd: Features, Use Cases, and Comparison with Zookeeper

This article provides a comprehensive overview of etcd, describing its purpose as a distributed, reliable key‑value store, outlining its core features, detailing multiple real‑world scenarios such as service discovery, configuration management, load balancing, distributed locking, and comparing its advantages over Zookeeper.

RaftZookeeper comparisondistributed key-value store

0 likes · 15 min read

Understanding etcd: Features, Use Cases, and Comparison with Zookeeper

ITFLY8 Architecture Home

Feb 15, 2022 · Operations

Why etcd Is the Backbone of Modern Distributed Systems

This article explains what etcd is, its origins, core features such as simplicity, security, speed, and reliability, and details eight practical scenarios—including service discovery, messaging, load balancing, distributed coordination, locks, queues, monitoring, and leader election—showing why it often outperforms Zookeeper in cloud‑native environments.

Raftetcdkey-value store

0 likes · 15 min read

Why etcd Is the Backbone of Modern Distributed Systems

Architecture Digest

Feb 13, 2022 · Cloud Native

What Is etcd? Features, Use Cases, and Comparison with Zookeeper

This article explains the distributed key‑value store etcd, its origin, core characteristics such as simplicity, security, speed and Raft‑based reliability, and details eight practical scenarios—including service discovery, pub/sub, load balancing, distributed locks and leader election—while also comparing it with Zookeeper.

Configuration ManagementRaftdistributed key-value store

0 likes · 15 min read

What Is etcd? Features, Use Cases, and Comparison with Zookeeper

Architect

Jan 12, 2022 · Cloud Native

Service Governance and etcd: Architecture, Core Technologies, and Large‑Scale Implementation

This article explains service governance concepts, the challenges of managing thousands of micro‑services, introduces etcd and its Raft‑based consistency model, details BoltDB storage internals, and describes Baidu's large‑scale Tianlu platform with its high‑availability, performance, scalability, and operational metrics.

Distributed Systemsetcdservice governance

0 likes · 21 min read

Service Governance and etcd: Architecture, Core Technologies, and Large‑Scale Implementation

Efficient Ops

Nov 30, 2021 · Cloud Native

How to Safely Backup and Restore etcd in a Kubernetes Cluster

This guide explains why etcd is critical for Kubernetes, walks through creating snapshots with etcdctl, automating backups via scripts and cron, and details step‑by‑step procedures for restoring a failed etcd cluster, including stopping services, cleaning data directories, and restarting components to recover the whole cluster.

BackupRestorecloud-native

0 likes · 16 min read

How to Safely Backup and Restore etcd in a Kubernetes Cluster

Baidu Intelligent Testing

Nov 16, 2021 · Cloud Native

Service Governance and etcd: Concepts, Raft & BoltDB Implementation, and Large‑Scale Practices at Baidu

This article introduces service governance fundamentals, explains how etcd’s Raft‑based consensus and BoltDB storage work, compares etcd with ZooKeeper and Consul, and describes Baidu’s large‑scale, high‑availability, high‑performance service‑governance platform built on these technologies.

BoltDBCloud NativeRaft

0 likes · 20 min read

Service Governance and etcd: Concepts, Raft & BoltDB Implementation, and Large‑Scale Practices at Baidu

Baidu Geek Talk

Nov 10, 2021 · Operations

How etcd Powers Scalable Service Governance: Raft, BoltDB, and Real‑World Practices

This article explores service governance fundamentals, examines why etcd’s Raft‑based consensus and BoltDB storage make it ideal for large‑scale systems, compares it with ZooKeeper and Consul, and shares Baidu’s practical architecture, performance tricks, and operational metrics for high‑availability, high‑performance service management.

BoltDBDistributed SystemsPerformance Optimization

0 likes · 23 min read

How etcd Powers Scalable Service Governance: Raft, BoltDB, and Real‑World Practices

Efficient Ops

Nov 9, 2021 · Operations

How Ant Group Scales etcd for 10k‑Node Kubernetes Clusters: High‑Availability Secrets

This article examines Ant Group's strategies for achieving high availability of the etcd key‑value store in a massive 10,000‑node Kubernetes cluster, detailing challenges, performance metrics, filesystem upgrades, tuning parameters, operational platform insights, and future directions for distributed etcd deployments.

Kubernetesetcdlarge scale

0 likes · 21 min read

How Ant Group Scales etcd for 10k‑Node Kubernetes Clusters: High‑Availability Secrets

Liangxu Linux

Oct 17, 2021 · Cloud Native

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Best Practices

This guide explains how to prepare large‑scale Kubernetes clusters on public clouds by expanding node quotas, tuning kernel parameters, configuring high‑availability etcd, adjusting kube‑apiserver limits, and applying pod‑level resource and affinity best practices.

Kernel ParametersKubeAPIServerKubernetes

0 likes · 8 min read

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Best Practices

360 Tech Engineering

Sep 9, 2021 · Databases

PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd

This article details the design, deployment, configuration, operation, monitoring, and backup of a PostgreSQL high‑availability cluster built on Patroni, Etcd, and LVS at 360, covering hardware layout, software versions, installation steps, parameter tuning, fail‑over testing, and future outlook.

BackupClusterPatroni

0 likes · 16 min read

PostgreSQL High‑Availability Cluster Deployment with Patroni and Etcd

MaGe Linux Operations

Sep 8, 2021 · Cloud Native

How to Scale Kubernetes Clusters: Quotas, Kernel Tweaks, and Best Practices

This guide outlines essential steps for scaling large Kubernetes clusters on public clouds, covering node quota adjustments, kernel parameter tuning, etcd high‑availability setup, API server and pod configurations, and best‑practice recommendations to ensure stable performance as node counts grow.

Kubernetescluster scalingetcd

0 likes · 7 min read

How to Scale Kubernetes Clusters: Quotas, Kernel Tweaks, and Best Practices

High Availability Architecture

Aug 31, 2021 · Cloud Native

High‑Availability Architecture for etcd in Ant Group’s Massive Kubernetes Clusters

The article describes how Ant Group operates a world‑largest Kubernetes deployment of over 10,000 nodes, details the performance challenges of the etcd key‑value store at such scale, and outlines a comprehensive set of hardware upgrades, configuration tuning, monitoring, data‑splitting, and future distributed‑etcd strategies to achieve robust high‑availability.

etcdperformance tuningscale-out

0 likes · 21 min read

High Availability Architecture

Aug 20, 2021 · Cloud Native

Apache APISIX Service Mesh: Architecture, Challenges, and the apisix-mesh-agent Solution

This article introduces Apache APISIX, examines the challenges of using it as a data‑plane in a service‑mesh architecture, presents the apisix‑mesh‑agent as an intermediary solution, and outlines the advantages, design, and future roadmap of a APISIX‑based cloud‑native service mesh.

Apache APISIXCloud NativeService Mesh

0 likes · 14 min read

Apache APISIX Service Mesh: Architecture, Challenges, and the apisix-mesh-agent Solution

Open Source Linux

Jul 25, 2021 · Cloud Native

Demystifying Kubernetes: Core Components and How They Work Together

This article provides a concise, question‑driven overview of Kubernetes, explaining the roles of master and worker nodes, pod networking, scheduling, storage with etcd, service exposure, scaling mechanisms, and how the various controllers collaborate to manage a cloud‑native cluster.

Cloud NativeKubernetesPods

0 likes · 10 min read

Demystifying Kubernetes: Core Components and How They Work Together

Ops Development Stories

Jul 21, 2021 · Cloud Native

How to Build, Manage, and Recover an etcd Cluster with TLS on CentOS

This guide walks you through setting up a three‑node etcd cluster on CentOS 7 using static configuration and self‑signed TLS certificates, covering member addition and removal, data backup via snapshots, and full cluster restoration from those snapshots.

CentOSClusterTLS

0 likes · 25 min read

How to Build, Manage, and Recover an etcd Cluster with TLS on CentOS

Full-Stack DevOps & Kubernetes

Jul 19, 2021 · Cloud Native

Mastering Kubernetes Node Isolation, Scaling, and Rolling Updates – Practical Commands and Tips

This guide walks through essential Kubernetes operations such as isolating and recovering nodes, expanding clusters with new nodes, dynamically scaling Pods, managing Labels, scheduling Pods to specific Nodes, performing rolling updates, and configuring high‑availability for etcd and Master components, all with concrete command‑line examples and YAML snippets.

KubernetesNode ManagementRolling Update

0 likes · 19 min read

Mastering Kubernetes Node Isolation, Scaling, and Rolling Updates – Practical Commands and Tips

Code Ape Tech Column

Jul 15, 2021 · Operations

What Really Caused Bilibili’s Sudden Outage? A Deep Dive into the Technical Failure

The article analyzes Bilibili's recent half‑hour service disruption, explores technical rumors such as an etcd crash, examines Kubernetes‑based cloud‑native infrastructure, reviews similar historic outages, and offers expert recommendations for improving high‑availability and disaster‑recovery in large‑scale internet services.

BilibiliCloud NativeKubernetes

0 likes · 8 min read

What Really Caused Bilibili’s Sudden Outage? A Deep Dive into the Technical Failure

Full-Stack DevOps & Kubernetes

Jul 13, 2021 · Operations

How to Upgrade a High‑Availability Kubernetes Cluster and Etcd with Zero Downtime

This guide walks through upgrading a HA Kubernetes cluster—from updating kubeadm, kubelet, and kubectl on master and worker nodes—to safely migrating an etcd cluster, covering version compatibility, backup procedures, and step‑by‑step commands to minimize service interruption.

HAKubernetesetcd

0 likes · 12 min read

How to Upgrade a High‑Availability Kubernetes Cluster and Etcd with Zero Downtime

Ops Development Stories

Jul 1, 2021 · Databases

Understanding Database Write-Ahead Logs (WAL) and Their Implementation in etcd

This article explains common database logging mechanisms such as MySQL redo logs and binlogs, compares them with Redis AOF and etcd's Raft‑based WAL, and provides an in‑depth analysis of etcd's WAL source code, including key structures, creation process, record types, encoding, and file pipeline management.

Database LogsGoRaft

0 likes · 18 min read

Understanding Database Write-Ahead Logs (WAL) and Their Implementation in etcd

High Availability Architecture

Jun 30, 2021 · Databases

Resolving gRPC‑gateway Limits and mTLS Certificate Issues in etcd 3.x for Apache APISIX

This article explains how etcd 3.x switched its external API to gRPC, the challenges of using its gRPC‑gateway for HTTP requests in Apache APISIX, the default message size limit causing sync failures, and the certificate configuration pitfalls that were fixed through a PR merged in v3.5.0.

Apache APISIXBackendHTTP API

0 likes · 8 min read

Resolving gRPC‑gateway Limits and mTLS Certificate Issues in etcd 3.x for Apache APISIX

Ops Development Stories

Jun 16, 2021 · Backend Development

How Raft Achieves Consensus: Leader Election, Log Replication, and State Machine Explained

This article explains the core mechanisms of the Raft consensus algorithm—including leader election, log replication, safety guarantees, message structures, state transitions, and key Go implementations in etcd-raft—providing code examples and detailed analysis of functions such as becomeLeader, tickElection, and appendEntry.

ConsensusDistributed SystemsGo

0 likes · 21 min read

How Raft Achieves Consensus: Leader Election, Log Replication, and State Machine Explained

Open Source Linux

May 30, 2021 · Cloud Native

What Is etcd? Features, Use Cases, and How It Powers Kubernetes

This article explains etcd as a highly available distributed key‑value store, outlines its simple, secure, fast, and reliable characteristics, describes typical scenarios such as service discovery and distributed locking, and then provides a comprehensive overview of Kubernetes architecture, components, deployment methods, security, networking, storage, and operational best practices.

Kubernetescontainer orchestrationetcd

0 likes · 45 min read

What Is etcd? Features, Use Cases, and How It Powers Kubernetes

Efficient Ops

May 6, 2021 · Operations

How to Safely Backup and Restore etcd in a Kubernetes Cluster

This guide explains why etcd backup is critical for Kubernetes disaster recovery, walks through snapshot creation, distribution, scheduled cron jobs, and provides a step‑by‑step procedure to restore the cluster on all nodes, ensuring services resume correctly.

BackupClusterKubernetes

0 likes · 14 min read

NiuNiu MaTe

Apr 8, 2021 · Backend Development

Master Go: A Structured Learning Path from Beginner to Cloud‑Native Expert

This guide presents a concise, step‑by‑step roadmap for mastering Go, covering beginner fundamentals, essential packages, and advanced cloud‑native projects like Etcd, Docker, Kubernetes, and Istio, while offering practical study tips and career insights.

Backend DevelopmentDockerGo

0 likes · 10 min read

Master Go: A Structured Learning Path from Beginner to Cloud‑Native Expert

360 Zhihui Cloud Developer

Mar 31, 2021 · Operations

How to Efficiently Backup and Restore Your Kubernetes Cluster with Velero and Other Tools

Accidental namespace deletions in Kubernetes can cause massive data loss, but by using etcd snapshots, resource‑level backup tools like Velero, PX‑Backup, and Kasten, and configuring scheduled backups, hooks, and PVC migration, you can protect clusters, streamline recovery, and avoid painful manual redeployments.

BackupCluster MigrationKubernetes

0 likes · 12 min read

How to Efficiently Backup and Restore Your Kubernetes Cluster with Velero and Other Tools

360 Quality & Efficiency

Mar 12, 2021 · Backend Development

Distributed Lock Implementations with Redis, Etcd, and Zookeeper

This article explains the concept of distributed locks, outlines common application scenarios, and provides detailed Java implementations using Redis (including Redisson and RedLock), Etcd, and Zookeeper, complete with code examples and a comparative summary of their advantages and drawbacks.

Backenddistributed-locketcd

0 likes · 14 min read

Distributed Lock Implementations with Redis, Etcd, and Zookeeper

MaGe Linux Operations

Mar 8, 2021 · Operations

How to Build a Highly Available etcd Cluster with SSL Security

This guide explains the fundamentals of etcd, its Raft‑based architecture, cluster planning, secure certificate generation, installation steps, service configuration, and verification commands to deploy a reliable, SSL‑protected etcd cluster for service discovery and configuration management.

ClusterConfiguration ManagementRaft

0 likes · 16 min read

How to Build a Highly Available etcd Cluster with SSL Security

360 Smart Cloud

Feb 25, 2021 · Backend Development

Understanding Distributed Locks: Concepts, System Classification, and Implementations with Redis and etcd/Zookeeper

This article explains the fundamentals of distributed locks, compares lock implementations based on asynchronous replication and Paxos protocols, and provides practical Redis and etcd/Zookeeper examples—including exclusive and shared lock mechanisms, code snippets, and usage considerations for reliability and safety.

BackendZooKeeperconcurrency

0 likes · 9 min read

Understanding Distributed Locks: Concepts, System Classification, and Implementations with Redis and etcd/Zookeeper

Open Source Linux

Feb 20, 2021 · Cloud Native

Fix Inconsistent Kubernetes rc/deployment/service Deletions and Etcd Failures

This guide walks through troubleshooting Kubernetes issues such as partially deleted resources, resetting etcd, apiserver start failures due to missing ServiceAccount certificates, SELinux permission errors, ServiceAccount key generation, etcd startup errors, host trust configuration, and resource limit pitfalls, providing concrete commands and scripts for each problem.

Cluster ManagementKubernetesLinux

0 likes · 17 min read

Fix Inconsistent Kubernetes rc/deployment/service Deletions and Etcd Failures

JD Tech

Feb 8, 2021 · Big Data

JD Remote Shuffle Service: Design, Implementation, and Performance Evaluation

This article presents JD's self‑developed Remote Shuffle Service for Spark, detailing its architecture, goals, implementation details, performance benchmarks, and real‑world production case studies that demonstrate its impact on shuffle efficiency and system stability in large‑scale data processing.

Distributed SystemsRemote Shuffle ServiceShuffle Optimization

0 likes · 17 min read

JD Remote Shuffle Service: Design, Implementation, and Performance Evaluation