Tagged articles

cluster upgrade

19 articles · Page 1 of 1
MaGe Linux Operations
MaGe Linux Operations
Jun 6, 2026 · Operations

Kubernetes etcd Operations Guide: From Backup & Restore to Cluster Performance Tuning

This comprehensive guide walks Kubernetes operators through the role of etcd, version compatibility, manual and automated backup strategies, disaster‑recovery procedures, performance tuning parameters, monitoring with Prometheus and Grafana, common failure troubleshooting, upgrade paths, and data‑at‑rest encryption, providing concrete commands and best‑practice recommendations for production clusters.

EncryptionEtcdMonitoring
0 likes · 47 min read
Kubernetes etcd Operations Guide: From Backup & Restore to Cluster Performance Tuning
vivo Internet Technology
vivo Internet Technology
May 13, 2026 · Big Data

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

This article details Vivo's end‑to‑end upgrade of a YARN 2.6.0 cluster to a modern version for a million‑node, hundred‑thousand‑tasks‑per‑day platform, covering architectural evolution, scheduler migration, compatibility fixes, performance tuning, and service‑continuity strategies.

Big DataCapacity SchedulerHadoop
0 likes · 28 min read
How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations
MaGe Linux Operations
MaGe Linux Operations
Dec 29, 2024 · Cloud Native

Step-by-Step Guide to Upgrading a Kubernetes Cluster to v1.15.12

This guide walks through downloading the latest Kubernetes packages, preparing master and node services, adjusting nginx proxy settings, safely cordoning and draining nodes, installing the new version, updating certificates and scripts, restarting services, and rebalancing pods to complete a seamless cluster upgrade to v1.15.12.

Node Maintenancecluster upgradekubectl
0 likes · 15 min read
Step-by-Step Guide to Upgrading a Kubernetes Cluster to v1.15.12
DeWu Technology
DeWu Technology
Jul 5, 2024 · Databases

StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices

The article outlines a cross‑cluster upgrade to StarRocks 2.5.13, evaluating resource and stability costs, and presents two migration schemes—using external tables and a Flink connector—along with planning, parallel execution, validation steps, and results showing successful migration of over 10 TB at 2 Gb/s across ten nodes, while noting future automation and CDC enhancements.

Data MigrationExternal TableFlink
0 likes · 15 min read
StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices
dbaplus Community
dbaplus Community
Dec 17, 2023 · Operations

Why Kubernetes Needs an LTS Release: Balancing Stability and Speed

The article examines the rapid Kubernetes upgrade cycle, the operational challenges it creates for teams, argues for a long‑term support (LTS) version, weighs pros and cons, and proposes compromise solutions to improve cluster stability without sacrificing innovation.

LTSOperationscluster upgrade
0 likes · 10 min read
Why Kubernetes Needs an LTS Release: Balancing Stability and Speed
Su San Talks Tech
Su San Talks Tech
Dec 6, 2023 · Operations

What Went Wrong in Didi’s 12‑Hour Outage? Lessons on Kubernetes Upgrades and Cost‑Cutting

An in‑depth review of Didi’s 12‑hour P0 outage reveals how a mistaken Kubernetes version downgrade during an in‑place upgrade caused master node failure, discusses cluster isolation, upgrade strategies, and the role of cost‑cutting pressures, offering practical lessons for large‑scale operations.

Operationscluster upgradecost management
0 likes · 7 min read
What Went Wrong in Didi’s 12‑Hour Outage? Lessons on Kubernetes Upgrades and Cost‑Cutting
ITPUB
ITPUB
Dec 5, 2023 · Cloud Native

Prevent Massive K8s Outages: Scale, Redundancy, and Embrace Restarts

The article analyzes the November 27 Didi outage caused by an aggressive Kubernetes upgrade, then presents four engineering principles—controlling cluster size, eliminating single points of failure, treating restarts as normal, and decoupling data and control planes—to build more resilient cloud‑native systems.

Cloud Nativecluster upgradefault tolerance
0 likes · 13 min read
Prevent Massive K8s Outages: Scale, Redundancy, and Embrace Restarts
Didi Tech
Didi Tech
Oct 17, 2023 · Cloud Native

How Didi’s Elastic Cloud Scales Millions of Nodes with Advanced k8s Scheduling

This article details Didi’s Elastic Cloud container platform, explaining its Kubernetes‑based scheduling architecture, custom pre‑filter and scoring extensions, service‑profiling driven placement, rescheduling mechanisms, rule‑engine integration, upgrade strategies from k8s 1.12 to 1.20, and the stability framework that keeps a massive multi‑tenant fleet running reliably.

Elastic CloudRule EngineScheduling
0 likes · 16 min read
How Didi’s Elastic Cloud Scales Millions of Nodes with Advanced k8s Scheduling
Past Memory Big Data
Past Memory Big Data
Aug 15, 2022 · Big Data

How Pinterest Scaled a Hadoop Upgrade Across 17k Nodes

Pinterest’s Monarch batch‑processing platform, built on over 17 k YARN nodes in AWS, was upgraded from Hadoop 2.7.1 to 2.10.0 using a phased, cluster‑by‑cluster strategy that balanced minimal downtime, extensive validation, and custom patches to handle compatibility and dependency issues.

AWS EC2Big DataHadoop
0 likes · 18 min read
How Pinterest Scaled a Hadoop Upgrade Across 17k Nodes
Ops Development Stories
Ops Development Stories
Aug 9, 2022 · Cloud Native

Master Kubernetes Cluster: Install, Upgrade, Backup, and Restore Step‑by‑Step

This comprehensive guide walks you through installing a Kubernetes cluster with kubeadm, configuring containerd, initializing master and worker nodes, deploying Calico networking and the Dashboard, performing upgrades, renewing certificates, adding or removing nodes, and backing up both etcd data and cluster manifests using scripts and Velero.

CalicoKubernetes DashboardVelero
0 likes · 36 min read
Master Kubernetes Cluster: Install, Upgrade, Backup, and Restore Step‑by‑Step
Hulu Beijing
Hulu Beijing
Jul 7, 2022 · Big Data

How Hulu Upgraded Hadoop 2.6 to 3.0: Lessons in Compatibility and Migration

This article details Hulu's five‑year journey from Hadoop 2.6 to 3.3.2, covering major feature evolutions, the original cluster architecture, a comprehensive upgrade plan, compatibility challenges across HDFS, YARN, Hive, Spark and Flink, and the testing and rollout strategies that ensured a smooth migration.

Big DataFlinkHadoop
0 likes · 17 min read
How Hulu Upgraded Hadoop 2.6 to 3.0: Lessons in Compatibility and Migration
MaGe Linux Operations
MaGe Linux Operations
Jan 30, 2022 · Cloud Native

Upgrade a Kubernetes Cluster from v1.22 to v1.23 the Hard Way

This step‑by‑step tutorial explains how to upgrade a Kubernetes cluster from version 1.22 to 1.23 using the hard‑way approach, covering prerequisites, master and worker node procedures, package handling, and verification commands to ensure a successful upgrade.

Hard Waycluster upgradedevops
0 likes · 8 min read
Upgrade a Kubernetes Cluster from v1.22 to v1.23 the Hard Way
Sohu Tech Products
Sohu Tech Products
Dec 22, 2021 · Cloud Native

Zero‑Downtime Upgrade of Large‑Scale Kubernetes Clusters from v1.10 to v1.17

This article details the challenges, strategies, and step‑by‑step procedures for upgrading a 1,000‑node Kubernetes cluster from version 1.10 to 1.17 without service interruption, covering compatibility checks, in‑place versus replacement upgrades, container‑restart avoidance, pod eviction handling, and TCP connection issues.

CNCFVersion SkewZero Downtime
0 likes · 22 min read
Zero‑Downtime Upgrade of Large‑Scale Kubernetes Clusters from v1.10 to v1.17
vivo Internet Technology
vivo Internet Technology
Dec 16, 2021 · Cloud Native

vivo Kubernetes Cluster Zero-Downtime Upgrade from v1.10 to v1.17: Practices and Solutions

Vivo’s internet team performed a zero‑downtime, in‑place upgrade of a 1,000‑node Kubernetes cluster from v1.10 to v1.17 by analyzing changelogs, backporting fixes, adjusting kubelet hash validation, adding tolerations, ensuring node labels, and using staged binary rollout, completing the process in roughly ten minutes.

Cloud NativeContainer OrchestrationK8s migration
0 likes · 19 min read
vivo Kubernetes Cluster Zero-Downtime Upgrade from v1.10 to v1.17: Practices and Solutions
JD Tech
JD Tech
Mar 20, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management

This article details JD's practical experience migrating HDFS to erasure coding, covering the decision between upgrade and porting, the step‑by‑step upgrade and rollback procedures, automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and comprehensive data‑integrity safeguards to achieve significant storage cost reductions while maintaining production reliability.

Data Lifecycle ManagementHDFSStorage Optimization
0 likes · 17 min read
Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management
Alibaba Cloud Native
Alibaba Cloud Native
Sep 30, 2020 · Cloud Native

Kubernetes Cluster Upgrade Guide: Pre‑Checks, Methods & Step‑by‑Step

This article explains why Kubernetes clusters need regular upgrades, outlines the challenges, details essential pre‑upgrade health checks for core components, nodes and cloud resources, compares in‑place and replacement upgrade strategies with their pros and cons, and presents a three‑stage upgrade process covering master, worker and core system components.

Cloud NativePre‑checkReplacement Upgrade
0 likes · 13 min read
Kubernetes Cluster Upgrade Guide: Pre‑Checks, Methods & Step‑by‑Step
dbaplus Community
dbaplus Community
Mar 13, 2019 · Operations

How We Upgraded a 100‑Node Hadoop Cluster with Ansible and Ambari

This article details the step‑by‑step process of modernizing a large‑scale Hadoop deployment—identifying legacy pain points, evaluating three migration strategies, selecting an in‑place upgrade using Ambari‑managed HDP, and automating the entire workflow with Ansible to minimize downtime and operational risk.

AmbariAnsibleHadoop
0 likes · 13 min read
How We Upgraded a 100‑Node Hadoop Cluster with Ansible and Ambari