Tag

Cluster Scaling

0 views collected around this technical thread.

Code Ape Tech Column
Code Ape Tech Column
Dec 4, 2023 · Cloud Native

Analysis of Didi’s Kubernetes Outage and General Mitigation Strategies

The article reviews Didi’s 12‑hour P0 outage caused by a Kubernetes upgrade failure in a massive cluster, discusses the root causes, and proposes general solutions such as federation, careful upgrade planning, and multi‑master designs to avoid similar incidents.

Cloud NativeCluster ScalingKubernetes
0 likes · 8 min read
Analysis of Didi’s Kubernetes Outage and General Mitigation Strategies
Top Architect
Top Architect
May 15, 2023 · Backend Development

Comprehensive Guide to Kafka: Architecture, Performance Tuning, and Operational Practices

This article provides an in-depth overview of Kafka, covering its core value as a message queue, fundamental concepts, cluster architecture, producer and consumer configurations, scaling strategies, monitoring tools, and practical operational commands for building and maintaining high‑throughput, highly available streaming systems.

Cluster ScalingKafkaMessage Queue
0 likes · 31 min read
Comprehensive Guide to Kafka: Architecture, Performance Tuning, and Operational Practices
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 1, 2023 · Operations

Guide to Expanding an OceanBase Cluster: Adding Zones and Resources

This article provides a step‑by‑step guide for scaling an OceanBase cluster, covering both white‑screen and black‑screen methods to add zones (replicas) and resources (OBServers), including configuration file preparation, deployment commands, zone addition, verification queries, and procedures for both expansion and contraction.

Cluster ScalingDatabase OperationsObserver
0 likes · 12 min read
Guide to Expanding an OceanBase Cluster: Adding Zones and Resources
Efficient Ops
Efficient Ops
Jan 12, 2023 · Cloud Native

How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices

This guide explains how to adjust node quotas, kernel parameters, and etcd configurations for large Kubernetes clusters, covering cloud provider limits, GCE and Alibaba Cloud settings, API server tuning, and pod resource best practices to ensure reliable scaling and performance.

Cluster ScalingKubernetesNode Quotas
0 likes · 7 min read
How to Scale Kubernetes Clusters: Node Quotas, Kernel Tweaks, and Etcd Best Practices
58 Tech
58 Tech
Nov 17, 2022 · Backend Development

Design and Migration Strategies for the WLock Distributed Lock Service

The article presents the architecture of WLock, a Paxos‑based distributed lock service, analyzes key isolation schemes, evaluates cluster expansion and splitting, and details a multi‑step key migration process—including forward and reverse migration, node scaling, and consistency safeguards—to achieve high‑availability and isolated lock handling in multi‑tenant environments.

Cluster ScalingDistributed LockKey Migration
0 likes · 18 min read
Design and Migration Strategies for the WLock Distributed Lock Service
Bilibili Tech
Bilibili Tech
Apr 9, 2022 · Big Data

Bilibili Presto on Hadoop: Architecture, Scaling, and Performance Enhancements

Bilibili’s Presto on Hadoop combines a multi‑engine offline platform with Kubernetes‑managed YARN scheduling, Ranger security, and a custom dispatcher, scaling to over 400 nodes handling 160 k daily queries on 10 PB, while adding coordinator HA, resource‑group punishment, query limits, Alluxio caching, dynamic filtering, and numerous SQL‑level enhancements, with future auto‑scaling and materialized‑view automation.

Big DataCluster ScalingHadoop
0 likes · 30 min read
Bilibili Presto on Hadoop: Architecture, Scaling, and Performance Enhancements
Efficient Ops
Efficient Ops
Mar 28, 2022 · Cloud Native

How to Scale Kubernetes Clusters: Quotas, Kernel Tweaks, and Etcd Best Practices

This guide explains how to adjust node quotas, tune kernel parameters, configure high‑availability etcd clusters, and set optimal Kube‑APIServer and Pod settings for large‑scale Kubernetes deployments, ensuring stability and performance as the cluster grows.

Cloud NativeCluster ScalingKubernetes
0 likes · 8 min read
How to Scale Kubernetes Clusters: Quotas, Kernel Tweaks, and Etcd Best Practices
DataFunTalk
DataFunTalk
Mar 18, 2022 · Big Data

Scaling LinkedIn’s Hadoop YARN Cluster Beyond 10,000 Nodes: Challenges and Solutions

This article examines how LinkedIn tackled severe scheduling slowdowns when its Hadoop YARN cluster grew to nearly 10,000 nodes, analyzes the root causes of resource‑manager bottlenecks, and describes the fairness‑redefinition and scheduling‑logic patches that restored throughput and scalability.

Big DataCluster ScalingHadoop
0 likes · 13 min read
Scaling LinkedIn’s Hadoop YARN Cluster Beyond 10,000 Nodes: Challenges and Solutions
vivo Internet Technology
vivo Internet Technology
Feb 9, 2022 · Databases

Redis Optimization for Vivo Push Platform: Architecture, Bottlenecks, and Solutions

To sustain Vivo Push Platform’s massive real‑time traffic, engineers re‑architected two Redis clusters, trimmed capacity by 58 %, split clusters, randomized hotspot‑prone keys, and introduced three‑level caching, cutting peak CPU load by 15 %, halving response time and improving overall Redis efficiency during peak loads.

Cluster ScalingHot Key MitigationPerformance Optimization
0 likes · 15 min read
Redis Optimization for Vivo Push Platform: Architecture, Bottlenecks, and Solutions
政采云技术
政采云技术
Nov 11, 2021 · Cloud Native

Cluster Scaling, Backup, and Upgrade Using Sealer Clusterfile

This article explains how to scale, back up, and upgrade Kubernetes clusters with Sealer by modifying the Clusterfile, using join/delete commands for both ALI_CLOUD and BAREMETAL providers, and configuring backup plugins and upgrade workflows.

BackupCloud NativeCluster Scaling
0 likes · 7 min read
Cluster Scaling, Backup, and Upgrade Using Sealer Clusterfile
Efficient Ops
Efficient Ops
Aug 11, 2021 · Operations

Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips

This guide outlines how to prepare large‑scale Kubernetes clusters on public clouds by increasing node quotas, adjusting kernel parameters, configuring high‑availability etcd with the etcd‑operator, tuning kube‑apiserver settings, and applying pod‑level best practices for resource limits and affinity.

Cluster ScalingKubernetesOperations
0 likes · 8 min read
Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips
Efficient Ops
Efficient Ops
Aug 24, 2020 · Operations

How to Scale Elasticsearch for PB‑Level Game Logs: Real‑World Strategies & Lessons

This article walks through a mid‑size gaming company's journey of deploying, tuning, and scaling an Elasticsearch cluster for massive log volumes, covering hot‑cold node architecture, ILM policies, shard management, Logstash‑Kafka optimization, emergency expansions, and the promise of searchable snapshots to achieve petabyte‑scale storage with cost efficiency.

Big DataCluster ScalingElasticsearch
0 likes · 28 min read
How to Scale Elasticsearch for PB‑Level Game Logs: Real‑World Strategies & Lessons
Tencent Cloud Developer
Tencent Cloud Developer
Jul 29, 2020 · Big Data

Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics

To handle a gaming company's million‑QPS log stream, the team built a hot‑cold Tencent Cloud Elasticsearch cluster with ILM‑driven tiering, scaled CPU/heap, reduced shard count via shrink and replica tweaks, tuned Logstash‑Kafka pipelines, and employed COS snapshots and searchable snapshots, achieving stable performance and lower cost.

Big DataCluster ScalingElasticsearch
0 likes · 29 min read
Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics
Big Data Technology Architecture
Big Data Technology Architecture
Jun 4, 2020 · Big Data

58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration

This article presents a comprehensive case study of 58.com’s massive Hadoop‑based offline computing platform, detailing its architecture, scaling challenges, performance‑tuning measures, YARN and SparkSQL upgrades, and the systematic cross‑data‑center migration of thousands of nodes and petabytes of data.

Big DataCluster ScalingHadoop
0 likes · 23 min read
58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration
Big Data Technology Architecture
Big Data Technology Architecture
Sep 9, 2019 · Operations

Investigation and Resolution of Partial Queue Consumption after RocketMQ Topic Expansion

This article details a real‑world RocketMQ case where expanding a topic's queue count caused two consumer groups to miss messages on one broker, explains the root cause of missing subscription metadata after cluster scaling, and outlines the manual steps taken to restore full consumption.

Cluster ScalingConsumer LagMessage Queue
0 likes · 8 min read
Investigation and Resolution of Partial Queue Consumption after RocketMQ Topic Expansion
Architecture Digest
Architecture Digest
May 31, 2019 · Operations

Running a 400+ Node Elasticsearch Cluster: Architecture, Scaling, and Performance Tuning

Meltwater details how it processes millions of daily media posts using a custom‑tuned Elasticsearch 1.7.6 cluster of over 400 nodes on AWS, covering data volume, query complexity, node configuration, indexing strategy, performance optimizations, and lessons learned for large‑scale search deployments.

AWSBig DataCluster Scaling
0 likes · 12 min read
Running a 400+ Node Elasticsearch Cluster: Architecture, Scaling, and Performance Tuning