Tagged articles
186 articles
Page 1 of 2
Raymond Ops
Raymond Ops
Dec 27, 2025 · Cloud Native

15 Powerful kubectl Tricks to Master Kubernetes Management

Learn 15 practical kubectl techniques—from resource shortcuts and context switching to advanced JSONPath queries, custom output formats, and efficient alias configurations—that enable Kubernetes administrators to streamline cluster management, improve debugging, and boost operational productivity.

CLICluster ManagementDevOps
0 likes · 12 min read
15 Powerful kubectl Tricks to Master Kubernetes Management
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Dec 5, 2025 · Operations

Essential Ceph Command Cheat Sheet for Cluster Management

This guide provides a concise collection of essential Ceph commands for starting services, checking health and status, managing monitors, metadata servers, and OSDs, as well as creating admin users, purging nodes, and handling crush maps, enabling administrators to efficiently operate and troubleshoot a Ceph storage cluster.

CephCluster ManagementLinux
0 likes · 6 min read
Essential Ceph Command Cheat Sheet for Cluster Management
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 19, 2025 · Backend Development

Master Elasticsearch: Index Design, Field Types, and Cluster Management Tips

An experienced engineer shares practical Elasticsearch insights covering index design with aliases and routing, field type choices, query optimization techniques, pagination strategies, real‑time refresh settings, memory limits, and cluster management, offering concrete examples and actionable recommendations for robust search implementations.

Cluster ManagementElasticsearchfield types
0 likes · 12 min read
Master Elasticsearch: Index Design, Field Types, and Cluster Management Tips
DevOps Coach
DevOps Coach
Oct 28, 2025 · Cloud Native

20 Essential Kubernetes Tips to Boost Security, Reliability, and Manageability

This guide presents twenty practical Kubernetes best‑practice tips covering productivity shortcuts, resource limits, health probes, node draining, PodDisruptionBudgets, RBAC hardening, read‑only ConfigMaps/Secrets, non‑root containers, network policies, image version pinning, secret rotation, centralized logging, etcd backups, resource cleanup, and secure access methods.

Cluster ManagementDevOpsKubernetes
0 likes · 8 min read
20 Essential Kubernetes Tips to Boost Security, Reliability, and Manageability
Ray's Galactic Tech
Ray's Galactic Tech
Sep 20, 2025 · Operations

How to Safely Upgrade a ZooKeeper Node’s IP Without Disrupting the Cluster

This guide explains why changing a ZooKeeper node’s IP requires updating the configuration on all members, then walks through a step‑by‑step procedure—including stopping the target node, editing zoo.cfg on every server, restarting the remaining nodes, and verifying the quorum—plus best‑practice tips for Kubernetes deployments.

Cluster ManagementIP upgradeKubernetes
0 likes · 7 min read
How to Safely Upgrade a ZooKeeper Node’s IP Without Disrupting the Cluster
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Aug 30, 2025 · Operations

INFINI Console FAQ: Enterprise‑Grade Unified Elasticsearch Management

The article introduces INFINI Console, an open‑source, lightweight platform for unified, multi‑cluster and cross‑version Elasticsearch governance, compares it with Kibana, details deployment options, enterprise‑level features such as monitoring, alerting and security, and analyzes cost advantages and practical migration scenarios.

Cluster ManagementCost OptimizationElasticsearch
0 likes · 13 min read
INFINI Console FAQ: Enterprise‑Grade Unified Elasticsearch Management
DataFunSummit
DataFunSummit
Aug 28, 2025 · Artificial Intelligence

How We Scaled AI Compute to Millions of Nodes with Ray on WeChat

This article explains how Tencent's WeChat team built the Astra platform on Ray to manage millions of AI compute nodes, addressing challenges of massive scale, heterogeneous GPU resources, low‑priority node instability, deployment complexity, and cost, while detailing architecture, scheduling strategies, and practical usage examples.

AI scalingCluster ManagementRay
0 likes · 21 min read
How We Scaled AI Compute to Millions of Nodes with Ray on WeChat
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Aug 6, 2025 · Cloud Native

Step‑by‑Step Rancher Deployment for Multi‑Cluster Kubernetes Management

This guide explains the background of multi‑IDC Kubernetes clusters, why a unified platform like Rancher is needed, and provides detailed step‑by‑step instructions for single‑node, high‑availability RKE, lightweight K3s deployments, Helm installation, cert‑manager setup, ingress configuration, and best‑practice recommendations.

Cluster ManagementHA deploymentKubernetes
0 likes · 12 min read
Step‑by‑Step Rancher Deployment for Multi‑Cluster Kubernetes Management
MaGe Linux Operations
MaGe Linux Operations
Jul 21, 2025 · Cloud Native

Master Kubernetes with Essential Commands: Efficient Container Cluster Management

This comprehensive guide walks operations engineers through essential Kubernetes commands, covering cluster inspection, pod lifecycle, service and network handling, storage configuration, troubleshooting, performance monitoring, scaling, security, and automation, enabling efficient and expert management of containerized clusters.

Cluster ManagementKubernetesOperations
0 likes · 17 min read
Master Kubernetes with Essential Commands: Efficient Container Cluster Management
Raymond Ops
Raymond Ops
Jul 19, 2025 · Cloud Native

Step-by-Step Guide to Upgrading Kubernetes Nodes to v1.15.12

This tutorial walks you through downloading the latest Kubernetes packages, preparing master and node services, adjusting nginx proxy settings, cordoning and draining nodes, replacing binaries and certificates, restarting services, and verifying the upgrade across a two‑node cluster.

Cluster ManagementKubernetesNGINX
0 likes · 13 min read
Step-by-Step Guide to Upgrading Kubernetes Nodes to v1.15.12
Raymond Ops
Raymond Ops
Jun 19, 2025 · Operations

Master Kubernetes Cluster Management: Essential kubectl Commands Explained

This guide walks you through essential kubectl commands for viewing cluster status, inspecting resources, creating and modifying objects, labeling, annotating, and launching pods, providing practical examples and command syntax to help you manage Kubernetes clusters effectively.

Cluster ManagementDevOpsPod
0 likes · 14 min read
Master Kubernetes Cluster Management: Essential kubectl Commands Explained
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Jun 18, 2025 · Operations

Comprehensively Manage Elasticsearch 9.X with INFINI Console

The article provides a detailed technical overview of INFINI Console, an open‑source, lightweight governance platform that enables multi‑cluster, cross‑version management, dynamic registration, monitoring, alerting, and developer tools for Elasticsearch 9.X, comparing it with Kibana and highlighting deployment simplicity across various OS and CPU architectures.

Cluster ManagementCross-Version SupportDeployment
0 likes · 11 min read
Comprehensively Manage Elasticsearch 9.X with INFINI Console
DevOps Operations Practice
DevOps Operations Practice
Jun 16, 2025 · Cloud Native

Mastering Kubernetes: 6 Essential Tools for Cluster Management

This article introduces six indispensable tools—kubectl, Helm, Prometheus + Grafana, Istio, Velero, and K9s—that simplify Kubernetes cluster management by covering resource handling, monitoring, networking, security, backup, and interactive UI, helping readers efficiently operate production‑grade clusters.

Cloud NativeCluster ManagementDevOps
0 likes · 7 min read
Mastering Kubernetes: 6 Essential Tools for Cluster Management
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Jun 4, 2025 · Operations

When Should You Deploy Dedicated Coordinating Nodes in Elasticsearch?

The article explains what Elasticsearch coordinating nodes are, why dedicated coordinating‑only nodes can off‑load HTTP handling from data and master nodes to reduce load, lower latency and simplify client configuration, and outlines the associated hardware and cluster‑state costs, usage scenarios, deployment steps and monitoring tips.

Cluster ManagementCoordinating NodeElasticsearch
0 likes · 12 min read
When Should You Deploy Dedicated Coordinating Nodes in Elasticsearch?
Efficient Ops
Efficient Ops
May 12, 2025 · Cloud Native

Master Kubernetes Management with Kuboard: Visual UI Guide & Installation

Kuboard is a web‑based visual tool for managing Kubernetes clusters, offering multi‑auth, multi‑cluster support, micro‑service layering, and storage integration; the guide explains Docker installation, adding clusters via KubeConfig, workload inspection, and how the UI simplifies complex command‑line operations.

Cloud NativeCluster ManagementDocker
0 likes · 5 min read
Master Kubernetes Management with Kuboard: Visual UI Guide & Installation
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Apr 10, 2025 · Cloud Computing

Unlock Scalable, Reliable Storage: A Complete Guide to Deploying Ceph

This article provides a comprehensive overview of Ceph distributed storage, covering storage fundamentals, Ceph architecture, advantages, version lifecycle, and step‑by‑step deployment using ceph‑deploy, including environment preparation, monitor and OSD setup, manager configuration, and dashboard activation.

CephCluster ManagementDashboard
0 likes · 28 min read
Unlock Scalable, Reliable Storage: A Complete Guide to Deploying Ceph
Tencent Cloud Middleware
Tencent Cloud Middleware
Apr 9, 2025 · Operations

How TDMQ Pulsar’s Cluster‑Level and Topic‑Partition Throttling Keeps Your Messaging System Stable

This article explains why high‑throughput producers and consumers can saturate CPU, memory, network and disk I/O in TDMQ Pulsar clusters, describes the built‑in cluster‑level distributed and topic‑partition rate‑limiting mechanisms, and provides practical guidance for configuration, monitoring, and troubleshooting.

Cluster ManagementMessage QueueOperations
0 likes · 12 min read
How TDMQ Pulsar’s Cluster‑Level and Topic‑Partition Throttling Keeps Your Messaging System Stable
Raymond Ops
Raymond Ops
Mar 30, 2025 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained

This article explains three Elasticsearch data‑synchronization methods, compares their pros and cons, and then dives into ES cluster structure, node roles, shard allocation, distributed queries, split‑brain handling, and fault‑tolerance mechanisms, providing a comprehensive guide for developers and ops engineers.

Cluster ManagementDistributed SystemsElasticsearch
0 likes · 9 min read
Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Mar 29, 2025 · Operations

How to Reset a Forgotten INFINI Console Password

The article explains two ways to recover access to INFINI Console when the password is lost: locating the original console_configuration.json file to retrieve the stored credentials, or using the built‑in Reset Password feature in the user management UI, with step‑by‑step instructions and screenshots.

Cluster ManagementINFINI Consoleadmin guide
0 likes · 5 min read
How to Reset a Forgotten INFINI Console Password
Cloud Native Technology Community
Cloud Native Technology Community
Mar 18, 2025 · Cloud Native

Best Practices for Managing Core Services in Large‑Scale Kubernetes Deployments

Scaling Kubernetes across dozens or hundreds of clusters requires standardized core services—networking, security, observability, and automation—so organizations should adopt templated configurations, GitOps tools, centralized monitoring, and automated certificate management to reduce complexity, improve security, and lower operational overhead.

AutomationCluster ManagementGitOps
0 likes · 8 min read
Best Practices for Managing Core Services in Large‑Scale Kubernetes Deployments
dbaplus Community
dbaplus Community
Feb 13, 2025 · Databases

Automating Redis Resource Balancing to Cut DBA Effort

To handle growing memory pressure across thousands of Redis servers, the platform implements an automated, daily resource‑balancing scheduler that selects overloaded hosts, chooses optimal nodes based on instance count, tier, and placement rules, then safely migrates them through a multi‑step process with rigorous validation.

AutomationCluster ManagementDatabase operations
0 likes · 14 min read
Automating Redis Resource Balancing to Cut DBA Effort
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Feb 11, 2025 · Operations

How to Ace the Elastic Certified Engineer Exam: Full 8.15 Syllabus Breakdown and Fast‑Track Tips

This guide dissects the Elastic Certified Engineer 8.15 exam syllabus, explains each core topic—from searchable snapshots and async search to ILM policies and cross‑cluster replication—while offering a step‑by‑step study roadmap, hands‑on lab ideas, and resource recommendations to help candidates pass efficiently.

8.15Cluster ManagementElastic Certified Engineer
0 likes · 6 min read
How to Ace the Elastic Certified Engineer Exam: Full 8.15 Syllabus Breakdown and Fast‑Track Tips
Architect
Architect
Dec 27, 2024 · Big Data

Fault Self‑Healing System for Large‑Scale Big Data Clusters

This article describes the design, architecture, and technical implementation of BMR's fault self‑healing platform, which automatically collects data, analyzes failures, defines decision rules, and executes safe recovery workflows to improve reliability and efficiency of massive, heterogeneous big‑data environments.

Big DataCluster Managementfault self-healing
0 likes · 16 min read
Fault Self‑Healing System for Large‑Scale Big Data Clusters
Bilibili Tech
Bilibili Tech
Dec 10, 2024 · Big Data

Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)

Bilibili's fault‑self‑healing platform for its massive BMR big‑data cluster—over 10,000 machines and 1 EB storage—adds near‑real‑time fault discovery, intelligent diagnosis, and automated workflow handling, dramatically cutting resolution time, improving stability across services, and scaling to dozens of daily automated repairs.

BMRCluster Managementfault self-healing
0 likes · 16 min read
Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)
System Architect Go
System Architect Go
Nov 6, 2024 · Cloud Native

How Kubernetes Extended Resources Enable Custom Scheduling (and Their Limits)

This article explains how Kubernetes Extended Resources let you define custom resource types, describes the creation, synchronization, and scheduling workflow, highlights the non‑real‑time allocatable status behavior, and discusses practical limitations and the role of Device Plugins and Operators.

Cluster ManagementCustom SchedulingDevice Plugin
0 likes · 6 min read
How Kubernetes Extended Resources Enable Custom Scheduling (and Their Limits)
Bilibili Tech
Bilibili Tech
Oct 29, 2024 · Big Data

Bilibili One‑Stop Big Data Cluster Management Platform (BMR): Architecture, Modules, and Future Outlook

Bilibili's One‑Stop Big Data Cluster Management Platform (BMR) unifies cluster, metadata, intelligent operations, and custom managers to oversee 50+ services, 10,000 machines, exabyte storage, and millions of cores, using cloud‑native containers, fault prediction, and resource‑sharing techniques to boost efficiency, stability, and cost savings.

BMRCluster ManagementDevOps
0 likes · 17 min read
Bilibili One‑Stop Big Data Cluster Management Platform (BMR): Architecture, Modules, and Future Outlook
Baidu Geek Talk
Baidu Geek Talk
Oct 9, 2024 · Artificial Intelligence

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

This article analyzes Baidu's Baige 4.0 AI infrastructure, detailing its four‑layer architecture, XMAN 5.0 hardware, HPN network, BCCL communication library, and AIAK inference upgrades, and explains how these innovations address large‑model training and inference challenges while boosting performance, utilization, and cost efficiency.

AI InfrastructureCluster ManagementGPU Acceleration
0 likes · 16 min read
How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
Sep 12, 2024 · Industry Insights

Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights

This article examines the key pain points of massive AI compute clusters—including heterogeneous hardware compatibility, efficient scheduling, training and inference acceleration, and fault‑tolerant operations—while presenting practical management and performance‑tuning strategies, a cloud‑native AI platform implementation, and future directions for the ecosystem.

AI computingCluster ManagementOperations
0 likes · 7 min read
Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Aug 29, 2024 · Cloud Native

Mastering Kubernetes: Core Concepts, Architecture, and Real‑World Use Cases

This article provides a comprehensive overview of Kubernetes (K8S), covering its origins, key problems it solves, master‑node architecture, core components such as kube‑apiserver, scheduler, controllers, node agents, and practical applications like CI/CD integration, multi‑tenant and micro‑service deployments.

Cloud NativeCluster ManagementKubernetes
0 likes · 9 min read
Mastering Kubernetes: Core Concepts, Architecture, and Real‑World Use Cases
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 6, 2024 · Operations

ZooKeeper Core Concepts: Data Model, Node Types, Sessions, Cluster, Election, ZAB, Watch, ACL, and Distributed Lock Patterns

This article explains ZooKeeper's hierarchical data model, node types, session mechanism, cluster roles and election process, ZAB protocol, watch mechanism, ACL permissions, and common distributed lock implementations, providing a comprehensive overview of its core concepts and practical usage.

ACLCluster ManagementCoordination Service
0 likes · 17 min read
ZooKeeper Core Concepts: Data Model, Node Types, Sessions, Cluster, Election, ZAB, Watch, ACL, and Distributed Lock Patterns
Bilibili Tech
Bilibili Tech
Jul 19, 2024 · Big Data

Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation

Bilibili’s one‑stop Big Data Cluster Management Platform (BMR) consolidates HDFS, Spark, Flink, ClickHouse, Kafka and other services into a unified system that evolved through four stages—standardization, metadata‑driven construction, containerization, and observability—addressing node consistency, scaling, fault self‑healing, and resource optimization while delivering elastic scaling, automated start/stop, and future cost‑saving and stability enhancements.

Cluster ManagementObservabilityResource Optimization
0 likes · 12 min read
Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation
DevOps Cloud Academy
DevOps Cloud Academy
Jun 18, 2024 · Operations

Essential kubectl Commands for DevOps Engineers

This guide presents a comprehensive collection of the most important and frequently used kubectl commands, explaining how to retrieve version information, manage clusters, list resources, manipulate contexts, create, update, patch, scale, expose, delete, and debug Kubernetes objects, as well as format output and control verbosity, enabling DevOps engineers to efficiently operate Kubernetes clusters.

CLICluster ManagementDevOps
0 likes · 14 min read
Essential kubectl Commands for DevOps Engineers
Baidu Geek Talk
Baidu Geek Talk
Apr 24, 2024 · Industry Insights

How Baidu’s New AI OS “WanYuan” Redefines Intelligent Computing

At the Create 2024 Baidu AI Developer Conference, Baidu unveiled its next‑generation intelligent computing operating system WanYuan, detailing its cluster‑scale management, GPU‑centric performance, integrated large‑model services, and a layered architecture that aims to simplify AI‑native application development and accelerate the AI era.

AIBaiduCluster Management
0 likes · 12 min read
How Baidu’s New AI OS “WanYuan” Redefines Intelligent Computing
Practical DevOps Architecture
Practical DevOps Architecture
Apr 18, 2024 · Cloud Native

Kubernetes Source Code Deep Dive and Secondary Development Course Outline

This curriculum provides a comprehensive, step‑by‑step exploration of Kubernetes internals—including kubeadm core source, Go module management, cobra libraries, kubeadm init/join processes, client‑go components, code generators, custom resources, operators, and practical deployment automation—aimed at mastering cluster setup, configuration, and advanced development.

Cluster ManagementGoclient-go
0 likes · 10 min read
Kubernetes Source Code Deep Dive and Secondary Development Course Outline
NewBeeNLP
NewBeeNLP
Mar 8, 2024 · Industry Insights

Why Building LLMs Is Like Buying a Hardware Lottery – Lessons from a Startup

The article recounts Yi Tay’s experience founding Reka and building large language models from scratch, highlighting the unpredictable quality of GPU clusters, the challenges of multi‑cluster orchestration, code‑base choices, and how startups must rely on fast, intuition‑driven experimentation to succeed.

Cluster ManagementGPUHardware
0 likes · 12 min read
Why Building LLMs Is Like Buying a Hardware Lottery – Lessons from a Startup
dbaplus Community
dbaplus Community
Feb 26, 2024 · Cloud Native

10 Hard‑Earned Lessons from 3 Years Managing Kubernetes Clusters

After three years of hands‑on Kubernetes administration, the author shares ten practical lessons covering cloud‑hosted clusters, infrastructure‑as‑code, Helm chart usage, service mesh decisions, tool selection, resource limits, stateless design, HPA configuration, and upgrade strategies to help both newcomers and seasoned engineers manage clusters effectively.

Cloud NativeCluster ManagementKubernetes
0 likes · 7 min read
10 Hard‑Earned Lessons from 3 Years Managing Kubernetes Clusters
Ops Development Stories
Ops Development Stories
Feb 2, 2024 · Cloud Native

Essential kubectl Commands for Efficient Kubernetes Management

This guide compiles a comprehensive set of kubectl and Docker commands for retrieving logs, sorting pods, managing secrets, cleaning resources, debugging, port forwarding, and performing cluster maintenance tasks, helping administrators streamline Kubernetes operations and troubleshoot issues effectively.

CLICloud NativeCluster Management
0 likes · 15 min read
Essential kubectl Commands for Efficient Kubernetes Management
Didi Tech
Didi Tech
Jan 9, 2024 · Big Data

Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System

Apache Pulsar, a cloud‑native distributed messaging platform, solves Didi Big Data’s DKafka bottlenecks by separating compute and storage, using sequential log writes, heterogeneous disks, multi‑level caching, bundle‑based load balancing and automatic scaling, dramatically improving stability while introducing richer monitoring complexity.

Apache PulsarCluster ManagementDKafka
0 likes · 17 min read
Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System
dbaplus Community
dbaplus Community
Dec 20, 2023 · Operations

Scaling Kafka to 1000+ Nodes: Governance, Auto‑Balancing & Tiered Storage

This article outlines how a large‑scale Kafka deployment of over a thousand machines across dozens of clusters was engineered for stability and efficiency through a custom Guardian controller that adds partition‑level throttling, automatic balancing, multi‑tenant isolation, cross‑IDC management, tiered storage, audit capabilities, and fully automated operational workflows.

Cluster ManagementKafkaOperations
0 likes · 21 min read
Scaling Kafka to 1000+ Nodes: Governance, Auto‑Balancing & Tiered Storage
WeiLi Technology Team
WeiLi Technology Team
Nov 1, 2023 · Big Data

How to Diagnose and Resolve HDFS Safe Mode Issues

This guide explains why HDFS enters safe mode after a DataNode failure, describes the safe‑mode state and its exit conditions, and provides step‑by‑step commands and troubleshooting procedures to analyze, fix, and recover from safe‑mode incidents in Hadoop clusters.

Big DataCluster ManagementHDFS
0 likes · 10 min read
How to Diagnose and Resolve HDFS Safe Mode Issues
Efficient Ops
Efficient Ops
Sep 17, 2023 · Cloud Native

Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows

Explore nine indispensable Kubernetes tools—including Kubie, Kubespray, Helm, Minikube, K3s, Kustomize, KOps, Prometheus, and krew—that simplify cluster management, accelerate deployments, and enhance efficiency, helping you choose the right solution for smoother, more productive cloud‑native operations.

Cluster ManagementKubernetesPrometheus
0 likes · 6 min read
Top 9 Essential Kubernetes Tools to Streamline Your Cloud‑Native Workflows
Aikesheng Open Source Community
Aikesheng Open Source Community
Jul 3, 2023 · Databases

Replacing OCP Nodes Using the ANTMAN Tool in OceanBase Cloud Platform

This article provides a step‑by‑step guide on how to replace OceanBase Cloud Platform (OCP) nodes using the ANTMAN tool, covering environment preparation, configuration adjustments, execution of management scripts, tenant migration, cleanup of old services, and troubleshooting tips for a seamless database cluster upgrade.

ANTMANCluster ManagementDocker
0 likes · 25 min read
Replacing OCP Nodes Using the ANTMAN Tool in OceanBase Cloud Platform
Liangxu Linux
Liangxu Linux
Jul 2, 2023 · Cloud Native

Mastering kubectl: Essential Commands for Kubernetes Management

This guide explains what kubectl is, how it interacts with the Kubernetes API server, and provides a categorized list of essential commands for retrieving information, debugging, state management, scaling, deployment, and security, helping users efficiently operate and automate K8s clusters.

Cloud NativeCluster ManagementDevOps
0 likes · 5 min read
Mastering kubectl: Essential Commands for Kubernetes Management
Test Development Learning Exchange
Test Development Learning Exchange
Jun 29, 2023 · Cloud Native

Essential Kubernetes Commands for Testers: 50 Commands with Practical Examples

This article presents a comprehensive collection of 50 essential kubectl commands covering cluster, namespace, pod, deployment, service, ConfigMap, secret, volume, logging, debugging, scaling, configuration, and cleanup operations, providing testers with practical examples to efficiently manage and troubleshoot Kubernetes environments.

Cluster Managementkubectltesting
0 likes · 9 min read
Essential Kubernetes Commands for Testers: 50 Commands with Practical Examples
High Availability Architecture
High Availability Architecture
May 26, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling

This article introduces Amiya, a self‑developed overcommit component that dynamically increases Yarn memory and vCore capacity on Bilibili's offline big‑data clusters, details its architecture, key implementation of overcommit, eviction and mixed‑deployment strategies, and evaluates its resource‑utilization impact.

Cluster ManagementOvercommitResource Optimization
0 likes · 22 min read
Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling
Bilibili Tech
Bilibili Tech
May 23, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster

Amiya, a self‑developed dynamic over‑commit component for Bilibili’s offline big‑data cluster, inflates reported resources on under‑utilized nodes and adjusts them when load rises, adding roughly 683 TB of memory and 137 k vCores, boosting per‑node memory by 15 % and CPU usage by over 20 % while keeping eviction rates below 3 %.

AmiyaBilibiliCluster Management
0 likes · 22 min read
Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster
Cloud Native Technology Community
Cloud Native Technology Community
May 17, 2023 · Cloud Native

Why Do You Need Kubernetes Multi‑Cluster? Core Challenges and Design Principles

This article explains the motivations behind Kubernetes multi‑cluster deployments, outlines common use cases such as isolation and high‑availability, and analyzes core management elements including deployment models, control‑plane architectures, network connectivity, service discovery, cross‑cluster scheduling, application model extensions, and treating clusters as resources.

Cloud NativeCluster ManagementKubernetes
0 likes · 23 min read
Why Do You Need Kubernetes Multi‑Cluster? Core Challenges and Design Principles
ITPUB
ITPUB
Apr 5, 2023 · Operations

Automating TiDB Operations: From Manual Pain Points to a Scalable Platform

This article details how Zhaozhuan's DBA team transformed TiDB cluster management by addressing metadata, resource allocation, upgrade, and alert challenges through a comprehensive automation platform that streamlines work orders, node operations, scaling, monitoring, and alert handling, ultimately reducing manual effort and improving reliability.

AlertingCluster ManagementTiDB
0 likes · 22 min read
Automating TiDB Operations: From Manual Pain Points to a Scalable Platform
MaGe Linux Operations
MaGe Linux Operations
Mar 30, 2023 · Cloud Native

Why Is Kubernetes So Hard to Master? A Beginner’s Q&A Guide

This article explains the core concepts of Kubernetes—including its architecture, node communication, pod scheduling, data storage, service exposure, scaling, and controller coordination—through a series of clear questions and answers, helping beginners grasp why the platform feels complex.

Cloud NativeCluster ManagementPod Scheduling
0 likes · 9 min read
Why Is Kubernetes So Hard to Master? A Beginner’s Q&A Guide
Architecture Digest
Architecture Digest
Mar 20, 2023 · Cloud Native

Kubernetes: What It Is and Why It’s Hard to Get Started

This article provides a concise, question‑and‑answer overview of Kubernetes, explaining its role as a distributed container‑orchestration system, the architecture of master and worker nodes, core components such as etcd, kube‑apiserver, scheduler, controllers, and how services, pods, labels, and scaling operate within a cluster.

Cloud NativeCluster ManagementControllers
0 likes · 8 min read
Kubernetes: What It Is and Why It’s Hard to Get Started
21CTO
21CTO
Feb 10, 2023 · Cloud Native

Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough

This article introduces Kubernetes fundamentals through a series of questions and answers, covering its architecture, node communication, pod scheduling, data storage, external access, scaling mechanisms, and component coordination, all illustrated with clear diagrams.

Cluster ManagementContainersKubernetes
0 likes · 9 min read
Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough
Top Architect
Top Architect
Feb 7, 2023 · Cloud Native

Understanding Kubernetes: Core Concepts and Architecture

This article provides a concise, question‑driven overview of Kubernetes, covering its architecture, node and master communication, pod fundamentals, scheduling, storage via etcd, service exposure, scaling mechanisms, and the roles of core components such as kube‑apiserver, kubelet, kube‑proxy and controllers.

Cloud NativeCluster ManagementContainers
0 likes · 9 min read
Understanding Kubernetes: Core Concepts and Architecture
Open Source Linux
Open Source Linux
Dec 30, 2022 · Operations

Top 7 Kubernetes Management Tools to Simplify Cluster Operations

This article introduces seven popular Kubernetes management solutions—including K9s, Rancher, the native Dashboard with Kubectl and Kubeadm, Helm, KubeSpray, Kontena Lens, and WKSctl—detailing their key features, usage scenarios, and how they help streamline cluster monitoring, deployment, scaling, and security across cloud‑native environments.

Cluster ManagementDevOpsKubernetes
0 likes · 9 min read
Top 7 Kubernetes Management Tools to Simplify Cluster Operations
Architecture Digest
Architecture Digest
Nov 30, 2022 · Backend Development

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

This article details Meituan's large‑scale Kafka deployment—over 15,000 machines and petabyte‑level daily traffic—its operational challenges such as slow nodes, load imbalance, and resource contention, and the comprehensive read/write latency, system‑level, and cluster‑management optimizations implemented to improve performance and reliability.

Cluster ManagementDistributed SystemsKafka
0 likes · 22 min read
Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability
ITPUB
ITPUB
Nov 23, 2022 · Backend Development

How Zookeeper Elects Its Leader: A Human Election Analogy Explained

This article explains Zookeeper's leader election mechanism by comparing it to human voting, detailing the four core concepts, the role of zxid, the step‑by‑step process during startup and runtime failures, and the key terms every interviewee should know.

Backend DevelopmentCluster Managementleader election
0 likes · 11 min read
How Zookeeper Elects Its Leader: A Human Election Analogy Explained
Java Architect Essentials
Java Architect Essentials
Nov 11, 2022 · Big Data

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

This article details Meituan's large‑scale Kafka deployment, describing the current state, performance challenges such as slow nodes and disk imbalance, and the comprehensive optimizations applied—including read/write latency reductions, migration pipelines, fetcher isolation, SSD caching, RAID acceleration, cgroup isolation, full‑link monitoring, service lifecycle management, and TOR disaster recovery—to improve reliability and prepare for future growth.

Cluster ManagementKafkaLatency Reduction
0 likes · 21 min read
Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability
Java High-Performance Architecture
Java High-Performance Architecture
Oct 11, 2022 · Operations

How Meituan Optimized Kafka for Massive Scale: Reducing Latency and Managing Clusters

This article details Meituan's real‑world challenges with a 15,000‑node Kafka deployment and explains the application‑layer and system‑layer optimizations—such as disk balancing, migration pipeline acceleration, fetcher isolation, RAID acceleration, cgroup isolation, and an SSD‑based cache—that together dramatically cut read/write latency and simplify large‑scale cluster management.

Cluster ManagementMeituanStreaming
0 likes · 23 min read
How Meituan Optimized Kafka for Massive Scale: Reducing Latency and Managing Clusters
Code Ape Tech Column
Code Ape Tech Column
Sep 24, 2022 · Operations

Overview of Redis Monitoring, Data Migration, and Cluster Management Tools

This article introduces essential Redis operational tools, covering real‑time monitoring with the INFO command and Prometheus‑exporter, data migration using Redis‑shake, consistency checking via Redis‑full‑check, and cluster management through CacheCloud, providing practical guidance for administrators.

Cluster ManagementData MigrationOperations
0 likes · 10 min read
Overview of Redis Monitoring, Data Migration, and Cluster Management Tools
vivo Internet Technology
vivo Internet Technology
Sep 14, 2022 · Big Data

Exploring and Practicing Apache Pulsar at vivo: Cluster Management, Monitoring, and Optimization

The vivo big‑data team details how they migrated massive real‑time workloads from Kafka to Apache Pulsar, describing cluster‑level bundle and ledger management, retention policies, a Prometheus‑Kafka‑Druid monitoring pipeline, load‑balancing tweaks, client tuning, rapid broker‑failure recovery, and future cloud‑native tracing and migration plans.

Apache PulsarBig DataCluster Management
0 likes · 19 min read
Exploring and Practicing Apache Pulsar at vivo: Cluster Management, Monitoring, and Optimization
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 12, 2022 · Big Data

Master the Big Data Ecosystem: 9 Core Technology Frameworks Explained

This article provides a comprehensive overview of the big data ecosystem, detailing nine essential technology categories—including data collection, storage, computation, analysis, resource management, retrieval, underlying infrastructure, and cluster installation—while comparing popular tools and illustrating their typical use‑cases with diagrams.

Cluster Managementdata collectiondata storage
0 likes · 11 min read
Master the Big Data Ecosystem: 9 Core Technology Frameworks Explained
Meituan Technology Team
Meituan Technology Team
Aug 11, 2022 · Cloud Native

LAR: Load Auto-Regulator System for Resource Utilization and Service Quality

The article analyzes Meituan’s self‑designed Load Auto‑Regulator (LAR), detailing its tiered resource‑pool architecture, dynamic load‑to‑static‑resource mapping, and QoS mechanisms that together raise data‑center CPU utilization by 5‑10% while keeping online service quality stable, and discusses its deployment in online and mixed‑workload scenarios.

Cloud NativeCluster ManagementKubernetes
0 likes · 28 min read
LAR: Load Auto-Regulator System for Resource Utilization and Service Quality
Meituan Technology Team
Meituan Technology Team
Aug 4, 2022 · Big Data

Optimizing Kafka for Large-Scale Data Platforms at Meituan

The article details Meituan's massive Kafka deployment—over 15,000 machines handling more than 30 PB of daily data—its performance and management challenges, and the comprehensive application‑layer, system‑layer, and hybrid‑layer optimizations Meituan implemented to reduce read/write latency and improve large‑scale cluster reliability.

Cluster ManagementData PlatformFull‑Link Monitoring
0 likes · 25 min read
Optimizing Kafka for Large-Scale Data Platforms at Meituan
NetEase Game Operations Platform
NetEase Game Operations Platform
Jun 10, 2022 · Databases

Apache Doris Deployment and Optimization at NetEase Interactive Entertainment

This article details NetEase Interactive Entertainment's adoption of Apache Doris for large‑scale game data analytics, covering background, Doris architecture, cluster governance, tablet and compaction tuning, scaling strategies, monitoring, alerting, and fault‑handling practices to improve performance and stability.

Apache DorisBig DataCluster Management
0 likes · 22 min read
Apache Doris Deployment and Optimization at NetEase Interactive Entertainment
vivo Internet Technology
vivo Internet Technology
Jun 8, 2022 · Cloud Native

Vivo’s Large‑Scale Kubernetes Operator Practice for Multi‑Data‑Center Cluster Management

Vivo replaced error‑prone manual Ansible playbooks with a custom Kubernetes Operator that uses declarative CRDs and modular Ansible scripts to automate the full lifecycle—deployment, scaling, upgrades, and recovery—of thousands of nodes across multiple data‑centers, supported by extensive CI testing and future kubeadm integration.

AnsibleCI/CDCloud Native
0 likes · 14 min read
Vivo’s Large‑Scale Kubernetes Operator Practice for Multi‑Data‑Center Cluster Management
vivo Internet Technology
vivo Internet Technology
May 31, 2022 · Big Data

Kafka Load Balancing and Cruise Control: Concepts, Manual Migration, and Deployment

Kafka’s server‑side load imbalance, caused by static replica placement on broker disks, makes manual replica migration infeasible at scale, but Cruise Control automates metric collection, analysis, and execution of fine‑grained rebalance plans—including broker de‑commissioning and leader dispersion—allowing large clusters to expand and operate efficiently.

Big DataCluster ManagementCruise Control
0 likes · 21 min read
Kafka Load Balancing and Cruise Control: Concepts, Manual Migration, and Deployment
dbaplus Community
dbaplus Community
May 12, 2022 · Big Data

How Bilibili Scaled Presto on Hadoop: Architecture, Optimizations, and Performance Gains

This article details Bilibili's end‑to‑end Presto on Hadoop architecture, covering the multi‑engine SQL stack, dispatcher routing, cluster scale, stability enhancements like coordinator HA and real‑time punish, query limits, Hive UDF compatibility, insert‑overwrite support, Alluxio caching, multi‑datacenter routing, query result caching, Raptorx local cache, JDK upgrades, dynamic filtering, and future roadmap, illustrating how these innovations boosted query throughput and reduced latency.

Big DataCluster ManagementDistributed Systems
0 likes · 32 min read
How Bilibili Scaled Presto on Hadoop: Architecture, Optimizations, and Performance Gains
Aikesheng Open Source Community
Aikesheng Open Source Community
Apr 7, 2022 · Databases

TiDB 2.1.x to 4.0.13 Upgrade and Data Migration Guide

This article provides a comprehensive step‑by‑step guide for senior DBAs to upgrade an online TiDB 2.1.x cluster to version 4.0.13 via data migration, detailing environment assessment, configuration changes, component deployment, full and incremental data transfer, consistency verification, permission synchronization, and traffic switchover.

AnsibleCluster ManagementDatabase Upgrade
0 likes · 26 min read
TiDB 2.1.x to 4.0.13 Upgrade and Data Migration Guide
IT Services Circle
IT Services Circle
Apr 3, 2022 · Cloud Native

Understanding Kubernetes Federation: kubefed and Karmada Multi‑Cluster Management

This article explains why Kubernetes single‑cluster scalability is limited to about 5,000 nodes, introduces the concept of multi‑cluster federation, compares the legacy kubefed project with the actively maintained Karmada solution, and shows how policies and replica‑scheduling enable flexible cross‑AZ deployments and failover.

Cloud NativeCluster ManagementFederation
0 likes · 13 min read
Understanding Kubernetes Federation: kubefed and Karmada Multi‑Cluster Management
Architect
Architect
Feb 6, 2022 · Big Data

Elasticsearch Overview: Architecture, Core Concepts, and Performance Optimization

This article provides a comprehensive introduction to Elasticsearch, covering data types, Lucene fundamentals, inverted indexes, cluster components, node roles, shard and replica mechanisms, mapping, installation, health monitoring, write path, storage strategies, segment management, refresh and translog processes, as well as practical performance and JVM tuning tips.

Cluster ManagementDistributed SearchElasticsearch
0 likes · 37 min read
Elasticsearch Overview: Architecture, Core Concepts, and Performance Optimization
DataFunTalk
DataFunTalk
Feb 1, 2022 · Big Data

Kafka at Meituan: Practices, Challenges, and Optimizations for Large‑Scale Data Platforms

This article presents Meituan's large‑scale Kafka deployment, describing the current state and challenges of massive data ingestion, detailing latency‑reduction techniques, cluster‑level optimizations, SSD‑based caching, isolation strategies, full‑link monitoring, lifecycle management, and future directions for high availability.

Cluster ManagementKafkaMeituan
0 likes · 22 min read
Kafka at Meituan: Practices, Challenges, and Optimizations for Large‑Scale Data Platforms
MaGe Linux Operations
MaGe Linux Operations
Jan 28, 2022 · Cloud Native

Top 7 Kubernetes Management Tools to Simplify Cluster Operations

Discover the most popular Kubernetes management solutions—including K9s, Rancher, Dashboard, Helm, Kubespray, Lens, and WKSctl—detailing their features, deployment options, and how they streamline cluster monitoring, scaling, and security for cloud-native environments and improve operational efficiency.

Cloud NativeCluster ManagementDevOps
0 likes · 9 min read
Top 7 Kubernetes Management Tools to Simplify Cluster Operations
Yiche Technology
Yiche Technology
Jan 11, 2022 · Databases

Elasticsearch Overview, Comparison, Maintenance Challenges, Deployment Strategies, and Automation Management Platform

This document provides a comprehensive technical overview of Elasticsearch, comparing it with Solr and ClickHouse, detailing common operational pain points and configuration solutions, describing containerized and ECK deployments, and outlining a company‑wide automation platform for cluster provisioning, monitoring, index and security management, with future directions for lifecycle and backup strategies.

AutomationCluster ManagementKubernetes
0 likes · 31 min read
Elasticsearch Overview, Comparison, Maintenance Challenges, Deployment Strategies, and Automation Management Platform
21CTO
21CTO
Jan 4, 2022 · Operations

Deploy Searchable Snapshots in Elasticsearch 7.14: A Complete Step‑by‑Step Guide

This article explains the principles behind Elasticsearch searchable snapshots, details the DataTier model and node role optimizations, and provides a full practical walkthrough—including cluster setup, COS repository creation, ILM policy configuration, index templates, mounting strategies, and performance considerations—using ES 7.14.2.

Cluster ManagementData TierElasticsearch
0 likes · 15 min read
Deploy Searchable Snapshots in Elasticsearch 7.14: A Complete Step‑by‑Step Guide
Efficient Ops
Efficient Ops
Jan 3, 2022 · Operations

Master Elasticsearch Cluster: Essential Commands for Health, Tasks, and Settings

This article explains how to manage Tencent Cloud Elasticsearch clusters by using key APIs to check health status, monitor pending tasks, retrieve metadata, view statistics, adjust shard allocation, modify cluster settings, and control tasks, providing practical command examples and detailed explanations for effective operations.

APICluster Managementsettings
0 likes · 19 min read
Master Elasticsearch Cluster: Essential Commands for Health, Tasks, and Settings
政采云技术
政采云技术
Nov 9, 2021 · Cloud Native

Design and Usage of Clusterfile in Sealer for Cluster Configuration and Plugins

This article explains the design principles of Sealer's Clusterfile, details its configuration parameters, demonstrates how to inject additional settings and environment variables, and describes the supported plugins for customizing Kubernetes clusters, providing practical examples and code snippets.

Cloud NativeCluster ManagementClusterfile
0 likes · 10 min read
Design and Usage of Clusterfile in Sealer for Cluster Configuration and Plugins
Alibaba Cloud Native
Alibaba Cloud Native
Oct 29, 2021 · Cloud Native

Unified Management & Secure Governance for Alibaba Cloud ACK and On-Prem Kubernetes

This article explains how cloud‑native technologies enable a unified control plane for Alibaba Cloud ACK clusters and self‑built Kubernetes clusters, detailing the ACK registered‑cluster architecture, one‑way registration, non‑managed security mechanisms, step‑by‑step cluster onboarding, and consistent security governance across environments.

ACKCloud NativeCluster Management
0 likes · 11 min read
Unified Management & Secure Governance for Alibaba Cloud ACK and On-Prem Kubernetes
Tencent Cloud Developer
Tencent Cloud Developer
Oct 8, 2021 · Operations

Unveiling Kafka’s Controller: Architecture, Election, and Monitoring Deep Dive

This article provides a comprehensive technical analysis of Kafka’s Controller component, covering its background, core responsibilities, data storage, election process, version‑specific improvements, monitoring techniques, and key source‑code excerpts to help engineers understand and manage Kafka clusters effectively.

Cluster ManagementControllerDistributed Systems
0 likes · 27 min read
Unveiling Kafka’s Controller: Architecture, Election, and Monitoring Deep Dive
MaGe Linux Operations
MaGe Linux Operations
Oct 5, 2021 · Cloud Native

Unlock Advanced kubectl Tricks for Faster Kubernetes Management

This article shares a collection of powerful kubectl commands and tips—including API debugging, status‑based pod filtering and deletion, node‑specific pod listing, distribution counting with awk, and proxy usage—to help experienced Kubernetes users work more efficiently and avoid manual API client coding.

CLICluster ManagementDevOps
0 likes · 7 min read
Unlock Advanced kubectl Tricks for Faster Kubernetes Management
DevOps Cloud Academy
DevOps Cloud Academy
Sep 21, 2021 · Operations

Practical Elasticsearch Operations and Performance Tuning Guide

This article extends previous Elasticsearch cheat sheets with practical commands and step‑by‑step instructions for shard allocation, replica adjustment, cluster settings, slow‑log configuration, mapping routing, force merge, bulk writes, refresh intervals, translog durability, heap sizing, disk‑space monitoring, and troubleshooting strategies.

Cluster ManagementElasticsearchOperations
0 likes · 7 min read
Practical Elasticsearch Operations and Performance Tuning Guide
Selected Java Interview Questions
Selected Java Interview Questions
Sep 7, 2021 · Big Data

Elasticsearch Basics: Core Concepts, Indexing, Write and Search Processes, Cluster Management and Performance Tips

This article provides a comprehensive overview of Elasticsearch, covering its fundamental architecture, key concepts such as indices, shards and replicas, the complete write and search workflows, consistency mechanisms, master node election, and practical performance‑tuning recommendations for large‑scale deployments.

Big DataCluster ManagementElasticsearch
0 likes · 15 min read
Elasticsearch Basics: Core Concepts, Indexing, Write and Search Processes, Cluster Management and Performance Tips
Tencent Cloud Developer
Tencent Cloud Developer
Aug 26, 2021 · Big Data

Recap of Shenzhen Elasticsearch Meetup – Community Growth, Compression Optimization, Real‑time Data Fusion, and Cluster Practices

The first Shenzhen Elasticsearch meetup on August 21, 2021, jointly hosted by the ES Chinese community and Tencent Cloud, gathered experts from Tencent, Tapdata, ByteDance and Vivo to showcase rapid community growth, compression‑encoding optimizations, real‑time ES‑MongoDB data fusion, custom kernel extensions, large‑scale cluster practices, and concluded with extensive Q&A and networking.

Big DataCluster ManagementElasticsearch
0 likes · 11 min read
Recap of Shenzhen Elasticsearch Meetup – Community Growth, Compression Optimization, Real‑time Data Fusion, and Cluster Practices
Senior Brother's Insights
Senior Brother's Insights
Jul 28, 2021 · Operations

How Zookeeper Prevents Split‑Brain: Inside Quorum‑Based Leader Election

This article explains the split‑brain phenomenon in distributed clusters, uses Zookeeper as a case study to illustrate how network partitions can create multiple leaders, and details Zookeeper's majority‑quorum mechanism, node count considerations, and common strategies for avoiding split‑brain scenarios.

Cluster ManagementDistributed SystemsSplit-Brain
0 likes · 13 min read
How Zookeeper Prevents Split‑Brain: Inside Quorum‑Based Leader Election
Sohu Tech Products
Sohu Tech Products
Jul 14, 2021 · Cloud Native

Limitations and Challenges of Kubernetes in Cluster Management and Application Scenarios

The article examines Kubernetes' widespread adoption, outlines its scalability and multi‑cluster management constraints, discusses practical application scenarios such as deployment models, batch scheduling, and hard multi‑tenancy, and highlights the gaps that still limit its use in large‑scale production environments.

Cloud NativeCluster ManagementKubernetes
0 likes · 21 min read
Limitations and Challenges of Kubernetes in Cluster Management and Application Scenarios
JD Retail Technology
JD Retail Technology
Jun 9, 2021 · Big Data

JD OLAP High‑Availability Practices: ClickHouse and Doris Deployment, Architecture, and Future Plans

This article details JD's OLAP implementation using ClickHouse as the primary engine and Doris as a secondary engine, covering business scenarios, selection criteria, multi‑tenant deployment, high‑availability architecture, encountered challenges, and future roadmap for cloud‑native, scalable analytics.

ClickHouseCloud NativeCluster Management
0 likes · 17 min read
JD OLAP High‑Availability Practices: ClickHouse and Doris Deployment, Architecture, and Future Plans
dbaplus Community
dbaplus Community
Apr 29, 2021 · Operations

How 58.com Scaled Elasticsearch: Cluster Optimization, Automation, and Real‑World Practices

This article details 58.com’s journey with Elasticsearch, covering the challenges of disparate deployments, common problems like disk exhaustion and write slowdown, the governance and automation platform they built, development standards, service architecture, real‑world application cases, and future plans for version upgrades and intelligent diagnostics.

Cluster ManagementElasticsearchIndex Lifecycle
0 likes · 19 min read
How 58.com Scaled Elasticsearch: Cluster Optimization, Automation, and Real‑World Practices