Tagged articles
1413 articles
Page 1 of 15
Architects' Tech Alliance
Architects' Tech Alliance
May 16, 2026 · Industry Insights

Designing a 2026 Ultra‑Large Green AI Data Center: Full Infrastructure Blueprint

This article presents a comprehensive 2026 design plan for an ultra‑large green AI data center with 5,000 cabinets, 150 MW IT load, and 200 MW capacity, detailing market drivers, core metrics, six design principles, site and power architecture, liquid‑cooling, networking, security, and AI‑driven autonomous operations.

2026 designAI data centerDCIM
0 likes · 5 min read
Designing a 2026 Ultra‑Large Green AI Data Center: Full Infrastructure Blueprint
Ops Community
Ops Community
May 9, 2026 · Operations

Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

This article walks through building a simple, cost‑effective high‑availability solution for Nginx using Keepalived’s VRRP‑based VIP failover, covering environment setup, configuration of master and backup nodes, health‑check scripts, testing procedures, troubleshooting tips, and rollback steps.

LinuxNGINXfailover
0 likes · 29 min read
Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide
dbaplus Community
dbaplus Community
Apr 28, 2026 · Backend Development

Designing High‑Availability for Unreliable Third‑Party Services

When downstream APIs are unstable and slow, this article walks through building a dedicated defensive layer that provides a unified abstraction, client‑side governance (rate limiting, retries with idempotency checks), comprehensive observability, and mock‑based testing to keep your system highly available and interview‑ready.

MicroservicesMock TestingObservability
0 likes · 22 min read
Designing High‑Availability for Unreliable Third‑Party Services
Java Backend Full-Stack
Java Backend Full-Stack
Apr 27, 2026 · Databases

Proven Redis Tuning Techniques for Production Environments

This article compiles practical, interview‑ready Redis tuning tips—from strict memory limits and eviction policies to avoiding big keys, hot keys, slow commands, and optimizing persistence, networking, and high‑availability settings—so you can confidently handle Redis performance questions in real‑world deployments.

ConfigurationMemory Managementhigh availability
0 likes · 9 min read
Proven Redis Tuning Techniques for Production Environments
Lobster Programming
Lobster Programming
Apr 15, 2026 · Databases

Choosing the Right Redis Architecture: From Single Node to Cluster

This article reviews the main Redis deployment options—including single‑node, master‑slave with Sentinel, sharding via consistent hashing, and Redis Cluster—explaining their advantages, high‑availability mechanisms, scalability limits, and recommending suitable scenarios for each architecture.

ClusterDeploymenthigh availability
0 likes · 7 min read
Choosing the Right Redis Architecture: From Single Node to Cluster
Ops Community
Ops Community
Apr 9, 2026 · Operations

Mastering Nginx Reverse Proxy: From Basics to Advanced Load Balancing and High Availability

This comprehensive guide explains the fundamentals of reverse proxy, walks through Nginx configuration, load‑balancing algorithms, health‑check setups, caching strategies, session‑persistence methods, high‑availability designs, performance tuning, monitoring, and troubleshooting, providing practical code snippets for real‑world deployments.

NGINXhealth checkhigh availability
0 likes · 30 min read
Mastering Nginx Reverse Proxy: From Basics to Advanced Load Balancing and High Availability
Ops Community
Ops Community
Mar 27, 2026 · Backend Development

Master Nginx Reverse Proxy on Ubuntu 24.04 & Rocky Linux 9.4 – From Installation to Monitoring

This comprehensive guide walks you through installing Nginx 1.27 on Ubuntu 24.04 LTS and Rocky Linux 9.4, configuring reverse proxy, load balancing, SSL/TLS, WebSocket and gRPC support, tuning kernel and Nginx parameters, setting up health checks, high‑availability with Keepalived, and monitoring with Prometheus and Grafana, all with ready‑to‑use code snippets and scripts.

NGINXSSLhigh availability
0 likes · 59 min read
Master Nginx Reverse Proxy on Ubuntu 24.04 & Rocky Linux 9.4 – From Installation to Monitoring
Cognitive Technology Team
Cognitive Technology Team
Mar 27, 2026 · Operations

How to Build a Rock‑Solid High‑Availability Architecture: Redundancy, Defense, and Smooth Deployments

This article breaks down high‑availability architecture into redundancy, defensive degradation, and release mechanisms, offering concrete techniques, real‑world failure case studies, and step‑by‑step configurations to ensure continuous service even under heavy load or component failures.

Kubernetesci/cdcircuit breaker
0 likes · 16 min read
How to Build a Rock‑Solid High‑Availability Architecture: Redundancy, Defense, and Smooth Deployments
Raymond Ops
Raymond Ops
Mar 19, 2026 · Operations

Zero‑Downtime HAProxy Load Balancing: Complete L4/L7 Deployment Guide

This guide walks through installing HAProxy 2.x, configuring L4 TCP and L7 HTTP/HTTPS load balancing for web, MySQL, and Redis, setting up health checks, session persistence, monitoring, high‑availability with Keepalived, performance tuning, security hardening, and step‑by‑step zero‑downtime deployment and rollback procedures.

HAProxyZero Downtimehealth checks
0 likes · 36 min read
Zero‑Downtime HAProxy Load Balancing: Complete L4/L7 Deployment Guide
MaGe Linux Operations
MaGe Linux Operations
Mar 12, 2026 · Backend Development

How to Deploy vLLM Inference Service on Kubernetes with Ingress and Service Load Balancing

This guide walks through deploying a production‑grade vLLM inference service on Kubernetes, covering GPU resource scheduling, Service and Ingress configuration, session affinity, health checks, performance tuning, scaling, monitoring, fault‑tolerance, and best‑practice recommendations for high‑availability AI workloads.

GPUIngressKubernetes
0 likes · 47 min read
How to Deploy vLLM Inference Service on Kubernetes with Ingress and Service Load Balancing
ITPUB
ITPUB
Mar 3, 2026 · Databases

Why Is Installing Modern Databases Still So Painful?

Even in 2026, installing databases like Oracle remains a complex, error‑prone process, and this article explores the historical roots, recent AI‑assisted attempts, and four key reasons why database installation still challenges engineers.

AIDevOpsInstallation
0 likes · 8 min read
Why Is Installing Modern Databases Still So Painful?
ITPUB
ITPUB
Feb 15, 2026 · Backend Development

Mastering Message Queues: From Flash‑Sale Basics to RabbitMQ Production

This guide walks through why a high‑traffic flash‑sale system needs a message queue, explains the three core benefits of async processing, decoupling and traffic‑shaping, and then details RabbitMQ installation, common work patterns, durability, idempotency, ordering, dead‑letter handling, high‑availability clustering and advanced features such as delayed and priority queues.

Backend DevelopmentMessage QueueRabbitMQ
0 likes · 16 min read
Mastering Message Queues: From Flash‑Sale Basics to RabbitMQ Production
ITPUB
ITPUB
Feb 5, 2026 · Databases

Master Oracle 19c RAC Architecture in 41 Diagrams – Quick Technical Guide

This article translates and consolidates Oracle Real Application Clusters 19c Technical Architecture, using 41 detailed diagrams to explain RAC concepts, configurations, cluster components, storage options, tools, and management commands for building and operating high‑availability Oracle databases.

ASMEnterprise ManagerGrid Infrastructure
0 likes · 52 min read
Master Oracle 19c RAC Architecture in 41 Diagrams – Quick Technical Guide
ITPUB
ITPUB
Jan 31, 2026 · Databases

How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS

OpenAI’s engineering team expanded a single‑primary PostgreSQL cluster with nearly 50 read‑only replicas, migrated write‑heavy workloads to Azure Cosmos DB, and applied extensive optimizations to reliably serve the global traffic of ChatGPT and the OpenAI API for 800 million users at multi‑million queries per second.

AzurePostgreSQLRead Replicas
0 likes · 24 min read
How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS
MaGe Linux Operations
MaGe Linux Operations
Jan 30, 2026 · Cloud Computing

Mastering Alibaba Cloud SLB: Build High‑Availability Load Balancing with Terraform

This guide walks through Alibaba Cloud SLB’s architecture, product variants, and environment prerequisites, and step‑by‑step Terraform provisioning for CLB, ALB, and NLB, covering health checks, HTTPS setup, traffic routing, performance testing, best practices, security hardening, monitoring, and disaster‑recovery procedures.

Alibaba CloudSLBTerraform
0 likes · 28 min read
Mastering Alibaba Cloud SLB: Build High‑Availability Load Balancing with Terraform
Java Architect Handbook
Java Architect Handbook
Jan 28, 2026 · Databases

How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write

This article explains the Redis split‑brain problem that can occur in master‑replica clusters, outlines the interview points interviewers look for, and provides a detailed solution using the min‑replicas‑to‑write (or min‑slaves‑to‑write) configuration to sacrifice write availability for data consistency, along with best‑practice recommendations and common pitfalls.

ConfigurationDistributed SystemsSplit-Brain
0 likes · 12 min read
How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write
Architect Chen
Architect Chen
Jan 26, 2026 · Databases

Mastering MySQL Master‑Slave Replication: Architecture, Threads, and Setup

This article explains MySQL master‑slave replication, covering its purpose for high availability and read‑write separation, typical one‑master‑multiple‑slaves architecture, the binlog‑based synchronization mechanism, and the roles of the master’s dump thread and the slave’s I/O and SQL threads.

Database Architecturebinary loghigh availability
0 likes · 3 min read
Mastering MySQL Master‑Slave Replication: Architecture, Threads, and Setup
Ray's Galactic Tech
Ray's Galactic Tech
Jan 25, 2026 · Operations

Why Redis High Availability Fails: Split‑Brain and Replication Storm Explained

The article examines the two most dangerous production failures in Redis high‑availability—split‑brain and replication storm—explaining their causes, real‑world impact, and practical engineering safeguards such as write‑protection parameters, network isolation, backlog sizing, and cascading replication.

Replication StormSplit-BrainWrite Protection
0 likes · 7 min read
Why Redis High Availability Fails: Split‑Brain and Replication Storm Explained
Ops Community
Ops Community
Jan 22, 2026 · Operations

Master HAProxy 3.0: From System Tuning to Advanced Load‑Balancing Practices

This comprehensive guide walks you through HAProxy 3.0’s new features, hardware and OS requirements, step‑by‑step installation, detailed global, frontend, backend configurations, health‑check optimization, monitoring with Prometheus, troubleshooting tips, backup strategies, and best‑practice recommendations for high‑performance load balancing in production environments.

HAProxyLinuxhigh availability
0 likes · 29 min read
Master HAProxy 3.0: From System Tuning to Advanced Load‑Balancing Practices
Ray's Galactic Tech
Ray's Galactic Tech
Jan 20, 2026 · Databases

Mastering Redis High Availability: Replication, Sentinel, and Cluster Deep Dive

This guide walks through Redis's evolution from single‑node replication to Sentinel and native Cluster, explaining each architecture's principles, configuration steps, advantages, drawbacks, performance trade‑offs, and practical deployment recommendations for building highly available and scalable caching systems.

ClusterReplicationhigh availability
0 likes · 11 min read
Mastering Redis High Availability: Replication, Sentinel, and Cluster Deep Dive
java1234
java1234
Jan 10, 2026 · Backend Development

Designing a Highly Available Service Registry: Key Principles and Java Example

This article explains how to design a highly available service registry for microservice architectures, covering high‑availability mechanisms, performance optimizations, scalability strategies, core registry functions, and provides a complete Java Spring Boot implementation using Redis.

JavaMicroservicesSpring Boot
0 likes · 6 min read
Designing a Highly Available Service Registry: Key Principles and Java Example
Raymond Ops
Raymond Ops
Jan 10, 2026 · Operations

Designing Enterprise‑Grade RabbitMQ HA: Architecture, Config, and Best Practices

This guide explains why high availability is critical for RabbitMQ in micro‑service environments, compares cluster modes, provides step‑by‑step commands for building a resilient three‑node cluster, and covers monitoring, failover, performance tuning, and common pitfalls to ensure reliable message delivery.

ClusterRabbitMQhigh availability
0 likes · 12 min read
Designing Enterprise‑Grade RabbitMQ HA: Architecture, Config, and Best Practices
ITPUB
ITPUB
Jan 5, 2026 · Backend Development

How Apache Pulsar Solved Our Financial Messaging Challenges

Facing limited visibility, routing, and security in traditional MQ-based financial systems, a company evaluated its needs for identity control, routing, auditing, low latency, scalability, ordering, and replay, and chose Apache Pulsar for its multi‑cluster, compute‑storage separation, pluggable authentication, rich API, and functions, outlining practical experiences and solutions.

Apache PulsarMessagingdistributed architecture
0 likes · 15 min read
How Apache Pulsar Solved Our Financial Messaging Challenges
Cognitive Technology Team
Cognitive Technology Team
Dec 30, 2025 · Backend Development

How to Prevent Message Queue Reordering: 4 Proven High‑Availability Solutions

This article examines why message queue ordering failures can corrupt data and cause outages, explains four root causes such as concurrent consumption and partitioning, and presents four production‑tested high‑availability patterns—including ordered messages, pre‑condition checks, state‑machine driving, and monitoring—to reliably mitigate MQ disorder.

Backendhigh availabilityordering
0 likes · 9 min read
How to Prevent Message Queue Reordering: 4 Proven High‑Availability Solutions
Ray's Galactic Tech
Ray's Galactic Tech
Dec 29, 2025 · Databases

Mastering PostgreSQL Backup & Replication: A Complete Enterprise Guide

An in‑depth enterprise guide explains why backup and replication are critical for PostgreSQL, compares physical, logical, and logical replication methods, provides step‑by‑step command examples, outlines high‑availability architectures, automation scripts, disaster‑recovery procedures, monitoring queries, and common pitfalls to ensure robust data protection.

PostgreSQLReplicationdisaster recovery
0 likes · 8 min read
Mastering PostgreSQL Backup & Replication: A Complete Enterprise Guide
Xiao Liu Lab
Xiao Liu Lab
Dec 26, 2025 · Operations

How to Achieve RabbitMQ High Availability with HAProxy: A Step‑by‑Step Guide

This tutorial explains why HAProxy is essential for RabbitMQ clusters, walks through installing HAProxy on Ubuntu, configuring load‑balancing and health‑check parameters, integrating with Java applications, and validating automatic failover to ensure high availability and efficient resource utilization.

HAProxyJavaLinux
0 likes · 8 min read
How to Achieve RabbitMQ High Availability with HAProxy: A Step‑by‑Step Guide
Xiao Liu Lab
Xiao Liu Lab
Dec 23, 2025 · Databases

Mastering Redis Master‑Slave Replication: Core Concepts, Workflow, and Configuration

This article explains how Redis master‑slave replication provides hot backup, read‑write separation, high availability, and horizontal scaling by detailing its three‑stage workflow, full and partial synchronization mechanisms, key configuration options, and practical analogies for clear understanding.

Replicationdata synchronizationdatabase
0 likes · 11 min read
Mastering Redis Master‑Slave Replication: Core Concepts, Workflow, and Configuration
Ray's Galactic Tech
Ray's Galactic Tech
Dec 23, 2025 · Operations

20 Essential Kubernetes Ops Tips to Keep Production Clusters Stable

This guide compiles twenty practical Kubernetes operations tips drawn from real‑world production experience, covering high availability, performance tuning, monitoring, automation, security, and advanced learning to help teams build and maintain reliable, resilient clusters.

OpsSecurityhigh availability
0 likes · 8 min read
20 Essential Kubernetes Ops Tips to Keep Production Clusters Stable
Raymond Ops
Raymond Ops
Dec 23, 2025 · Databases

Master MySQL in Production: From Configuration Tuning to SQL Performance Optimization

This comprehensive guide walks you through a real‑world MySQL outage, then details step‑by‑step configuration tweaks, InnoDB parameter tuning, connection and thread settings, index design, query rewrites, monitoring scripts, backup strategies, high‑availability replication, and essential tooling to keep your database fast and reliable.

Database Configurationhigh availabilitymonitoring
0 likes · 13 min read
Master MySQL in Production: From Configuration Tuning to SQL Performance Optimization
Raymond Ops
Raymond Ops
Dec 22, 2025 · Operations

Build a High‑Availability Prometheus Monitoring System from Scratch: Pitfalls & Performance Tuning

This guide walks you through constructing a production‑grade, highly available Prometheus monitoring stack, covering architecture choices, sharding strategies, common pitfalls such as memory bloat, query latency and storage growth, and provides concrete tuning steps, Kubernetes deployment examples, and advanced optimisation techniques.

AlertingKubernetesPrometheus
0 likes · 11 min read
Build a High‑Availability Prometheus Monitoring System from Scratch: Pitfalls & Performance Tuning
Raymond Ops
Raymond Ops
Dec 17, 2025 · Operations

Build a Production‑Ready Prometheus HA Architecture with Federation and Remote Storage

Learn how to design and implement a robust, production‑grade Prometheus high‑availability solution using a federated global cluster, multiple business‑level instances, remote storage with Thanos or VictoriaMetrics, Docker‑Compose deployment, health‑check scripts, performance metrics, alerting rules, and best‑practice operational guidelines.

Docker ComposeFederationRemote Storage
0 likes · 17 min read
Build a Production‑Ready Prometheus HA Architecture with Federation and Remote Storage
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 16, 2025 · Databases

Designing a Million‑QPS Database Architecture: Sharding, Caching, and High Availability

This article explains how to architect a database system that can sustain tens of millions of queries per second by combining sharding, read‑write separation, multi‑layer caching, traffic shaping, and robust high‑availability strategies to keep most requests off the database and ensure reliable data storage.

high availabilityperformance
0 likes · 5 min read
Designing a Million‑QPS Database Architecture: Sharding, Caching, and High Availability
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Dec 15, 2025 · Databases

How TiDB Achieves Multi‑Datacenter High Availability with Multi‑Raft and TiCDC

This article explains TiDB's distributed, financial‑grade high‑availability architecture, covering single‑cluster same‑zone multi‑datacenter deployment, cross‑cluster DTS synchronization, underlying Raft and label mechanisms, configuration examples, performance trade‑offs, and real‑world monitoring results on the HULK cloud platform.

TiCDCTiDBdistributed database
0 likes · 16 min read
How TiDB Achieves Multi‑Datacenter High Availability with Multi‑Raft and TiCDC
Ray's Galactic Tech
Ray's Galactic Tech
Dec 12, 2025 · Cloud Native

Inside the Kubernetes Master: A Complete Breakdown of Core Components

Master nodes act as the brain of a Kubernetes cluster, hosting essential components such as kube‑apiserver, etcd, kube‑scheduler, kube‑controller‑manager and optionally cloud‑controller‑manager, each with distinct roles, high‑availability designs, security considerations, and operational workflows that together orchestrate and maintain cluster state.

Control PlaneMaster NodeScheduler
0 likes · 8 min read
Inside the Kubernetes Master: A Complete Breakdown of Core Components
Java Web Project
Java Web Project
Dec 7, 2025 · Databases

What Makes TiDB a NewSQL Powerhouse? A Deep Dive into Architecture, Features, and Use Cases

This article analyzes TiDB as a distributed NewSQL database, explaining the evolution from traditional SQL to NoSQL and NewSQL, detailing TiDB's core components, elastic scaling, ACID transactions, HTAP capabilities, high‑availability design, compatibility with MySQL, real‑world use cases, and its limitations compared to conventional databases.

HTAPMySQL compatibilityNewSQL
0 likes · 24 min read
What Makes TiDB a NewSQL Powerhouse? A Deep Dive into Architecture, Features, and Use Cases
Ctrip Technology
Ctrip Technology
Dec 5, 2025 · Databases

How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

This article explains the design and implementation of Ctrip's Data Replication Center (DRC), a MySQL‑based high‑availability system that solves cross‑region data loop, progress tracking, concurrency, DDL handling, and conflict resolution to achieve low‑latency, reliable data replication for global travel services.

Distributed SystemsGTIDcross-region
0 likes · 21 min read
How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication
Architect's Journey
Architect's Journey
Dec 1, 2025 · Backend Development

Designing Three‑High Systems: Practical Performance Tuning and Fault‑Tolerant Architecture

The article breaks down the design logic and implementation steps for high‑performance, high‑concurrency, and high‑availability systems, covering bottleneck identification, read/write optimization, three‑dimensional scaling, and concrete fault‑tolerance strategies to build resilient, scalable services.

System Architecturefault tolerancehigh availability
0 likes · 15 min read
Designing Three‑High Systems: Practical Performance Tuning and Fault‑Tolerant Architecture
Old Meng AI Explorer
Old Meng AI Explorer
Nov 26, 2025 · Operations

How Alertmanager Turns Chaos into Calm: Mastering Alert Management for DevOps

Alertmanager, the official Prometheus alert manager, consolidates redundant alerts, supports silencing, inhibition, multi‑channel routing, and high‑availability clustering, enabling DevOps teams to quickly pinpoint critical issues, reduce noise, and streamline incident response across large server fleets with simple YAML configuration and command‑line tools.

Alert ManagementAlertmanagerDevOps
0 likes · 10 min read
How Alertmanager Turns Chaos into Calm: Mastering Alert Management for DevOps
DevOps Coach
DevOps Coach
Nov 11, 2025 · Cloud Computing

Why the US‑East‑1 AWS Outage Happened and How to Guard Against It

On October 19‑20 a massive AWS failure in the US‑East‑1 region crippled a large portion of the internet, exposing how a faulty internal monitoring tool, DynamoDB’s lack of cross‑region replication, and unchecked retry storms can cascade into a widespread outage, and offering concrete operational lessons for cloud teams.

AWSDynamoDBOutage
0 likes · 7 min read
Why the US‑East‑1 AWS Outage Happened and How to Guard Against It
MaGe Linux Operations
MaGe Linux Operations
Nov 9, 2025 · Backend Development

How to Stop Redis Cache Penetration, Breakdown, and Avalanche – Proven Solutions Inside

This comprehensive guide explains the causes of Redis cache penetration, breakdown, and avalanche, and provides production‑tested solutions such as Bloom filters, distributed locks, logical expiration, random TTL, cache pre‑warming, multi‑level caching, high‑availability deployment, monitoring, and backup strategies.

Spring Bootbloom-filterhigh availability
0 likes · 42 min read
How to Stop Redis Cache Penetration, Breakdown, and Avalanche – Proven Solutions Inside
Ops Community
Ops Community
Nov 9, 2025 · Operations

How to Achieve 99.99% Uptime with Keepalived Dual‑Node HA

This guide explains how to design a high‑availability architecture using Keepalived's VRRP‑based active‑passive failover, covering technical features, applicable scenarios, environment requirements, step‑by‑step installation and configuration for services like Nginx, MySQL and Redis, plus best practices, troubleshooting, monitoring and backup strategies.

NGINXVRRPhigh availability
0 likes · 46 min read
How to Achieve 99.99% Uptime with Keepalived Dual‑Node HA
Ops Community
Ops Community
Nov 8, 2025 · Operations

Mastering Nginx Reverse Proxy & Load Balancing: Best Practices for High‑Performance Deployments

This comprehensive guide walks you through Nginx reverse proxy and load balancing fundamentals, key features, suitable scenarios, environment prerequisites, step‑by‑step installation, core configuration, performance tuning, security hardening, high‑availability designs, troubleshooting, monitoring, backup strategies, real‑world case studies, and advanced learning paths for production‑grade deployments.

Performance OptimizationSecurityhigh availability
0 likes · 56 min read
Mastering Nginx Reverse Proxy & Load Balancing: Best Practices for High‑Performance Deployments
MaGe Linux Operations
MaGe Linux Operations
Nov 8, 2025 · Backend Development

Mastering Redis Cache: Prevent Penetration, Breakdown, and Avalanche with Proven Solutions

This comprehensive guide explains the three major Redis cache issues—penetration, breakdown, and avalanche—detailing their causes, impacts, and production‑ready solutions such as Bloom filters, distributed locks, logical expiration, random TTL, multi‑level caching, high‑availability setups, monitoring, backup, and best‑practice recommendations.

Performance OptimizationSpring Bootbloom-filter
0 likes · 56 min read
Mastering Redis Cache: Prevent Penetration, Breakdown, and Avalanche with Proven Solutions
MaGe Linux Operations
MaGe Linux Operations
Nov 5, 2025 · Databases

Deploy Redis Sentinel for High Availability in 30 Minutes – Step‑by‑Step Guide

Learn how to set up Redis Sentinel for high‑availability caching, covering prerequisites, anti‑patterns, detailed configuration of master, replicas and Sentinel nodes, firewall rules, monitoring, failover testing, troubleshooting, performance tuning, backup, rollback and best practices—all achievable within a 30‑minute deployment.

LinuxReplicationfailover
0 likes · 38 min read
Deploy Redis Sentinel for High Availability in 30 Minutes – Step‑by‑Step Guide
Top Architect
Top Architect
Nov 3, 2025 · Operations

How to Build Nginx High Availability with Keepalived on Two VMs

This guide walks through installing Nginx on two CentOS 7 virtual machines, configuring keepalived for VRRP‑based high availability, creating a virtual IP, and demonstrating failover scenarios to ensure continuous web service availability in production environments.

LinuxNGINXVRRP
0 likes · 10 min read
How to Build Nginx High Availability with Keepalived on Two VMs
Linux Ops Smart Journey
Linux Ops Smart Journey
Nov 3, 2025 · Cloud Native

How to Build a Production-Ready High-Availability Keycloak Cluster

Learn step‑by‑step how to design and deploy a production‑grade, high‑availability Keycloak cluster using external databases, distributed session management with Infinispan, HAProxy reverse proxy, TLS termination, and Docker‑Compose orchestration, ensuring scalability, fault tolerance, and secure identity management for cloud‑native applications.

Cloud NativeDevOpsDocker Compose
0 likes · 8 min read
How to Build a Production-Ready High-Availability Keycloak Cluster
MaGe Linux Operations
MaGe Linux Operations
Nov 1, 2025 · Operations

Zero‑Downtime HAProxy Load Balancing: Full 4‑Layer & 7‑Layer Deployment Guide

This guide walks through installing HAProxy, configuring both layer‑4 TCP and layer‑7 HTTP/HTTPS load balancing with health checks, session persistence, advanced algorithms, high‑availability via Keepalived, monitoring with HAProxy stats and Prometheus, performance tuning, security hardening, and step‑by‑step rollback procedures for zero‑downtime deployments.

HAProxyOpsZero Downtime
0 likes · 36 min read
Zero‑Downtime HAProxy Load Balancing: Full 4‑Layer & 7‑Layer Deployment Guide
DataFunSummit
DataFunSummit
Oct 29, 2025 · Big Data

How Huolala Scaled to 40PB: Inside Their Evolving Big Data Storage Architecture

Huolala, founded in 2013, runs a massive cross‑cloud hybrid big‑data storage platform of over 40 PB across 3,000+ machines, evolving through four online‑storage phases, robust HA design, performance‑cost optimizations, AI vector storage, and a cost‑governance system that saved more than half of its storage expenses.

AI vector storageBig DataCost Optimization
0 likes · 18 min read
How Huolala Scaled to 40PB: Inside Their Evolving Big Data Storage Architecture
Senior Brother's Insights
Senior Brother's Insights
Oct 27, 2025 · Databases

How Does MySQL Power High‑Performance OLTP Workloads?

This article explains what OLTP (Online Transaction Processing) is, outlines its key characteristics, and details how MySQL—through ACID‑compliant transactions, the InnoDB storage engine, various indexing strategies, fast locking mechanisms, query optimization, and high‑availability features—effectively supports high‑concurrency, low‑latency transactional workloads.

Database TransactionsInnoDBOLTP
0 likes · 9 min read
How Does MySQL Power High‑Performance OLTP Workloads?
Ops Community
Ops Community
Oct 23, 2025 · Operations

Zero‑Downtime Nginx Load Balancing: Build a 99.99% HA Architecture

This guide walks through designing and implementing a highly available Nginx load‑balancing solution—covering applicable scenarios, prerequisites, environment matrix, step‑by‑step configuration of Nginx, Keepalived, SSL termination, health checks, monitoring, performance tuning, security hardening, troubleshooting, and a concise list of best‑practice recommendations.

SSLhigh availabilitykeepalived
0 likes · 29 min read
Zero‑Downtime Nginx Load Balancing: Build a 99.99% HA Architecture
Ray's Galactic Tech
Ray's Galactic Tech
Oct 17, 2025 · Backend Development

Prevent Redis Cache Avalanche, Penetration & Breakdown: A Practical High‑Availability Guide

This guide explains the three major Redis cache failure patterns—avalanche, penetration, and breakdown—detailing their causes and offering concrete mitigation techniques such as staggered TTLs, empty‑object caching, Bloom filters, logical expiration, distributed locks, high‑availability clusters, and comprehensive monitoring to ensure robust high‑availability systems.

Cachecache-avalanchecache-breakdown
0 likes · 7 min read
Prevent Redis Cache Avalanche, Penetration & Breakdown: A Practical High‑Availability Guide
dbaplus Community
dbaplus Community
Oct 16, 2025 · Backend Development

How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience

This article presents a step‑by‑step engineering guide for designing, evolving, and operating a high‑traffic open platform, covering three‑layer decoupled architecture, multi‑level caching, asynchronous message queues, distributed transaction models, high‑availability strategies, and phased rollout plans to sustain billions of daily API calls.

Distributed SystemsOpen Platformcaching
0 likes · 20 min read
How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience
Su San Talks Tech
Su San Talks Tech
Oct 10, 2025 · Operations

How to Boost System Stability: Observability, Resilience, and High‑Availability Strategies

This comprehensive guide explains how to improve system stability and reduce online incidents by building observability, implementing distributed tracing, applying rate‑limiting and circuit‑breaker patterns, adopting blue‑green and gray deployments, managing data consistency with distributed transactions, planning capacity, optimizing performance, and preparing emergency response plans.

Deployment StrategiesDistributed TracingDistributed Transactions
0 likes · 19 min read
How to Boost System Stability: Observability, Resilience, and High‑Availability Strategies
dbaplus Community
dbaplus Community
Oct 5, 2025 · Cloud Native

Binary Deployment vs kubeadm: Which Kubernetes Setup Fits Your Enterprise?

This article compares manual binary deployment and kubeadm‑based installation of Kubernetes, covering core architectural differences, high‑availability designs, upgrade procedures, security models, enterprise scenario‑driven selection criteria, practical implementation steps, and concluding recommendations for choosing the most suitable approach.

EnterpriseKubernetesSecurity
0 likes · 14 min read
Binary Deployment vs kubeadm: Which Kubernetes Setup Fits Your Enterprise?
Architecture Breakthrough
Architecture Breakthrough
Sep 28, 2025 · Operations

How to Build an Organizational High‑Availability Mechanism for Banking IT Production Issues

This article outlines a comprehensive, step‑by‑step framework for establishing a high‑availability system in large‑scale banking IT, covering goal definition, logical architecture, service classification, key activity identification, capability upgrades, monitoring, emergency‑response asset creation, technical debt tracking, and periodic post‑mortem redesign.

OperationsProcess DesignTechnical Debt
0 likes · 10 min read
How to Build an Organizational High‑Availability Mechanism for Banking IT Production Issues
Ray's Galactic Tech
Ray's Galactic Tech
Sep 27, 2025 · Databases

Master PostgreSQL Streaming Replication: Step‑by‑Step Setup Guide

This comprehensive guide explains PostgreSQL streaming replication concepts, required environment, primary and standby configuration commands, verification queries, failover procedures, and production best‑practice recommendations, enabling you to build a reliable high‑availability database cluster.

Database ReplicationPostgreSQLStreaming Replication
0 likes · 7 min read
Master PostgreSQL Streaming Replication: Step‑by‑Step Setup Guide
JD Tech
JD Tech
Sep 26, 2025 · Operations

Avoiding High‑Availability Pitfalls: Real‑World JD Lessons and Solutions

This article examines common high‑availability challenges across applications, databases, caches, message queues, containers, and GC, presenting real JD engineering cases, root‑cause analyses, and practical mitigation strategies to help engineers design more resilient systems.

Message Queuedatabasefault tolerance
0 likes · 37 min read
Avoiding High‑Availability Pitfalls: Real‑World JD Lessons and Solutions
Wukong Talks Architecture
Wukong Talks Architecture
Sep 24, 2025 · Databases

How Meiyou Scaled Overseas Messaging with TiDB Architecture

Meiyou, a leading women‑health platform, migrated its overseas messaging system and other core services from MySQL to TiDB, detailing the selection process, performance testing, deployment configurations, and the resulting gains in scalability, latency, high availability, and reduced operational costs.

TiDBdatabase migrationhigh availability
0 likes · 12 min read
How Meiyou Scaled Overseas Messaging with TiDB Architecture
Raymond Ops
Raymond Ops
Sep 22, 2025 · Databases

Master‑Slave, Sentinel, and Sharding: Complete Guide to Redis Cluster Architectures

This article explains Redis’s three clustering options—master‑slave replication, Sentinel high‑availability, and sharding—detailing their architectures, setup steps, synchronization mechanisms, advantages, drawbacks, and common interview questions, helping readers choose and implement the right solution for high‑performance, scalable data storage.

ClusterReplicationhigh availability
0 likes · 18 min read
Master‑Slave, Sentinel, and Sharding: Complete Guide to Redis Cluster Architectures
MaGe Linux Operations
MaGe Linux Operations
Sep 22, 2025 · Databases

Redis Ops Survival Guide: From Data Loss Nightmares to Mastering High‑Availability

This comprehensive guide walks you through real‑world Redis failure stories, explains why Redis is a critical backbone for modern applications, and provides step‑by‑step high‑availability designs, troubleshooting mind maps, monitoring setups, security hardening, automation scripts, cloud‑native deployments, and future‑proofing tips for engineers.

high availabilityperformance tuningredis
0 likes · 35 min read
Redis Ops Survival Guide: From Data Loss Nightmares to Mastering High‑Availability
Ops Community
Ops Community
Sep 19, 2025 · Operations

From Midnight Outage to Zero Downtime: Mastering NFS High‑Availability

This article recounts a critical NFS failure that caused massive loss, then walks through practical high‑availability designs—including Keepalived + DRBD, GlusterFS migration, and cloud‑native CSI storage—while sharing real‑world pitfalls, monitoring strategies, and forward‑looking recommendations for resilient file‑system operations.

Distributed File SystemNFShigh availability
0 likes · 12 min read
From Midnight Outage to Zero Downtime: Mastering NFS High‑Availability
Tech Freedom Circle
Tech Freedom Circle
Sep 19, 2025 · Interview Experience

Designing a Rock‑Solid High‑Availability Solution for Unreliable Third‑Party Services

When third‑party services frequently fail, this article walks through a systematic high‑availability design—including an ACL anti‑corruption layer, strategy‑pattern master‑slave routing, precise rate limiting, circuit‑breaker fallback, full observability, async degradation, and mock testing—to keep external dependencies as stable as a mountain.

ACLMock TestingStrategy Pattern
0 likes · 24 min read
Designing a Rock‑Solid High‑Availability Solution for Unreliable Third‑Party Services
Ops Community
Ops Community
Sep 17, 2025 · Operations

Mastering System Fault Tolerance: From Theory to Production‑Ready High‑Availability

This comprehensive guide explores the philosophy, core patterns, and practical techniques for designing fault‑tolerant, highly available systems, covering circuit breakers, retries, rate limiting, monitoring, cloud‑native deployment, and real‑world case studies to help engineers build resilient production architectures.

Cloud Nativecircuit breakerfault tolerance
0 likes · 24 min read
Mastering System Fault Tolerance: From Theory to Production‑Ready High‑Availability
Raymond Ops
Raymond Ops
Sep 16, 2025 · Cloud Native

How to Build a Secure High‑Availability Etcd Cluster on Linux

This guide walks through installing etcd, configuring a three‑node high‑availability cluster with TLS certificates, setting up host files, disabling SELinux and firewalld, creating a Certificate Authority using cfssl, generating node certificates, distributing them, and finally deploying and verifying the cluster on Linux systems.

CertificateCloud NativeLinux
0 likes · 19 min read
How to Build a Secure High‑Availability Etcd Cluster on Linux
Raymond Ops
Raymond Ops
Sep 13, 2025 · Operations

How to Build a High‑Availability RabbitMQ Cluster on CentOS with Docker

This guide walks through the full process of analyzing requirements, selecting self‑hosted servers, preparing CentOS nodes, installing Docker and Docker‑Compose, configuring RabbitMQ, and deploying a three‑node high‑availability RabbitMQ cluster with detailed commands and configuration files.

ClusterDockerDocker Compose
0 likes · 12 min read
How to Build a High‑Availability RabbitMQ Cluster on CentOS with Docker
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Sep 11, 2025 · Operations

Mastering Load Balancing: Single, Dual, and Multi‑Layer Architectures Explained

This article explains the fundamentals of load balancing, describing single‑layer, dual‑layer, and multi‑layer architectures, their advantages, disadvantages, and suitable scenarios, helping readers choose the right design based on traffic volume, availability, security, topology, budget, and operational capabilities.

OperationsScalabilityhigh availability
0 likes · 6 min read
Mastering Load Balancing: Single, Dual, and Multi‑Layer Architectures Explained
Aikesheng Open Source Community
Aikesheng Open Source Community
Sep 10, 2025 · Databases

Master SQL Server Operations: From Installation to High‑Availability

The Aikexing open‑source community announces a giveaway of the technical book “SQL Server Operations Guide”, detailing its four‑part content on installation, performance tuning, security, multimodal data, and high‑availability architecture, authored by veteran DBA Lin Yonghua, and inviting beginners, developers, and educators to participate.

Book GiveawayDatabase AdministrationPerformance Optimization
0 likes · 12 min read
Master SQL Server Operations: From Installation to High‑Availability
MaGe Linux Operations
MaGe Linux Operations
Sep 8, 2025 · Big Data

Build Enterprise‑Grade HDFS HA and Optimize YARN Scheduling from Scratch

This comprehensive guide walks you through constructing a fault‑tolerant HDFS high‑availability architecture, configuring dual NameNodes with ZooKeeper and JournalNode clusters, fine‑tuning YARN resource schedulers, implementing monitoring, automated failover testing, and performance optimization, all backed by real‑world production experiences and code examples.

Big Data OperationsHDFSYARN
0 likes · 24 min read
Build Enterprise‑Grade HDFS HA and Optimize YARN Scheduling from Scratch
MaGe Linux Operations
MaGe Linux Operations
Sep 6, 2025 · Databases

How to Build a High‑Availability MySQL Master‑Slave Cluster and Automate Failover

This guide walks through the reasons for MySQL master‑slave replication, explains its core mechanisms, details step‑by‑step environment planning, configuration, data initialization, replication setup, monitoring, failover with MHA, read‑write splitting using ProxySQL, performance tuning, troubleshooting, and best‑practice recommendations for enterprise‑grade high availability.

Replicationfailoverhigh availability
0 likes · 27 min read
How to Build a High‑Availability MySQL Master‑Slave Cluster and Automate Failover
Raymond Ops
Raymond Ops
Sep 5, 2025 · Databases

Why Redis Needs a Cluster: Step‑by‑Step Setup, Configuration & Best Practices

This guide explains the need for Redis clustering to achieve high availability, walks through Redis 3.0's decentralized cluster configuration, shows how to modify redis.conf, start multiple nodes, create the cluster, use hash slots, handle failures, and connect via Java Jedis, highlighting both advantages and limitations.

ClusterConfigurationJava
0 likes · 13 min read
Why Redis Needs a Cluster: Step‑by‑Step Setup, Configuration & Best Practices
JD Tech Talk
JD Tech Talk
Sep 4, 2025 · Operations

Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions

This article analyzes the multi‑dimensional challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—by sharing real JD engineering scenarios, common failure patterns, and concrete mitigation strategies to help engineers design more resilient services.

BackendDistributed Systemsfault tolerance
0 likes · 36 min read
Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions
JD Cloud Developers
JD Cloud Developers
Sep 4, 2025 · Operations

Mastering High‑Availability: JD Real‑World Pitfalls & Fixes for Apps, DBs, Cache & MQ

This article shares JD's practical high‑availability architecture lessons, detailing common pitfalls across applications, databases, caches, RPC frameworks, containers, data centers, GC, and message queues, and provides concrete troubleshooting steps and optimization techniques to help engineers design more resilient, fault‑tolerant systems.

BackendSystem Designfault tolerance
0 likes · 36 min read
Mastering High‑Availability: JD Real‑World Pitfalls & Fixes for Apps, DBs, Cache & MQ
JD Retail Technology
JD Retail Technology
Sep 4, 2025 · Operations

Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems

This article walks through the challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—using JD’s production experiences to highlight common pitfalls, root‑cause analyses, and practical mitigation strategies for engineers seeking resilient architecture.

CacheDistributed SystemsJDK
0 likes · 37 min read
Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems
Raymond Ops
Raymond Ops
Sep 1, 2025 · Operations

Mastering Keepalived: A Complete Guide to VRRP‑Based High Availability with LVS

This tutorial explains how Keepalived provides targeted high‑availability for LVS clusters by implementing VRRP, details its architecture, walks through installation, configuration of VRRP and virtual servers, shows health‑check scripts, and demonstrates testing of fail‑over and load‑balancing behavior.

IPVSLVSLinux
0 likes · 16 min read
Mastering Keepalived: A Complete Guide to VRRP‑Based High Availability with LVS
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Aug 28, 2025 · Cloud Computing

How VPC Private DNS Powers Secure, Scalable Cloud Networks

VPC private DNS provides an isolated, internal name resolution service for cloud resources, enabling secure, efficient communication, private domain management, recursive queries, and seamless integration with public DNS, while offering advantages such as enhanced security, flexible architecture, simplified operations, high availability, and support for hybrid cloud scenarios.

Private DNSVPCcloud networking
0 likes · 12 min read
How VPC Private DNS Powers Secure, Scalable Cloud Networks
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Aug 27, 2025 · Databases

How RedHub Revolutionizes Database Access for Billion‑User Scale

RedHub is a next‑generation database proxy built by Xiaohongshu that unifies fragmented access methods, leverages PolarDB‑X for distributed SQL execution, and delivers high‑performance, highly available, and easily observable database connectivity, enabling seamless migration and advanced features for massive‑scale workloads.

Database ProxyDistributed SQLObservability
0 likes · 15 min read
How RedHub Revolutionizes Database Access for Billion‑User Scale
Ops Community
Ops Community
Aug 26, 2025 · Databases

5 Redis High‑Availability Architectures – Why Most Fail and the Hidden Solution

This article examines why single‑node Redis is a reliability nightmare, then rigorously evaluates five high‑availability architectures—including Sentinel, Redis Cluster, Codis, Redis Enterprise, and cloud‑native services—detailing their scenarios, pros, cons, performance metrics, deployment scripts, monitoring setups, and a decision‑making guide to help you choose the optimal solution.

Clusterhigh availabilityperformance
0 likes · 14 min read
5 Redis High‑Availability Architectures – Why Most Fail and the Hidden Solution
Tech Freedom Circle
Tech Freedom Circle
Aug 24, 2025 · Operations

How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem

A leading financial platform suffered a six‑hour outage and $170 million loss when its Nacos service‑registry cluster entered a split‑brain state due to network partition, exposing flaws in AP‑mode deployment, monitoring gaps, and cascading failures that were later resolved through Raft migration, multi‑active architecture, and client‑side resilience.

Distributed SystemsMicroservicesNacos
0 likes · 32 min read
How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem
Ops Community
Ops Community
Aug 21, 2025 · Databases

Redis Sentinel vs Cluster: Choosing the Right High‑Availability Architecture

This guide compares Redis Sentinel and Redis Cluster deployment modes, detailing architecture diagrams, performance benchmarks, configuration steps, operational trade‑offs, and selection criteria to help engineers decide the optimal high‑availability solution for their workloads.

ClusterDeploymenthigh availability
0 likes · 13 min read
Redis Sentinel vs Cluster: Choosing the Right High‑Availability Architecture
Ops Community
Ops Community
Aug 20, 2025 · Databases

How MySQL Master‑Slave Replication and Read‑Write Splitting Turn a Single Server into a High‑Availability Architecture

This article walks through why a single MySQL instance often fails under load, explains the fundamentals of asynchronous master‑slave replication and read‑write splitting, provides step‑by‑step configuration scripts, highlights common pitfalls with solutions, and shows advanced optimization and monitoring techniques for building a scalable, high‑availability MySQL architecture.

ProxySQLhigh availabilitymysql
0 likes · 16 min read
How MySQL Master‑Slave Replication and Read‑Write Splitting Turn a Single Server into a High‑Availability Architecture
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Aug 20, 2025 · Cloud Computing

How Alibaba Cloud Achieves Rock‑Solid IaaS Stability: Design Principles, Metrics, and Engineering Practices

This article explains Alibaba Cloud's comprehensive approach to IaaS stability, covering shared responsibility with customers, availability metrics, design principles, compute, storage, and network engineering practices that together deliver rock‑solid reliability for millions of workloads.

IaaSSystem Designhigh availability
0 likes · 56 min read
How Alibaba Cloud Achieves Rock‑Solid IaaS Stability: Design Principles, Metrics, and Engineering Practices