Tagged articles
1413 articles
Page 2 of 15
Alibaba Cloud Native
Alibaba Cloud Native
Aug 19, 2025 · Artificial Intelligence

Boost Dify AI App Performance with Higress AI Gateway: A Full-Scale High‑Availability Guide

This guide explains why Dify’s system components and model services become performance bottlenecks at scale, and how integrating the Higress AI gateway can provide protocol standardization, observability, security, and stability features to achieve full‑stack high availability for AI applications.

AI gatewayCloud NativeDify
0 likes · 16 min read
Boost Dify AI App Performance with Higress AI Gateway: A Full-Scale High‑Availability Guide
MaGe Linux Operations
MaGe Linux Operations
Aug 14, 2025 · Backend Development

Designing Enterprise‑Grade RabbitMQ High‑Availability: Architecture & Best Practices

This article explores why high availability is critical for RabbitMQ in micro‑service environments, presents a full HA architecture diagram, compares cluster modes, details mirror‑queue and quorum‑queue configurations, walks through production‑grade setup steps, performance tuning, monitoring, network‑partition handling, failover procedures, and shares practical lessons learned.

ClusterRabbitMQhigh availability
0 likes · 14 min read
Designing Enterprise‑Grade RabbitMQ High‑Availability: Architecture & Best Practices
Raymond Ops
Raymond Ops
Aug 11, 2025 · Operations

Mastering Redis Sentinel: Automatic Failover and High Availability Explained

This article provides a comprehensive guide to Redis Sentinel, covering its purpose, architecture, monitoring functions, discovery mechanisms, failover process, leader election, configuration options, and practical commands for achieving reliable high‑availability in Redis deployments.

Operationsfailoverhigh availability
0 likes · 17 min read
Mastering Redis Sentinel: Automatic Failover and High Availability Explained
StarRocks
StarRocks
Aug 6, 2025 · Databases

How Qunar Migrated to StarRocks: Architecture, Performance Gains & Best Practices

This article details Qunar's transition to StarRocks as a unified OLAP engine, covering the business background, engine evaluation, architecture redesign, observability, high‑availability strategies, query‑performance optimizations, real‑world application cases, community contributions, and future plans.

Data PlatformOLAPObservability
0 likes · 21 min read
How Qunar Migrated to StarRocks: Architecture, Performance Gains & Best Practices
Tech Freedom Circle
Tech Freedom Circle
Aug 4, 2025 · Operations

How Do Projects Achieve High Availability Without Multi‑Site Active‑Active? – A Meituan Interview Question

The article analyzes high‑availability concepts, from single‑machine risks to multi‑site active‑active architectures, compares cold and hot backup strategies, discusses network latency challenges, and presents Ele.me’s cell‑based, sharding‑driven multi‑region solution with concrete examples, tables, and code snippets.

cell-based architecturedata replicationdisaster recovery
0 likes · 28 min read
How Do Projects Achieve High Availability Without Multi‑Site Active‑Active? – A Meituan Interview Question
MaGe Linux Operations
MaGe Linux Operations
Aug 3, 2025 · Operations

Avoid 3 Hidden Nginx+Keepalived HA Pitfalls That 90% of Ops Encounter

This article reveals three hard‑to‑detect pitfalls in Nginx + Keepalived high‑availability setups—split‑brain caused by network partitions, inadequate health‑check scripts, and unsafe configuration‑sync timing—provides real‑world incident examples, and offers complete, battle‑tested solutions with ready‑to‑use scripts.

Configuration SyncNGINXSplit-Brain
0 likes · 16 min read
Avoid 3 Hidden Nginx+Keepalived HA Pitfalls That 90% of Ops Encounter
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jul 30, 2025 · Databases

Seamless Multi-DataCenter Database Migration: Strategies and Domain Scheduling

Learn how to execute a zero‑downtime, risk‑controlled database migration across data centers using pre‑expansion, cross‑room master switch, intelligent domain scheduling, and step‑by‑step operational guides—including VIP handling, global vs. zone‑specific domains, and post‑migration validation—to ensure continuous service and optimal resource elasticity.

Domain SchedulingZero Downtimedatabase migration
0 likes · 13 min read
Seamless Multi-DataCenter Database Migration: Strategies and Domain Scheduling
MaGe Linux Operations
MaGe Linux Operations
Jul 27, 2025 · Databases

Master MySQL Performance Tuning & Troubleshooting on Linux: A Complete Guide

This comprehensive guide walks you through why MySQL performance matters, how to benchmark and establish baselines, apply Linux system and MySQL configuration optimizations, fine‑tune SQL queries, diagnose common failures, set up robust monitoring, and implement high‑availability architectures for production environments.

Database Optimizationhigh availabilitymysql
0 likes · 18 min read
Master MySQL Performance Tuning & Troubleshooting on Linux: A Complete Guide
MaGe Linux Operations
MaGe Linux Operations
Jul 26, 2025 · Operations

How to Build a High‑Availability Prometheus Monitoring System: Pitfalls & Performance Tuning

This article walks you through building a production‑grade, highly available Prometheus monitoring system, covering architecture design with federation and sharding, common pitfalls such as memory bloat, query latency and storage growth, plus practical tuning, deployment, alerting and advanced optimization techniques.

Kuberneteshigh availabilityperformance tuning
0 likes · 10 min read
How to Build a High‑Availability Prometheus Monitoring System: Pitfalls & Performance Tuning
Architect's Guide
Architect's Guide
Jul 21, 2025 · Operations

How to Achieve Five Nines: Practical High‑Availability Strategies for Modern Web Systems

This article explains key high‑availability concepts such as availability metrics, microservice modularization, load balancing, rate limiting, circuit breaking, isolation, retry strategies, rollback plans, stress testing, monitoring, and on‑call processes, providing concrete design guidelines for building resilient internet services.

Circuit BreakingMicroserviceshigh availability
0 likes · 12 min read
How to Achieve Five Nines: Practical High‑Availability Strategies for Modern Web Systems
Su San Talks Tech
Su San Talks Tech
Jul 19, 2025 · Operations

Mastering Load Balancing: Architecture, Algorithms, and Real-World Pitfalls

This article explores the four‑layer load‑balancing architecture, five common algorithms (including Round Robin, Weighted RR, Least Connections, Consistent Hashing, and AI‑driven adaptive load), high‑availability design, deep pitfalls, and a self‑built load balancer implementation, providing practical code examples and best‑practice guidelines.

Backend ArchitectureOperationsdistributed algorithms
0 likes · 10 min read
Mastering Load Balancing: Architecture, Algorithms, and Real-World Pitfalls
macrozheng
macrozheng
Jul 12, 2025 · Databases

NewSQL vs Middleware Sharding: Which Architecture Truly Wins?

This article objectively compares middleware‑based sharding with NewSQL distributed databases, examining their architectures, transaction support, CAP implications, high‑availability, scaling, storage engines, and ecosystem maturity to help readers decide which solution best fits their workload.

CAP theoremDistributed TransactionsNewSQL
0 likes · 19 min read
NewSQL vs Middleware Sharding: Which Architecture Truly Wins?
Raymond Ops
Raymond Ops
Jul 11, 2025 · Operations

Mastering Keepalived: Complete Guide to High‑Availability Load Balancing

This tutorial explains Keepalived’s VRRP‑based failover, IPVS rule generation, health‑checking, script integration, installation methods, detailed configuration files, notification handling, logging, brain‑split prevention, and VRRP scripting for building robust high‑availability clusters on Linux.

IPVSVRRPhigh availability
0 likes · 26 min read
Mastering Keepalived: Complete Guide to High‑Availability Load Balancing

Demystifying Consistency Models: From Linear to Eventual in Distributed Systems

This article explores the concept of consistency in distributed systems, breaking down various consistency models—including linear, sequential, causal, and eventual—explaining their definitions, practical implications, and how they guide the design of high‑availability architectures and data replication strategies.

ConsistencyDistributed Systemsconsistency models
0 likes · 13 min read
Demystifying Consistency Models: From Linear to Eventual in Distributed Systems
MaGe Linux Operations
MaGe Linux Operations
Jul 6, 2025 · Operations

Master Kafka Production: High‑Availability Cluster Deployment & Ops Best Practices

This comprehensive guide walks operations engineers through designing, deploying, and managing a high‑availability Kafka production cluster, covering automated ZooKeeper and Kafka installation scripts, performance tuning for producers and consumers, monitoring with Prometheus and Grafana, and automated health checks and recovery procedures.

high availabilityproduction deployment
0 likes · 28 min read
Master Kafka Production: High‑Availability Cluster Deployment & Ops Best Practices
Lin is Dream
Lin is Dream
Jul 4, 2025 · Databases

Master Redis High Availability: Complete Guide to Sentinel and Cluster Deployment

This article explains why single‑node Redis can become a single point of failure, compares Redis Sentinel and Redis Cluster deployment options, provides step‑by‑step Docker deployment scripts, details Sentinel’s inner workings, demonstrates failover verification, and shares best‑practice recommendations for production environments.

ClusterDeploymenthigh availability
0 likes · 31 min read
Master Redis High Availability: Complete Guide to Sentinel and Cluster Deployment
Efficient Ops
Efficient Ops
Jun 21, 2025 · Operations

What a Lychee Delivery Tale Teaches About DevOps and Operations

Through a vivid analogy of transporting lychees to ancient Chang’an, the article illustrates how operations teams must negotiate SLAs, automate monitoring, design high‑availability pipelines, document responsibilities, and avoid the endless cycle of blame, offering practical DevOps strategies for managing zero‑budget, zero‑resource projects.

DevOpsOperations ManagementSLA
0 likes · 5 min read
What a Lychee Delivery Tale Teaches About DevOps and Operations
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 10, 2025 · Operations

Mastering Load Balancing: From Single‑Layer to Billion‑Scale Architectures

This article explains the essential role of load balancing in modern distributed systems and walks through single‑layer, double‑layer, and billion‑scale architectures, highlighting their design principles, benefits, trade‑offs, and typical deployment scenarios for high‑availability and high‑performance applications.

LVSNGINXScalability
0 likes · 6 min read
Mastering Load Balancing: From Single‑Layer to Billion‑Scale Architectures
Liangxu Linux
Liangxu Linux
Jun 5, 2025 · Databases

Choosing the Right MySQL HA Solution: MHA, Percona XtraDB Cluster, and Galera

An in‑depth comparison of three popular MySQL high‑availability architectures—MHA, Percona XtraDB Cluster (PXC), and Galera Cluster—covers their principles, architectures, strengths, limitations, deployment scenarios, and best‑practice recommendations to help you select the optimal solution for your production environment.

Database ReplicationGaleraMHA
0 likes · 10 min read
Choosing the Right MySQL HA Solution: MHA, Percona XtraDB Cluster, and Galera
Raymond Ops
Raymond Ops
Jun 4, 2025 · Operations

Mastering SFTP: Complete Planning, Configuration, and High‑Availability Guide

This guide walks you through SFTP server planning, user naming conventions, directory structures, SSH configuration, account creation, permission setup, client usage, log auditing, rotation, connection limits, monitoring, and high‑availability deployment across multiple servers, providing ready‑to‑run commands and scripts.

ACLLinuxSFTP
0 likes · 14 min read
Mastering SFTP: Complete Planning, Configuration, and High‑Availability Guide
Instant Consumer Technology Team
Instant Consumer Technology Team
Jun 4, 2025 · Databases

Achieving High Availability for MySQL & Redis on MaShang Cloud with Distributed Sentinel

This article explains MaShang Cloud's RDS high‑availability design, detailing the distributed sentinel monitoring system, proxy layer, multi‑AZ disaster‑recovery strategies, and real‑world case studies that demonstrate how MySQL and Redis services maintain continuous, consistent access with minimal RTO and RPO.

Database ProxyDistributed SentinelRDS
0 likes · 16 min read
Achieving High Availability for MySQL & Redis on MaShang Cloud with Distributed Sentinel
MaGe Linux Operations
MaGe Linux Operations
Jun 2, 2025 · Operations

How to Deploy a High‑Availability MinIO Distributed Cluster on Rocky 9

This guide walks you through deploying a highly available MinIO distributed object storage cluster on Rocky 9, covering prerequisites, environment preparation, user and directory setup, configuration files, systemd service creation, testing, Nginx load balancing, and verification of cluster health.

Miniodistributed storagehigh availability
0 likes · 20 min read
How to Deploy a High‑Availability MinIO Distributed Cluster on Rocky 9
Amap Tech
Amap Tech
May 27, 2025 · Databases

OceanBase Unitization: Building the Next Generation of Online Map Applications

This paper presents the design, implementation, and experimental evaluation of OceanBase's unitization architecture for large‑scale online map services, demonstrating superior disaster‑recovery, high‑throughput OLTP/OLAP performance, and storage efficiency compared with competing distributed databases.

OceanBaseOnline MapsPerformance Evaluation
0 likes · 24 min read
OceanBase Unitization: Building the Next Generation of Online Map Applications
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
May 25, 2025 · Fundamentals

How Consensus, CAP, and BASE Shape High‑Availability Architecture

This article explains the role of consensus algorithms in achieving high‑availability through redundancy and automatic failover, clarifies distributed consistency, explores the CAP theorem and its C component, and introduces the BASE theory as a practical complement for eventual consistency in modern distributed systems.

BASE theoryCAP theoremConsensus
0 likes · 10 min read
How Consensus, CAP, and BASE Shape High‑Availability Architecture
IT Xianyu
IT Xianyu
May 20, 2025 · Operations

Building a Three‑Server High‑Availability MySQL Cluster with HAProxy on Almalinux

This guide explains why three servers are needed for high availability, walks through hardware and software preparation, network configuration, MySQL master‑slave replication setup, HAProxy load‑balancing, and firewall/SELinux adjustments, providing complete command‑line examples for each step.

AlmaLinuxHAProxyLinux operations
0 likes · 8 min read
Building a Three‑Server High‑Availability MySQL Cluster with HAProxy on Almalinux
Tencent Technical Engineering
Tencent Technical Engineering
May 19, 2025 · Cloud Native

How Tencent’s TGW Delivers 3× Faster Throughput and Near‑Zero Downtime at Scale

The USENIX‑selected paper on Tencent’s TGW cloud gateway reveals how a modular, multi‑layer architecture achieves up to 2.9‑fold throughput gains, seconds‑level elastic scaling, loss‑less hot migration, and sub‑second fault recovery, offering a blueprint for resilient large‑scale cloud networking.

Cloud GatewayState MigrationTencent
0 likes · 16 min read
How Tencent’s TGW Delivers 3× Faster Throughput and Near‑Zero Downtime at Scale
Architect
Architect
Apr 30, 2025 · Databases

Redis Core Architecture, Data Types, Persistence, High Availability, and Performance Optimization

This comprehensive guide explains Redis's core architecture, the underlying implementation of its various data types, persistence mechanisms (RDB and AOF), high‑availability solutions such as replication, Sentinel and Cluster, as well as performance‑monitoring techniques and common optimization strategies.

Data StructuresPersistencehigh availability
0 likes · 48 min read
Redis Core Architecture, Data Types, Persistence, High Availability, and Performance Optimization
Java Captain
Java Captain
Apr 17, 2025 · Databases

Choosing Between Sharding Middleware and NewSQL Distributed Databases: An Objective Comparison

This article objectively compares middleware‑based sharding with NewSQL distributed databases, examining their architectural differences, transaction models, high‑availability mechanisms, scaling, SQL support, storage engines, and maturity to help practitioners decide which approach best fits their workload and operational constraints.

Database ArchitectureDistributed TransactionsNewSQL
0 likes · 17 min read
Choosing Between Sharding Middleware and NewSQL Distributed Databases: An Objective Comparison
Cognitive Technology Team
Cognitive Technology Team
Apr 13, 2025 · Backend Development

Understanding RocketMQ Master‑Slave Architecture and High‑Availability Mechanisms

This article explains how RocketMQ achieves high availability and data reliability through its master‑slave broker design, covering synchronous and asynchronous replication, flush strategies, transaction messaging, automatic failover with Dledger, and read‑write separation for load balancing in distributed systems.

Distributed SystemsMaster‑SlaveRocketMQ
0 likes · 7 min read
Understanding RocketMQ Master‑Slave Architecture and High‑Availability Mechanisms
FunTester
FunTester
Apr 12, 2025 · Operations

How to Design Effective Fault‑Testing Cases for Resilient Distributed Systems

This article explains why fault testing is essential for modern distributed and cloud environments, outlines core goals, design principles, common fault categories, practical implementation strategies such as chaos engineering and gray releases, and shows how to analyze results to continuously improve system reliability.

Distributed Systemschaos engineeringfault testing
0 likes · 18 min read
How to Design Effective Fault‑Testing Cases for Resilient Distributed Systems
Liangxu Linux
Liangxu Linux
Apr 8, 2025 · Databases

How to Build a High‑Availability Redis Cluster Without Centralized Configuration

This guide explains why Redis clustering is needed for capacity, concurrency and failover, describes Redis 3.0's decentralized cluster architecture, provides step‑by‑step commands to configure, launch and combine six nodes into a cluster, demonstrates slot calculations, client usage with Jedis, and outlines fault recovery, pros and cons, and cleanup procedures.

ClusterDevOpsJedis
0 likes · 24 min read
How to Build a High‑Availability Redis Cluster Without Centralized Configuration
Java Backend Full-Stack
Java Backend Full-Stack
Apr 8, 2025 · Backend Development

Interview Question: Designing a Service Registry

The article walks through the need for a service registry in a micro‑service scenario, explains how services register and discover each other, discusses high‑availability deployment, and compares push, pull, and long‑polling mechanisms for dynamic detection of service instances.

Microserviceshigh availabilitylong polling
0 likes · 10 min read
Interview Question: Designing a Service Registry
Ma Wei Says
Ma Wei Says
Apr 8, 2025 · Operations

Mastering High Availability: 4 Failover Patterns Explained

Understanding high‑availability architectures involves mastering replication and fail‑over, balancing RTO and RPO, and choosing among four patterns—Active‑Standby, Active‑Active, Cold Standby, and Hot Standby—each with distinct synchronization, load‑balancing, and cost considerations for reliable system design.

Active-ActiveReplicationactive standby
0 likes · 9 min read
Mastering High Availability: 4 Failover Patterns Explained
Alibaba Cloud Native
Alibaba Cloud Native
Apr 6, 2025 · Cloud Native

How ZEEK’s Cloud‑Native Architecture Boosted App Stability and Agility

This article details ZEEK's cloud‑native transformation, covering the strategic shift to open‑source standards, unified microservice architecture, high‑availability practices, upgraded traffic gateways, visual data analysis, car‑network data collection, and AI‑assisted development, illustrating how these steps enhanced system stability, scalability, and development efficiency.

AICloud NativeMicroservices
0 likes · 22 min read
How ZEEK’s Cloud‑Native Architecture Boosted App Stability and Agility
The Dominant Programmer
The Dominant Programmer
Mar 22, 2025 · Databases

Master Redis Interview Questions: From Basics to Advanced, Ace Your Interview

This article compiles the most frequently asked Redis interview questions, covering fundamentals, data structures, persistence mechanisms, high‑availability features, clustering, performance tuning, and troubleshooting, providing clear explanations and practical guidance to help candidates confidently tackle any Redis interview.

Data StructuresPerformance OptimizationPersistence
0 likes · 8 min read
Master Redis Interview Questions: From Basics to Advanced, Ace Your Interview
Amap Tech
Amap Tech
Mar 21, 2025 · Mobile Development

Gaode Map Terminal Architecture: Achieving Ultra‑Stable, High‑Performance, and Efficient Mobile Mapping

Gaode Map’s new integrated container architecture, combined with on‑demand loading, package slimming, and multi‑system/device/language support, delivers ultra‑stable, high‑availability navigation with second‑level startup, halved binary size and traffic, enabling efficient, cross‑platform mobile mapping for diverse hardware.

Container ArchitectureMobile DevelopmentPerformance Optimization
0 likes · 12 min read
Gaode Map Terminal Architecture: Achieving Ultra‑Stable, High‑Performance, and Efficient Mobile Mapping
Java Architect Essentials
Java Architect Essentials
Mar 14, 2025 · Databases

Comparing NewSQL Databases with Middleware‑Based Sharding: Advantages, Trade‑offs, and Selection Guidance

This article objectively compares NewSQL distributed databases with traditional middleware‑based sharding solutions, examining their architectures, distributed transaction handling, high‑availability, scaling, storage engines, and ecosystem maturity, and provides guidance on selecting the appropriate approach based on consistency, growth, operational capacity, and performance requirements.

Database ArchitectureDistributed TransactionsNewSQL
0 likes · 19 min read
Comparing NewSQL Databases with Middleware‑Based Sharding: Advantages, Trade‑offs, and Selection Guidance
FunTester
FunTester
Mar 14, 2025 · Operations

Fault Testing: Enhancing System Resilience through Controlled Failure Simulations

The article explains how fault testing—by deliberately injecting failures in a controlled environment—helps identify system weaknesses, validates post‑mortem improvements, and drives architectural optimization, thereby increasing high‑availability and resilience of modern internet services.

Operationschaos engineeringfault testing
0 likes · 8 min read
Fault Testing: Enhancing System Resilience through Controlled Failure Simulations
Top Architect
Top Architect
Mar 13, 2025 · Databases

Choosing Between NewSQL Databases and Middleware‑Based Sharding: Advantages, Trade‑offs and Practical Guidance

The article objectively compares NewSQL distributed databases with middleware‑plus‑sharding solutions, covering architectural differences, distributed transaction handling, high‑availability, scaling, SQL support, storage engines, maturity, and provides a decision‑making checklist to help engineers select the most suitable approach for their workloads.

NewSQLScalabilitydistributed databases
0 likes · 23 min read
Choosing Between NewSQL Databases and Middleware‑Based Sharding: Advantages, Trade‑offs and Practical Guidance
MaGe Linux Operations
MaGe Linux Operations
Mar 13, 2025 · Operations

How to Build a Secure High‑Availability Etcd Cluster on Linux

This guide walks through installing etcd, generating TLS certificates with cfssl, configuring static, dynamic, or DNS‑based discovery, setting up systemd service files for three nodes, and verifying cluster health using etcdctl, providing a complete step‑by‑step deployment for a production‑grade, cloud‑native key‑value store.

TLSetcdhigh availability
0 likes · 19 min read
How to Build a Secure High‑Availability Etcd Cluster on Linux
Java Web Project
Java Web Project
Mar 6, 2025 · Databases

NewSQL vs Middleware Sharding: Which Architecture Truly Wins?

This article objectively compares NewSQL databases with middleware‑based sharding, dissecting their core architectures, distributed transaction handling, high‑availability designs, scaling mechanisms, SQL support, storage engines, and maturity to help engineers decide the most suitable solution for their workloads.

CAP theoremDatabase ArchitectureDistributed Transactions
0 likes · 20 min read
NewSQL vs Middleware Sharding: Which Architecture Truly Wins?
Code Ape Tech Column
Code Ape Tech Column
Mar 5, 2025 · Backend Development

Design and Evolution of an Enterprise Unified Push Service

The article describes the evolution from modular push modules to a framework‑based and finally a service‑oriented unified push platform, detailing its architecture, functional and non‑functional requirements, component responsibilities, and deployment considerations for high‑performance, scalable enterprise notification systems.

Microserviceshigh availabilitypush notifications
0 likes · 14 min read
Design and Evolution of an Enterprise Unified Push Service
Architecture Digest
Architecture Digest
Mar 3, 2025 · Databases

NewSQL vs Middleware Sharding: A Comparative Analysis of Distributed Databases

This article objectively compares NewSQL distributed databases with traditional middleware‑based sharding solutions, examining their architectures, distributed transaction support, high availability, scaling, SQL capabilities, and maturity to help readers decide which approach best fits their workload and operational constraints.

NewSQLdistributed databaseshigh availability
0 likes · 18 min read
NewSQL vs Middleware Sharding: A Comparative Analysis of Distributed Databases
Cognitive Technology Team
Cognitive Technology Team
Feb 28, 2025 · Artificial Intelligence

Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework

This article introduces Alibaba's LangEngine, a pure Java AI application framework, detailing its high‑availability gateway architecture, communication protocols, streaming and non‑streaming output, multi‑level metadata caching, asynchronous and serverless designs, and future open‑source roadmap, offering practical guidance for building robust AI services.

AI FrameworkLLMLangEngine
0 likes · 11 min read
Design and High‑Availability Architecture of Alibaba LangEngine AI Application Framework
Sanyou's Java Diary
Sanyou's Java Diary
Feb 20, 2025 · Databases

How Redis Sentinel Ensures Automatic Failover and High Availability

Redis Sentinel provides a robust high‑availability solution by monitoring master‑slave clusters, automatically detecting failures, electing leaders, and performing failover, while using quorum voting, Pub/Sub communication, and configuration provisioning to ensure seamless master promotion and client redirection without manual intervention.

databasefailoverhigh availability
0 likes · 16 min read
How Redis Sentinel Ensures Automatic Failover and High Availability
MaGe Linux Operations
MaGe Linux Operations
Jan 27, 2025 · Operations

Redis Sentinel Deep Dive: High‑Availability Architecture & Automatic Failover

This article explains Redis Sentinel’s role as the official high‑availability solution, detailing its monitoring, notification, automatic failover mechanisms, discovery processes, connection types, down‑state classifications, failover steps, leader election, master selection rules, and data consistency guarantees.

Operationsfailoverhigh availability
0 likes · 18 min read
Redis Sentinel Deep Dive: High‑Availability Architecture & Automatic Failover
Architect
Architect
Jan 26, 2025 · Databases

Optimizing Redis Cluster Slot Migration to Reduce Latency and Improve High Availability

This article analyzes the latency and availability problems of native Redis cluster slot migration, proposes a master‑slave synchronization based redesign that batches slot transfers, reduces ask‑move and topology‑change overhead, and validates the solution with performance tests showing smoother latency and higher reliability.

ClusterDatabase OptimizationSlot Migration
0 likes · 16 min read
Optimizing Redis Cluster Slot Migration to Reduce Latency and Improve High Availability
Architect
Architect
Jan 23, 2025 · Operations

Designing High‑Availability Systems: Architecture, Capacity Planning, and Fault‑Tolerance Guide

This article presents a comprehensive guide to building high‑availability systems, covering availability metrics, fault prevention, detection and recovery, capacity evaluation, layered architecture design, service tiering, resilience mechanisms, and operational best practices for reliable service delivery.

OperationsSystem Architecturecapacity planning
0 likes · 34 min read
Designing High‑Availability Systems: Architecture, Capacity Planning, and Fault‑Tolerance Guide
dbaplus Community
dbaplus Community
Jan 14, 2025 · Backend Development

Mastering High‑Performance, High‑Concurrency, High‑Availability Backend Systems

This article shares a backend engineer's practical methodology for building systems that simultaneously achieve high performance, high concurrency, and high availability, covering performance optimization, scaling strategies, fault‑tolerance techniques, and real‑world case studies from B‑ and C‑side logistics platforms.

DDDSystem Designcaching
0 likes · 27 min read
Mastering High‑Performance, High‑Concurrency, High‑Availability Backend Systems
Raymond Ops
Raymond Ops
Jan 11, 2025 · Operations

How to Build a Highly Available Load Balancer with LVS and Keepalived

This tutorial explains how to design and deploy a high‑availability web cluster using Linux Virtual Server (LVS) and Keepalived, covering terminology, test environment setup, detailed configuration steps, HA testing procedures, and a concise summary of the solution.

LVSLinuxhigh availability
0 likes · 11 min read
How to Build a Highly Available Load Balancer with LVS and Keepalived
IT Architects Alliance
IT Architects Alliance
Jan 9, 2025 · Operations

Load Balancing Strategies for High Availability in Distributed Systems

This article explores the challenges and opportunities of distributed architectures and explains how various static and dynamic load‑balancing strategies, hardware and software balancers, redundancy, health checks, and failover mechanisms together ensure high availability, illustrated with real‑world e‑commerce and live‑streaming case studies and future trends.

OperationsSystem Architecturehigh availability
0 likes · 20 min read
Load Balancing Strategies for High Availability in Distributed Systems
JD Tech
JD Tech
Jan 9, 2025 · Databases

Challenges and Practices of Distributed Data Systems: Master‑Slave Replication, Partitioning, and High‑Availability Strategies

This article examines the core challenges of distributed data systems—including consistency, availability, and partition tolerance—then details master‑slave replication mechanisms for MySQL and Redis, various replication modes and binlog formats, and explores data partitioning, sharding, and hot‑key mitigation techniques for scalable, high‑availability deployments.

Replicationdatabaseshigh availability
0 likes · 23 min read
Challenges and Practices of Distributed Data Systems: Master‑Slave Replication, Partitioning, and High‑Availability Strategies
IT Architects Alliance
IT Architects Alliance
Jan 7, 2025 · Industry Insights

Why Multi-Active Architecture Matters and How to Build It

The article explains why multi‑active (active‑active) architecture is essential for modern enterprises, outlines its evolution from single‑server setups, details core principles like redundancy and data synchronization, compares common deployment patterns, examines industry use cases, and discusses challenges and mitigation strategies.

Data ConsistencyDistributed Systemscloud computing
0 likes · 21 min read
Why Multi-Active Architecture Matters and How to Build It
Tencent Cloud Developer
Tencent Cloud Developer
Jan 7, 2025 · Operations

Designing High‑Availability Systems: Principles, Architecture, and Operations

This comprehensive guide explains how to design, build, and operate high‑availability systems by covering availability metrics, fault‑tolerance strategies, capacity planning, code and data layer architecture, automated testing, monitoring, and clear role responsibilities to ensure services stay reliable and resilient under load.

Cloud NativeSRESystem Design
0 likes · 32 min read
Designing High‑Availability Systems: Principles, Architecture, and Operations
dbaplus Community
dbaplus Community
Jan 1, 2025 · Backend Development

Mastering Multi-Active Data Architecture: Reducing Write Latency and Ensuring High Availability

This article examines the challenges of building multi‑active distributed systems, focusing on the data layer’s role in high availability, write‑latency, sharding, isolation, replication strategies, and routing decisions, and provides concrete architectural patterns and practical guidelines for robust backend design.

Distributed SystemsLatencydata replication
0 likes · 23 min read
Mastering Multi-Active Data Architecture: Reducing Write Latency and Ensuring High Availability
IT Architects Alliance
IT Architects Alliance
Dec 29, 2024 · Operations

Design Principles and Key Technologies for High‑Availability Systems

The article explains why 24/7 high‑availability systems are essential for modern enterprises and details core design principles, layered architecture, and critical technologies such as redundancy, load balancing, caching, elastic scaling, monitoring, and fault‑tolerance to ensure continuous, reliable service.

System Designcloud computinghigh availability
0 likes · 23 min read
Design Principles and Key Technologies for High‑Availability Systems
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 25, 2024 · Cloud Native

Ensuring Stability of Large‑Scale Kubernetes Clusters: Lessons from the OpenAI Incident and Alibaba Cloud Practices

This article analyses the OpenAI large‑scale Kubernetes outage, explains the inherent risks of massive K8s clusters, and presents Alibaba Cloud's architectural enhancements, observability improvements, and best‑practice guidelines to achieve high‑availability and reliable operation of thousands‑node Kubernetes environments.

Cloud NativeKubernetesLarge-Scale Clusters
0 likes · 21 min read
Ensuring Stability of Large‑Scale Kubernetes Clusters: Lessons from the OpenAI Incident and Alibaba Cloud Practices
IT Architects Alliance
IT Architects Alliance
Dec 24, 2024 · Cloud Native

Unlock Scalable, Highly Available IT Architecture: Key Strategies Explained

This article examines the modern challenges of IT architecture and presents proven techniques—microservices, container orchestration, distributed caching, redundancy, load balancing, and automated fault recovery—illustrated with Amazon and Google case studies, while forecasting future AI and cloud‑native trends.

Cloud NativeMicroservicesScalability
0 likes · 10 min read
Unlock Scalable, Highly Available IT Architecture: Key Strategies Explained
Efficient Ops
Efficient Ops
Nov 19, 2024 · Operations

Mastering System Stability: Proven SRE Practices for Reliable, High‑Availability Services

This article explains how system stability depends on architecture and code details, defines SLA and the “nines” metric, outlines Google’s SRE hierarchy, and provides practical governance steps—including development and release processes, high‑availability design, capacity planning, monitoring, incident response, and team culture—to achieve reliable, high‑availability services.

SREcapacity planninghigh availability
0 likes · 34 min read
Mastering System Stability: Proven SRE Practices for Reliable, High‑Availability Services
Bilibili Tech
Bilibili Tech
Nov 19, 2024 · Operations

Building a Lightweight Disaster‑Recovery Drill System at Bilibili: Architecture, Practices, and Lessons

Bilibili’s infrastructure team created a lightweight, multi‑layered disaster‑recovery drill platform—combining an atomic fault library, scenario catalogs, chaos‑experiment orchestration, real‑time observation, and a product‑level interface—backed by standardized governance and CI‑integrated automation, cutting drill preparation from weeks to days and boosting weekly resilience testing across the organization.

disaster recoveryhigh availabilitysite reliability
0 likes · 39 min read
Building a Lightweight Disaster‑Recovery Drill System at Bilibili: Architecture, Practices, and Lessons
Cognitive Technology Team
Cognitive Technology Team
Nov 15, 2024 · Operations

Building Redundancy in Applications to Avoid Single Points of Failure

The article explains how to design resilient applications by identifying critical paths, adding redundant components, using formulas for overall availability, and applying best‑practice recommendations such as multi‑zone/region deployment, load‑balanced VMs, database replication, and thorough testing of failover mechanisms.

cloud architecturehigh availabilityload balancing
0 likes · 6 min read
Building Redundancy in Applications to Avoid Single Points of Failure
58 Tech
58 Tech
Nov 8, 2024 · Operations

Design and Optimization of an App Operation Platform: Ensuring High Availability, Performance, and Scalability

This article details the architecture, challenges, and optimization techniques of an app operation platform, covering its dual-engine design, caching strategies, and high‑availability principles that reduce response time to under 4 ms while supporting massive concurrent traffic.

App OperationsDistributed SystemsPerformance Optimization
0 likes · 7 min read
Design and Optimization of an App Operation Platform: Ensuring High Availability, Performance, and Scalability
Tencent Cloud Middleware
Tencent Cloud Middleware
Oct 22, 2024 · Operations

Scaling Apache Pulsar on Tencent Cloud: Multi‑Network Access, Cluster Migration & HA Tips

This article details Tencent Cloud engineers' technical solutions for large‑scale Apache Pulsar deployments, covering multi‑network access challenges, a routing‑addressing redesign, product deployment models, a four‑step cluster migration process with subscription‑progress compensation, and high‑availability best practices such as rack‑aware and cross‑AZ replica distribution.

Apache PulsarCluster MigrationMessage Queue
0 likes · 11 min read
Scaling Apache Pulsar on Tencent Cloud: Multi‑Network Access, Cluster Migration & HA Tips
Architect
Architect
Oct 17, 2024 · Operations

Designing Multi‑Active Distributed Systems: Key Factors and Replication Strategies

This article analyzes the architectural challenges of building large‑scale distributed systems with multi‑active (cross‑city) capabilities, focusing on data‑layer design, write latency, replication models, sharding techniques, and routing impacts to guide reliable, high‑performance infrastructure decisions.

Distributed Systemsarchitecturedata replication
0 likes · 22 min read
Designing Multi‑Active Distributed Systems: Key Factors and Replication Strategies
ITPUB
ITPUB
Oct 15, 2024 · Databases

Choosing the Right Database High‑Availability Architecture: Lessons from GBase 8s

The article explores the evolution of database high‑availability architectures, compares mainstream solutions like Oracle's HA, RAC and ADG, examines domestic offerings such as GBase 8s with HAC, RHAC and SSC clusters, and provides practical guidance for selecting cost‑effective HA designs to ensure continuous business operations.

EnterpriseGBaseHA Architecture
0 likes · 14 min read
Choosing the Right Database High‑Availability Architecture: Lessons from GBase 8s
Tencent Cloud Developer
Tencent Cloud Developer
Oct 15, 2024 · Industry Insights

Why Write Latency Drives Multi‑Active Distributed Architecture Design

This article analyzes how write latency, write volume, isolation, and data replication strategies influence the design of multi‑active distributed systems, offering practical guidance on sharding, synchronous and asynchronous replication, routing, and architecture selection for high availability and performance across regions.

Distributed Systemsdata replicationhigh availability
0 likes · 23 min read
Why Write Latency Drives Multi‑Active Distributed Architecture Design
IT Services Circle
IT Services Circle
Oct 4, 2024 · Databases

Understanding Redis Split‑Brain: Causes, Data Loss, and Prevention Strategies

This article explains Redis split‑brain behavior, describing its definition, causes such as network failures and Sentinel elections, the resulting data loss during master‑slave switches, and practical prevention measures including quorum configuration, timeout tuning, network monitoring, proxy layers, and the min‑slaves‑to‑write and min‑slaves‑max‑lag settings.

Master‑SlaveSplit-Braindatabase
0 likes · 7 min read
Understanding Redis Split‑Brain: Causes, Data Loss, and Prevention Strategies
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 30, 2024 · Cloud Native

Best Practices for High Availability and Stability in Alibaba Cloud Container Service for Kubernetes (ACK)

This article presents a comprehensive overview of high‑availability design patterns and best‑practice recommendations for Alibaba Cloud Container Service for Kubernetes (ACK), covering common error scenarios, single‑cluster and multi‑cluster architectures, workload resilience, monitoring, and real‑world case studies.

ACKCloud NativeKubernetes
0 likes · 13 min read
Best Practices for High Availability and Stability in Alibaba Cloud Container Service for Kubernetes (ACK)
Open Source Linux
Open Source Linux
Sep 20, 2024 · Databases

Redis Master‑Slave Replication and Sentinel: How They Work and Scale

This article explains Redis master‑slave replication, synchronization steps, handling of network partitions, and how Sentinel provides automatic failover through monitoring, leader election, and notification, offering strategies to reduce master load and ensure high availability.

Master‑SlaveReplicationdatabase
0 likes · 9 min read
Redis Master‑Slave Replication and Sentinel: How They Work and Scale
Bilibili Tech
Bilibili Tech
Sep 10, 2024 · Backend Development

Design and Implementation of a Scalable Reward System for Bilibili Live Platform

The paper presents a scalable, message‑queue‑driven reward system for Bilibili Live that unifies diverse reward types and distribution scenarios through standardized APIs, layered fast/slow queues, idempotent processing, multi‑stage retries, and comprehensive monitoring to ensure low latency, no over‑issuance, and reliable delivery.

BilibiliIdempotencyMessage Queue
0 likes · 16 min read
Design and Implementation of a Scalable Reward System for Bilibili Live Platform
dbaplus Community
dbaplus Community
Sep 7, 2024 · Operations

What Hidden Costs Do You Face When Chasing 5‑Nines Availability?

Achieving five‑nine (99.999%) uptime demands massive capital, operational, and human investments, and this article breaks down the infrastructure, monitoring, testing, staffing expenses and explains why the marginal benefits sharply diminish as availability targets rise.

availability engineeringfive nineshigh availability
0 likes · 8 min read
What Hidden Costs Do You Face When Chasing 5‑Nines Availability?
JD Tech Talk
JD Tech Talk
Sep 4, 2024 · Backend Development

Methodology and Practices for Building High‑Performance, High‑Concurrency, High‑Availability Backend Systems

This article shares a backend‑centric methodology and practical experiences for constructing systems that simultaneously achieve high performance, high concurrency, and high availability, covering performance optimization, read/write strategies, scaling techniques, fault‑tolerance mechanisms, and deployment considerations.

BackendMicroservicesSystem Design
0 likes · 24 min read
Methodology and Practices for Building High‑Performance, High‑Concurrency, High‑Availability Backend Systems
JD Tech
JD Tech
Sep 3, 2024 · Backend Development

Designing High‑Performance, High‑Concurrency, High‑Availability Backend Systems: Methodologies and Practices

This article shares a backend engineer’s comprehensive methodology and practical experiences for building systems that simultaneously achieve high performance, high concurrency, and high availability, covering performance optimization, caching strategies, scaling techniques, fault tolerance, and operational best practices across application, storage, and deployment layers.

ScalabilitySystem Designhigh availability
0 likes · 28 min read
Designing High‑Performance, High‑Concurrency, High‑Availability Backend Systems: Methodologies and Practices
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 2, 2024 · Operations

How ByteDance Scales Disaster Recovery: From Single Data Center to Multi‑Region Active‑Active

This article details ByteDance’s disaster‑recovery evolution—from a single‑room deployment to same‑city multi‑data‑center setups and finally to active‑active multi‑region architectures—explaining the challenges, specific failure scenarios, and the strategic practices used to ensure continuous service during outages.

InfrastructureOperationsdisaster recovery
0 likes · 15 min read
How ByteDance Scales Disaster Recovery: From Single Data Center to Multi‑Region Active‑Active
macrozheng
macrozheng
Aug 23, 2024 · Databases

NewSQL vs Middleware Sharding: Which Architecture Truly Wins?

This article objectively compares NewSQL databases with middleware‑based sharding solutions, examining architecture, distributed transactions, CAP constraints, high availability, scaling, SQL support, storage engines, and maturity to help readers choose the right approach for their workloads.

CAP theoremNewSQLdistributed databases
0 likes · 19 min read
NewSQL vs Middleware Sharding: Which Architecture Truly Wins?
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Aug 8, 2024 · Big Data

How to Migrate HBase and HDFS Clusters Safely Without Downtime

This guide details a step‑by‑step migration plan for HBase and HDFS clusters, covering background, high‑availability architecture, role assignments, expansion and shrinkage of ZooKeeper and JournalNode, NameNode and DataNode migration, rolling restarts, and common upgrade pitfalls.

Big DataCluster MigrationHBase
0 likes · 12 min read
How to Migrate HBase and HDFS Clusters Safely Without Downtime
Liangxu Linux
Liangxu Linux
Aug 3, 2024 · Operations

Build a Highly Available Web Cluster with LVS and Keepalived on CentOS

This guide explains how to create a high‑availability web load‑balancing cluster using Linux Virtual Server (LVS) and Keepalived on CentOS, covering background, terminology, environment setup, detailed configuration steps for master and backup nodes, real‑server preparation, HA testing, and final conclusions.

CentOSIPVSLVS
0 likes · 12 min read
Build a Highly Available Web Cluster with LVS and Keepalived on CentOS
Linux Ops Smart Journey
Linux Ops Smart Journey
Jul 30, 2024 · Cloud Native

Unveiling Kubernetes: Inside the Cosmic Architecture Powering Cloud Native Apps

Amid the digital transformation era, Kubernetes has become essential for modern cloud computing, and this article demystifies its inner workings by detailing its master and node components, service discovery, storage orchestration, networking, high availability, flexible resource management, and thriving ecosystem.

Cloud NativeKubernetesarchitecture
0 likes · 5 min read
Unveiling Kubernetes: Inside the Cosmic Architecture Powering Cloud Native Apps
Liangxu Linux
Liangxu Linux
Jul 29, 2024 · Databases

How to Build a Reliable MySQL Master‑Master Cluster with Keepalived Failover

This guide walks through the complete process of creating a MySQL dual‑master replication cluster, configuring replication users, synchronizing binary logs, setting up keepalived for virtual IP failover, and testing both data consistency and high‑availability monitoring.

Master-Master Replicationdatabase clusteringhigh availability
0 likes · 8 min read
How to Build a Reliable MySQL Master‑Master Cluster with Keepalived Failover
Efficient Ops
Efficient Ops
Jul 28, 2024 · Operations

Building a Resilient, High‑Performance Website: Domains, CDN, Security & Ops

This guide outlines a comprehensive, step‑by‑step strategy for creating a highly available, secure, and scalable website—from buying and protecting multiple domains, configuring DNS and CDN, setting up image and database servers, to implementing monitoring, redundancy, high‑concurrency testing, and disaster‑recovery plans.

CDNhigh availabilitymonitoring
0 likes · 13 min read
Building a Resilient, High‑Performance Website: Domains, CDN, Security & Ops
Tencent Cloud Developer
Tencent Cloud Developer
Jul 25, 2024 · Databases

Redis: Features, Use Cases, Evolution, Architecture, Data Types, Commands, and Tencent Cloud Redis

Redis is a high‑performance, in‑memory NoSQL key‑value store offering persistence, rich data types, advanced structures, and robust commands, supporting caching, session storage, pub/sub, and leaderboards, while evolving through replication, Sentinel, clustering, and multithreaded proxies, with Tencent Cloud providing scalable, highly available managed Redis services.

Cloud ServicesData StructuresIn-Memory Database
0 likes · 9 min read
Redis: Features, Use Cases, Evolution, Architecture, Data Types, Commands, and Tencent Cloud Redis