Tagged articles
153 articles
Page 2 of 2
MaGe Linux Operations
MaGe Linux Operations
Jul 14, 2020 · Operations

How Keepalived Enables High-Availability Load Balancing with VRRP

Keepalived, originally designed for LVS load balancing, provides VRRP-based high‑availability by managing LVS nodes, performing health checks, and offering failover for services like Nginx, HAProxy, and MySQL, while also addressing split‑brain scenarios and non‑preemptive configurations.

OperationsVRRPfailover
0 likes · 10 min read
How Keepalived Enables High-Availability Load Balancing with VRRP
Top Architect
Top Architect
Apr 14, 2020 · Databases

Designing a High‑Availability Redis Service with Sentinel

This article explains how to build a highly available Redis service by analyzing failure scenarios, comparing single‑instance, master‑slave with one or multiple Sentinel processes, and finally presenting a three‑Sentinel architecture that ensures continuous service despite node or network outages.

architecturefailoverhigh availability
0 likes · 11 min read
Designing a High‑Availability Redis Service with Sentinel
21CTO
21CTO
Apr 6, 2020 · Operations

How Alipay Achieved Near‑Zero Downtime with Multi‑Datacenter Failover Architecture

This article explains the evolution of Alipay's high‑availability and disaster‑recovery architecture—from a simple single‑datacenter design to a multi‑datacenter, unit‑based system with failover and blue‑green deployment—highlighting the challenges, solutions, and operational benefits that enable continuous service during massive traffic spikes.

Alipay architectureBlue‑Green deploymentDistributed Systems
0 likes · 17 min read
How Alipay Achieved Near‑Zero Downtime with Multi‑Datacenter Failover Architecture
dbaplus Community
dbaplus Community
Mar 22, 2020 · Backend Development

Designing Multi‑Data‑Center Redis Cache with Strong Consistency and Failover

This article walks through the evolution of a Redis‑based cache layer for multi‑data‑center deployments, addressing consistency, safety, performance, disk‑space, data loops, timestamp versioning, master‑slave failover, and global numeric aggregation, and culminates in a ready‑to‑use middleware solution.

Cache ConsistencyLogical Clockfailover
0 likes · 19 min read
Designing Multi‑Data‑Center Redis Cache with Strong Consistency and Failover
Big Data Technology Architecture
Big Data Technology Architecture
Feb 11, 2020 · Databases

JDHBase Multi‑Active Architecture and Asynchronous Replication Practices

This article describes JDHBase’s large‑scale KV storage architecture, its HBase‑based asynchronous replication mechanism, multi‑active cluster design, client‑side routing via Fox Manager, automatic failover strategies, dynamic replication tuning, and serial replication techniques to ensure data consistency across geographically distributed data centers.

ConsistencyDynamic TuningHBase Replication
0 likes · 12 min read
JDHBase Multi‑Active Architecture and Asynchronous Replication Practices
Java High-Performance Architecture
Java High-Performance Architecture
Dec 17, 2019 · Backend Development

Understanding Kafka Topic Architecture: Partitions, Replication, and Failover

This article explains Kafka's topic architecture, detailing how topics are split into partitions for scalability and parallelism, the role of logs, key-based and round-robin partitioning, replication with leaders, followers, ISR, and how these mechanisms enable fault‑tolerance and high‑performance consumer failover.

BackendKafkaPartition
0 likes · 7 min read
Understanding Kafka Topic Architecture: Partitions, Replication, and Failover
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jul 30, 2019 · Databases

How QiYun Enhances OpenStack Trove for Seamless Master‑Slave Database Deployment

This article explains OpenStack Trove’s role as a Database-as-a-Service platform, outlines its core components, and details QiYun’s custom enhancements—including automated backup, monitoring, and a streamlined single-API master-slave instance creation with isolated networks and VIP-based failover for improved security and availability.

BackupDatabase-as-a-ServiceMaster‑Slave
0 likes · 5 min read
How QiYun Enhances OpenStack Trove for Seamless Master‑Slave Database Deployment
58 Tech
58 Tech
Jul 8, 2019 · Databases

Design and Implementation of WMHA: A Modified MySQL High‑Availability Solution

This article explains the need for high‑availability MySQL services, critiques the original in‑house HA approach, and details how the mature MHA framework was extended into WMHA with added VIP monitoring, enhanced failover procedures, richer notifications, and a reorganized deployment structure to improve reliability and reduce DBA intervention.

Database operationsMHAWMHA
0 likes · 9 min read
Design and Implementation of WMHA: A Modified MySQL High‑Availability Solution
MaGe Linux Operations
MaGe Linux Operations
Mar 8, 2019 · Operations

Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling

This article explains the principles and components of high‑availability (HA) clusters, covering active/standby nodes, resource stickiness and constraints, heartbeat and quorum mechanisms, split‑brain avoidance, failure detection methods, and the minimal setup required for a reliable web‑service HA deployment.

HeartbeatOperationsResource Management
0 likes · 14 min read
Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling
UC Tech Team
UC Tech Team
Oct 23, 2018 · Operations

Understanding Faults and Fault Isolation Strategies in Distributed Systems

The article explains what constitutes a fault, introduces key metrics such as RPO and RTO, and describes various fault isolation principles, patterns, and practical examples—including dependency degradation, failover, dynamic adjustment, fast‑fail, caching, rate limiting, and resource isolation—to improve system reliability.

OperationsRPORTO
0 likes · 14 min read
Understanding Faults and Fault Isolation Strategies in Distributed Systems
ITPUB
ITPUB
Jun 22, 2018 · Databases

How to Build a Highly Available Redis Service with Sentinel and Virtual IP

This article explains how to design and implement a highly available Redis deployment using master‑slave replication, multiple Redis Sentinel instances, and a virtual IP to provide seamless failover while maintaining simple client connectivity, covering failure scenarios, architecture choices, and practical configuration tips.

databasefailoverhigh availability
0 likes · 12 min read
How to Build a Highly Available Redis Service with Sentinel and Virtual IP
21CTO
21CTO
May 9, 2018 · Operations

How Alipay Built Seamless High Availability and Disaster Recovery for Millions of Transactions

This article examines Alipay's evolution from a simple single‑datacenter setup to a multi‑active‑active, unit‑based architecture, detailing the technical challenges of high availability, disaster recovery, failover design, blue‑green deployment, and how these solutions enable continuous service during massive traffic spikes like Double 11.

AlipayBlue‑Green deploymentDistributed Systems
0 likes · 17 min read
How Alipay Built Seamless High Availability and Disaster Recovery for Millions of Transactions
Architecture Digest
Architecture Digest
May 9, 2018 · Operations

High Availability and Disaster Recovery Architecture: The Evolution of Alipay’s System Design

This article examines the importance of high‑availability and disaster‑recovery architectures, tracing Alipay’s evolution from a simple load‑balanced setup through multi‑datacenter, failover, and unit‑based designs that address scalability, data consistency, and continuous service delivery challenges.

Distributed SystemsScalabilitydisaster recovery
0 likes · 16 min read
High Availability and Disaster Recovery Architecture: The Evolution of Alipay’s System Design
ITPUB
ITPUB
Apr 14, 2018 · Databases

Designing a Highly Available Redis Service with Sentinel and Multi‑Sentinel Architecture

This article explains how to define high availability for Redis, enumerates typical failure scenarios, compares four deployment patterns—from a single instance to a three‑sentinel setup—and provides practical steps, diagrams, and tips for achieving reliable Redis service using Sentinel and virtual IP failover.

architecturedatabasefailover
0 likes · 14 min read
Designing a Highly Available Redis Service with Sentinel and Multi‑Sentinel Architecture
Architecture Digest
Architecture Digest
Apr 5, 2018 · Databases

Designing a Highly Available Redis Service Using Sentinel

This article explains how to build a highly available Redis deployment by defining HA requirements, analyzing failure scenarios, and progressively implementing solutions from a single instance to a three‑sentinel architecture with virtual IP failover for seamless client access.

failoverhigh availabilitysentinel
0 likes · 11 min read
Designing a Highly Available Redis Service Using Sentinel
Architecture Digest
Architecture Digest
Mar 29, 2018 · Databases

Designing a High‑Availability Redis Service with Sentinel

This article explains how to build a highly available Redis deployment using Redis Sentinel, compares several architectural options, and details the final three‑sentinel design that tolerates node, process, and network failures while keeping client access simple.

Infrastructurefailoverhigh availability
0 likes · 12 min read
Designing a High‑Availability Redis Service with Sentinel
Architecture Digest
Architecture Digest
Dec 27, 2017 · Backend Development

Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems

This article explores how distributed systems determine node liveness, manage failover and recovery, and implement at‑most‑once, at‑least‑once, and exactly‑once processing guarantees—including opaque transactions and two‑phase commit—using examples from Kafka, Zookeeper, and big‑data pipelines.

Big DataDistributed SystemsExactly-Once
0 likes · 15 min read
Handling Transactions, Failover, and Exactly‑Once Semantics in Distributed Systems
MaGe Linux Operations
MaGe Linux Operations
Dec 21, 2017 · Operations

Mastering High Availability Clusters: Key Concepts, Resource Management, and Failure Handling

This article explains how high‑availability (HA) clusters provide redundancy for directors, RS‑servers, databases and storage, covering active‑passive node roles, resource stickiness, constraints, quorum voting, split‑brain avoidance, failure detection methods, and essential configuration tips.

ClusterOperationsResource Management
0 likes · 12 min read
Mastering High Availability Clusters: Key Concepts, Resource Management, and Failure Handling
JD Retail Technology
JD Retail Technology
Oct 16, 2017 · Databases

Design and Evolution of JD Elastic Database: Architecture, Sharding, and Automatic Failover

This article details the evolution of JD's Elastic Database, describing the challenges of scaling MySQL, the staged solutions including sharding, JProxy, and the final elastic architecture with services like Topology, JED‑Gate, and JED‑Tablet, and explains its query processing, dynamic resharding, and automatic failover mechanisms.

Elastic ArchitectureQuery Processingdatabases
0 likes · 11 min read
Design and Evolution of JD Elastic Database: Architecture, Sharding, and Automatic Failover
Architecture Digest
Architecture Digest
Jun 16, 2017 · Databases

Redis High‑Availability Architecture and Best Practices

This article explains Redis fundamentals, details the Sentinel mechanism, compares several high‑availability deployment patterns—including Sentinel with DNS or VIP, client‑direct connections, Keepalived/Haproxy, Redis Cluster, Twemproxy, and Codis—provides their advantages and drawbacks, and offers practical best‑practice recommendations for reliable production use.

Database Architecturebest practicesfailover
0 likes · 12 min read
Redis High‑Availability Architecture and Best Practices
ITPUB
ITPUB
May 24, 2017 · Databases

How to Build a Redis High‑Availability Cluster with Sentinel and VIP

This guide walks through setting up a Redis high‑availability solution using master‑slave replication, Redis Sentinel for automatic failover, and a floating VIP to provide a stable endpoint, covering environment preparation, configuration files, firewall rules, testing, and client integration.

Linuxfailoverhigh availability
0 likes · 10 min read
How to Build a Redis High‑Availability Cluster with Sentinel and VIP
Architecture Digest
Architecture Digest
May 22, 2017 · Databases

Building a High‑Availability Redis System with Sentinel and VIP

This guide demonstrates how to configure a highly available Redis deployment using master‑slave replication, Redis Sentinel for automatic failover, and virtual IP (VIP) migration, covering environment setup, configuration files, firewall adjustments, testing procedures, and client connection strategies.

databasefailoverredis
0 likes · 11 min read
Building a High‑Availability Redis System with Sentinel and VIP
dbaplus Community
dbaplus Community
Mar 9, 2017 · Databases

Why Redis Redlock May Not Be Safe: A Deep Dive into the Redlock Debate

An in‑depth review of the heated debate between Redis creator antirez and distributed‑systems expert Martin Kleppmann over the safety of Redis’s Redlock algorithm, covering single‑node lock pitfalls, failover issues, timing assumptions, fencing tokens, and practical recommendations for when to use Redlock versus simpler locks.

ConsistencyRedlockdistributed-lock
0 likes · 25 min read
Why Redis Redlock May Not Be Safe: A Deep Dive into the Redlock Debate
MaGe Linux Operations
MaGe Linux Operations
Dec 26, 2016 · Databases

Mastering MySQL High Availability with MHA: Step‑By‑Step Setup Guide

This article introduces MHA (Master High Availability) for MySQL, explains its architecture, outlines required hardware and software configurations, provides detailed commands to set up master and slave nodes, create configuration files, and demonstrates how to start and verify the high‑availability cluster.

Database ReplicationLinuxMHA
0 likes · 8 min read
Mastering MySQL High Availability with MHA: Step‑By‑Step Setup Guide
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Nov 17, 2016 · Operations

Why Large Redis Instances Cause Disasters and How to Prevent Them

This article examines the operational challenges of oversized Redis instances—including slow failover, prolonged slave resynchronization, network‑induced avalanches, and persistence blocking—and offers practical mitigation strategies such as key expiration, data compression, and using high‑performance alternatives like Pika.

Database operationsMemory ManagementPerformance Optimization
0 likes · 9 min read
Why Large Redis Instances Cause Disasters and How to Prevent Them
ITPUB
ITPUB
Oct 28, 2016 · Databases

Step‑by‑Step Oracle Data Guard Switchover and Failover Guide

This article provides a detailed, hands‑on walkthrough of Oracle Data Guard switchover in normal operation and the subsequent steps to convert the original primary to a standby, including all necessary SQL commands, instance restarts, and verification queries.

Data GuardOracleSQL
0 likes · 6 min read
Step‑by‑Step Oracle Data Guard Switchover and Failover Guide
Architecture Digest
Architecture Digest
Aug 5, 2016 · Backend Development

Implementation Principles and Architecture of the Diamond Configuration Management System

The article explains Diamond, a simple, reliable, and easy‑to‑use distributed configuration management system used inside Taobao, detailing its features, persistence and disaster‑recovery mechanisms, overall architecture, client‑side subscription code, and the internal processes that keep configuration data synchronized.

BackendConfiguration ManagementJava
0 likes · 10 min read
Implementation Principles and Architecture of the Diamond Configuration Management System
dbaplus Community
dbaplus Community
Jul 21, 2016 · Databases

How MHA Delivers Fast, Zero‑Data‑Loss MySQL High Availability

This article explains MHA’s architecture, failover workflow, comparison with other MySQL HA solutions, and its six key advantages, showing how it can switch masters within seconds while preserving data consistency without altering MySQL settings or adding many servers.

MHAfailoverhigh availability
0 likes · 9 min read
How MHA Delivers Fast, Zero‑Data‑Loss MySQL High Availability
ITPUB
ITPUB
Jun 25, 2016 · Operations

Why Large Redis Deployments Fail: Failover, Scaling, and Memory Pitfalls

The article examines how oversized Redis instances cause catastrophic failures during primary node crashes, scaling bursts, and network issues, explains the costly re‑synchronization steps, presents real‑world timing data, and offers practical memory‑reduction strategies to keep Redis operations reliable.

failoverredisscaling
0 likes · 8 min read
Why Large Redis Deployments Fail: Failover, Scaling, and Memory Pitfalls
21CTO
21CTO
May 8, 2016 · Databases

Which MySQL High‑Availability Architecture Is Right for You? A Comprehensive Guide

The article reviews common MySQL high‑availability solutions—including shared‑storage SAN, DRBD disk replication, keepalived/heartbeat, MHA, ZooKeeper‑based HA, Galera/PXC clusters, and proxy middleware—detailing their architectures, advantages, limitations, and suitability for different business and operational requirements.

ClusterDatabase ReplicationHA Architecture
0 likes · 19 min read
Which MySQL High‑Availability Architecture Is Right for You? A Comprehensive Guide
ITPUB
ITPUB
Apr 19, 2016 · Databases

Mastering SQL Server Log Shipping: Setup, Jobs, and Troubleshooting

This comprehensive guide explains how SQL Server log shipping works, details the roles of primary, secondary, and monitor servers, walks through each job type, discusses execution intervals and data‑loss implications, and provides step‑by‑step failover and troubleshooting procedures.

Backup JobsLog ShippingSQL Server
0 likes · 26 min read
Mastering SQL Server Log Shipping: Setup, Jobs, and Troubleshooting
21CTO
21CTO
Mar 11, 2016 · Databases

How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies

This article explores practical MySQL high‑availability solutions, covering asynchronous and semi‑synchronous replication, monitoring with keepalived or Zookeeper, failover decision criteria, GTID and pseudo‑GTID techniques, and lessons learned from real‑world deployments.

GTIDHAPseudo GTID
0 likes · 13 min read
How to Build Reliable MySQL HA: Replication, Monitoring, and Failover Strategies
Efficient Ops
Efficient Ops
Mar 2, 2016 · Databases

How DBMP Automates MySQL Management and Cuts DBA Workload

This article explains why the DBMP platform was created to automate MySQL operations, describes its architecture and key features such as host management, instance groups, backup, slow‑query handling, and scheduled tasks, and outlines future optimization directions and common technical Q&A.

Backupdatabase automationfailover
0 likes · 14 min read
How DBMP Automates MySQL Management and Cuts DBA Workload
Architects' Tech Alliance
Architects' Tech Alliance
Sep 8, 2015 · Operations

Advanced Load Balancing and Link Failover for DDBoost

The article explains how to create an application‑level interface group for DDBoost to aggregate multiple Data Domain IP interfaces into a private network group, achieving load balancing, fault‑tolerant data transfer, and notes performance considerations such as avoiding mixed‑capacity links.

DDBoostData Domainfailover
0 likes · 3 min read
Advanced Load Balancing and Link Failover for DDBoost
Architect
Architect
Aug 31, 2015 · Databases

MySQL High Availability: Replication, Monitoring, and Failover Strategies

This article discusses MySQL high‑availability solutions, covering asynchronous and semi‑synchronous replication, monitoring with keepalived, Zookeeper, and custom agents, failover procedures using binlog positions, GTID and pseudo‑GTID techniques, and the author's practical experiences and future plans.

GTIDReplicationdatabase
0 likes · 17 min read
MySQL High Availability: Replication, Monitoring, and Failover Strategies
Baidu Tech Salon
Baidu Tech Salon
Apr 22, 2014 · Operations

Baidu's Optimization of MooseFS and Redis: Architecture Improvements and Performance Enhancement

At Baidu’s 49th Technical Salon, Cheng Yishi explained how the company revamped its MooseFS and Redis systems by adding a Shadow Master to split reads from writes, introducing Slave nodes for failover, and deploying a Redis proxy middleware, thereby dramatically improving performance, scalability, and high‑availability for critical services.

BaiduMooseFSShadow Master
0 likes · 6 min read
Baidu's Optimization of MooseFS and Redis: Architecture Improvements and Performance Enhancement