Tagged articles
257 articles
Page 2 of 3
ITPUB
ITPUB
Jan 5, 2022 · Operations

Why Contingency Planning Beats System Optimization: Lessons from Xi'an One‑Code Collapse

The recent collapse of Xi'an’s One‑Code health system highlighted that system failures often stem from blocked pipelines rather than database overload, and the article argues that robust manual contingency plans—such as alternative mini‑programs or simple backup apps—are essential to prevent small glitches from becoming crises.

IT infrastructurecontingency planningdisaster recovery
0 likes · 9 min read
Why Contingency Planning Beats System Optimization: Lessons from Xi'an One‑Code Collapse
Tencent Database Technology
Tencent Database Technology
Dec 31, 2021 · Databases

Practices and Exploration of Disaster Recovery in Cloud‑Native Database TDSQL‑C (formerly CynosDB)

This article examines the architecture differences between traditional MySQL and the cloud‑native TDSQL‑C database, outlines MySQL disaster‑recovery deployment models, and details TDSQL‑C’s multi‑dimensional disaster‑recovery system, including its agent‑scheduler design, cross‑AZ switching challenges, and mitigation strategies.

TDSQL-Ccloud-native databasedisaster recovery
0 likes · 10 min read
Practices and Exploration of Disaster Recovery in Cloud‑Native Database TDSQL‑C (formerly CynosDB)
Architect
Architect
Dec 31, 2021 · Operations

Understanding Distributed System High Availability: From Single‑Node to Multi‑Active Architecture

This article explains the principles, evolution, and implementation details of high‑availability architectures—from basic single‑node setups to multi‑active, cross‑region deployments—covering redundancy, disaster recovery, data synchronization, routing strategies, and the challenges of achieving true geo‑distributed active‑active systems.

Active-ActiveDistributed SystemsSystem Architecture
0 likes · 30 min read
Understanding Distributed System High Availability: From Single‑Node to Multi‑Active Architecture
Tencent Architect
Tencent Architect
Dec 30, 2021 · Databases

Practices and Exploration of Disaster Recovery in Tencent Cloud‑Native Database TDSQL‑C (formerly CynosDB)

This article examines the architecture differences between cloud‑native TDSQL‑C and traditional MySQL, outlines TDSQL‑C’s elastic, serverless, low‑latency features, compares MySQL disaster‑recovery models, and details the multi‑dimensional disaster‑recovery system and its cross‑AZ/Region challenges and solutions.

TDSQL-Ccloud-native databasedisaster recovery
0 likes · 9 min read
Practices and Exploration of Disaster Recovery in Tencent Cloud‑Native Database TDSQL‑C (formerly CynosDB)
Cloud Native Technology Community
Cloud Native Technology Community
Nov 18, 2021 · Cloud Computing

Multi-Cloud Strategy: Concepts, Benefits, Use Cases, Challenges, and Best Practices

This article explains the multi‑cloud concept, how it works, its advantages such as disaster recovery, cost optimization and avoiding vendor lock‑in, the differences from hybrid cloud, common use cases, implementation challenges, and practical best‑practice guidelines for planning and managing a multi‑cloud environment.

cloud computingcloud strategydata sovereignty
0 likes · 15 min read
Multi-Cloud Strategy: Concepts, Benefits, Use Cases, Challenges, and Best Practices
NiuNiu MaTe
NiuNiu MaTe
Nov 17, 2021 · Databases

Mastering MySQL Disaster Recovery: Replication Modes and Strategies

This article explains MySQL disaster‑recovery techniques, covering cold and hot backups, same‑city versus remote setups, master‑slave topologies, async, semi‑sync and full‑sync replication, the MAR strong‑sync approach, and practical recommendations for building resilient two‑city three‑center architectures.

Replicationdatabasedisaster recovery
0 likes · 10 min read
Mastering MySQL Disaster Recovery: Replication Modes and Strategies
High Availability Architecture
High Availability Architecture
Nov 5, 2021 · Cloud Native

Multi-Cloud Active‑Active Architecture: Design, Benefits, and Challenges

The article examines why multi‑cloud active‑active (multi‑active) deployments are essential for high availability, outlines common disaster‑recovery patterns such as primary‑backup and active‑active, details the technical workflow of traffic routing, business and storage layers, and discusses the practical advantages and drawbacks of this approach.

Active-Activearchitecturecloud-native
0 likes · 10 min read
Multi-Cloud Active‑Active Architecture: Design, Benefits, and Challenges
Efficient Ops
Efficient Ops
Oct 28, 2021 · Operations

Why Geo‑Active‑Active Architecture Is the Key to Ultra‑High System Availability

This article explains the principles behind geo‑active‑active (multi‑active) architectures, covering system availability metrics, redundancy strategies from single‑node backups to same‑city and cross‑city active‑active deployments, data‑sync challenges, routing and sharding techniques, and how these designs dramatically improve reliability and scalability.

Distributed SystemsSystem Designdisaster recovery
0 likes · 37 min read
Why Geo‑Active‑Active Architecture Is the Key to Ultra‑High System Availability
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Oct 20, 2021 · Operations

Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies

This article explains the concept of geo-distributed active‑active (multi‑active) systems, covering architectural principles, availability metrics, redundancy techniques such as master‑slave replication, cold and hot disaster recovery, same‑city and cross‑city active‑active setups, data synchronization challenges, and practical routing and sharding methods to achieve high availability and scalability.

Active-ActiveSystem Architecturedisaster recovery
0 likes · 29 min read
Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies
21CTO
21CTO
Sep 19, 2021 · Databases

From Two‑Site Three‑Center to Three‑Site Five‑Center: NetBank’s Database Architecture Evolution

NetBank’s database deployment has evolved from a simple two‑site three‑center disaster‑recovery model to a sophisticated three‑site five‑center architecture, incorporating distributed databases, multi‑tenant isolation, transaction consistency, latency optimization, and containerized deployment to achieve high availability, scalability, and cost efficiency.

Database ArchitecturePerformance Optimizationcontainerization
0 likes · 19 min read
From Two‑Site Three‑Center to Three‑Site Five‑Center: NetBank’s Database Architecture Evolution
ITPUB
ITPUB
Sep 17, 2021 · Databases

How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture

This article details NetBank's evolution of database deployment—from early distributed setups to a unitized, cloud‑native architecture—covering disaster‑recovery upgrades, distributed database design, multi‑tenant strategies, containerized migration, and the performance and operational impacts of moving to a three‑site five‑center model.

containerizationdisaster recoverydistributed databases
0 likes · 20 min read
How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture
Beike Product & Technology
Beike Product & Technology
Sep 17, 2021 · Frontend Development

Flutter for Web: Architecture, Platform Issues, and Disaster‑Recovery Solutions at Beike

This article describes how Beike's Flutter team leveraged Flutter for Web to enable rapid online issue mitigation, detailing the compilation pipeline, platform‑specific challenges such as operating‑system detection and dart:io limitations, and the multi‑module disaster‑recovery architecture they built.

Platform ChannelWebaop
0 likes · 13 min read
Flutter for Web: Architecture, Platform Issues, and Disaster‑Recovery Solutions at Beike
Architects' Tech Alliance
Architects' Tech Alliance
Aug 15, 2021 · Operations

Enterprise Multi‑Data Center Evolution: From Two‑Region Three‑Center to Distributed Active/Active Architecture

The article explains how enterprises are moving from traditional primary‑backup and two‑region three‑center data‑center models toward distributed active/active data‑center architectures to achieve continuous 24/7 operations, higher resource utilization, and fault‑transparent services, while outlining the technical and organizational challenges involved.

Active-ActiveIT Operationsdisaster recovery
0 likes · 10 min read
Enterprise Multi‑Data Center Evolution: From Two‑Region Three‑Center to Distributed Active/Active Architecture
Xianyu Technology
Xianyu Technology
Aug 12, 2021 · Frontend Development

Automatic Front-end Disaster Recovery Solution Overview

The automatic front‑end disaster‑recovery solution packages an npm tool and visual backend that generates on‑demand API fallback data, uses a whitelist and static parameters to target backups, syncs results to developers, and after deployment raised coverage from ~30% to ~70%, automating 80% of backups.

Automationdata backupdisaster recovery
0 likes · 6 min read
Automatic Front-end Disaster Recovery Solution Overview
Qingyun Technology Community
Qingyun Technology Community
Aug 10, 2021 · Cloud Native

New Oriental’s Blueprint for Stateful Services in Kubernetes: Custom Operators & XLSS

This article details New Oriental's approach to building stateful services on Kubernetes, covering the challenges of native storage, the use of custom Operators, the design of the XLSS local storage solution, backup and disaster‑recovery workflows, and a multi‑phase roadmap for large‑scale stateful middleware deployment.

BackupCloud Native StorageCustom Operator
0 likes · 16 min read
New Oriental’s Blueprint for Stateful Services in Kubernetes: Custom Operators & XLSS
Aotu Lab
Aotu Lab
Jul 15, 2021 · Frontend Development

How JD’s Frontend Team Delivered 16 High‑Traffic 618 Event Halls in Record Time

This article details how JD's front‑end team tackled the 2021 618 shopping festival by using Taro3 for cross‑platform H5 and mini‑program development, implementing disaster‑recovery services, intelligent UI personalization, pull‑to‑refresh, and efficient collaboration practices to launch sixteen high‑performance event halls quickly and reliably.

Collaborationcross‑platformdisaster recovery
0 likes · 14 min read
How JD’s Frontend Team Delivered 16 High‑Traffic 618 Event Halls in Record Time
IT Architects Alliance
IT Architects Alliance
Jul 8, 2021 · Operations

Mastering High Availability: From Cold Backup to Multi‑Region Active‑Active

This article analyzes various high‑availability strategies for stateful backend services—covering cold backup, dual‑machine hot standby, same‑city active‑active, remote active‑active, and multi‑region active‑active architectures—detailing their benefits, limitations, and practical implementation considerations.

Active-ActiveSystem Designbackend operations
0 likes · 14 min read
Mastering High Availability: From Cold Backup to Multi‑Region Active‑Active
DataFunTalk
DataFunTalk
Jul 8, 2021 · Big Data

Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture

This article explains how ByteDance extended the Apache HDFS architecture with a multi‑datacenter design, introducing components such as DanceNN, NNProxy, and BookKeeper to achieve scalable storage, cross‑datacenter data placement, and rack‑level disaster recovery for petabyte‑scale workloads.

ByteDanceHDFSbig data storage
0 likes · 13 min read
Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture
IT Architects Alliance
IT Architects Alliance
Jun 28, 2021 · Industry Insights

WeChat Moments' Billion-Visit Architecture: Disaster Recovery & Flexible Scaling

The article analyzes WeChat Moments' massive image and video services, detailing its OC/IDC architecture, holiday traffic challenges, software and hardware safeguards, disaster‑recovery mechanisms, retry policies, and a series of flexible strategies—including compression format changes, bitrate reduction, buffer pools, and timeline throttling—to sustain billions of daily accesses.

Flexible ScalingVideo BitrateWeChat Moments
0 likes · 13 min read
WeChat Moments' Billion-Visit Architecture: Disaster Recovery & Flexible Scaling
New Oriental Technology
New Oriental Technology
Jun 4, 2021 · Cloud Native

Overview of XDF Local Storage Service (xlss) Architecture, Components, and Disaster Recovery Workflow

The article introduces xlss, a high‑performance, highly‑available Kubernetes local storage solution, details its core components, application scenarios, custom scheduler design, backup and recovery processes, and provides code snippets and CRD examples for implementing resilient stateful workloads.

Cloud NativeKubernetesScheduler
0 likes · 14 min read
Overview of XDF Local Storage Service (xlss) Architecture, Components, and Disaster Recovery Workflow
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 22, 2021 · Operations

How Active‑Active Data Centers Boost Resilience and Resource Efficiency

The article explains hot standby, cold standby, and active‑active (dual‑active) data center architectures, compares their advantages and drawbacks, outlines deployment challenges, and highlights the role of cloud computing and automation in achieving high availability and optimal resource utilization.

Active-Activecloud computingdisaster recovery
0 likes · 12 min read
How Active‑Active Data Centers Boost Resilience and Resource Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2021 · Industry Insights

Why Distributed Active/Active Data Centers Are the Future of Enterprise IT

The article examines how enterprises are moving from traditional primary‑backup and two‑site‑three‑center architectures toward distributed active/active data centers, outlining the concepts of distribution and multi‑activity, the technical challenges involved, and the operational benefits of higher availability and resource efficiency.

Active-ActiveIT Operationscloud computing
0 likes · 9 min read
Why Distributed Active/Active Data Centers Are the Future of Enterprise IT
Xianyu Technology
Xianyu Technology
May 13, 2021 · Frontend Development

Front-End Disaster Recovery for Page Stability

To prevent page failures and white‑screen errors, the team built a front‑end SDK that fetches fallback data from OSS + CDN, offers configurable black/white‑list rules, lightweight validation, and a visual backend, cutting error rates from over 8% to 0.55% and dramatically improving interface stability.

CDNOSSSDK
0 likes · 9 min read
Front-End Disaster Recovery for Page Stability
Volcano Engine Developer Services
Volcano Engine Developer Services
May 10, 2021 · Databases

How Distributed Databases Powered Douyin’s Spring Festival Red‑Envelope Event

In a May 15 meetup, ByteDance engineer Ma Haoxiang discussed his background, the culture at ByteDance, recommended resources, and detailed how distributed databases differ from traditional relational databases, highlighting their massive capacity, low cost, high performance, and the specific performance and disaster‑recovery challenges faced during Douyin’s Spring Festival red‑envelope activity.

DouyinScalabilitySpring Festival
0 likes · 7 min read
How Distributed Databases Powered Douyin’s Spring Festival Red‑Envelope Event
Architects' Tech Alliance
Architects' Tech Alliance
May 6, 2021 · Operations

Key Technical Considerations for Dual‑Active Data Center Architecture

The article explains dual‑active data‑center disaster‑recovery architecture, covering SAN vs NAS storage options, distance, network, performance, true active‑active versus active‑passive designs, multipathing considerations, and provides a downloadable comprehensive guide to implementation for practitioners.

Dual-Activedisaster recoverystorage
0 likes · 7 min read
Key Technical Considerations for Dual‑Active Data Center Architecture
Java Interview Crash Guide
Java Interview Crash Guide
Apr 30, 2021 · Operations

How Do Large Internet Companies Achieve Cross‑Region Multi‑Active High Availability?

The article explains why large internet firms adopt cross‑region multi‑active architectures for high availability, compares cold backup, hot standby, same‑city active‑active, and cross‑region active‑active solutions, discusses their trade‑offs, and presents practical design patterns and questions for implementing such systems.

Distributed SystemsOperationsdisaster recovery
0 likes · 15 min read
How Do Large Internet Companies Achieve Cross‑Region Multi‑Active High Availability?
dbaplus Community
dbaplus Community
Apr 22, 2021 · Operations

Achieving True Multi‑Region Active‑Active: Bidirectional Sync Across Three Data Centers

This article explains how to implement a true multi‑region active‑active architecture by enabling bidirectional data synchronization among three or more data centers, covering CAP trade‑offs, distributed ID generation algorithms, center closure strategies, final consistency mechanisms, and a disaster‑recovery design.

CAP theoremDistributed Systemsdata synchronization
0 likes · 16 min read
Achieving True Multi‑Region Active‑Active: Bidirectional Sync Across Three Data Centers
Programmer DD
Programmer DD
Apr 2, 2021 · Operations

Why a Data Center Fire Can Sink Your Startup: Disaster Recovery Lessons

The article uses the OVH data‑center fire as a stark reminder that startups must design robust data disaster‑recovery strategies, explaining why backups, off‑site storage, and proper architectural planning are essential to prevent catastrophic data loss and potential business collapse.

OperationsSystem Architecturedata backup
0 likes · 8 min read
Why a Data Center Fire Can Sink Your Startup: Disaster Recovery Lessons
Programmer DD
Programmer DD
Mar 27, 2021 · Operations

Disaster Recovery vs Backup: Key Differences, Types, and Levels Explained

This article explains what disaster recovery is, how it differs from backup, outlines the various classifications of disaster recovery and backup, and details the six practical differences and four backup levels that organizations should consider to ensure business continuity and data protection.

BackupData ProtectionIT Operations
0 likes · 9 min read
Disaster Recovery vs Backup: Key Differences, Types, and Levels Explained
Architecture Digest
Architecture Digest
Mar 25, 2021 · Big Data

Uber's Multi-Region Kafka Architecture and Disaster Recovery

This article explains how Uber built a multi‑region Kafka infrastructure with disaster‑recovery capabilities, detailing its replication topology, active/active and active/passive consumption modes, offset‑management service, and the challenges of ensuring reliable, low‑latency data streaming across regions.

Data StreamingKafkaOffset Management
0 likes · 9 min read
Uber's Multi-Region Kafka Architecture and Disaster Recovery
21CTO
21CTO
Mar 24, 2021 · Backend Development

Mastering Backend High Availability: From Cold Backups to Multi‑Active Deployments

This article examines stateful backend services and compares various high‑availability strategies—including cold backup, dual‑machine hot standby, same‑city and cross‑city active‑active, and multi‑active architectures—highlighting their benefits, drawbacks, and practical implementation considerations.

Backend Architecturecold backupdisaster recovery
0 likes · 14 min read
Mastering Backend High Availability: From Cold Backups to Multi‑Active Deployments
ITPUB
ITPUB
Mar 22, 2021 · Operations

How to Achieve High Availability for Stateful Backend Services: From Cold Backup to Multi‑Active

This article explains the evolution of high‑availability strategies for stateful backend services, comparing cold backup, dual‑machine hot standby, same‑city active‑active, cross‑city active‑active and multi‑active solutions, and discusses their trade‑offs, implementation details, and practical considerations.

System Designactive standbycold backup
0 likes · 15 min read
How to Achieve High Availability for Stateful Backend Services: From Cold Backup to Multi‑Active
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 14, 2021 · Cloud Computing

Which MySQL Tables Need Cross‑Cloud Sync? A Disaster Recovery Guide

This article explains how to identify which MySQL tables in an Alibaba Cloud RDS environment should be synchronized across clouds and which can be excluded, covering key concepts, design and operational practices, a real‑world failure case, and recommended mitigation and improvement steps for application‑level disaster recovery.

DTSRDSdata synchronization
0 likes · 20 min read
Which MySQL Tables Need Cross‑Cloud Sync? A Disaster Recovery Guide
MaGe Linux Operations
MaGe Linux Operations
Mar 1, 2021 · Backend Development

Mastering High Availability: From Cold Backup to Multi‑Active Architecture

This article examines high‑availability strategies for stateful backend services, covering cold backup, dual‑machine hot standby, same‑city active‑active, and remote multi‑active solutions, while discussing their benefits, trade‑offs, and architectural patterns for resilient distributed systems.

Backend Architectureactive standbycold backup
0 likes · 14 min read
Mastering High Availability: From Cold Backup to Multi‑Active Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Feb 28, 2021 · Cloud Computing

Disaster Recovery Technologies: SDS, Ceph RBD Mirror, Containers, Hyper‑Converged Infrastructure, Cloud & Edge Computing, and Blockchain

This article surveys modern disaster‑recovery techniques, explaining how software‑defined storage, Ceph RBD Mirror, container platforms, hyper‑converged infrastructure, cloud and edge computing, and blockchain can be combined to achieve seamless, fault‑tolerant data protection across on‑premise and cloud environments.

BlockchainCephEdge Computing
0 likes · 14 min read
Disaster Recovery Technologies: SDS, Ceph RBD Mirror, Containers, Hyper‑Converged Infrastructure, Cloud & Edge Computing, and Blockchain
Programmer DD
Programmer DD
Feb 20, 2021 · Big Data

How Uber Built a Multi‑Region Kafka Architecture for Disaster Recovery

Uber operates the world’s largest Kafka cluster, handling trillions of messages daily, and has engineered a multi‑region deployment with active/active and active/passive consumption modes, offset management, and uReplicator to ensure high‑availability and seamless disaster recovery across data centers.

Active-ActiveActive-PassiveKafka
0 likes · 10 min read
How Uber Built a Multi‑Region Kafka Architecture for Disaster Recovery
Architects' Tech Alliance
Architects' Tech Alliance
Feb 17, 2021 · Databases

How Alipay Handles 540k TPS: Inside the LDC Architecture, Unitization and CAP Analysis

This article dissects Alipay's massive Double‑11 payment surge, explaining how its Logical Data Center (LDC) and unit‑based architecture—RZone, GZone, and CZone—scale to hundreds of thousands of transactions per second, manage traffic routing, implement disaster‑recovery, and navigate the CAP theorem using OceanBase and Paxos.

CAP theoremDistributed SystemsLDC architecture
0 likes · 39 min read
How Alipay Handles 540k TPS: Inside the LDC Architecture, Unitization and CAP Analysis
AntTech
AntTech
Feb 5, 2021 · Databases

OceanBase 2020 Review: Record‑Breaking Performance, Independent Operation, Ecosystem Expansion, and Advanced Disaster‑Recovery

In 2020 OceanBase achieved a world‑record TPC‑C benchmark of 7.07 billion tpmC, spun off as an independent company, attracted dozens of marquee customers, built a four‑layer ecosystem, delivered ultra‑high performance for enterprises, and introduced Paxos‑based disaster‑recovery that guarantees RPO = 0 and minute‑level RTO.

EcosystemOceanBasePaxos
0 likes · 13 min read
OceanBase 2020 Review: Record‑Breaking Performance, Independent Operation, Ecosystem Expansion, and Advanced Disaster‑Recovery
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 3, 2021 · Operations

How to Build True Multi‑Region Active‑Active Architecture with Bidirectional Sync

This article explains why true multi‑region active‑active requires data to be bidirectionally synchronized across three or more centers, and details a multi‑center disaster‑recovery architecture, distributed ID generation algorithms, CAP considerations, and techniques for achieving eventual consistency.

Distributed Systemsdata synchronizationdisaster recovery
0 likes · 14 min read
How to Build True Multi‑Region Active‑Active Architecture with Bidirectional Sync
Alibaba Cloud Native
Alibaba Cloud Native
Dec 21, 2020 · Operations

How to Build Multi‑Site High Availability with AHAS‑MSHA: Real‑World E‑Commerce Cases

This article explains the challenges of achieving high availability in unreliable environments, introduces disaster‑tolerance concepts and RPO/RTO metrics, describes Alibaba Cloud's AHAS‑MSHA multi‑site solution and its key features, and walks through two e‑commerce case studies that demonstrate implementation steps, fault‑injection drills, and recovery verification.

AHASMSHAMulti‑Site
0 likes · 14 min read
How to Build Multi‑Site High Availability with AHAS‑MSHA: Real‑World E‑Commerce Cases
FunTester
FunTester
Dec 12, 2020 · Operations

Why Redundancy Is the Key to Effective Disaster Recovery in IT Systems

The article explains that disaster recovery for information systems relies on redundancy across hardware, energy, and data, classifies natural, human, and technical disasters, defines critical metrics such as RTO and RPO, and outlines the technologies, architectures, and maturity levels needed to ensure business continuity.

RPORTObusiness continuity
0 likes · 29 min read
Why Redundancy Is the Key to Effective Disaster Recovery in IT Systems
Architects' Tech Alliance
Architects' Tech Alliance
Nov 30, 2020 · Industry Insights

Cut Storage Costs and Boost Disaster Recovery with Deduplication and Encryption

Data deduplication eliminates redundant data blocks to lower storage and bandwidth costs, while source‑ and transmission‑level encryption safeguards data in transit and at rest; the article also compares hardware vs software deduplication, various storage architectures (DAS, SAN, NAS, object and distributed storage) and their trade‑offs.

BackupNASSAN
0 likes · 15 min read
Cut Storage Costs and Boost Disaster Recovery with Deduplication and Encryption
Programmer DD
Programmer DD
Nov 30, 2020 · Operations

Mastering High Availability: From Cold Backup to Multi‑Active Deployments

This article explains how backend services can be classified as stateless or stateful and explores a range of high‑availability strategies—from simple cold backups and active‑standby setups to same‑city, cross‑city, and multi‑active architectures—highlighting their trade‑offs and implementation considerations.

backend servicesdisaster recoveryhigh availability
0 likes · 14 min read
Mastering High Availability: From Cold Backup to Multi‑Active Deployments
Architects' Tech Alliance
Architects' Tech Alliance
Nov 22, 2020 · Databases

Database High Availability: HADR, HACMP, Data Replication, Storage DR, and DPF Solutions

This article provides a comprehensive overview of database high‑availability techniques—including DB2 HADR, HACMP clustering, SQL and Q replication, storage‑layer disaster recovery, and DPF considerations—explaining their features, suitable scenarios, and how they can be combined to achieve robust end‑to‑end resilience.

DB2HADRReplication
0 likes · 11 min read
Database High Availability: HADR, HACMP, Data Replication, Storage DR, and DPF Solutions
dbaplus Community
dbaplus Community
Nov 19, 2020 · Big Data

How Banks Can Tame Petabytes of Unstructured Data: Architecture and Best Practices

This article presents a comprehensive design and deployment plan for a bank's unstructured data service platform, covering data growth challenges, lifecycle management, three‑tier storage architecture, Elasticsearch indexing, fault‑tolerant disaster recovery, monitoring, and future development directions.

Elasticsearchdisaster recoverystorage architecture
0 likes · 19 min read
How Banks Can Tame Petabytes of Unstructured Data: Architecture and Best Practices
Java Architect Essentials
Java Architect Essentials
Nov 8, 2020 · Operations

What Happens If You Destroy All of Alipay’s Storage Servers? A Deep Dive into Data Center Architecture and Disaster Recovery

The article explores the consequences of destroying Alipay’s storage servers, detailing typical financial data center architectures, backup strategies, power redundancy, fire suppression systems, and the practical challenges of crippling such facilities, while highlighting regulatory and physical security measures.

BackupData centerFire Suppression
0 likes · 8 min read
What Happens If You Destroy All of Alipay’s Storage Servers? A Deep Dive into Data Center Architecture and Disaster Recovery
IT Architects Alliance
IT Architects Alliance
Nov 1, 2020 · Industry Insights

What Are the Five Core Data Replication Techniques for Disaster Recovery?

This article breaks down the five major data replication approaches—application‑level, host‑level, database‑level, storage‑gateway, and storage‑media—detailing their principles, advantages, drawbacks, and typical use cases to help professionals design effective disaster‑recovery solutions.

BackupReplication Techniquesdata replication
0 likes · 12 min read
What Are the Five Core Data Replication Techniques for Disaster Recovery?
Suning Technology
Suning Technology
Oct 30, 2020 · Operations

Designing Suning’s Multi‑Data‑Center Active‑Active Architecture for Scalable E‑Commerce

Suning built a multi‑data‑center active‑active solution that combines primary‑backup, same‑city active‑active, and full multi‑active modes, defines top‑level design goals, values and principles, and implements a comprehensive architecture, routing, high‑availability, hybrid‑cloud and disaster‑recovery strategy to support massive e‑commerce growth.

Multi-Data Centercloud architecturedisaster recovery
0 likes · 22 min read
Designing Suning’s Multi‑Data‑Center Active‑Active Architecture for Scalable E‑Commerce
Aikesheng Open Source Community
Aikesheng Open Source Community
Oct 30, 2020 · Databases

Configuring MySQL MGR with Asynchronous Replication Automatic Failover for Multi‑Site Disaster Recovery

This article explains how MySQL Group Replication (MGR) can provide zero‑RPO high‑availability within a city‑scale data center, why it needs asynchronous replication for WAN‑scale disaster recovery, and walks through a step‑by‑step setup—including code examples—for automatic failover of asynchronous replication channels.

Asynchronous ReplicationMGRdatabase high availability
0 likes · 6 min read
Configuring MySQL MGR with Asynchronous Replication Automatic Failover for Multi‑Site Disaster Recovery
Programmer DD
Programmer DD
Oct 23, 2020 · Databases

When a Mistyped Function Wiped a Production Database – Lessons Learned

A Keepthescore founder accidentally ran a local‑only database reset function on production, causing thousands of scores to vanish, but daily DigitalOcean backups enabled a rapid restore, illustrating the perils of unsafe code and the vital role of reliable backups.

BackupPythondisaster recovery
0 likes · 4 min read
When a Mistyped Function Wiped a Production Database – Lessons Learned
vivo Internet Technology
vivo Internet Technology
Sep 10, 2020 · Operations

Multi-Active High Availability Architecture: Scenarios, Solutions, and Evaluation

Multi‑active high‑availability architectures—ranging from same‑city dual‑active and two‑site three‑center setups to fully remote multi‑active deployments—provide continuous 24/7 service by replicating data across sites, but introduce latency, consistency, routing, and cost complexities that require careful unit‑based design, synchronized storage, and sophisticated traffic management.

System Architecturedata synchronizationdisaster recovery
0 likes · 17 min read
Multi-Active High Availability Architecture: Scenarios, Solutions, and Evaluation
Dada Group Technology
Dada Group Technology
Sep 9, 2020 · Cloud Computing

Dada Dual-Cloud Active-Active Disaster Recovery: Architecture, Practices, and Lessons Learned

This article details Dada's dual‑cloud active‑active disaster‑recovery implementation, explaining high availability versus disaster recovery, describing the first‑phase architecture and challenges, and outlining the second‑phase enhancements such as multi‑data‑center Consul, bidirectional database replication, precise load‑balancing, tool adaptations, capacity elasticity, and future plans.

Active-ActiveConsulDatabase Replication
0 likes · 13 min read
Dada Dual-Cloud Active-Active Disaster Recovery: Architecture, Practices, and Lessons Learned
Efficient Ops
Efficient Ops
Aug 11, 2020 · Operations

How Multi‑Cloud Disaster Recovery Boosts Site Availability: Lessons from Real‑World DR Drills

This article shares a detailed case study of building multi‑cloud site disaster‑recovery and fault‑drill practices at Kaixin Network, covering high‑availability concepts, architectural redesign, pain points, automated one‑click switching, and future self‑healing with chaos engineering to improve reliability.

Operationsdisaster recoveryfault drills
0 likes · 15 min read
How Multi‑Cloud Disaster Recovery Boosts Site Availability: Lessons from Real‑World DR Drills
IT Architects Alliance
IT Architects Alliance
Aug 6, 2020 · Operations

Eight Essential Steps for Successful Disaster Recovery Drills

This guide outlines eight practical steps—including defining scope, forming a planning team, setting clear objectives, designing realistic scenarios, creating evaluation checklists, assigning roles, conducting pre‑drill briefings, and performing post‑drill reviews—to help organizations execute effective, repeatable disaster recovery exercises that strengthen business continuity.

OperationsPlanningbest practices
0 likes · 9 min read
Eight Essential Steps for Successful Disaster Recovery Drills
Programmer DD
Programmer DD
Aug 1, 2020 · Databases

Inside Ant Financial’s LDC Architecture: Scaling Double‑11 Payments with OceanBase and CAP Theory

This article explains how Ant Financial’s logical data center (LDC) and unitized architecture, combined with OceanBase’s Paxos‑based consensus, enable the massive TPS growth for Double‑11 payments while addressing sharding, CAP trade‑offs, traffic diversion, and multi‑site disaster recovery.

Ant FinancialCAP theoremDistributed Systems
0 likes · 37 min read
Inside Ant Financial’s LDC Architecture: Scaling Double‑11 Payments with OceanBase and CAP Theory
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jul 22, 2020 · Operations

Building a Comprehensive High‑Availability System: Disaster Recovery, Capacity Planning, Online Protection, and Fault Drills

This article explains how to construct a truly high‑availability architecture for modern distributed, cloud‑native services by covering disaster‑recovery principles, capacity planning with realistic load testing, online traffic protection, and systematic fault‑drill practices.

Fault Injectioncapacity planningdisaster recovery
0 likes · 13 min read
Building a Comprehensive High‑Availability System: Disaster Recovery, Capacity Planning, Online Protection, and Fault Drills
Top Architect
Top Architect
Jul 14, 2020 · Databases

Understanding Alipay’s LDC Architecture, Unitization, and CAP Analysis

The article explains how Alipay achieves massive payment throughput during Double‑11 by using logical data centers (LDC), unit‑based system design, multi‑active disaster‑recovery, and CAP‑theorem analysis, highlighting the role of OceanBase and PAXOS in ensuring consistency and availability.

CAP theoremDistributed SystemsHigh TPS
0 likes · 37 min read
Understanding Alipay’s LDC Architecture, Unitization, and CAP Analysis
JD Retail Technology
JD Retail Technology
Jun 5, 2020 · Operations

How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills

This article details JD Cloud's comprehensive operational preparation for the 618 shopping festival, covering early resource procurement, hardware fault management, network and CDN scaling, extensive capacity‑testing, disaster‑recovery drills, and cross‑departmental coordination that together ensured stable service during massive traffic spikes.

Infrastructurecapacity planningcloud operations
0 likes · 8 min read
How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills
ITPUB
ITPUB
Jun 1, 2020 · Databases

How Ant Financial Scales Payments: Data Patterns and Disaster‑Recovery Strategies

This article examines how Ant Financial redesigned its payment system’s data layer after retiring mainframes, detailing RTO/RPO goals, CAP trade‑offs, vertical and horizontal sharding, blacklist‑based accounting DR, failover mechanisms for transactions, and the role of OceanBase in achieving strong consistency and low‑latency recovery.

data modelingdatabase shardingdisaster recovery
0 likes · 17 min read
How Ant Financial Scales Payments: Data Patterns and Disaster‑Recovery Strategies
ITPUB
ITPUB
Apr 13, 2020 · Databases

How Ant Financial Scales to Hundreds of Thousands TPS with LDC, Unitization, and CAP Mastery

This article explains how Ant Financial’s LDC (Logical Data Center) architecture, unitized RZone/GZone/CZone design, OceanBase database, and CAP-aware strategies enable the payment platform to handle double‑11 traffic peaks of over 540,000 transactions per second while ensuring high availability, disaster recovery, and eventual consistency.

CAP theoremHigh TPSLDC
0 likes · 37 min read
How Ant Financial Scales to Hundreds of Thousands TPS with LDC, Unitization, and CAP Mastery
21CTO
21CTO
Apr 6, 2020 · Operations

How Alipay Achieved Near‑Zero Downtime with Multi‑Datacenter Failover Architecture

This article explains the evolution of Alipay's high‑availability and disaster‑recovery architecture—from a simple single‑datacenter design to a multi‑datacenter, unit‑based system with failover and blue‑green deployment—highlighting the challenges, solutions, and operational benefits that enable continuous service during massive traffic spikes.

Alipay architectureBlue‑Green deploymentDistributed Systems
0 likes · 17 min read
How Alipay Achieved Near‑Zero Downtime with Multi‑Datacenter Failover Architecture
FunTester
FunTester
Mar 30, 2020 · Operations

How Virtualization Transforms Software Testing: Benefits, Types, and Common Pitfalls

The article explains what virtualization is, outlines its main types, and shows how it enables efficient software testing by consolidating servers, improving disaster recovery, saving time, increasing availability, reducing complexity, and protecting data, while also noting potential driver, memory, and performance issues.

Software TestingVirtualizationdisaster recovery
0 likes · 7 min read
How Virtualization Transforms Software Testing: Benefits, Types, and Common Pitfalls
Top Architect
Top Architect
Mar 17, 2020 · Databases

Understanding Ant Financial’s LDC Architecture: Unitization, CAP Analysis, and High‑TPS Design

This article explains how Ant Financial’s massive Double‑11 payment traffic is handled through logical data centers (LDC), unit‑based architecture (RZone, GZone, CZone), traffic routing, disaster‑recovery strategies, and a CAP analysis that highlights the role of OceanBase’s Paxos‑based consensus in achieving high availability and eventual consistency.

CAP theoremDistributed SystemsOceanBase
0 likes · 36 min read
Understanding Ant Financial’s LDC Architecture: Unitization, CAP Analysis, and High‑TPS Design
Efficient Ops
Efficient Ops
Feb 4, 2020 · Databases

How Multi‑Active Database Architecture Is Redefining Bank Disaster Recovery

In this interview, a senior database expert from Huaxia Bank shares twelve years of experience and explains how moving from traditional replication to multi‑active, real‑time consistent data centers, combined with automation and mobile remote operations, is transforming banking database reliability and security.

Database operationsdisaster recoveryhigh availability
0 likes · 9 min read
How Multi‑Active Database Architecture Is Redefining Bank Disaster Recovery
dbaplus Community
dbaplus Community
Feb 2, 2020 · Databases

JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency

JDHBase, JD.com’s large‑scale KV store, powers billions of daily reads and writes across 7,000 nodes, and this article details its multi‑active, cross‑region architecture—including HBase replication fundamentals, Fox Manager routing, automatic failover policies, dynamic replication tuning, and serial replication to ensure strong consistency.

Database ArchitectureHBaseReplication
0 likes · 15 min read
JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency
Top Architect
Top Architect
Jan 6, 2020 · Backend Development

Alipay’s LDC Architecture: High‑TPS Design, Unitization, and CAP Analysis

The article explains how Alipay’s Logical Data Center (LDC) architecture, with its RZone, GZone, and CZone unitization, combined with OceanBase’s Paxos‑based consensus, enables massive TPS growth, traffic diversion, and disaster‑recovery while navigating the CAP theorem constraints.

CAP theoremDistributed SystemsHigh TPS
0 likes · 35 min read
Alipay’s LDC Architecture: High‑TPS Design, Unitization, and CAP Analysis
Tencent Cloud Developer
Tencent Cloud Developer
Dec 11, 2019 · Frontend Development

Comprehensive Practice of WeChat Mini Program Performance Monitoring System

The article describes a full‑stack performance monitoring system for WeChat Mini Programs presented by Niu Tifa, covering Mini Program architecture fundamentals, a monitoring system architecture using a JS SDK, Druid, Elasticsearch, and practical applications like load timing, error handling, fallback strategies, with dashboards and alerts, emphasizing low request volume and non‑intrusive monitoring.

JS SDKPerformance MonitoringSystem Architecture
0 likes · 13 min read
Comprehensive Practice of WeChat Mini Program Performance Monitoring System
Programmer DD
Programmer DD
Dec 8, 2019 · Operations

Can Your Money Survive a Bombed Alipay Server? Inside Data Center Redundancy

The article explores how Alipay’s financial data is protected through multi‑site data centers, hot and cold backups, and disaster‑recovery mechanisms, explaining why destroying a single server—or even multiple facilities—won’t instantly erase users’ funds, and outlining the lengths required to truly cripple the system.

BackupData centerdisaster recovery
0 likes · 10 min read
Can Your Money Survive a Bombed Alipay Server? Inside Data Center Redundancy
MaGe Linux Operations
MaGe Linux Operations
Dec 5, 2019 · Operations

When Alipay Crashed: Lessons on High Availability and Disaster Recovery

On December 5th Alipay experienced a brief outage that sent users into panic, prompting a humorous recount of personal losses, meme images, and a reminder of the critical importance of high‑availability architecture and disaster‑recovery planning for large‑scale financial services.

Alipay outageFinancial ServicesOperations
0 likes · 3 min read
When Alipay Crashed: Lessons on High Availability and Disaster Recovery
System Architect Go
System Architect Go
Nov 17, 2019 · Databases

Handling Single Point Failures and Disaster Recovery in InfluxDB

To mitigate the inherent single‑point‑failure risk of the open‑source InfluxDB community edition, the article proposes deploying multiple InfluxDB instances with concurrent client writes, tracking failed writes, temporarily storing them, and using custom workers to replay data, while addressing timeout, data consistency, and storage considerations.

Data ConsistencyInfluxDBTime Series Database
0 likes · 3 min read
Handling Single Point Failures and Disaster Recovery in InfluxDB
Architecture Digest
Architecture Digest
Nov 16, 2019 · Operations

What Happens If Alipay’s Data Centers Are Physically Destroyed? A Deep Dive into Redundancy and Disaster Recovery

The article examines how Alipay’s financial data would survive a physical destruction of its servers by explaining multi‑site data center architectures, hot and cold backups, power redundancy, fire‑suppression systems, and the role of partner banks in data recovery, highlighting the extensive resilience measures in modern financial infrastructures.

AlipayData centerOperations
0 likes · 8 min read
What Happens If Alipay’s Data Centers Are Physically Destroyed? A Deep Dive into Redundancy and Disaster Recovery
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 3, 2019 · Backend Development

RocketMQ Practices and Disaster‑Recovery Architecture at ByteDance

This article summarizes Shen Hui’s presentation on how ByteDance adopted RocketMQ in a massive micro‑service environment, detailing the business background, reasons for choosing RocketMQ, the proxy‑based deployment, encountered challenges, and the multi‑data‑center disaster‑recovery solutions implemented.

Message QueueProxy ArchitectureRocketMQ
0 likes · 13 min read
RocketMQ Practices and Disaster‑Recovery Architecture at ByteDance
dbaplus Community
dbaplus Community
Oct 29, 2019 · Cloud Computing

How to Achieve AWS Cross‑Region Disaster Recovery with CloudEndure

This guide explains CloudEndure’s features and walks through a step‑by‑step example of configuring AWS cross‑region disaster recovery, covering initial project setup, data replication, test and recovery mode switching, and the failback process, while highlighting networking and security considerations.

AWSCloudEndureReplication
0 likes · 16 min read
How to Achieve AWS Cross‑Region Disaster Recovery with CloudEndure
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2019 · Databases

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

This article reviews Alibaba HBase's evolution toward high availability, covering large‑cluster architecture, reliability metrics (MTTF/MTTR), disaster‑recovery strategies such as data replication and traffic switching, performance optimizations for extreme latency requirements, and lessons learned for building resilient distributed database services.

Distributed SystemsHBasePerformance Optimization
0 likes · 20 min read
High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience
Architects' Tech Alliance
Architects' Tech Alliance
Oct 2, 2019 · Operations

Understanding Disaster Tolerance, Fault Tolerance, and Disaster Recovery: Concepts, Differences, and Implementation Strategies

This article explains the definitions of disaster tolerance, fault tolerance, and disaster recovery, compares their purposes, discusses backup versus disaster‑tolerance solutions, outlines key metrics such as RTO and RPO, and presents common architectural and investment considerations for building resilient enterprise systems.

BackupIT OperationsRPO
0 likes · 8 min read
Understanding Disaster Tolerance, Fault Tolerance, and Disaster Recovery: Concepts, Differences, and Implementation Strategies
dbaplus Community
dbaplus Community
Sep 25, 2019 · Databases

Master MySQL Backup & Disaster Recovery: Strategies, Tools, and Automation

Effective MySQL backup and disaster recovery are essential for protecting critical business data; this guide explains backup types, tools like Percona XtraBackup, scheduling, local and remote strategies, incremental processes, preparation and restoration steps, and introduces a platform for automated backup management.

AutomationBackupDatabase Management
0 likes · 26 min read
Master MySQL Backup & Disaster Recovery: Strategies, Tools, and Automation
Architects' Tech Alliance
Architects' Tech Alliance
Jul 31, 2019 · Databases

Overview of Five Common Data Replication Technologies

This article introduces the global data replication market, explains synchronous and asynchronous replication, and details five typical replication techniques—host‑based, application/middleware‑based, database‑based, storage‑gateway‑based, and storage‑media‑based—highlighting their principles, advantages, and trade‑offs for disaster‑recovery planning.

Asynchronous Replicationdata replicationdatabase
0 likes · 11 min read
Overview of Five Common Data Replication Technologies
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 18, 2019 · Operations

Why Designing for Failure Is the Key to Resilient Systems

The article explains how anticipating and engineering for diverse failure scenarios—from hardware faults and software bugs to traffic spikes and external attacks—can dramatically improve system reliability, reduce downtime, and protect business continuity in modern distributed and cloud environments.

disaster recoveryfailure designmonitoring
0 likes · 12 min read
Why Designing for Failure Is the Key to Resilient Systems
Tencent Cloud Developer
Tencent Cloud Developer
May 24, 2019 · Cloud Computing

How Tencent Cloud Elasticsearch Enables Multi‑AZ Disaster Recovery

Tencent Cloud Elasticsearch now supports cross‑availability‑zone deployment, requiring even‑numbered data nodes, dedicated master nodes, and replica settings to ensure continuous service when a zone fails, with detailed steps for quick setup and region limitations explained.

ElasticsearchMulti‑AZTencent Cloud
0 likes · 6 min read
How Tencent Cloud Elasticsearch Enables Multi‑AZ Disaster Recovery
Efficient Ops
Efficient Ops
Apr 1, 2019 · Operations

How Dual‑Mode IT Is Redefining Disaster Recovery in Banking

The article examines how the rise of financial technology and dual‑mode IT is reshaping banking disaster‑recovery strategies, comparing active‑active and two‑site‑three‑center architectures, and proposing an integrated solution to improve resource utilization, reduce risk, and ensure continuous service.

Bankingbusiness continuitycloud
0 likes · 9 min read
How Dual‑Mode IT Is Redefining Disaster Recovery in Banking
Efficient Ops
Efficient Ops
Mar 23, 2019 · Operations

How to Build a Bank Ops SWAT Team for 5‑Minute Incident Recovery

This article explains how a bank can create a specialized Operations SWAT team, define its role, adopt seven essential “weapons” such as layered monitoring, intelligent alerts, communication protocols, automation, and disaster‑recovery tactics, and continuously train the team to meet strict five‑minute recovery targets.

AutomationSWAT teambank operations
0 likes · 21 min read
How to Build a Bank Ops SWAT Team for 5‑Minute Incident Recovery
Efficient Ops
Efficient Ops
Mar 18, 2019 · Operations

How to Build a Bank Ops SWAT Team for Rapid Incident Recovery

This article explains how a bank can create a specialized SWAT‑style operations team, define its roles, adopt seven essential "weapons" such as monitoring and intelligent alerts, and apply ten tactical processes—from communication to automation—to meet strict five‑minute recovery and regulatory requirements.

AutomationSWAT teambank operations
0 likes · 21 min read
How to Build a Bank Ops SWAT Team for Rapid Incident Recovery
Efficient Ops
Efficient Ops
Mar 17, 2019 · Operations

Why Cold-Standby Disaster Recovery Fails and How Active‑Active Architecture Wins

Modern cloud outages reveal that cold‑standby or simple multi‑cloud promises often provide only psychological comfort; achieving true high availability requires active‑active designs with local traffic handling, data partitioning, and low‑latency synchronization, while balancing cost, complexity, and physical distance constraints.

Active-ActiveLatencydata synchronization
0 likes · 10 min read
Why Cold-Standby Disaster Recovery Fails and How Active‑Active Architecture Wins