Tagged articles

257 articles

Page 2 of 3

Jan 5, 2022 · Operations

Why Contingency Planning Beats System Optimization: Lessons from Xi'an One‑Code Collapse

The recent collapse of Xi'an’s One‑Code health system highlighted that system failures often stem from blocked pipelines rather than database overload, and the article argues that robust manual contingency plans—such as alternative mini‑programs or simple backup apps—are essential to prevent small glitches from becoming crises.

IT infrastructurecontingency planningdisaster recovery

0 likes · 9 min read

Why Contingency Planning Beats System Optimization: Lessons from Xi'an One‑Code Collapse

Architects' Tech Alliance

Jan 1, 2022 · Operations

Disaster Recovery (DR) Fundamentals: Definitions, Roles, Metrics, and Implementation

This article provides a comprehensive overview of disaster recovery, covering its definition, the distinction between backup and DR, their respective roles, key metrics such as RPO and RTO, various replication technologies, and practical implementation methods across storage, network, and host layers.

BackupRPORTO

0 likes · 20 min read

Disaster Recovery (DR) Fundamentals: Definitions, Roles, Metrics, and Implementation

Tencent Database Technology

Dec 31, 2021 · Databases

Practices and Exploration of Disaster Recovery in Cloud‑Native Database TDSQL‑C (formerly CynosDB)

This article examines the architecture differences between traditional MySQL and the cloud‑native TDSQL‑C database, outlines MySQL disaster‑recovery deployment models, and details TDSQL‑C’s multi‑dimensional disaster‑recovery system, including its agent‑scheduler design, cross‑AZ switching challenges, and mitigation strategies.

TDSQL-Ccloud-native databasedisaster recovery

0 likes · 10 min read

Practices and Exploration of Disaster Recovery in Cloud‑Native Database TDSQL‑C (formerly CynosDB)

Architect

Dec 31, 2021 · Operations

Understanding Distributed System High Availability: From Single‑Node to Multi‑Active Architecture

This article explains the principles, evolution, and implementation details of high‑availability architectures—from basic single‑node setups to multi‑active, cross‑region deployments—covering redundancy, disaster recovery, data synchronization, routing strategies, and the challenges of achieving true geo‑distributed active‑active systems.

Active-ActiveDistributed SystemsSystem Architecture

0 likes · 30 min read

Understanding Distributed System High Availability: From Single‑Node to Multi‑Active Architecture

Tencent Architect

Dec 30, 2021 · Databases

Practices and Exploration of Disaster Recovery in Tencent Cloud‑Native Database TDSQL‑C (formerly CynosDB)

This article examines the architecture differences between cloud‑native TDSQL‑C and traditional MySQL, outlines TDSQL‑C’s elastic, serverless, low‑latency features, compares MySQL disaster‑recovery models, and details the multi‑dimensional disaster‑recovery system and its cross‑AZ/Region challenges and solutions.

TDSQL-Ccloud-native databasedisaster recovery

0 likes · 9 min read

Practices and Exploration of Disaster Recovery in Tencent Cloud‑Native Database TDSQL‑C (formerly CynosDB)

Cloud Native Technology Community

Nov 18, 2021 · Cloud Computing

Multi-Cloud Strategy: Concepts, Benefits, Use Cases, Challenges, and Best Practices

This article explains the multi‑cloud concept, how it works, its advantages such as disaster recovery, cost optimization and avoiding vendor lock‑in, the differences from hybrid cloud, common use cases, implementation challenges, and practical best‑practice guidelines for planning and managing a multi‑cloud environment.

cloud computingcloud strategydata sovereignty

0 likes · 15 min read

Multi-Cloud Strategy: Concepts, Benefits, Use Cases, Challenges, and Best Practices

NiuNiu MaTe

Nov 17, 2021 · Databases

Mastering MySQL Disaster Recovery: Replication Modes and Strategies

This article explains MySQL disaster‑recovery techniques, covering cold and hot backups, same‑city versus remote setups, master‑slave topologies, async, semi‑sync and full‑sync replication, the MAR strong‑sync approach, and practical recommendations for building resilient two‑city three‑center architectures.

Replicationdatabasedisaster recovery

0 likes · 10 min read

Mastering MySQL Disaster Recovery: Replication Modes and Strategies

High Availability Architecture

Nov 5, 2021 · Cloud Native

Multi-Cloud Active‑Active Architecture: Design, Benefits, and Challenges

The article examines why multi‑cloud active‑active (multi‑active) deployments are essential for high availability, outlines common disaster‑recovery patterns such as primary‑backup and active‑active, details the technical workflow of traffic routing, business and storage layers, and discusses the practical advantages and drawbacks of this approach.

Active-Activearchitecturecloud-native

0 likes · 10 min read

Multi-Cloud Active‑Active Architecture: Design, Benefits, and Challenges

Java High-Performance Architecture

Nov 4, 2021 · Backend Development

How Alipay Handles 540k TPS on Double 11: Inside the LDC Architecture

This article explains how Ant Financial’s Alipay scales to hundreds of thousands of transactions per second during Double 11 by using logical data centers (LDC), unitized architecture, CAP theorem analysis, OceanBase database, and multi‑region disaster‑recovery strategies.

CAP theoremHigh TPSLDC

0 likes · 37 min read

How Alipay Handles 540k TPS on Double 11: Inside the LDC Architecture

Efficient Ops

Oct 28, 2021 · Operations

Why Geo‑Active‑Active Architecture Is the Key to Ultra‑High System Availability

This article explains the principles behind geo‑active‑active (multi‑active) architectures, covering system availability metrics, redundancy strategies from single‑node backups to same‑city and cross‑city active‑active deployments, data‑sync challenges, routing and sharding techniques, and how these designs dramatically improve reliability and scalability.

Distributed SystemsSystem Designdisaster recovery

0 likes · 37 min read

Why Geo‑Active‑Active Architecture Is the Key to Ultra‑High System Availability

Full-Stack Internet Architecture

Oct 20, 2021 · Operations

Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies

This article explains the concept of geo-distributed active‑active (multi‑active) systems, covering architectural principles, availability metrics, redundancy techniques such as master‑slave replication, cold and hot disaster recovery, same‑city and cross‑city active‑active setups, data synchronization challenges, and practical routing and sharding methods to achieve high availability and scalability.

Active-ActiveSystem Architecturedisaster recovery

0 likes · 29 min read

Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies

21CTO

Sep 19, 2021 · Databases

From Two‑Site Three‑Center to Three‑Site Five‑Center: NetBank’s Database Architecture Evolution

NetBank’s database deployment has evolved from a simple two‑site three‑center disaster‑recovery model to a sophisticated three‑site five‑center architecture, incorporating distributed databases, multi‑tenant isolation, transaction consistency, latency optimization, and containerized deployment to achieve high availability, scalability, and cost efficiency.

Database ArchitecturePerformance Optimizationcontainerization

0 likes · 19 min read

From Two‑Site Three‑Center to Three‑Site Five‑Center: NetBank’s Database Architecture Evolution

ITPUB

Sep 17, 2021 · Databases

How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture

This article details NetBank's evolution of database deployment—from early distributed setups to a unitized, cloud‑native architecture—covering disaster‑recovery upgrades, distributed database design, multi‑tenant strategies, containerized migration, and the performance and operational impacts of moving to a three‑site five‑center model.

containerizationdisaster recoverydistributed databases

0 likes · 20 min read

How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture

Beike Product & Technology

Sep 17, 2021 · Frontend Development

Flutter for Web: Architecture, Platform Issues, and Disaster‑Recovery Solutions at Beike

This article describes how Beike's Flutter team leveraged Flutter for Web to enable rapid online issue mitigation, detailing the compilation pipeline, platform‑specific challenges such as operating‑system detection and dart:io limitations, and the multi‑module disaster‑recovery architecture they built.

Platform ChannelWebaop

0 likes · 13 min read

Flutter for Web: Architecture, Platform Issues, and Disaster‑Recovery Solutions at Beike

Architects' Tech Alliance

Aug 26, 2021 · Databases

Mastering Financial-Grade Database Disaster Recovery: Strategies and Techniques

This article provides a comprehensive technical overview of financial‑grade database disaster recovery, covering backup and recovery methods, MySQL replication options, automatic failover architectures, distributed transaction protection, and application‑level stress mitigation techniques.

BackupDistributed TransactionsFinancial

0 likes · 18 min read

Mastering Financial-Grade Database Disaster Recovery: Strategies and Techniques

Architects' Tech Alliance

Aug 18, 2021 · Operations

Understanding Disaster Tolerance, Fault Tolerance, and Disaster Recovery: A Practical Guide

This article explains the concepts of disaster tolerance, fault tolerance, and disaster recovery, compares them with backup strategies, outlines key metrics such as RTO and RPO, and presents common architectures and planning considerations for building resilient enterprise systems.

BackupRPORTO

0 likes · 9 min read

Understanding Disaster Tolerance, Fault Tolerance, and Disaster Recovery: A Practical Guide

Architects' Tech Alliance

Aug 15, 2021 · Operations

Enterprise Multi‑Data Center Evolution: From Two‑Region Three‑Center to Distributed Active/Active Architecture

The article explains how enterprises are moving from traditional primary‑backup and two‑region three‑center data‑center models toward distributed active/active data‑center architectures to achieve continuous 24/7 operations, higher resource utilization, and fault‑transparent services, while outlining the technical and organizational challenges involved.

Active-ActiveIT Operationsdisaster recovery

0 likes · 10 min read

Enterprise Multi‑Data Center Evolution: From Two‑Region Three‑Center to Distributed Active/Active Architecture

Xianyu Technology

Aug 12, 2021 · Frontend Development

Automatic Front-end Disaster Recovery Solution Overview

The automatic front‑end disaster‑recovery solution packages an npm tool and visual backend that generates on‑demand API fallback data, uses a whitelist and static parameters to target backups, syncs results to developers, and after deployment raised coverage from ~30% to ~70%, automating 80% of backups.

Automationdata backupdisaster recovery

0 likes · 6 min read

Automatic Front-end Disaster Recovery Solution Overview

Qingyun Technology Community

Aug 10, 2021 · Cloud Native

New Oriental’s Blueprint for Stateful Services in Kubernetes: Custom Operators & XLSS

This article details New Oriental's approach to building stateful services on Kubernetes, covering the challenges of native storage, the use of custom Operators, the design of the XLSS local storage solution, backup and disaster‑recovery workflows, and a multi‑phase roadmap for large‑scale stateful middleware deployment.

BackupCloud Native StorageCustom Operator

0 likes · 16 min read

New Oriental’s Blueprint for Stateful Services in Kubernetes: Custom Operators & XLSS

Open Source Linux

Jul 26, 2021 · Operations

How Floods Tested Zhengzhou’s Telecom Backbone and What It Reveals About Network Resilience

Severe flooding in Zhengzhou crippled core telecom facilities, prompting emergency repairs, backup HLR deployment, and temporary authentication shutdowns, while highlighting the critical role of network resilience and disaster‑recovery strategies for maintaining communication services during natural disasters.

BackupHLRNetwork Resilience

0 likes · 7 min read

How Floods Tested Zhengzhou’s Telecom Backbone and What It Reveals About Network Resilience

Aotu Lab

Jul 15, 2021 · Frontend Development

How JD’s Frontend Team Delivered 16 High‑Traffic 618 Event Halls in Record Time

This article details how JD's front‑end team tackled the 2021 618 shopping festival by using Taro3 for cross‑platform H5 and mini‑program development, implementing disaster‑recovery services, intelligent UI personalization, pull‑to‑refresh, and efficient collaboration practices to launch sixteen high‑performance event halls quickly and reliably.

Collaborationcross‑platformdisaster recovery

0 likes · 14 min read

How JD’s Frontend Team Delivered 16 High‑Traffic 618 Event Halls in Record Time

IT Architects Alliance

Jul 8, 2021 · Operations

Mastering High Availability: From Cold Backup to Multi‑Region Active‑Active

This article analyzes various high‑availability strategies for stateful backend services—covering cold backup, dual‑machine hot standby, same‑city active‑active, remote active‑active, and multi‑region active‑active architectures—detailing their benefits, limitations, and practical implementation considerations.

Active-ActiveSystem Designbackend operations

0 likes · 14 min read

Mastering High Availability: From Cold Backup to Multi‑Region Active‑Active

DataFunTalk

Jul 8, 2021 · Big Data

Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture

This article explains how ByteDance extended the Apache HDFS architecture with a multi‑datacenter design, introducing components such as DanceNN, NNProxy, and BookKeeper to achieve scalable storage, cross‑datacenter data placement, and rack‑level disaster recovery for petabyte‑scale workloads.

ByteDanceHDFSbig data storage

0 likes · 13 min read

Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture

IT Architects Alliance

Jun 28, 2021 · Industry Insights

WeChat Moments' Billion-Visit Architecture: Disaster Recovery & Flexible Scaling

The article analyzes WeChat Moments' massive image and video services, detailing its OC/IDC architecture, holiday traffic challenges, software and hardware safeguards, disaster‑recovery mechanisms, retry policies, and a series of flexible strategies—including compression format changes, bitrate reduction, buffer pools, and timeline throttling—to sustain billions of daily accesses.

Flexible ScalingVideo BitrateWeChat Moments

0 likes · 13 min read

WeChat Moments' Billion-Visit Architecture: Disaster Recovery & Flexible Scaling

Efficient Ops

Jun 16, 2021 · Databases

Mastering ElasticSearch Data Migration and Disaster Recovery: Practical Strategies

This article presents a comprehensive guide to synchronizing heterogeneous data sources with ElasticSearch, migrating clusters across environments, and implementing robust disaster‑recovery solutions for both intra‑city and inter‑city high‑availability scenarios.

Big DataCluster SyncData Migration

0 likes · 16 min read

Mastering ElasticSearch Data Migration and Disaster Recovery: Practical Strategies

Code Ape Tech Column

Jun 9, 2021 · Operations

Understanding Disaster Recovery vs. Backup: Key Differences and Best Practices

This article explains what disaster recovery is, distinguishes it from backup, outlines classification types, compares their core differences, and details four DR maturity levels with their advantages and drawbacks to help organizations build resilient data protection strategies.

BackupData ProtectionOperations

0 likes · 10 min read

Understanding Disaster Recovery vs. Backup: Key Differences and Best Practices

New Oriental Technology

Jun 4, 2021 · Cloud Native

Overview of XDF Local Storage Service (xlss) Architecture, Components, and Disaster Recovery Workflow

The article introduces xlss, a high‑performance, highly‑available Kubernetes local storage solution, details its core components, application scenarios, custom scheduler design, backup and recovery processes, and provides code snippets and CRD examples for implementing resilient stateful workloads.

Cloud NativeKubernetesScheduler

0 likes · 14 min read

Overview of XDF Local Storage Service (xlss) Architecture, Components, and Disaster Recovery Workflow

ITFLY8 Architecture Home

May 22, 2021 · Operations

How Active‑Active Data Centers Boost Resilience and Resource Efficiency

The article explains hot standby, cold standby, and active‑active (dual‑active) data center architectures, compares their advantages and drawbacks, outlines deployment challenges, and highlights the role of cloud computing and automation in achieving high availability and optimal resource utilization.

Active-Activecloud computingdisaster recovery

0 likes · 12 min read

How Active‑Active Data Centers Boost Resilience and Resource Efficiency

Architects' Tech Alliance

May 14, 2021 · Industry Insights

Why Distributed Active/Active Data Centers Are the Future of Enterprise IT

The article examines how enterprises are moving from traditional primary‑backup and two‑site‑three‑center architectures toward distributed active/active data centers, outlining the concepts of distribution and multi‑activity, the technical challenges involved, and the operational benefits of higher availability and resource efficiency.

Active-ActiveIT Operationscloud computing

0 likes · 9 min read

Why Distributed Active/Active Data Centers Are the Future of Enterprise IT

Xianyu Technology

May 13, 2021 · Frontend Development

Front-End Disaster Recovery for Page Stability

To prevent page failures and white‑screen errors, the team built a front‑end SDK that fetches fallback data from OSS + CDN, offers configurable black/white‑list rules, lightweight validation, and a visual backend, cutting error rates from over 8% to 0.55% and dramatically improving interface stability.

CDNOSSSDK

0 likes · 9 min read

Front-End Disaster Recovery for Page Stability

Volcano Engine Developer Services

May 10, 2021 · Databases

How Distributed Databases Powered Douyin’s Spring Festival Red‑Envelope Event

In a May 15 meetup, ByteDance engineer Ma Haoxiang discussed his background, the culture at ByteDance, recommended resources, and detailed how distributed databases differ from traditional relational databases, highlighting their massive capacity, low cost, high performance, and the specific performance and disaster‑recovery challenges faced during Douyin’s Spring Festival red‑envelope activity.

DouyinScalabilitySpring Festival

0 likes · 7 min read

How Distributed Databases Powered Douyin’s Spring Festival Red‑Envelope Event

Architects' Tech Alliance

May 6, 2021 · Operations

Key Technical Considerations for Dual‑Active Data Center Architecture

The article explains dual‑active data‑center disaster‑recovery architecture, covering SAN vs NAS storage options, distance, network, performance, true active‑active versus active‑passive designs, multipathing considerations, and provides a downloadable comprehensive guide to implementation for practitioners.

Dual-Activedisaster recoverystorage

0 likes · 7 min read

Key Technical Considerations for Dual‑Active Data Center Architecture

Java Interview Crash Guide

Apr 30, 2021 · Operations

How Do Large Internet Companies Achieve Cross‑Region Multi‑Active High Availability?

The article explains why large internet firms adopt cross‑region multi‑active architectures for high availability, compares cold backup, hot standby, same‑city active‑active, and cross‑region active‑active solutions, discusses their trade‑offs, and presents practical design patterns and questions for implementing such systems.

Distributed SystemsOperationsdisaster recovery

0 likes · 15 min read

How Do Large Internet Companies Achieve Cross‑Region Multi‑Active High Availability?

dbaplus Community

Apr 22, 2021 · Operations

Achieving True Multi‑Region Active‑Active: Bidirectional Sync Across Three Data Centers

This article explains how to implement a true multi‑region active‑active architecture by enabling bidirectional data synchronization among three or more data centers, covering CAP trade‑offs, distributed ID generation algorithms, center closure strategies, final consistency mechanisms, and a disaster‑recovery design.

CAP theoremDistributed Systemsdata synchronization

0 likes · 16 min read

Achieving True Multi‑Region Active‑Active: Bidirectional Sync Across Three Data Centers

Architects' Tech Alliance

Apr 18, 2021 · Operations

Active‑Active Dual Data Center Architecture: Requirements and Technical Overview

The article explains the active‑active dual‑data‑center solution as the preferred disaster‑recovery option, detailing its storage choices, distance, network, performance, true versus pseudo active‑active modes, and multipathing requirements, and offers a downloadable technical guide.

Dual-Activedisaster recoveryhigh availability

0 likes · 6 min read

Active‑Active Dual Data Center Architecture: Requirements and Technical Overview

Programmer DD

Apr 2, 2021 · Operations

Why a Data Center Fire Can Sink Your Startup: Disaster Recovery Lessons

The article uses the OVH data‑center fire as a stark reminder that startups must design robust data disaster‑recovery strategies, explaining why backups, off‑site storage, and proper architectural planning are essential to prevent catastrophic data loss and potential business collapse.

OperationsSystem Architecturedata backup

0 likes · 8 min read

Why a Data Center Fire Can Sink Your Startup: Disaster Recovery Lessons

Programmer DD

Mar 27, 2021 · Operations

Disaster Recovery vs Backup: Key Differences, Types, and Levels Explained

This article explains what disaster recovery is, how it differs from backup, outlines the various classifications of disaster recovery and backup, and details the six practical differences and four backup levels that organizations should consider to ensure business continuity and data protection.

BackupData ProtectionIT Operations

0 likes · 9 min read

Disaster Recovery vs Backup: Key Differences, Types, and Levels Explained

Architecture Digest

Mar 25, 2021 · Big Data

Uber's Multi-Region Kafka Architecture and Disaster Recovery

This article explains how Uber built a multi‑region Kafka infrastructure with disaster‑recovery capabilities, detailing its replication topology, active/active and active/passive consumption modes, offset‑management service, and the challenges of ensuring reliable, low‑latency data streaming across regions.

Data StreamingKafkaOffset Management

0 likes · 9 min read

Uber's Multi-Region Kafka Architecture and Disaster Recovery

21CTO

Mar 24, 2021 · Backend Development

Mastering Backend High Availability: From Cold Backups to Multi‑Active Deployments

This article examines stateful backend services and compares various high‑availability strategies—including cold backup, dual‑machine hot standby, same‑city and cross‑city active‑active, and multi‑active architectures—highlighting their benefits, drawbacks, and practical implementation considerations.

Backend Architecturecold backupdisaster recovery

0 likes · 14 min read

Mastering Backend High Availability: From Cold Backups to Multi‑Active Deployments

ITPUB

Mar 22, 2021 · Operations

How to Achieve High Availability for Stateful Backend Services: From Cold Backup to Multi‑Active

This article explains the evolution of high‑availability strategies for stateful backend services, comparing cold backup, dual‑machine hot standby, same‑city active‑active, cross‑city active‑active and multi‑active solutions, and discusses their trade‑offs, implementation details, and practical considerations.

System Designactive standbycold backup

0 likes · 15 min read

How to Achieve High Availability for Stateful Backend Services: From Cold Backup to Multi‑Active

Alibaba Cloud Developer

Mar 14, 2021 · Cloud Computing

Which MySQL Tables Need Cross‑Cloud Sync? A Disaster Recovery Guide

This article explains how to identify which MySQL tables in an Alibaba Cloud RDS environment should be synchronized across clouds and which can be excluded, covering key concepts, design and operational practices, a real‑world failure case, and recommended mitigation and improvement steps for application‑level disaster recovery.

DTSRDSdata synchronization

0 likes · 20 min read

Which MySQL Tables Need Cross‑Cloud Sync? A Disaster Recovery Guide

MaGe Linux Operations

Mar 1, 2021 · Backend Development

Mastering High Availability: From Cold Backup to Multi‑Active Architecture

This article examines high‑availability strategies for stateful backend services, covering cold backup, dual‑machine hot standby, same‑city active‑active, and remote multi‑active solutions, while discussing their benefits, trade‑offs, and architectural patterns for resilient distributed systems.

Backend Architectureactive standbycold backup

0 likes · 14 min read

Mastering High Availability: From Cold Backup to Multi‑Active Architecture

Architects' Tech Alliance

Feb 28, 2021 · Cloud Computing

Disaster Recovery Technologies: SDS, Ceph RBD Mirror, Containers, Hyper‑Converged Infrastructure, Cloud & Edge Computing, and Blockchain

This article surveys modern disaster‑recovery techniques, explaining how software‑defined storage, Ceph RBD Mirror, container platforms, hyper‑converged infrastructure, cloud and edge computing, and blockchain can be combined to achieve seamless, fault‑tolerant data protection across on‑premise and cloud environments.

BlockchainCephEdge Computing

0 likes · 14 min read

Disaster Recovery Technologies: SDS, Ceph RBD Mirror, Containers, Hyper‑Converged Infrastructure, Cloud & Edge Computing, and Blockchain

Programmer DD

Feb 20, 2021 · Big Data

How Uber Built a Multi‑Region Kafka Architecture for Disaster Recovery

Uber operates the world’s largest Kafka cluster, handling trillions of messages daily, and has engineered a multi‑region deployment with active/active and active/passive consumption modes, offset management, and uReplicator to ensure high‑availability and seamless disaster recovery across data centers.

Active-ActiveActive-PassiveKafka

0 likes · 10 min read

How Uber Built a Multi‑Region Kafka Architecture for Disaster Recovery

Architects' Tech Alliance

Feb 17, 2021 · Databases

How Alipay Handles 540k TPS: Inside the LDC Architecture, Unitization and CAP Analysis

This article dissects Alipay's massive Double‑11 payment surge, explaining how its Logical Data Center (LDC) and unit‑based architecture—RZone, GZone, and CZone—scale to hundreds of thousands of transactions per second, manage traffic routing, implement disaster‑recovery, and navigate the CAP theorem using OceanBase and Paxos.

CAP theoremDistributed SystemsLDC architecture

0 likes · 39 min read

How Alipay Handles 540k TPS: Inside the LDC Architecture, Unitization and CAP Analysis

AntTech

Feb 5, 2021 · Databases

OceanBase 2020 Review: Record‑Breaking Performance, Independent Operation, Ecosystem Expansion, and Advanced Disaster‑Recovery

In 2020 OceanBase achieved a world‑record TPC‑C benchmark of 7.07 billion tpmC, spun off as an independent company, attracted dozens of marquee customers, built a four‑layer ecosystem, delivered ultra‑high performance for enterprises, and introduced Paxos‑based disaster‑recovery that guarantees RPO = 0 and minute‑level RTO.

EcosystemOceanBasePaxos

0 likes · 13 min read

OceanBase 2020 Review: Record‑Breaking Performance, Independent Operation, Ecosystem Expansion, and Advanced Disaster‑Recovery

Alibaba Cloud Developer

Feb 3, 2021 · Operations

How to Build True Multi‑Region Active‑Active Architecture with Bidirectional Sync

This article explains why true multi‑region active‑active requires data to be bidirectionally synchronized across three or more centers, and details a multi‑center disaster‑recovery architecture, distributed ID generation algorithms, CAP considerations, and techniques for achieving eventual consistency.

Distributed Systemsdata synchronizationdisaster recovery

0 likes · 14 min read

How to Build True Multi‑Region Active‑Active Architecture with Bidirectional Sync

Alibaba Cloud Native

Dec 21, 2020 · Operations

How to Build Multi‑Site High Availability with AHAS‑MSHA: Real‑World E‑Commerce Cases

This article explains the challenges of achieving high availability in unreliable environments, introduces disaster‑tolerance concepts and RPO/RTO metrics, describes Alibaba Cloud's AHAS‑MSHA multi‑site solution and its key features, and walks through two e‑commerce case studies that demonstrate implementation steps, fault‑injection drills, and recovery verification.

AHASMSHAMulti‑Site

0 likes · 14 min read

How to Build Multi‑Site High Availability with AHAS‑MSHA: Real‑World E‑Commerce Cases

FunTester

Dec 12, 2020 · Operations

Why Redundancy Is the Key to Effective Disaster Recovery in IT Systems

The article explains that disaster recovery for information systems relies on redundancy across hardware, energy, and data, classifies natural, human, and technical disasters, defines critical metrics such as RTO and RPO, and outlines the technologies, architectures, and maturity levels needed to ensure business continuity.

RPORTObusiness continuity

0 likes · 29 min read

Why Redundancy Is the Key to Effective Disaster Recovery in IT Systems

Architects' Tech Alliance

Nov 30, 2020 · Industry Insights

Cut Storage Costs and Boost Disaster Recovery with Deduplication and Encryption

Data deduplication eliminates redundant data blocks to lower storage and bandwidth costs, while source‑ and transmission‑level encryption safeguards data in transit and at rest; the article also compares hardware vs software deduplication, various storage architectures (DAS, SAN, NAS, object and distributed storage) and their trade‑offs.

BackupNASSAN

0 likes · 15 min read

Cut Storage Costs and Boost Disaster Recovery with Deduplication and Encryption

Programmer DD

Nov 30, 2020 · Operations

Mastering High Availability: From Cold Backup to Multi‑Active Deployments

This article explains how backend services can be classified as stateless or stateful and explores a range of high‑availability strategies—from simple cold backups and active‑standby setups to same‑city, cross‑city, and multi‑active architectures—highlighting their trade‑offs and implementation considerations.

backend servicesdisaster recoveryhigh availability

0 likes · 14 min read

Mastering High Availability: From Cold Backup to Multi‑Active Deployments

IT Architects Alliance

Nov 22, 2020 · Operations

How Do Big Internet Companies Achieve Cross‑Region Multi‑Active HA?

This article analyzes the evolution of high‑availability deployment—from cold backup to cross‑region multi‑active—explaining the trade‑offs of each solution, the challenges of stateful services, and real‑world architectures used by companies like Alibaba and Eleme.

architecturecross-regiondisaster recovery

0 likes · 15 min read

How Do Big Internet Companies Achieve Cross‑Region Multi‑Active HA?

Architects' Tech Alliance

Nov 22, 2020 · Databases

Database High Availability: HADR, HACMP, Data Replication, Storage DR, and DPF Solutions

This article provides a comprehensive overview of database high‑availability techniques—including DB2 HADR, HACMP clustering, SQL and Q replication, storage‑layer disaster recovery, and DPF considerations—explaining their features, suitable scenarios, and how they can be combined to achieve robust end‑to‑end resilience.

DB2HADRReplication

0 likes · 11 min read

Database High Availability: HADR, HACMP, Data Replication, Storage DR, and DPF Solutions

dbaplus Community

Nov 19, 2020 · Big Data

How Banks Can Tame Petabytes of Unstructured Data: Architecture and Best Practices

This article presents a comprehensive design and deployment plan for a bank's unstructured data service platform, covering data growth challenges, lifecycle management, three‑tier storage architecture, Elasticsearch indexing, fault‑tolerant disaster recovery, monitoring, and future development directions.

Elasticsearchdisaster recoverystorage architecture

0 likes · 19 min read

How Banks Can Tame Petabytes of Unstructured Data: Architecture and Best Practices

Java Architect Essentials

Nov 8, 2020 · Operations

What Happens If You Destroy All of Alipay’s Storage Servers? A Deep Dive into Data Center Architecture and Disaster Recovery

The article explores the consequences of destroying Alipay’s storage servers, detailing typical financial data center architectures, backup strategies, power redundancy, fire suppression systems, and the practical challenges of crippling such facilities, while highlighting regulatory and physical security measures.

BackupData centerFire Suppression

0 likes · 8 min read

What Happens If You Destroy All of Alipay’s Storage Servers? A Deep Dive into Data Center Architecture and Disaster Recovery

IT Architects Alliance

Nov 1, 2020 · Industry Insights

What Are the Five Core Data Replication Techniques for Disaster Recovery?

This article breaks down the five major data replication approaches—application‑level, host‑level, database‑level, storage‑gateway, and storage‑media—detailing their principles, advantages, drawbacks, and typical use cases to help professionals design effective disaster‑recovery solutions.

BackupReplication Techniquesdata replication

0 likes · 12 min read

What Are the Five Core Data Replication Techniques for Disaster Recovery?

Suning Technology

Oct 30, 2020 · Operations

Designing Suning’s Multi‑Data‑Center Active‑Active Architecture for Scalable E‑Commerce

Suning built a multi‑data‑center active‑active solution that combines primary‑backup, same‑city active‑active, and full multi‑active modes, defines top‑level design goals, values and principles, and implements a comprehensive architecture, routing, high‑availability, hybrid‑cloud and disaster‑recovery strategy to support massive e‑commerce growth.

Multi-Data Centercloud architecturedisaster recovery

0 likes · 22 min read

Designing Suning’s Multi‑Data‑Center Active‑Active Architecture for Scalable E‑Commerce

Aikesheng Open Source Community

Oct 30, 2020 · Databases

Configuring MySQL MGR with Asynchronous Replication Automatic Failover for Multi‑Site Disaster Recovery

This article explains how MySQL Group Replication (MGR) can provide zero‑RPO high‑availability within a city‑scale data center, why it needs asynchronous replication for WAN‑scale disaster recovery, and walks through a step‑by‑step setup—including code examples—for automatic failover of asynchronous replication channels.

Asynchronous ReplicationMGRdatabase high availability

0 likes · 6 min read

Configuring MySQL MGR with Asynchronous Replication Automatic Failover for Multi‑Site Disaster Recovery

Programmer DD

Oct 23, 2020 · Databases

When a Mistyped Function Wiped a Production Database – Lessons Learned

A Keepthescore founder accidentally ran a local‑only database reset function on production, causing thousands of scores to vanish, but daily DigitalOcean backups enabled a rapid restore, illustrating the perils of unsafe code and the vital role of reliable backups.

BackupPythondisaster recovery

0 likes · 4 min read

When a Mistyped Function Wiped a Production Database – Lessons Learned

vivo Internet Technology

Sep 10, 2020 · Operations

Multi-Active High Availability Architecture: Scenarios, Solutions, and Evaluation

Multi‑active high‑availability architectures—ranging from same‑city dual‑active and two‑site three‑center setups to fully remote multi‑active deployments—provide continuous 24/7 service by replicating data across sites, but introduce latency, consistency, routing, and cost complexities that require careful unit‑based design, synchronized storage, and sophisticated traffic management.

System Architecturedata synchronizationdisaster recovery

0 likes · 17 min read

Multi-Active High Availability Architecture: Scenarios, Solutions, and Evaluation

Dada Group Technology

Sep 9, 2020 · Cloud Computing

Dada Dual-Cloud Active-Active Disaster Recovery: Architecture, Practices, and Lessons Learned

This article details Dada's dual‑cloud active‑active disaster‑recovery implementation, explaining high availability versus disaster recovery, describing the first‑phase architecture and challenges, and outlining the second‑phase enhancements such as multi‑data‑center Consul, bidirectional database replication, precise load‑balancing, tool adaptations, capacity elasticity, and future plans.

Active-ActiveConsulDatabase Replication

0 likes · 13 min read

Dada Dual-Cloud Active-Active Disaster Recovery: Architecture, Practices, and Lessons Learned

Efficient Ops

Aug 11, 2020 · Operations

How Multi‑Cloud Disaster Recovery Boosts Site Availability: Lessons from Real‑World DR Drills

This article shares a detailed case study of building multi‑cloud site disaster‑recovery and fault‑drill practices at Kaixin Network, covering high‑availability concepts, architectural redesign, pain points, automated one‑click switching, and future self‑healing with chaos engineering to improve reliability.

Operationsdisaster recoveryfault drills

0 likes · 15 min read

How Multi‑Cloud Disaster Recovery Boosts Site Availability: Lessons from Real‑World DR Drills

IT Architects Alliance

Aug 6, 2020 · Operations

Eight Essential Steps for Successful Disaster Recovery Drills

This guide outlines eight practical steps—including defining scope, forming a planning team, setting clear objectives, designing realistic scenarios, creating evaluation checklists, assigning roles, conducting pre‑drill briefings, and performing post‑drill reviews—to help organizations execute effective, repeatable disaster recovery exercises that strengthen business continuity.

OperationsPlanningbest practices

0 likes · 9 min read

Eight Essential Steps for Successful Disaster Recovery Drills

Programmer DD

Aug 1, 2020 · Databases

Inside Ant Financial’s LDC Architecture: Scaling Double‑11 Payments with OceanBase and CAP Theory

This article explains how Ant Financial’s logical data center (LDC) and unitized architecture, combined with OceanBase’s Paxos‑based consensus, enable the massive TPS growth for Double‑11 payments while addressing sharding, CAP trade‑offs, traffic diversion, and multi‑site disaster recovery.

Ant FinancialCAP theoremDistributed Systems

0 likes · 37 min read

Inside Ant Financial’s LDC Architecture: Scaling Double‑11 Payments with OceanBase and CAP Theory

Full-Stack Internet Architecture

Jul 22, 2020 · Operations

Building a Comprehensive High‑Availability System: Disaster Recovery, Capacity Planning, Online Protection, and Fault Drills

This article explains how to construct a truly high‑availability architecture for modern distributed, cloud‑native services by covering disaster‑recovery principles, capacity planning with realistic load testing, online traffic protection, and systematic fault‑drill practices.

Fault Injectioncapacity planningdisaster recovery

0 likes · 13 min read

Building a Comprehensive High‑Availability System: Disaster Recovery, Capacity Planning, Online Protection, and Fault Drills

Top Architect

Jul 14, 2020 · Databases

Understanding Alipay’s LDC Architecture, Unitization, and CAP Analysis

The article explains how Alipay achieves massive payment throughput during Double‑11 by using logical data centers (LDC), unit‑based system design, multi‑active disaster‑recovery, and CAP‑theorem analysis, highlighting the role of OceanBase and PAXOS in ensuring consistency and availability.

CAP theoremDistributed SystemsHigh TPS

0 likes · 37 min read

Understanding Alipay’s LDC Architecture, Unitization, and CAP Analysis

JD Retail Technology

Jun 16, 2020 · Backend Development

Technical Strategies for Scaling and Optimizing JD.com Advertising Systems During the 618 Promotion

The article details JD.com's advertising division's comprehensive backend engineering efforts—including traffic handling, data pipeline upgrades, memory optimization, and disaster‑recovery designs—to ensure system stability and performance during the high‑traffic 618 sales event.

AdvertisingBackendSystem optimization

0 likes · 9 min read

Technical Strategies for Scaling and Optimizing JD.com Advertising Systems During the 618 Promotion

JD Retail Technology

Jun 5, 2020 · Operations

How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills

This article details JD Cloud's comprehensive operational preparation for the 618 shopping festival, covering early resource procurement, hardware fault management, network and CDN scaling, extensive capacity‑testing, disaster‑recovery drills, and cross‑departmental coordination that together ensured stable service during massive traffic spikes.

Infrastructurecapacity planningcloud operations

0 likes · 8 min read

How JD Cloud Engineered a Seamless 618 Shopping Surge: Ops Strategies & Disaster Drills

ITPUB

Jun 1, 2020 · Databases

How Ant Financial Scales Payments: Data Patterns and Disaster‑Recovery Strategies

This article examines how Ant Financial redesigned its payment system’s data layer after retiring mainframes, detailing RTO/RPO goals, CAP trade‑offs, vertical and horizontal sharding, blacklist‑based accounting DR, failover mechanisms for transactions, and the role of OceanBase in achieving strong consistency and low‑latency recovery.

data modelingdatabase shardingdisaster recovery

0 likes · 17 min read

How Ant Financial Scales Payments: Data Patterns and Disaster‑Recovery Strategies

Architects' Tech Alliance

Apr 27, 2020 · Operations

Mastering Disaster Recovery: A Complete Guide to Business Continuity Planning

This article provides a comprehensive, step‑by‑step methodology for building disaster‑recovery capabilities, covering business continuity planning, risk and impact analysis, design of recovery strategies, implementation phases, testing drills, and ongoing support to ensure uninterrupted business operations.

IT planningOperationsRisk analysis

0 likes · 13 min read

Mastering Disaster Recovery: A Complete Guide to Business Continuity Planning

ITPUB

Apr 13, 2020 · Databases

How Ant Financial Scales to Hundreds of Thousands TPS with LDC, Unitization, and CAP Mastery

This article explains how Ant Financial’s LDC (Logical Data Center) architecture, unitized RZone/GZone/CZone design, OceanBase database, and CAP-aware strategies enable the payment platform to handle double‑11 traffic peaks of over 540,000 transactions per second while ensuring high availability, disaster recovery, and eventual consistency.

CAP theoremHigh TPSLDC

0 likes · 37 min read

How Ant Financial Scales to Hundreds of Thousands TPS with LDC, Unitization, and CAP Mastery

21CTO

Apr 6, 2020 · Operations

How Alipay Achieved Near‑Zero Downtime with Multi‑Datacenter Failover Architecture

This article explains the evolution of Alipay's high‑availability and disaster‑recovery architecture—from a simple single‑datacenter design to a multi‑datacenter, unit‑based system with failover and blue‑green deployment—highlighting the challenges, solutions, and operational benefits that enable continuous service during massive traffic spikes.

Alipay architectureBlue‑Green deploymentDistributed Systems

0 likes · 17 min read

How Alipay Achieved Near‑Zero Downtime with Multi‑Datacenter Failover Architecture

FunTester

Mar 30, 2020 · Operations

How Virtualization Transforms Software Testing: Benefits, Types, and Common Pitfalls

The article explains what virtualization is, outlines its main types, and shows how it enables efficient software testing by consolidating servers, improving disaster recovery, saving time, increasing availability, reducing complexity, and protecting data, while also noting potential driver, memory, and performance issues.

Software TestingVirtualizationdisaster recovery

0 likes · 7 min read

How Virtualization Transforms Software Testing: Benefits, Types, and Common Pitfalls

Top Architect

Mar 17, 2020 · Databases

Understanding Ant Financial’s LDC Architecture: Unitization, CAP Analysis, and High‑TPS Design

This article explains how Ant Financial’s massive Double‑11 payment traffic is handled through logical data centers (LDC), unit‑based architecture (RZone, GZone, CZone), traffic routing, disaster‑recovery strategies, and a CAP analysis that highlights the role of OceanBase’s Paxos‑based consensus in achieving high availability and eventual consistency.

CAP theoremDistributed SystemsOceanBase

0 likes · 36 min read

Understanding Ant Financial’s LDC Architecture: Unitization, CAP Analysis, and High‑TPS Design

Architects Research Society

Mar 5, 2020 · Operations

Comparative Analysis of Commvault and Veritas NetBackup for Business Disaster Recovery

This article examines the key differences between Commvault and Veritas NetBackup, evaluating their features, cloud integration, endpoint support, cost, and suitability for small versus large enterprises to help organizations choose the most appropriate backup and disaster recovery solution.

BackupCommvaultData Protection

0 likes · 9 min read

Comparative Analysis of Commvault and Veritas NetBackup for Business Disaster Recovery

Efficient Ops

Mar 2, 2020 · Databases

Why Did the Weimob Data Deletion Take So Long? A Deep Dive into Database Recovery Challenges

The article analyzes the recent Weimob data‑deletion incident, explaining why recovery is complex, comparing on‑premise, hybrid, and full‑cloud database architectures, and outlining the technical steps and obstacles involved in restoring massive lost data.

Database RecoveryOperationscloud computing

0 likes · 11 min read

Why Did the Weimob Data Deletion Take So Long? A Deep Dive into Database Recovery Challenges

Efficient Ops

Feb 4, 2020 · Databases

How Multi‑Active Database Architecture Is Redefining Bank Disaster Recovery

In this interview, a senior database expert from Huaxia Bank shares twelve years of experience and explains how moving from traditional replication to multi‑active, real‑time consistent data centers, combined with automation and mobile remote operations, is transforming banking database reliability and security.

Database operationsdisaster recoveryhigh availability

0 likes · 9 min read

How Multi‑Active Database Architecture Is Redefining Bank Disaster Recovery

dbaplus Community

Feb 2, 2020 · Databases

JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency

JDHBase, JD.com’s large‑scale KV store, powers billions of daily reads and writes across 7,000 nodes, and this article details its multi‑active, cross‑region architecture—including HBase replication fundamentals, Fox Manager routing, automatic failover policies, dynamic replication tuning, and serial replication to ensure strong consistency.

Database ArchitectureHBaseReplication

0 likes · 15 min read

JDHBase Multi‑Active Disaster Recovery: Replication, Auto‑Failover & Consistency

Top Architect

Jan 6, 2020 · Backend Development

Alipay’s LDC Architecture: High‑TPS Design, Unitization, and CAP Analysis

The article explains how Alipay’s Logical Data Center (LDC) architecture, with its RZone, GZone, and CZone unitization, combined with OceanBase’s Paxos‑based consensus, enables massive TPS growth, traffic diversion, and disaster‑recovery while navigating the CAP theorem constraints.

CAP theoremDistributed SystemsHigh TPS

0 likes · 35 min read

Alipay’s LDC Architecture: High‑TPS Design, Unitization, and CAP Analysis

Tencent Cloud Developer

Dec 11, 2019 · Frontend Development

Comprehensive Practice of WeChat Mini Program Performance Monitoring System

The article describes a full‑stack performance monitoring system for WeChat Mini Programs presented by Niu Tifa, covering Mini Program architecture fundamentals, a monitoring system architecture using a JS SDK, Druid, Elasticsearch, and practical applications like load timing, error handling, fallback strategies, with dashboards and alerts, emphasizing low request volume and non‑intrusive monitoring.

JS SDKPerformance MonitoringSystem Architecture

0 likes · 13 min read

Comprehensive Practice of WeChat Mini Program Performance Monitoring System

Programmer DD

Dec 8, 2019 · Operations

Can Your Money Survive a Bombed Alipay Server? Inside Data Center Redundancy

The article explores how Alipay’s financial data is protected through multi‑site data centers, hot and cold backups, and disaster‑recovery mechanisms, explaining why destroying a single server—or even multiple facilities—won’t instantly erase users’ funds, and outlining the lengths required to truly cripple the system.

BackupData centerdisaster recovery

0 likes · 10 min read

Can Your Money Survive a Bombed Alipay Server? Inside Data Center Redundancy

MaGe Linux Operations

Dec 5, 2019 · Operations

When Alipay Crashed: Lessons on High Availability and Disaster Recovery

On December 5th Alipay experienced a brief outage that sent users into panic, prompting a humorous recount of personal losses, meme images, and a reminder of the critical importance of high‑availability architecture and disaster‑recovery planning for large‑scale financial services.

Alipay outageFinancial ServicesOperations

0 likes · 3 min read

When Alipay Crashed: Lessons on High Availability and Disaster Recovery

dbaplus Community

Dec 2, 2019 · Backend Development

Why ByteDance Chose RocketMQ: Architecture, Proxy Design, and Disaster Recovery

This article explains ByteDance's shift to RocketMQ, detailing the business drivers, technical advantages, proxy layer implementation for microservices, encountered challenges, and the disaster‑recovery strategies adopted to ensure high availability and performance.

ProxyRocketMQdisaster recovery

0 likes · 15 min read

Why ByteDance Chose RocketMQ: Architecture, Proxy Design, and Disaster Recovery

System Architect Go

Nov 17, 2019 · Databases

Handling Single Point Failures and Disaster Recovery in InfluxDB

To mitigate the inherent single‑point‑failure risk of the open‑source InfluxDB community edition, the article proposes deploying multiple InfluxDB instances with concurrent client writes, tracking failed writes, temporarily storing them, and using custom workers to replay data, while addressing timeout, data consistency, and storage considerations.

Data ConsistencyInfluxDBTime Series Database

0 likes · 3 min read

Handling Single Point Failures and Disaster Recovery in InfluxDB

Architecture Digest

Nov 16, 2019 · Operations

What Happens If Alipay’s Data Centers Are Physically Destroyed? A Deep Dive into Redundancy and Disaster Recovery

The article examines how Alipay’s financial data would survive a physical destruction of its servers by explaining multi‑site data center architectures, hot and cold backups, power redundancy, fire‑suppression systems, and the role of partner banks in data recovery, highlighting the extensive resilience measures in modern financial infrastructures.

AlipayData centerOperations

0 likes · 8 min read

What Happens If Alipay’s Data Centers Are Physically Destroyed? A Deep Dive into Redundancy and Disaster Recovery

Python Programming Learning Circle

Nov 10, 2019 · Operations

What Happens If Alipay’s Servers Are Bombed? Inside Data Center Redundancy

The article explains how financial platforms like Alipay protect user funds through multi‑site data centers, hot and cold backups, power redundancy, fire‑suppression systems, and strict location standards, showing why destroying a single server would not erase all stored money.

Data centerOperationsdisaster recovery

0 likes · 9 min read

What Happens If Alipay’s Servers Are Bombed? Inside Data Center Redundancy

Java Backend Technology

Nov 10, 2019 · Information Security

What Happens If Alipay’s Servers Are Destroyed? Inside Data‑Center Resilience

The article explains how Alipay’s financial system uses multi‑site, multi‑center architectures, hot‑standby, active‑active, and cold‑backup strategies, along with stringent A‑class data‑center standards, to ensure that even catastrophic physical attacks cannot erase users' money.

AlipayBackupData center

0 likes · 9 min read

What Happens If Alipay’s Servers Are Destroyed? Inside Data‑Center Resilience

Big Data Technology & Architecture

Nov 3, 2019 · Backend Development

RocketMQ Practices and Disaster‑Recovery Architecture at ByteDance

This article summarizes Shen Hui’s presentation on how ByteDance adopted RocketMQ in a massive micro‑service environment, detailing the business background, reasons for choosing RocketMQ, the proxy‑based deployment, encountered challenges, and the multi‑data‑center disaster‑recovery solutions implemented.

Message QueueProxy ArchitectureRocketMQ

0 likes · 13 min read

RocketMQ Practices and Disaster‑Recovery Architecture at ByteDance

dbaplus Community

Oct 29, 2019 · Cloud Computing

How to Achieve AWS Cross‑Region Disaster Recovery with CloudEndure

This guide explains CloudEndure’s features and walks through a step‑by‑step example of configuring AWS cross‑region disaster recovery, covering initial project setup, data replication, test and recovery mode switching, and the failback process, while highlighting networking and security considerations.

AWSCloudEndureReplication

0 likes · 16 min read

How to Achieve AWS Cross‑Region Disaster Recovery with CloudEndure

Big Data Technology & Architecture

Oct 21, 2019 · Databases

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

This article reviews Alibaba HBase's evolution toward high availability, covering large‑cluster architecture, reliability metrics (MTTF/MTTR), disaster‑recovery strategies such as data replication and traffic switching, performance optimizations for extreme latency requirements, and lessons learned for building resilient distributed database services.

Distributed SystemsHBasePerformance Optimization

0 likes · 20 min read

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

Architects' Tech Alliance

Oct 2, 2019 · Operations

Understanding Disaster Tolerance, Fault Tolerance, and Disaster Recovery: Concepts, Differences, and Implementation Strategies

This article explains the definitions of disaster tolerance, fault tolerance, and disaster recovery, compares their purposes, discusses backup versus disaster‑tolerance solutions, outlines key metrics such as RTO and RPO, and presents common architectural and investment considerations for building resilient enterprise systems.

BackupIT OperationsRPO

0 likes · 8 min read

Understanding Disaster Tolerance, Fault Tolerance, and Disaster Recovery: Concepts, Differences, and Implementation Strategies

dbaplus Community

Sep 25, 2019 · Databases

Master MySQL Backup & Disaster Recovery: Strategies, Tools, and Automation

Effective MySQL backup and disaster recovery are essential for protecting critical business data; this guide explains backup types, tools like Percona XtraBackup, scheduling, local and remote strategies, incremental processes, preparation and restoration steps, and introduces a platform for automated backup management.

AutomationBackupDatabase Management

0 likes · 26 min read

Master MySQL Backup & Disaster Recovery: Strategies, Tools, and Automation

Architects' Tech Alliance

Jul 31, 2019 · Databases

Overview of Five Common Data Replication Technologies

This article introduces the global data replication market, explains synchronous and asynchronous replication, and details five typical replication techniques—host‑based, application/middleware‑based, database‑based, storage‑gateway‑based, and storage‑media‑based—highlighting their principles, advantages, and trade‑offs for disaster‑recovery planning.

Asynchronous Replicationdata replicationdatabase

0 likes · 11 min read

Overview of Five Common Data Replication Technologies

Architects' Tech Alliance

Jul 7, 2019 · Operations

Designing Disaster Recovery Communication Links: Distance, Bandwidth, and Multiplexing Strategies

The article explains how to select and design disaster‑recovery communication links by evaluating distance, bandwidth, transmission media, and multiplexing techniques such as DWDM, FDM, and TDM, while balancing cost, reliability, and application requirements.

DWDMMultiplexingcommunication link

0 likes · 9 min read

Designing Disaster Recovery Communication Links: Distance, Bandwidth, and Multiplexing Strategies

Alibaba Cloud Developer

Jun 18, 2019 · Operations

Why Designing for Failure Is the Key to Resilient Systems

The article explains how anticipating and engineering for diverse failure scenarios—from hardware faults and software bugs to traffic spikes and external attacks—can dramatically improve system reliability, reduce downtime, and protect business continuity in modern distributed and cloud environments.

disaster recoveryfailure designmonitoring

0 likes · 12 min read

Why Designing for Failure Is the Key to Resilient Systems

Tencent Cloud Developer

May 24, 2019 · Cloud Computing

How Tencent Cloud Elasticsearch Enables Multi‑AZ Disaster Recovery

Tencent Cloud Elasticsearch now supports cross‑availability‑zone deployment, requiring even‑numbered data nodes, dedicated master nodes, and replica settings to ensure continuous service when a zone fails, with detailed steps for quick setup and region limitations explained.

ElasticsearchMulti‑AZTencent Cloud

0 likes · 6 min read

How Tencent Cloud Elasticsearch Enables Multi‑AZ Disaster Recovery

Efficient Ops

Apr 1, 2019 · Operations

How Dual‑Mode IT Is Redefining Disaster Recovery in Banking

The article examines how the rise of financial technology and dual‑mode IT is reshaping banking disaster‑recovery strategies, comparing active‑active and two‑site‑three‑center architectures, and proposing an integrated solution to improve resource utilization, reduce risk, and ensure continuous service.

Bankingbusiness continuitycloud

0 likes · 9 min read

How Dual‑Mode IT Is Redefining Disaster Recovery in Banking

Efficient Ops

Mar 23, 2019 · Operations

How to Build a Bank Ops SWAT Team for 5‑Minute Incident Recovery

This article explains how a bank can create a specialized Operations SWAT team, define its role, adopt seven essential “weapons” such as layered monitoring, intelligent alerts, communication protocols, automation, and disaster‑recovery tactics, and continuously train the team to meet strict five‑minute recovery targets.

AutomationSWAT teambank operations

0 likes · 21 min read

How to Build a Bank Ops SWAT Team for 5‑Minute Incident Recovery

Efficient Ops

Mar 18, 2019 · Operations

How to Build a Bank Ops SWAT Team for Rapid Incident Recovery

This article explains how a bank can create a specialized SWAT‑style operations team, define its roles, adopt seven essential "weapons" such as monitoring and intelligent alerts, and apply ten tactical processes—from communication to automation—to meet strict five‑minute recovery and regulatory requirements.

AutomationSWAT teambank operations

0 likes · 21 min read

How to Build a Bank Ops SWAT Team for Rapid Incident Recovery

Efficient Ops

Mar 17, 2019 · Operations

Why Cold-Standby Disaster Recovery Fails and How Active‑Active Architecture Wins

Modern cloud outages reveal that cold‑standby or simple multi‑cloud promises often provide only psychological comfort; achieving true high availability requires active‑active designs with local traffic handling, data partitioning, and low‑latency synchronization, while balancing cost, complexity, and physical distance constraints.

Active-ActiveLatencydata synchronization

0 likes · 10 min read

Why Cold-Standby Disaster Recovery Fails and How Active‑Active Architecture Wins