Tagged articles
1414 articles
Page 3 of 15
Tencent Cloud Developer
Tencent Cloud Developer
Jul 25, 2024 · Databases

Redis: Features, Use Cases, Evolution, Architecture, Data Types, Commands, and Tencent Cloud Redis

Redis is a high‑performance, in‑memory NoSQL key‑value store offering persistence, rich data types, advanced structures, and robust commands, supporting caching, session storage, pub/sub, and leaderboards, while evolving through replication, Sentinel, clustering, and multithreaded proxies, with Tencent Cloud providing scalable, highly available managed Redis services.

Cloud ServicesData StructuresIn-Memory Database
0 likes · 9 min read
Redis: Features, Use Cases, Evolution, Architecture, Data Types, Commands, and Tencent Cloud Redis
JD Cloud Developers
JD Cloud Developers
Jul 24, 2024 · Operations

How JD.com’s Buffalo Scheduler Achieves High‑Performance, Scalable DAG Orchestration

Buffalo, JD.com’s in‑house distributed DAG scheduler, tackles massive task volumes and complex dependencies through a dual‑layer entity model, instance‑based execution, tiered scheduling, high‑availability architecture, event‑driven processing, in‑memory and cold‑hot data separation, delivering scalable, low‑latency ETL orchestration.

DAG schedulingDistributed SystemsETL orchestration
0 likes · 12 min read
How JD.com’s Buffalo Scheduler Achieves High‑Performance, Scalable DAG Orchestration
JD Tech
JD Tech
Jul 23, 2024 · Big Data

Design and Architecture of JD's Buffalo Distributed Workflow Scheduling System

This article examines JD's self‑developed Buffalo distributed workflow scheduling system for big‑data ETL, detailing its two‑layer entity model, instance‑based scheduling, high‑availability three‑layer architecture, performance optimizations, cold‑hot data separation, and open APIs to support massive, complex data pipelines.

Big DataSchedulinghigh availability
0 likes · 11 min read
Design and Architecture of JD's Buffalo Distributed Workflow Scheduling System
JD Retail Technology
JD Retail Technology
Jul 22, 2024 · Big Data

Design and Architecture of JD's Buffalo Distributed Workflow Scheduling System

The article introduces JD's Buffalo distributed workflow scheduling system, detailing its dual-layer entity model, instance-based scheduling, high‑availability three‑tier architecture, performance optimizations such as horizontal scaling and event‑driven execution, as well as cold‑hot data separation and open APIs for future enhancements.

BuffaloDistributed SchedulingJD
0 likes · 10 min read
Design and Architecture of JD's Buffalo Distributed Workflow Scheduling System
Architecture and Beyond
Architecture and Beyond
Jul 21, 2024 · Operations

Mastering Backend Stability: 7 Essential Practices for High Availability

This comprehensive guide outlines the seven key pillars—operations, high‑availability architecture, capacity governance, change management, risk governance, fault management, and chaos engineering—that together form a systematic approach to building and maintaining a reliable, 24‑hour backend system.

Operationsbackend stabilitycapacity planning
0 likes · 40 min read
Mastering Backend Stability: 7 Essential Practices for High Availability
Huolala Tech
Huolala Tech
Jul 11, 2024 · Operations

How LApiGateway Achieves 99.999% Uptime: Architecture, SLA & Risk Mitigation

LApiGateway, Huolala's internal micro‑service gateway, achieves five‑nine availability through a dual‑plane architecture, comprehensive monitoring, SLA definition, risk classification, heartbeat health checks, traffic migration strategies, strict change governance, and regular fault drills, all detailed in this technical overview.

LApiGatewayMicroservice GatewaySLA
0 likes · 9 min read
How LApiGateway Achieves 99.999% Uptime: Architecture, SLA & Risk Mitigation
Su San Talks Tech
Su San Talks Tech
Jul 6, 2024 · Backend Development

Mastering High Availability: 10 Essential Design Techniques for Scalable Systems

This article explains ten core techniques—system splitting, decoupling, asynchrony, retry, compensation, backup, multi‑active strategies, isolation, rate limiting, circuit breaking, and degradation—that together enable robust, high‑availability architectures for modern backend services.

Distributed SystemsSystem Designfault tolerance
0 likes · 12 min read
Mastering High Availability: 10 Essential Design Techniques for Scalable Systems
Ctrip Technology
Ctrip Technology
Jul 5, 2024 · Backend Development

Design and Optimization of Ctrip Ticket Booking Transaction System for Flash‑Sale Events

This article examines the challenges faced by Ctrip’s ticket reservation transaction system during flash‑sale events and details the architectural optimizations—including Redis caching, database load reduction, supplier integration, and multi‑layer traffic throttling—that ensure system stability, strong consistency, and high availability under extreme concurrency.

Data ConsistencySystem Architecturehigh availability
0 likes · 16 min read
Design and Optimization of Ctrip Ticket Booking Transaction System for Flash‑Sale Events
Aikesheng Open Source Community
Aikesheng Open Source Community
Jun 27, 2024 · Databases

Evaluation of OceanBase Arbitration Service in a 2F1A Deployment: Fault Injection Experiments and Recovery Procedures

This article presents a detailed experimental study of OceanBase's Arbitration Service in a 2F1A (two full‑function replicas plus one arbitration node) configuration, examining how the system behaves when one or both full‑function replicas fail, how log‑stream degradation and permanent offline mechanisms work, and how normal service is restored after node recovery.

Arbitration ServiceFault InjectionOceanBase
0 likes · 17 min read
Evaluation of OceanBase Arbitration Service in a 2F1A Deployment: Fault Injection Experiments and Recovery Procedures
Top Architect
Top Architect
Jun 26, 2024 · Backend Development

High Availability Traffic Governance: Circuit Breakers, Isolation, Retries, Timeouts, and Rate Limiting

This article explains how to achieve high‑availability in microservice systems through traffic governance techniques such as circuit breakers, various isolation strategies, retry mechanisms, timeout controls, and rate‑limiting, illustrating each concept with examples, formulas, and pseudo‑code.

RetryTimeoutcircuit breaker
0 likes · 31 min read
High Availability Traffic Governance: Circuit Breakers, Isolation, Retries, Timeouts, and Rate Limiting
Architect
Architect
Jun 24, 2024 · Operations

Traffic Governance and High‑Availability Strategies for Microservices

This article explains how traffic governance—including circuit breaking, isolation, retry mechanisms, degradation, timeout control, and rate limiting—helps microservice systems achieve the three‑high goals of high performance, high availability, and easy scalability, using concrete formulas, algorithms, and practical examples.

MicroservicesRetryTimeout
0 likes · 29 min read
Traffic Governance and High‑Availability Strategies for Microservices
ITPUB
ITPUB
Jun 15, 2024 · Databases

Resolving Oracle RAC VIP Failover and SCAN IP Load‑Balancing Issues

This article walks through real‑world Oracle RAC failures caused by misconfigured VIP failover and SCAN IP load‑balancing, explains how to diagnose the symptoms, provides correct TAF and listener settings, and highlights essential configuration tips to ensure reliable high‑availability operation.

Database ConfigurationOracleRAC
0 likes · 9 min read
Resolving Oracle RAC VIP Failover and SCAN IP Load‑Balancing Issues
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 14, 2024 · Operations

Stability Assurance Practices for the 2024 CCTV Spring Festival Gala Live Stream

The 2024 CCTV Spring Festival Gala live stream employed comprehensive stability assurance practices across signal encoding, CDN distribution, request handling, and playback—using multi‑source encoding, multi‑level origin redundancy, multi‑cluster HA, and P2P‑augmented delivery—to handle massive QPS spikes, ensure high availability, and provide a resilient, high‑quality viewing experience.

Backend ArchitectureCDNP2P
0 likes · 24 min read
Stability Assurance Practices for the 2024 CCTV Spring Festival Gala Live Stream
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 12, 2024 · Backend Development

Comprehensive Guide to Nginx Configuration, Reverse Proxy, Load Balancing, and High‑Availability Clusters

This article provides a detailed tutorial on Nginx, covering its core features, configuration file structure, and practical examples for reverse proxy, load balancing, static‑dynamic separation, and high‑availability clustering with code snippets and deployment steps.

Backend DevelopmentNginxhigh availability
0 likes · 11 min read
Comprehensive Guide to Nginx Configuration, Reverse Proxy, Load Balancing, and High‑Availability Clusters
Architecture Breakthrough
Architecture Breakthrough
Jun 11, 2024 · R&D Management

Why Your Technical Presentation Fails and How the MECE Framework Saves It

The article reveals common pitfalls engineers face when presenting technical solutions—over‑focusing on details, ignoring business value and operational concerns—and shows how applying the MECE principle across value, technology, project, and operation dimensions creates a complete, persuasive report.

MECE frameworkcommunication skillshigh availability
0 likes · 7 min read
Why Your Technical Presentation Fails and How the MECE Framework Saves It
Tencent Cloud Developer
Tencent Cloud Developer
Jun 7, 2024 · Cloud Native

Multi-AZ High‑Availability Architecture of Tencent Cloud TDMQ for Apache Pulsar

Tencent Cloud TDMQ for Apache Pulsar achieves multi‑AZ high availability by containerizing ZooKeeper, BookKeeper and Brokers, using managed ZK, persistent cloud disks and elastic NICs, enforcing quorum and rack‑aware replicas, and planning cross‑region bidirectional replication to ensure seamless disaster recovery and continuous messaging.

Cloud NativeMulti‑AZPulsar
0 likes · 15 min read
Multi-AZ High‑Availability Architecture of Tencent Cloud TDMQ for Apache Pulsar
Sanyou's Java Diary
Sanyou's Java Diary
Jun 3, 2024 · Backend Development

Understanding the Full Lifecycle of a RocketMQ Message: From Production to Deletion

This article walks through every stage of a RocketMQ message—from producer creation, routing, queue selection, and storage with zero‑copy techniques, through high‑availability replication, consumption modes, ordering guarantees, and finally automatic cleanup—providing code examples and architectural diagrams for each step.

Backend DevelopmentRocketMQZero-Copy
0 likes · 26 min read
Understanding the Full Lifecycle of a RocketMQ Message: From Production to Deletion
Bilibili Tech
Bilibili Tech
May 31, 2024 · Backend Development

Design and High‑Availability Practices of Bilibili's Video Submission System

Bilibili’s video submission platform uses a layered micro‑service architecture with a DAG‑based scheduler, extensive observability, and HA tactics such as sharding, 64‑bit ID migration, full‑link stress tests, chaos engineering, and multi‑active data‑center deployment, while tooling like trace correlation and automated alerts ensures stability and guides future hybrid‑cloud migration.

Backend ArchitectureBilibiliDAG
0 likes · 35 min read
Design and High‑Availability Practices of Bilibili's Video Submission System
Su San Talks Tech
Su San Talks Tech
May 30, 2024 · Backend Development

Why Single‑Server Apps Fail: Master Load Balancing with Nginx and LVS

This article walks through the evolution from a single‑Tomcat deployment to a multi‑layer load‑balancing architecture using Nginx, a gateway, LVS, and DNS, explaining static‑dynamic separation, high‑availability strategies, and performance trade‑offs for scalable backend systems.

Backend ArchitectureLVSNginx
0 likes · 11 min read
Why Single‑Server Apps Fail: Master Load Balancing with Nginx and LVS
Efficient Ops
Efficient Ops
May 28, 2024 · Operations

How to Build a Resilient High‑Traffic Website: Domains, CDN, Monitoring, and Security

This guide outlines practical steps for creating a highly available, secure, and scalable website—including domain strategy, CDN deployment, image caching, data‑center selection, monitoring, attack mitigation, redundancy, server configuration, database replication, testing environments, disaster‑recovery planning, and high‑concurrency testing.

high availabilitymonitoringwebsite infrastructure
0 likes · 12 min read
How to Build a Resilient High‑Traffic Website: Domains, CDN, Monitoring, and Security
ITPUB
ITPUB
May 24, 2024 · Databases

Master PostgreSQL High Availability with Pacemaker & Corosync: A Step‑by‑Step Guide

This tutorial walks through building a PostgreSQL high‑availability cluster using Pacemaker and Corosync, covering architecture overview, component installation, cluster status checks, data synchronization verification, failover handling, and common maintenance commands with concrete commands and screenshots.

ClusterCorosyncPacemaker
0 likes · 7 min read
Master PostgreSQL High Availability with Pacemaker & Corosync: A Step‑by‑Step Guide
iQIYI Technical Product Team
iQIYI Technical Product Team
May 24, 2024 · Operations

High Availability and Disaster Recovery Practices of iQIYI's Video Relay Service (VRS)

iQIYI’s Video Relay Service ensures uninterrupted video playback by employing a two‑region, three‑center hybrid cloud architecture, multi‑layer storage, cross‑AZ retry mechanisms, protective rate‑limiting and degradation paths, layered monitoring, and rigorous stress‑testing and chaos engineering to achieve high availability and disaster recovery.

Backend ArchitectureCloud NativeVideo Streaming
0 likes · 18 min read
High Availability and Disaster Recovery Practices of iQIYI's Video Relay Service (VRS)
Laravel Tech Community
Laravel Tech Community
May 21, 2024 · Databases

MongoDB Replication Set and Sharding Configuration Guide

This article provides a comprehensive step‑by‑step guide to setting up MongoDB replica sets and sharded clusters, explaining the architecture, member roles, configuration files, initialization commands, and operational procedures for ensuring data redundancy, high availability, and horizontal scaling.

ClusterMongoDBReplication
0 likes · 29 min read
MongoDB Replication Set and Sharding Configuration Guide
DevOps Operations Practice
DevOps Operations Practice
May 19, 2024 · Operations

High‑Availability Solutions for Prometheus Monitoring

Prometheus, a leading monitoring system, can achieve high availability through several common architectures—including dual-node with external storage, federated mode with external storage, and multi-node clusters combined with Thanos and object storage—each offering data persistence and load distribution to enhance system stability and performance.

External StoragePrometheusThanos
0 likes · 3 min read
High‑Availability Solutions for Prometheus Monitoring
MaGe Linux Operations
MaGe Linux Operations
May 19, 2024 · Databases

How to Deploy Xenon: A Raft‑Based MySQL HA Solution with Semi‑Sync and Parallel Replication

This guide walks through deploying Xenon, an open‑source Raft‑based MySQL high‑availability solution, covering environment setup, installation of Go and Percona XtraBackup, configuring Xenon’s JSON, starting the cluster, monitoring status, and troubleshooting backup failures caused by misconfigured host settings.

BackupDatabase ReplicationGo
0 likes · 8 min read
How to Deploy Xenon: A Raft‑Based MySQL HA Solution with Semi‑Sync and Parallel Replication
Cognitive Technology Team
Cognitive Technology Team
May 16, 2024 · Operations

Core Principles of High‑Availability Architecture Design

These core principles—minimal dependency, weak dependency, distribution, rate limiting, degradable design, balanced risk, fault prevention and isolation, no single point of failure, self‑protection, automatic failover, and retry/idempotency/compensation—guide the design of highly available systems by reducing risk, ensuring redundancy, and protecting services at all layers.

OperationsReliabilitySystem Design
0 likes · 3 min read
Core Principles of High‑Availability Architecture Design
Selected Java Interview Questions
Selected Java Interview Questions
May 10, 2024 · Databases

Comparing NewSQL Databases with Middleware‑Based Sharding: Advantages, Limitations, and Practical Guidance

This article objectively compares NewSQL databases and middleware‑plus‑sharding architectures, examining their core principles, distributed transaction handling, high‑availability mechanisms, scaling and sharding strategies, SQL support, storage engines, and maturity to help engineers decide which solution fits their workload.

Database ArchitectureDistributed TransactionsNewSQL
0 likes · 18 min read
Comparing NewSQL Databases with Middleware‑Based Sharding: Advantages, Limitations, and Practical Guidance
Sanyou's Java Diary
Sanyou's Java Diary
May 9, 2024 · Databases

From Single Node to Cluster: Mastering Redis Architecture Evolution

This article walks you through Redis’s architectural journey—from a simple single‑node setup, through persistence mechanisms, master‑slave replication, Sentinel‑driven automatic failover, and finally sharding with Redis Cluster—explaining each component’s purpose, trade‑offs, and how they collectively boost performance and reliability.

ClusterPersistenceReplication
0 likes · 18 min read
From Single Node to Cluster: Mastering Redis Architecture Evolution
DevOps Cloud Academy
DevOps Cloud Academy
May 6, 2024 · Cloud Native

How to Deploy a Highly Available Application on Kubernetes

This article explains key Kubernetes configurations—such as pod replicas, pod anti‑affinity, deployment strategies, graceful termination, probes, resource allocation, scaling, and disruption budgets—to achieve high availability and zero‑downtime deployments for containerized applications in production.

Cloud NativeKubernetesProbes
0 likes · 20 min read
How to Deploy a Highly Available Application on Kubernetes
JD Retail Technology
JD Retail Technology
Apr 26, 2024 · Operations

How Isolation Principles Boost System High Availability: Real-World Cases

This article explains the concept of high availability, defines the isolation principle, outlines its implementation across various layers, and presents concrete case studies—including vertical data‑center redesign, dual‑cluster Elasticsearch migration, traffic grouping, and hot‑cold data segregation—to illustrate how isolation improves system resilience.

BackendCase StudyOperations
0 likes · 15 min read
How Isolation Principles Boost System High Availability: Real-World Cases
Java Captain
Java Captain
Apr 26, 2024 · Databases

Choosing Between Sharding Middleware and NewSQL Distributed Databases: Advantages, Trade‑offs, and Use Cases

This article objectively compares middleware‑based sharding with modern NewSQL distributed databases, examining their architectural differences, performance, transaction support, scalability, high‑availability, and operational considerations, to help practitioners decide which approach best fits their workload and organizational constraints.

Database ArchitectureDistributed TransactionsNewSQL
0 likes · 20 min read
Choosing Between Sharding Middleware and NewSQL Distributed Databases: Advantages, Trade‑offs, and Use Cases
dbaplus Community
dbaplus Community
Apr 25, 2024 · Operations

How We Built Same‑City Active‑Active Architecture for a High‑Volume Transaction Platform

This article details the background, design principles, overall architecture, concrete refactoring steps, launch process, results, and emerging challenges of implementing a same‑city active‑active solution to improve reliability, load balancing, disaster recovery, and cost efficiency for a large‑scale transaction system.

Active-ActiveBlue‑Green deploymentMulti‑AZ
0 likes · 23 min read
How We Built Same‑City Active‑Active Architecture for a High‑Volume Transaction Platform
Architect
Architect
Apr 22, 2024 · Operations

Flow Governance and High‑Availability Strategies for Microservice Systems

This article explains how to achieve high availability in microservice architectures by applying flow governance techniques such as circuit breaking, isolation, retry policies, degradation, timeout management, and rate limiting, while detailing key metrics like MTBF and MTTR and providing practical implementation guidance.

Flow ControlMicroservicesRetry
0 likes · 30 min read
Flow Governance and High‑Availability Strategies for Microservice Systems
Selected Java Interview Questions
Selected Java Interview Questions
Apr 21, 2024 · Backend Development

Designing an Enterprise‑Level Unified Notification Service Architecture

This article systematically outlines the requirements, evolution stages, functional and non‑functional specifications, and component design of a scalable, high‑availability enterprise notification platform that supports multi‑channel push (email, SMS, chat, WeChat, DingTalk, etc.) through a microservice‑based architecture.

MessagingNotificationScalability
0 likes · 12 min read
Designing an Enterprise‑Level Unified Notification Service Architecture
Architecture Digest
Architecture Digest
Apr 19, 2024 · Databases

Comparing NewSQL Distributed Databases with Middleware‑Based Sharding: Advantages, Trade‑offs, and Use Cases

The article objectively compares NewSQL distributed databases with traditional middleware‑based sharding solutions, examining their architectural differences, distributed transaction support, performance, scalability, high‑availability mechanisms, storage engines, and practical suitability for various application scenarios.

CAP theoremNewSQLScalability
0 likes · 18 min read
Comparing NewSQL Distributed Databases with Middleware‑Based Sharding: Advantages, Trade‑offs, and Use Cases
Efficient Ops
Efficient Ops
Apr 14, 2024 · Operations

How to Ensure System Stability and High Availability: An SRE Playbook

This article explains the definitions of stability and high availability, distinguishes their relationship, outlines key performance indicators, and provides a comprehensive framework—including fault prevention, detection, and recovery, as well as design, coding, testing, monitoring, and emergency response practices—to help teams build reliable, highly available systems.

SREcapacity planninghigh availability
0 likes · 10 min read
How to Ensure System Stability and High Availability: An SRE Playbook
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Apr 11, 2024 · Databases

Mastering Redis Sentinel: Ensuring Automatic High Availability

This article explains Redis Sentinel’s role in providing monitoring, notifications, automatic failover, and configuration updates to achieve high availability, detailing its heartbeat mechanism, master‑down detection, leader election, failover selection criteria, and the trade‑offs of using this solution.

databasefailoverhigh availability
0 likes · 6 min read
Mastering Redis Sentinel: Ensuring Automatic High Availability
Architecture & Thinking
Architecture & Thinking
Apr 10, 2024 · Operations

How Redis Sentinel Ensures Automatic Failover and High Availability

Redis Sentinel provides automatic monitoring, fault detection, and failover for Redis master‑slave clusters, enabling high availability by electing a new master when the original fails, using sdown/odown states, quorum voting, and pub/sub communication to keep services running with minimal downtime.

failoverhigh availabilitymonitoring
0 likes · 11 min read
How Redis Sentinel Ensures Automatic Failover and High Availability
JD Retail Technology
JD Retail Technology
Apr 8, 2024 · Backend Development

Applying the Weak Dependency Principle for High Availability in Microservices

This article explains the weak dependency principle, contrasts it with the less‑dependency principle, and presents concrete microservice architecture strategies—including module splitting, independent deployment, asynchronous messaging, interface abstraction, fault‑tolerance, and governance—to improve system flexibility, scalability, and high availability.

Microservicesarchitecturehigh availability
0 likes · 14 min read
Applying the Weak Dependency Principle for High Availability in Microservices
MaGe Linux Operations
MaGe Linux Operations
Apr 8, 2024 · Operations

Build a Highly Available Load Balancer with LVS and Keepalived

This guide explains how to design and deploy a highly available web load‑balancing cluster using Linux Virtual Server (LVS) together with Keepalived, covering architecture, required software, configuration steps for both master and backup nodes, real‑server setup, and HA testing procedures.

LVSLinuxhigh availability
0 likes · 12 min read
Build a Highly Available Load Balancer with LVS and Keepalived
Architect
Architect
Apr 4, 2024 · Backend Development

Mastering High Availability: 9 Essential Design Techniques for Scalable Systems

The article walks through nine practical techniques—system splitting, decoupling, asynchronous processing, retry, compensation, backup, multi‑active deployment, rate limiting, circuit breaking, and degradation—explaining why each is needed, how they are implemented in real‑world microservice architectures, and what trade‑offs to consider.

Distributed SystemsMicroservicesSystem Design
0 likes · 13 min read
Mastering High Availability: 9 Essential Design Techniques for Scalable Systems
FunTester
FunTester
Mar 29, 2024 · Operations

Implementing Chaos Engineering in WeChat Pay: Practices, Challenges, and Outcomes

This article describes how WeChat Pay applied chaos engineering to improve system reliability, detailing the business scenario, challenges of controlling fault injection radius, practical solutions, risk assessment, automation, and the resulting business and tool achievements.

Fault InjectionOperationsWeChat Pay
0 likes · 18 min read
Implementing Chaos Engineering in WeChat Pay: Practices, Challenges, and Outcomes
DeWu Technology
DeWu Technology
Mar 25, 2024 · Cloud Native

Design and Implementation of Same‑City Dual‑Active Architecture for a Transaction Platform

The paper details a same‑city dual‑active architecture for a high‑traffic transaction platform, combining blue‑green and dual‑cluster deployment with zone‑aware routing, middleware transformations, and a gradual traffic‑coloring release process that achieved near‑50/50 traffic split, stable performance, minimal cost, and outlines remaining challenges.

DeploymentDual-Activecloud-native
0 likes · 20 min read
Design and Implementation of Same‑City Dual‑Active Architecture for a Transaction Platform
Tencent Cloud Developer
Tencent Cloud Developer
Mar 19, 2024 · Operations

Chaos Engineering in WeChat Pay: Design, Implementation, and Results

WeChat Pay’s team adopted Netflix‑style chaos engineering, building an automated, YAML‑driven fault‑injection platform that isolates experiments in multi‑zone partitions, enabling over 500 safe experiments in 2021‑2022, uncovering critical bugs across core services while maintaining five‑nine availability and zero production incidents.

AutomationFault InjectionReliability
0 likes · 18 min read
Chaos Engineering in WeChat Pay: Design, Implementation, and Results
dbaplus Community
dbaplus Community
Mar 18, 2024 · Operations

How to Build a Resilient, High‑Traffic Web Infrastructure: A Step‑by‑Step Ops Guide

This guide outlines a complete, practical workflow for acquiring multiple domains, configuring DNS, deploying CDN and image caches, selecting data‑center locations, setting up redundant servers, implementing monitoring, handling DDoS attacks, planning capacity, securing systems, and organizing an operations team to ensure high availability for large‑scale web services.

CDNServer ConfigurationWeb infrastructure
0 likes · 12 min read
How to Build a Resilient, High‑Traffic Web Infrastructure: A Step‑by‑Step Ops Guide
Huolala Tech
Huolala Tech
Mar 14, 2024 · Cloud Native

HuoLala’s Cost‑Effective Multi‑Zone High Availability via Multi‑Lane Architecture

This article explains how HuoLala designed a cost‑effective multi‑zone high‑availability solution called the multi‑lane architecture, detailing its goals, deployment of services across availability zones, use of Consul for service discovery, Apollo for configuration, traffic scheduling strategies, and how it differs from traditional active‑active setups.

Cloud NativeConfiguration Managementhigh availability
0 likes · 13 min read
HuoLala’s Cost‑Effective Multi‑Zone High Availability via Multi‑Lane Architecture
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Mar 13, 2024 · Databases

Unlocking Redis: Architecture, High Availability, and Persistence Explained

This article provides a comprehensive overview of Redis, covering its core concepts, deployment architectures—including single instance, high‑availability, Sentinel, and cluster setups—its replication mechanisms, gossip protocol, and the various persistence options such as RDB, AOF, and fork‑based snapshots.

ClusterIn-Memory DatabasePersistence
0 likes · 17 min read
Unlocking Redis: Architecture, High Availability, and Persistence Explained
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mar 6, 2024 · Cloud Computing

Understanding IaaS: Definition, Features, Core Technologies, and Application Scenarios

This article provides a comprehensive overview of IaaS, detailing its definition, core characteristics, underlying technologies such as virtualization and automation, and common use cases, while highlighting benefits like cost reduction, elasticity, high availability, and security in cloud environments.

AutomationIaaSInfrastructure as a Service
0 likes · 8 min read
Understanding IaaS: Definition, Features, Core Technologies, and Application Scenarios
Architects' Tech Alliance
Architects' Tech Alliance
Feb 24, 2024 · Operations

How the Two‑Site Three‑Center Disaster Recovery Model Boosts Business Continuity

The article explains the two‑site three‑center disaster‑recovery architecture—comprising a production site, a same‑city backup, and a remote backup—detailing synchronous and asynchronous data replication, failover capabilities, Oracle Data Guard implementation, and why this hybrid approach delivers superior RPO, RTO, and availability for enterprises.

InfrastructureOracle Data GuardRPO
0 likes · 6 min read
How the Two‑Site Three‑Center Disaster Recovery Model Boosts Business Continuity
MaGe Linux Operations
MaGe Linux Operations
Feb 14, 2024 · Databases

Unlocking Redis: Core Concepts, Architecture, and Persistence Explained

This article introduces Redis as an in‑memory key‑value data‑structure server, explains its primary use cases, walks through deployment options such as single instances, high‑availability, Sentinel and Cluster, and details its persistence mechanisms including RDB, AOF and forking.

ClusterIn-Memory DatabasePersistence
0 likes · 16 min read
Unlocking Redis: Core Concepts, Architecture, and Persistence Explained
Architects' Tech Alliance
Architects' Tech Alliance
Feb 13, 2024 · Operations

What Makes Enterprise Storage Systems Reliable and Scalable?

As enterprise data volumes surge, modern storage systems must deliver high availability, fault tolerance, multi‑protocol support, backup, snapshot, and cloning capabilities, often through distributed architectures that boost reliability, scalability, and cost efficiency while ensuring rapid data recovery.

Enterprise StorageStorage Systemsdata backup
0 likes · 4 min read
What Makes Enterprise Storage Systems Reliable and Scalable?
ITPUB
ITPUB
Feb 13, 2024 · Databases

Achieve Seamless Second‑Level Database Scaling for High‑Throughput Microservices

This guide explains how to design a high‑concurrency, high‑throughput internet architecture that ensures database high availability with double‑master sync and virtual IPs, and how to horizontally shard and smoothly expand the cluster in seconds using configuration changes, reloads, and cleanup steps.

Microservicesdatabaseshigh availability
0 likes · 8 min read
Achieve Seamless Second‑Level Database Scaling for High‑Throughput Microservices
JavaEdge
JavaEdge
Feb 7, 2024 · Backend Development

Designing a High‑Availability Payment System: Flow, Optimization, and Fault Tolerance

This article details the end‑to‑end design of a payment system, covering transaction flow, horizontal and vertical pre‑optimizations, task scheduling, sharding strategies, data structures, high‑availability mechanisms such as channel isolation and Hystrix, and future planning for dynamic scaling and intelligent routing.

Backend ArchitectureElastic-JobHystrix
0 likes · 12 min read
Designing a High‑Availability Payment System: Flow, Optimization, and Fault Tolerance
MaGe Linux Operations
MaGe Linux Operations
Feb 7, 2024 · Databases

How to Build a Real‑Time Data Guard System for Dameng Database

This guide walks through setting up a Dameng data‑guard service using a primary, standby, and monitor server, covering data preparation, configuration of dm.ini, dmmal.ini, dmarch.ini, dmwatcher.ini, starting services, OGUID setup, mode switching, and monitoring to achieve high‑availability replication.

BackupDamengData Guard
0 likes · 12 min read
How to Build a Real‑Time Data Guard System for Dameng Database
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 1, 2024 · Databases

Why Redis Dominates Modern Caching: Architecture, Strategies, and Pitfalls

This article provides a comprehensive technical overview of Redis, covering its high‑performance in‑memory design, rich data structures, persistence options, transaction support, eviction policies, common caching patterns, distributed locking techniques, and high‑availability solutions such as Sentinel and Cluster, while also comparing it with alternatives like Memcached, Tair, Guava, EVCache and ETCD.

Persistencehigh availabilityredis
0 likes · 35 min read
Why Redis Dominates Modern Caching: Architecture, Strategies, and Pitfalls
Baidu Geek Talk
Baidu Geek Talk
Jan 29, 2024 · Databases

BTS (Baidu Table Storage): Architecture and Core Technologies

BTS (Baidu Table Storage) is Baidu Intelligent Cloud’s high‑performance, low‑cost semi‑structured NoSQL service that evolved from single‑table to multi‑model (wide tables, time‑series, soon documents), featuring a three‑layer compute‑storage separation architecture, multi‑level caching, hot‑backup HA, and supporting massive IoT, AI, autonomous‑driving and monitoring workloads.

BTSBaidu Table StorageDatabase Architecture
0 likes · 21 min read
BTS (Baidu Table Storage): Architecture and Core Technologies
Efficient Ops
Efficient Ops
Jan 23, 2024 · Operations

Why Building Truly High‑Availability Systems Is Harder Than You Think

The article examines why 2023 saw a surge in major online outages, linking layoffs and cost‑cutting to lost expertise, and explores the entropy and Murphy laws that make perpetual high availability impossible without continuous, systematic investment and cultural change.

SRETechnical Debthigh availability
0 likes · 13 min read
Why Building Truly High‑Availability Systems Is Harder Than You Think
DevOps
DevOps
Jan 12, 2024 · Operations

Why Building a Never‑Failing System Is Impossible and How to Pursue Continuous High Availability

The article analyses why truly never‑failing systems cannot exist—citing entropy and Murphy’s laws—examines the organizational and technical obstacles to continuous high availability, and offers practical cultural and engineering practices such as testing, code review, monitoring, and regular system health checks to mitigate risk.

Murphy's LawOperationsSRE
0 likes · 14 min read
Why Building a Never‑Failing System Is Impossible and How to Pursue Continuous High Availability
Tencent Cloud Developer
Tencent Cloud Developer
Jan 10, 2024 · Operations

The Challenges of Building Continuously Available Systems: Entropy, Murphy's Law, and the 'Divine Doctor Paradox'

Building continuously available systems in 2023 is hampered by entropy‑driven technical debt and Murphy’s Law failures, and the “Divine Doctor Paradox” shows that successful availability work goes unnoticed while blame follows any outage, making cultural commitment—not just technology—the essential solution.

Murphy's LawSRETechnical Debt
0 likes · 14 min read
The Challenges of Building Continuously Available Systems: Entropy, Murphy's Law, and the 'Divine Doctor Paradox'
Programmer DD
Programmer DD
Jan 5, 2024 · Operations

Master the ‘Three Highs’: Availability, Throughput, and Scalability in System Design

This article explains the essential "three high" goals of system design—high availability, high throughput, and high scalability—detailing their meanings, common architectural patterns such as Hot‑Hot, Hot‑Warm, leader‑based clusters, and practical techniques like caching, async processing, and micro‑service isolation to build robust, scalable services.

Backend ArchitectureHigh ThroughputScalability
0 likes · 6 min read
Master the ‘Three Highs’: Availability, Throughput, and Scalability in System Design
Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 27, 2023 · Cloud Native

How ByConity Achieves Leader Election with Shared Storage and CAS

This article explains how ByConity uses a high‑availability shared KV store and CAS operations to implement a lightweight, fault‑tolerant leader election mechanism that eliminates the need for external services like Zookeeper, simplifies node management, and ensures safe leader transitions in a cloud‑native data warehouse.

ByConityCloud Nativedistributed consensus
0 likes · 21 min read
How ByConity Achieves Leader Election with Shared Storage and CAS
Architect
Architect
Dec 23, 2023 · Backend Development

Architecture Evolution and Challenges of Meituan's Code Hosting Platform

This article details Meituan's Code platform evolution from a single‑machine setup to a multi‑machine and finally a distributed, sharded architecture, describing the scalability and availability challenges faced and the comprehensive engineering solutions implemented to achieve high‑performance, high‑availability code hosting for millions of repositories.

Gitcode hostingdistributed architecture
0 likes · 21 min read
Architecture Evolution and Challenges of Meituan's Code Hosting Platform
DataFunSummit
DataFunSummit
Dec 23, 2023 · Databases

REDTao: A Scalable Graph Storage System for Trillion‑Scale Social Networks at Xiaohongshu

This article presents REDTao, Xiaohongshu's self‑built graph storage solution that unifies graph queries, reduces development duplication, and delivers low‑latency, high‑availability access to a trillion‑scale social graph through a three‑layer architecture, distributed cache, and cloud‑native deployment.

Cloud NativeScalabilitydistributed cache
0 likes · 15 min read
REDTao: A Scalable Graph Storage System for Trillion‑Scale Social Networks at Xiaohongshu
dbaplus Community
dbaplus Community
Dec 19, 2023 · Databases

How GitHub Upgraded 1,200 MySQL Servers from 5.7 to 8.0 Without Downtime

GitHub detailed a year‑long, multi‑team effort to upgrade over 1,200 MySQL hosts from version 5.7 to 8.0, describing the motivations, infrastructure scale, preparation steps, a staged rollout plan, rollback strategies, challenges faced, and key lessons learned for large‑scale database migrations.

AutomationDatabase UpgradeGitHub
0 likes · 16 min read
How GitHub Upgraded 1,200 MySQL Servers from 5.7 to 8.0 Without Downtime
21CTO
21CTO
Dec 15, 2023 · Databases

How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime

GitHub’s engineering team detailed a year‑long, multi‑team effort to upgrade over 1,200 MySQL hosts from 5.7 to 8.0, preserving high availability, SLO compliance, and rollback capability while introducing new features and performance improvements.

GitHubReplicationdatabase migration
0 likes · 17 min read
How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime
Refining Core Development Skills
Refining Core Development Skills
Dec 11, 2023 · Databases

Understanding Tencent Cloud's TDSQL Distributed Relational Database Architecture and Features

This article explains the motivations behind distributed databases in high‑risk industries, introduces Tencent Cloud's TDSQL as a market leader, and details its architecture—including load balancing, SQL engine, compute and storage separation, raft‑based strong consistency, lossless upgrades, high‑availability failover, and an intelligent DBA platform.

TDSQLhigh availabilityintelligent DBA
0 likes · 8 min read
Understanding Tencent Cloud's TDSQL Distributed Relational Database Architecture and Features
FunTester
FunTester
Dec 10, 2023 · Databases

How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime

GitHub detailed a year‑long, multi‑team effort to upgrade over 1,200 MySQL hosts from 5.7 to 8.0 using phased rollouts, automated testing, compatibility checks, and rollback mechanisms while maintaining strict SLOs and high‑availability requirements.

GitHubOperationsdatabase migration
0 likes · 16 min read
How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime
Tencent Cloud Developer
Tencent Cloud Developer
Nov 30, 2023 · Cloud Computing

X's Cloud Cost Reduction and the Shift Toward On‑Premises: Implications for Cloud Computing Trends

X (formerly Twitter) cut monthly cloud spending by 60% by shifting workloads and storage to on‑premises infrastructure, igniting a debate over whether de‑clouding is viable for all enterprises, how it signals a potential inflection point in cloud computing, and what strategies—balancing high availability, disaster recovery, and cost efficiency—should guide firms, as highlighted in the upcoming TVP Tech Sleepless Nights series featuring leading industry experts.

Cloud NativeCost Optimizationcloud repatriation
0 likes · 7 min read
X's Cloud Cost Reduction and the Shift Toward On‑Premises: Implications for Cloud Computing Trends
Efficient Ops
Efficient Ops
Nov 28, 2023 · Databases

Mastering Redis: Core Features, Caching Strategies, and High Availability

This article provides a comprehensive overview of Redis, covering its architecture, key features, data types, caching use cases, common pitfalls such as consistency, avalanche, penetration and breakdown, as well as performance reasons, eviction policies, persistence options, replication, and Sentinel high‑availability mechanisms.

cachinghigh availabilityperformance
0 likes · 13 min read
Mastering Redis: Core Features, Caching Strategies, and High Availability
Top Architecture Tech Stack
Top Architecture Tech Stack
Nov 27, 2023 · Operations

Designing Multi-Active Cross‑Region Architecture: Scenarios, Patterns, and Practical Techniques

This article explains the motivations, application scenarios, architectural patterns (same‑city, cross‑city, and cross‑country), and concrete design techniques for building multi‑active cross‑region systems that ensure high availability and graceful degradation during extreme failures.

Distributed Systemsdata synchronizationdisaster recovery
0 likes · 32 min read
Designing Multi-Active Cross‑Region Architecture: Scenarios, Patterns, and Practical Techniques
Top Architecture Tech Stack
Top Architecture Tech Stack
Nov 26, 2023 · Operations

Understanding High Availability and High Performance: Complexity, Redundancy, and Decision Strategies

This article examines the inherent complexity of achieving high availability and high performance in distributed systems, explaining redundancy techniques, storage consistency challenges, various state‑decision models, and the trade‑offs involved in scaling single‑machine and cluster architectures.

Distributed SystemsSystem Designhigh availability
0 likes · 27 min read
Understanding High Availability and High Performance: Complexity, Redundancy, and Decision Strategies
Open Source Tech Hub
Open Source Tech Hub
Nov 24, 2023 · Backend Development

Why Switch from Linux Crontab to Workerman Crontab for High‑Availability Scheduling

This article compares traditional Linux crontab with the PHP‑based Workerman Crontab, highlighting crontab's high‑availability, load‑balancing, and permission limitations, and demonstrating how Workerman Crontab offers second‑level precision, dynamic management, distributed deployment, and superior performance for modern task scheduling needs.

BackendPHPWorkerman
0 likes · 7 min read
Why Switch from Linux Crontab to Workerman Crontab for High‑Availability Scheduling
Architect
Architect
Nov 23, 2023 · Databases

Inside Our High‑Performance Self‑Built Redis System: Architecture, Features & Ops

This article details the design and implementation of a self‑managed Redis KV cache system spanning tens of terabytes, covering its Proxy‑based architecture, ConfigServer high‑availability via Raft, Redis‑Proxy slot routing, async‑fork optimizations, data migration strategies, and a comprehensive automation platform for deployment, scaling, monitoring, and stability governance.

AutomationDistributed Systemshigh availability
0 likes · 24 min read
Inside Our High‑Performance Self‑Built Redis System: Architecture, Features & Ops
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 22, 2023 · Backend Development

Improving Stability and High Availability of an Advertising Billing System: Architecture Upgrade and Optimizations

This article describes the background, problems, and a series of architectural upgrades—including MQ replacement, thread‑pool isolation, Redis/TiKV redundancy, and Spark‑based compensation—to enhance the stability, scalability, and high‑availability of an advertising billing system.

AdvertisingBackendMessage Queue
0 likes · 12 min read
Improving Stability and High Availability of an Advertising Billing System: Architecture Upgrade and Optimizations
Top Architecture Tech Stack
Top Architecture Tech Stack
Nov 22, 2023 · Operations

Designing Multi‑Active (Active‑Active) Architecture Across Regions: Scenarios, Patterns, and Practical Techniques

This article explains the motivations, application scenarios, architectural patterns, and step‑by‑step design techniques for building geographically distributed active‑active systems that can survive extreme failures while balancing cost, complexity, and data consistency requirements.

Active-ActiveDistributed SystemsSystem Design
0 likes · 32 min read
Designing Multi‑Active (Active‑Active) Architecture Across Regions: Scenarios, Patterns, and Practical Techniques
Senior Tony
Senior Tony
Nov 21, 2023 · Operations

How to Shrink Failure Scope with Circuit Breakers, Degradation, and Link Splitting

This article explains how to reduce the impact of failures in distributed systems by simplifying service links, applying circuit‑breaker mechanisms, implementing graceful degradation, performing core‑link isolation, and, as a last resort, switching to a minimal MVP version to keep essential functionality alive.

Operationscircuit breakerdegradation
0 likes · 11 min read
How to Shrink Failure Scope with Circuit Breakers, Degradation, and Link Splitting
DaTaobao Tech
DaTaobao Tech
Nov 17, 2023 · Artificial Intelligence

Marketing Technology Architecture and Challenges at Taobao

Taobao’s marketing technology team built a platform‑centric architecture that separates merchant acquisition, ad placement, benefits, and scene construction, enabling 80‑90% feature reuse while tackling challenges such as massive merchant onboarding, real‑time rule validation, price consistency, ultra‑high‑concurrency lottery draws, low‑end device rendering, and AI‑driven asset creation.

AIMarketingSystem Architecture
0 likes · 16 min read
Marketing Technology Architecture and Challenges at Taobao
政采云技术
政采云技术
Nov 16, 2023 · Fundamentals

Comprehensive Guide to Software Architecture Design and Practices

This article provides an extensive overview of software architecture, covering its definition, history, core concepts, design principles, complexity sources, design process, performance, high availability, scalability, and practical implementation techniques for large‑scale web systems.

MicroservicesPerformance OptimizationScalability
0 likes · 24 min read
Comprehensive Guide to Software Architecture Design and Practices
Architect's Guide
Architect's Guide
Nov 15, 2023 · Databases

Smooth 2N Database Scaling and High Availability with MariaDB, Keepalived, and Sharding

This article presents five expansion strategies—shutdown, write‑stop, log‑based, dual‑write, and smooth 2N—detailing step‑by‑step procedures for MariaDB installation, master‑master replication, dynamic data‑source configuration, and Keepalived high‑availability setup, enabling seamless horizontal scaling and minimal service disruption for large‑scale databases.

MariaDBdatabase scalinghigh availability
0 likes · 30 min read
Smooth 2N Database Scaling and High Availability with MariaDB, Keepalived, and Sharding
Senior Tony
Senior Tony
Nov 14, 2023 · Operations

Master Availability, Reliability, and Stability for High‑Availability Systems

Understanding the differences between system availability, reliability, and stability is essential for building resilient services; this guide explains each concept, illustrates their distinctions with examples, and outlines practical strategies such as rate limiting, anti‑scraping, timeout settings, system inspections, and fault post‑mortems to reduce failures and downtime.

AvailabilityReliabilityhigh availability
0 likes · 11 min read
Master Availability, Reliability, and Stability for High‑Availability Systems
Su San Talks Tech
Su San Talks Tech
Nov 13, 2023 · Operations

What Alibaba Cloud’s Epic Outage Reveals About Building Truly Resilient Systems

An unprecedented Alibaba Cloud outage that crippled services like Aliyun Drive, Taobao, and DingTalk highlighted the critical need for high‑availability, multi‑region architectures, prompting a detailed look at the incident timeline, affected products, and practical design lessons for ensuring resilient cloud deployments.

System Designcloud outagehigh availability
0 likes · 5 min read
What Alibaba Cloud’s Epic Outage Reveals About Building Truly Resilient Systems
Architecture Digest
Architecture Digest
Nov 11, 2023 · Databases

Redis: From Cache to Distributed Data Store – Benefits, Persistence, and Use Cases

This article explains how Redis evolved from a simple cache to a high‑performance distributed data store, covering its architecture, persistence models, scalability, high‑availability features, complex data structures, and the trade‑offs of using it as a primary database versus a traditional relational system.

Distributed SystemsPersistencecaching
0 likes · 9 min read
Redis: From Cache to Distributed Data Store – Benefits, Persistence, and Use Cases
JD Retail Technology
JD Retail Technology
Nov 3, 2023 · Backend Development

Order System Architecture Overview and Design

This document outlines the business scope, value, overall and real‑time data layer architecture, design advantages, data model, extensibility, and future challenges of the order system, emphasizing decoupling, high availability, scalability, and cost control.

Backend ArchitectureScalabilitydatabases
0 likes · 12 min read
Order System Architecture Overview and Design
Aikesheng Open Source Community
Aikesheng Open Source Community
Oct 31, 2023 · Databases

MySQL Disaster Recovery: Multi‑Region Three‑Center Replication and RTO/RPO Optimization

This article explains the principles of disaster recovery for MySQL, covering RTO/RPO metrics, national backup level standards, common master‑slave topologies, a comparative analysis of high‑availability solutions, and a detailed three‑center multi‑region replication design with code patches to avoid replication loops.

RPORTOReplication
0 likes · 17 min read
MySQL Disaster Recovery: Multi‑Region Three‑Center Replication and RTO/RPO Optimization
JD Tech
JD Tech
Oct 30, 2023 · Operations

High‑Availability Assurance for E‑Commerce Mega‑Promotion Systems

This article outlines a systematic approach to ensuring high availability for e‑commerce mega‑promotion events, covering historical context, business model analysis, goal setting, strategic planning, tactical execution, and growth, with detailed evaluation of marketing, transaction, fulfillment, and monitoring processes.

Performance Monitoringe‑commercehigh availability
0 likes · 22 min read
High‑Availability Assurance for E‑Commerce Mega‑Promotion Systems
JD Tech
JD Tech
Oct 25, 2023 · Backend Development

Design and Implementation of JD Logistics Order System Architecture for High Scalability and Availability

The article details JD Logistics' order system redesign using a four‑layer transaction architecture, describing its decoupled backend, unified data model, high‑availability components such as CQRS, Redis, JMQ, HBase, and Elasticsearch, and outlines design advantages, extensible data modeling, future challenges, and overall performance outcomes.

Backend ArchitectureDistributed SystemsOrder Management
0 likes · 10 min read
Design and Implementation of JD Logistics Order System Architecture for High Scalability and Availability