Tagged articles
1414 articles
Page 5 of 15
Sanyou's Java Diary
Sanyou's Java Diary
Jan 9, 2023 · Backend Development

Unveiling the Full Lifecycle of a RocketMQ Message: From Production to Deletion

This article walks through every stage of a RocketMQ message—how producers create and route messages, the storage mechanisms and zero‑copy techniques used by brokers, high‑availability modes, consumption models, ordering guarantees, and the automatic cleanup policies that finally retire messages.

Message QueueRocketMQhigh availability
0 likes · 26 min read
Unveiling the Full Lifecycle of a RocketMQ Message: From Production to Deletion
Top Architect
Top Architect
Jan 8, 2023 · Databases

Understanding Redis: Architecture, Deployment Options, Persistence Models, and High Availability

This article provides a comprehensive overview of Redis, covering its definition as an in‑memory data structure server, various deployment topologies such as single instance, high‑availability, Sentinel, and Cluster, as well as detailed explanations of persistence mechanisms, forking, replication, and practical considerations for scaling and reliability.

ClusterPersistenceReplication
0 likes · 18 min read
Understanding Redis: Architecture, Deployment Options, Persistence Models, and High Availability
Tencent Cloud Developer
Tencent Cloud Developer
Jan 5, 2023 · Cloud Native

QQ Music High-Availability Architecture Overview

QQ Music achieves high availability by layering redundant multi‑datacenter architecture, proactive chaos‑engineering toolchains, and comprehensive observability—including metrics, logging, tracing and profiling—while employing service grading, adaptive retry windows and EMA‑based dynamic timeouts to gracefully handle faults across its massive micro‑service ecosystem.

Distributed SystemsMicroservicesObservability
0 likes · 24 min read
QQ Music High-Availability Architecture Overview
Java High-Performance Architecture
Java High-Performance Architecture
Jan 2, 2023 · Backend Development

How to Build a High‑Availability Payment System with Smart Routing

This article explains how a fintech payment platform achieves high availability and optimal channel selection by using decision‑tree routing, sliding‑window negative‑feedback, pressure‑detection services, and component fallback strategies such as RabbitMQ with Redis, supporting millions of daily transactions.

Backend ArchitectureOperationsRouting Algorithm
0 likes · 13 min read
How to Build a High‑Availability Payment System with Smart Routing
Efficient Ops
Efficient Ops
Dec 29, 2022 · Operations

How eBay Scales Its Event Platform with ClickHouse and Kubernetes

This article details eBay's event platform architecture, explaining why a dedicated event system is needed, how ClickHouse provides high‑performance storage, the use of Kubernetes CRDs for cross‑region high availability, data routing, read/write separation, and query optimizations with LogQL.

ClickHouseEvent PlatformKubernetes
0 likes · 18 min read
How eBay Scales Its Event Platform with ClickHouse and Kubernetes
ITPUB
ITPUB
Dec 21, 2022 · Big Data

How Bilibili Optimized Flink Runtime for Massive Real‑Time Jobs

This article details Bilibili's extensive enhancements to the Flink runtime—including checkpoint recoverability, max‑parallelism calculations, State Processor API extensions, Full and Regional Checkpoints, hybrid HA, task‑level recovery, load‑balanced partitioners, and large‑scale cluster maintenance—to improve reliability and performance of its billion‑scale streaming workloads.

Big DataCheckpointFlink
0 likes · 33 min read
How Bilibili Optimized Flink Runtime for Massive Real‑Time Jobs
Architecture Digest
Architecture Digest
Dec 21, 2022 · Operations

Designing High‑Availability Systems: Principles and Practices Across Six Layers

This article systematically explores high‑availability system design from development standards, capacity planning, application services, storage, product strategies, operations deployment, to incident response, presenting key concepts, architectural patterns, and practical guidelines for building resilient services.

DeploymentOperationsSystem Design
0 likes · 27 min read
Designing High‑Availability Systems: Principles and Practices Across Six Layers
Inke Technology
Inke Technology
Dec 19, 2022 · Backend Development

How to Build a Highly Available, Stable, and Observable SMS Service

This article explains how to design a high‑availability SMS system by identifying stability bottlenecks, defining reliability goals, implementing failover strategies for Redis, MySQL and external services, establishing a comprehensive observability framework, and measuring key quality metrics to ensure 99.99% uptime.

BackendMetricsObservability
0 likes · 11 min read
How to Build a Highly Available, Stable, and Observable SMS Service
vivo Internet Technology
vivo Internet Technology
Dec 14, 2022 · Cloud Native

Vivo’s Cloud‑Native Container Practices: High‑Availability, Automation, and Platform Evolution

Vivo’s cloud‑native journey, detailed from its 2018 machine‑learning pilot to a large‑scale container ecosystem, showcases how high‑availability design, automated multi‑cluster operations, CI/CD pipelines, and unified traffic ingress have dramatically improved efficiency, reduced costs, and enabled rapid, scalable AI‑driven services across the business.

ContainerKubernetesautomation
0 likes · 19 min read
Vivo’s Cloud‑Native Container Practices: High‑Availability, Automation, and Platform Evolution
Efficient Ops
Efficient Ops
Dec 12, 2022 · Operations

How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management

This article chronicles Bilibili's five‑year evolution of Site Reliability Engineering, detailing the introduction of SRE culture, the construction of high‑availability and multi‑active architectures, capacity management with Kubernetes, VPA/HPA, incident case studies, and the ongoing transformation of SRE practices across the organization.

KubernetesOperationsSRE
0 likes · 24 min read
How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management
Java High-Performance Architecture
Java High-Performance Architecture
Dec 5, 2022 · Backend Development

How to Build a High‑Performance, Highly Available Membership System with ES, Redis, and MySQL

This article explains how a large‑scale membership system achieves high performance and high availability by using a dual‑center Elasticsearch cluster, traffic‑isolated three‑cluster architecture, Redis caching with dual‑center clusters, and a MySQL partitioned dual‑center setup, while also detailing optimization, migration, and fine‑grained flow‑control strategies.

Backend ArchitectureElasticsearchMySQL
0 likes · 21 min read
How to Build a High‑Performance, Highly Available Membership System with ES, Redis, and MySQL
DataFunSummit
DataFunSummit
Dec 4, 2022 · Big Data

Star River Data Scheduling Platform: Architecture, Evolution, and Intelligent Operations at 58.com

This article details the design, evolution, and core capabilities of 58.com's self‑developed Star River data scheduling platform, covering its positioning, architectural challenges, high‑availability master design, intelligent monitoring, baseline management, and future roadmap for big‑data operations.

Intelligent OperationsResource Isolationhigh availability
0 likes · 15 min read
Star River Data Scheduling Platform: Architecture, Evolution, and Intelligent Operations at 58.com
High Availability Architecture
High Availability Architecture
Dec 2, 2022 · Operations

High‑Availability Design and Implementation of the BIGO Backbone Network

This article explains how BIGO’s backbone network achieves high availability through a three‑layer design—control‑plane HA using ETCD‑based Raft leader election, data‑plane HA with MPLS SR‑Policy and intermediate Route‑Reflection layers, and business‑level HA that combines traffic, optimization, and fault scheduling to ensure seamless service continuity.

MPLSSDNSR-Policy
0 likes · 19 min read
High‑Availability Design and Implementation of the BIGO Backbone Network
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 1, 2022 · Databases

Understanding Redis Cluster Architecture: High Availability, Data Partitioning, and Proxy Strategies

This article explains the fundamental concepts of Redis cluster architecture, covering high‑availability with Sentinel, data partitioning methods, proxy‑based sharding techniques, the mechanics of Redis Cluster without a central node, and practical considerations for multi‑key operations in a distributed environment.

ClusterData PartitioningProxy
0 likes · 9 min read
Understanding Redis Cluster Architecture: High Availability, Data Partitioning, and Proxy Strategies
Bilibili Tech
Bilibili Tech
Nov 29, 2022 · Big Data

How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations

This article details Bilibili's extensive enhancements to Flink's runtime—including checkpoint recoverability, operator ID stability, state processor extensions, hybrid high‑availability, regional checkpointing, and load‑based channel selection—to improve scalability, reliability, and operational efficiency of large‑scale streaming jobs.

Big DataCheckpointFlink
0 likes · 32 min read
How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 28, 2022 · Cloud Computing

How Baidu’s ARIES Powers Exabyte-Scale Cloud Storage for Baidu Netdisk

This article presents a comprehensive overview of Baidu’s ARIES storage platform, detailing its design philosophy, architecture, key concepts, and engineering challenges, and explains how it underpins Baidu Netdisk’s massive data‑plane storage with high availability, cost‑performance trade‑offs, and robust monitoring.

Distributed SystemsResource Managementcloud storage
0 likes · 36 min read
How Baidu’s ARIES Powers Exabyte-Scale Cloud Storage for Baidu Netdisk
HelloTech
HelloTech
Nov 22, 2022 · Operations

Guidelines for Incident Postmortem and Fault Review

The incident postmortem guideline advocates a dialectical view of failures, rapid low‑severity recovery, and a structured process—covering background, impact scope, timeline replay, deep root‑cause analysis, SMART improvement actions, responsibility assignment, and PDCA‑validated closure—to enhance system resilience, team anti‑fragility, and knowledge sharing.

MTBFMTTROperations
0 likes · 15 min read
Guidelines for Incident Postmortem and Fault Review
Programmer DD
Programmer DD
Nov 21, 2022 · Databases

How to Achieve Seamless Horizontal Scaling with 2N Expansion and Keepalived

This guide explains how to scale a sharded database horizontally by adding new nodes, handling data migration with stop‑service, stop‑write, log‑based, dual‑write, and smooth 2N strategies, and implementing high‑availability using MariaDB double‑master replication and Keepalived, complete with configuration examples and code snippets.

Data MigrationMariaDBhigh availability
0 likes · 35 min read
How to Achieve Seamless Horizontal Scaling with 2N Expansion and Keepalived
Bilibili Tech
Bilibili Tech
Nov 15, 2022 · Operations

Technical Assurance for Bilibili S12 Live Streaming Event: Architecture, Resource Management, and High Availability

To ensure “tea‑time” reliability for Bilibili’s 2022 S12 League of Legends championship, a cross‑functional technical‑assurance project introduced shared resource pools, CPUSET removal, multi‑instance HA architecture, adaptive throttling, chaos‑engineered fault injection, a new Golang gateway, extensive load testing, and coordinated on‑site duty, delivering uninterrupted live streaming without forced throttling.

SREchaos engineeringhigh availability
0 likes · 20 min read
Technical Assurance for Bilibili S12 Live Streaming Event: Architecture, Resource Management, and High Availability
21CTO
21CTO
Nov 8, 2022 · Operations

Building a Billion‑User Membership System: ES, Redis & MySQL High‑Availability

This article details how a large‑scale membership platform achieves high performance and near‑zero downtime by employing dual‑center Elasticsearch clusters, traffic‑isolated ES architectures, deep ES optimizations, Redis caching with distributed locks, and a seamless MySQL migration with partitioned, dual‑center databases.

OperationsSystem Architecturehigh availability
0 likes · 20 min read
Building a Billion‑User Membership System: ES, Redis & MySQL High‑Availability
Top Architect
Top Architect
Nov 7, 2022 · Cloud Native

Step‑by‑Step Deployment of a Highly Available Kubernetes Cluster with Nginx/Keepalived Load Balancer, Flannel CNI, IPVS, Dashboard, and Harbor Registry

This comprehensive guide walks you through installing Docker and containerd, configuring yum repositories, setting up kubeadm/kubelet/kubectl, initializing a multi‑master Kubernetes cluster, enabling Flannel CNI and IPVS, building a Nginx‑Keepalived HA load balancer, deploying the Kubernetes dashboard, configuring NFS storage with a dynamic provisioner, and installing a secure Harbor image registry for private images.

FlannelHarborKubernetes
0 likes · 44 min read
Step‑by‑Step Deployment of a Highly Available Kubernetes Cluster with Nginx/Keepalived Load Balancer, Flannel CNI, IPVS, Dashboard, and Harbor Registry
ITPUB
ITPUB
Nov 6, 2022 · Databases

Master‑Slave, Sentinel, and Cluster: Unlocking Redis High Availability

This guide explains Redis high‑availability mechanisms, covering master‑slave replication, the Sentinel monitoring and automatic failover process, and the Redis Cluster sharding architecture, including hash slots, MOVED/ASK redirection, gossip communication, and practical considerations such as data consistency, network partitions, and slot allocation.

ClusterGossipHash Slots
0 likes · 25 min read
Master‑Slave, Sentinel, and Cluster: Unlocking Redis High Availability
Top Architect
Top Architect
Nov 4, 2022 · Cloud Native

Step-by-Step Guide to Deploy a High‑Availability Kubernetes Cluster with Dashboard, Nginx/Keepalived, NFS, Harbor, and Ingress

This comprehensive tutorial walks through preparing hosts, installing Docker and containerd, setting up Kubernetes components, initializing a HA master cluster, configuring networking, deploying the Kubernetes dashboard, NFS storage, Harbor registry, and an Nginx/Keepalived load balancer, all with detailed commands and configuration files.

Cloud NativeDockerHarbor
0 likes · 41 min read
Step-by-Step Guide to Deploy a High‑Availability Kubernetes Cluster with Dashboard, Nginx/Keepalived, NFS, Harbor, and Ingress
Shopee Tech Team
Shopee Tech Team
Nov 3, 2022 · Cloud Native

Design of a Cloud-Native High-Availability Service Naming Center

The article describes ShopeePay’s cloud‑native, high‑availability Naming Center built on sRPC, which replaces etcd with an in‑memory, peer‑to‑peer service registry that uses gossip heartbeats, self‑protection, and a coordinator for eventual consistency, rapid recovery, and seamless upgrades.

ConsistencyNaming Servicecloud-native
0 likes · 18 min read
Design of a Cloud-Native High-Availability Service Naming Center
DataFunTalk
DataFunTalk
Nov 3, 2022 · Databases

Enhancing ClickHouse High Availability: Reducing Zookeeper Load, Faster Recovery, and Additional Reliability Improvements

ByteDance’s article details the high‑availability challenges of ClickHouse in large‑scale deployments—such as frequent failures, long recovery times, and operational complexity—and explains three key enhancements: a new HaMergeTree engine to lessen Zookeeper load, RocksDB‑based metadata persistence for faster restarts, and additional reliability features like HaKafka and monitoring tools.

ClickHouseDatabase EngineeringHaMergeTree
0 likes · 10 min read
Enhancing ClickHouse High Availability: Reducing Zookeeper Load, Faster Recovery, and Additional Reliability Improvements
Architecture Digest
Architecture Digest
Oct 30, 2022 · Backend Development

High‑Availability Architecture for a Large‑Scale Membership System

This article details the design and implementation of a high‑availability, high‑performance membership system that serves billions of users across multiple platforms, covering Elasticsearch dual‑center clusters, traffic‑isolated three‑cluster setups, Redis caching strategies, MySQL dual‑center partitioning, and advanced flow‑control and degradation mechanisms.

ElasticsearchPartitioningSystem Architecture
0 likes · 18 min read
High‑Availability Architecture for a Large‑Scale Membership System
Efficient Ops
Efficient Ops
Oct 25, 2022 · Cloud Native

How Guangdong Mobile Built a Resilient Container Cloud from Scratch

This article details Guangdong Mobile's end‑to‑end journey of designing, constructing, and operating a production‑grade container cloud platform, covering architecture decisions, monitoring, logging, high‑availability, scaling, network optimization, upgrade challenges, and lessons learned for cloud‑native practitioners.

Cloud NativeDevOpsKubernetes
0 likes · 26 min read
How Guangdong Mobile Built a Resilient Container Cloud from Scratch
Top Architect
Top Architect
Oct 19, 2022 · Databases

MySQL Replication, High Availability, and Sharding: Concepts and Solutions

This article explains the evolution from single‑node MySQL databases to master‑slave replication, various replication modes, high‑availability strategies, and both vertical and horizontal sharding techniques, while discussing the associated challenges such as distributed transactions, routing, and operational complexity.

Distributed TransactionsMySQLReplication
0 likes · 11 min read
MySQL Replication, High Availability, and Sharding: Concepts and Solutions
Architect's Guide
Architect's Guide
Oct 19, 2022 · Databases

Database Scaling, Replication, High Availability, and Sharding Overview

This article explains why single‑node databases cannot keep up with rapid business growth and describes MySQL replication methods, high‑availability solutions such as MHA and MGR, and the challenges and techniques of vertical and horizontal sharding for large‑scale systems.

MySQLReplicationdatabase scaling
0 likes · 11 min read
Database Scaling, Replication, High Availability, and Sharding Overview
Aikesheng Open Source Community
Aikesheng Open Source Community
Oct 18, 2022 · Databases

Why MySQL Remains Popular and Its Logical Architecture

The article explains MySQL's continued dominance in the database market, presents recent developer survey statistics, describes its three‑layer logical architecture, and introduces the new edition of the "High Performance MySQL" book as a guide for modern database professionals.

Database ArchitectureMySQLOpen-source
0 likes · 7 min read
Why MySQL Remains Popular and Its Logical Architecture
DeWu Technology
DeWu Technology
Oct 17, 2022 · Operations

High Availability: Principles and Practices for System Stability

High availability—measured in nines of uptime—requires partitioning systems, decoupling components, choosing robust technologies, deploying redundant instances with automatic failover, capacity planning, rapid scaling, traffic shaping, resource isolation, global protection, observability, and disciplined change management to achieve stable, resilient services.

Observabilitycapacity planningchange management
0 likes · 10 min read
High Availability: Principles and Practices for System Stability
Open Source Linux
Open Source Linux
Oct 11, 2022 · Cloud Native

How to Build a Fully HA Kubernetes Cluster with Nginx, Keepalived, and Harbor

This step‑by‑step guide walks you through deploying a production‑grade Kubernetes environment, covering node preparation, Docker and containerd setup, kubeadm initialization, high‑availability configuration with Nginx and Keepalived, installing the dashboard, and setting up a private Harbor registry with NFS storage, all using cloud‑native best practices.

DockerHarborKubernetes
0 likes · 41 min read
How to Build a Fully HA Kubernetes Cluster with Nginx, Keepalived, and Harbor
Programmer DD
Programmer DD
Oct 11, 2022 · Operations

How to Achieve High Availability for Stateful Backend Services?

This article explores various high‑availability strategies for stateful backend services, comparing cold backup, active/standby, same‑city active‑active, and multi‑site active‑active solutions, discussing their benefits, limitations, and practical implementation examples from large‑scale internet companies.

Active-ActiveBackend Architecturedisaster recovery
0 likes · 17 min read
How to Achieve High Availability for Stateful Backend Services?
DevOps Cloud Academy
DevOps Cloud Academy
Oct 4, 2022 · Operations

Production Considerations for Deploying Linkerd: HA, Helm Charts, Prometheus, and Multi‑Cluster

This article explains how to prepare Linkerd for production use by covering high‑availability deployment, Helm chart installation, Prometheus metric handling, external Prometheus integration, multi‑cluster communication, and additional operational best‑practices such as resource tuning and security considerations.

KubernetesLinkerdMulti‑Cluster
0 likes · 12 min read
Production Considerations for Deploying Linkerd: HA, Helm Charts, Prometheus, and Multi‑Cluster
MaGe Linux Operations
MaGe Linux Operations
Sep 30, 2022 · Operations

Mastering High Availability: From Cold Backups to Multi-Region Active-Active

This article examines high‑availability strategies for stateful backend services, comparing cold backup, hot standby, same‑city active‑active, cross‑region active‑active, and multi‑active architectures, highlighting their advantages, limitations, and practical implementation considerations such as downtime, data loss, synchronization overhead, and conflict resolution.

Active-ActiveBackend Architecturehigh availability
0 likes · 14 min read
Mastering High Availability: From Cold Backups to Multi-Region Active-Active
Bilibili Tech
Bilibili Tech
Sep 30, 2022 · Databases

Database Failure Management: Types, Mitigation Strategies, and Bilibili’s Practices

The article outlines common database and cache failures—such as instance outages, replication lag, data corruption, and cache avalanches—while detailing Bilibili’s mitigation strategies including high‑availability architectures, scaling, multi‑active designs, proxy controls, slow‑query alerts, fault‑injection drills, and ongoing resilience improvements.

BilibiliCacheMySQL
0 likes · 17 min read
Database Failure Management: Types, Mitigation Strategies, and Bilibili’s Practices
Laravel Tech Community
Laravel Tech Community
Sep 27, 2022 · Databases

Understanding Redis: Overview, Architecture, and Persistence Model

Redis is an open‑source in‑memory key‑value data‑structure server that serves as a cache, primary database, and messaging system; this article explains its core concepts, deployment options (single instance, HA, Sentinel, Cluster), and persistence mechanisms (RDB, AOF, and hybrid approaches).

In-Memory DatabasePersistenceclustering
0 likes · 18 min read
Understanding Redis: Overview, Architecture, and Persistence Model
Tencent Cloud Developer
Tencent Cloud Developer
Sep 27, 2022 · Big Data

GooseFS: Accelerating Cloud Storage for Big Data and Data Lake Platforms

GooseFS, Tencent Cloud’s Hadoop‑compatible storage accelerator, adds a local NVMe‑SSD cache layer to cloud‑native data lakes, letting users boost query speeds by up to 46 % and cut backend bandwidth by 200 Gbps without code changes, as demonstrated by a music‑industry customer’s 200‑node deployment caching ten million files.

Cost reductionData LakeGooseFS
0 likes · 16 min read
GooseFS: Accelerating Cloud Storage for Big Data and Data Lake Platforms
DeWu Technology
DeWu Technology
Sep 26, 2022 · Cloud Native

DeWu's High‑Availability Architecture Evolution

DeWu’s tech team describes how their e‑commerce platform grew from a simple PHP monolith to a containerized active‑active, multi‑region system with hot‑standby failover, comprehensive governance, full‑link stress testing, and detailed big‑sale preparation, illustrating a systematic, evolving high‑availability architecture that balances scalability, disaster recovery, and business continuity.

MicroservicesSystem Architecturedisaster recovery
0 likes · 21 min read
DeWu's High‑Availability Architecture Evolution
Architects' Tech Alliance
Architects' Tech Alliance
Sep 19, 2022 · Operations

Fundamentals of Data Replication, Backup, and Disaster Recovery

This article explains the core concepts of disaster recovery and data backup—including RTO, RPO, recovery levels, cloud disaster recovery, backup types, copy data management, deduplication, compression, and block/file/database backup—while also noting related commercial offerings.

Copy Data ManagementRPORTO
0 likes · 13 min read
Fundamentals of Data Replication, Backup, and Disaster Recovery
Senior Brother's Insights
Senior Brother's Insights
Sep 14, 2022 · Backend Development

From Single Server to Cloud‑Native: 14 Stages of Scaling a Large‑Scale Website

This article walks through the evolution of a high‑traffic e‑commerce site—from a single‑machine setup to cloud‑native microservices—detailing each architectural milestone, the problems it solves, key technologies involved, and design principles for building scalable, highly available systems.

Cloud NativeScalabilityarchitecture evolution
0 likes · 22 min read
From Single Server to Cloud‑Native: 14 Stages of Scaling a Large‑Scale Website
Huolala Tech
Huolala Tech
Sep 8, 2022 · Databases

Why Build Your Own Database Middleware in the Multi‑Cloud Era?

The article explains why, contrary to common belief, the rise of multi‑cloud environments actually demands self‑built database middleware to ensure seamless adaptation, vendor neutrality, high availability, and cost‑effective scalability for growing enterprise workloads.

Database MiddlewareOperationsScalability
0 likes · 18 min read
Why Build Your Own Database Middleware in the Multi‑Cloud Era?
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 5, 2022 · Big Data

Scaling Alibaba TCC to Millions of RPS with a High‑Availability Real‑Time Data Warehouse

This article details how Alibaba's TCC platform evolved its architecture over multiple phases—from a legacy database to a high‑availability real‑time data warehouse built on Flink and Hologres—highlighting the challenges, solutions, and cost‑saving measures that enabled millions of RPS, terabytes of storage, and sub‑second query latency.

FlinkHologresReal-Time
0 likes · 21 min read
Scaling Alibaba TCC to Millions of RPS with a High‑Availability Real‑Time Data Warehouse
IT Architects Alliance
IT Architects Alliance
Sep 4, 2022 · Databases

Mastering MySQL: From Replication to High Availability and Sharding Strategies

This article examines why single-node databases no longer meet modern internet workloads, explores MySQL replication models (master‑slave, asynchronous, semi‑synchronous, group replication), discusses high‑availability solutions such as MHA, MGR and Orchestrator, and outlines vertical and horizontal sharding techniques along with their trade‑offs.

Database ArchitectureMGRMHA
0 likes · 13 min read
Mastering MySQL: From Replication to High Availability and Sharding Strategies
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 2, 2022 · Cloud Computing

Design and Implementation of a Scalable, High‑Availability Object Storage Service (OSS) Based on SeaweedFS

This article describes the design goals, technology selection, architecture, high‑availability mechanisms, performance testing, cost optimization, and seamless migration strategy of a new object storage service built on SeaweedFS to support billions of files with low latency and high reliability.

Cloud NativeS3 CompatibilityScalability
0 likes · 16 min read
Design and Implementation of a Scalable, High‑Availability Object Storage Service (OSS) Based on SeaweedFS
Open Source Linux
Open Source Linux
Sep 1, 2022 · Operations

What’s New in Zabbix 6.0? Enhanced Monitoring, HA, AI & Cloud Features Explained

Zabbix 6.0 introduces a suite of enhancements—including high‑availability clustering, advanced business‑service monitoring with SLA calculations, root‑cause analysis, machine‑learning‑based anomaly detection, Kubernetes templates, a redesigned audit log, TLS certificate checks, UI improvements, customizable branding, and new integrations—aimed at boosting operational visibility and efficiency across cloud and on‑premise environments.

KubernetesOperationsZabbix
0 likes · 12 min read
What’s New in Zabbix 6.0? Enhanced Monitoring, HA, AI & Cloud Features Explained
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 30, 2022 · Operations

Mastering Cloud Business Stability: Proven Methods & Real-World Cases

This whitepaper presents a comprehensive methodology for ensuring cloud‑based business stability, covering conceptual frameworks, fault‑management processes, change‑control standards, and detailed industry case studies such as new‑game launches, container deployments, live‑event streaming, and high‑availability architecture design.

Case Studieschange controlcloud stability
0 likes · 2 min read
Mastering Cloud Business Stability: Proven Methods & Real-World Cases
Tencent Cloud Developer
Tencent Cloud Developer
Aug 29, 2022 · Cloud Computing

High‑Availability DNS Solutions on Tencent Cloud: BIND and CoreDNS with ETCD

The article details two high‑availability DNS implementations for Tencent Cloud—an intelligent BIND‑based server and a CoreDNS solution backed by an ETCD cluster—covering DNS fundamentals, installation steps, configuration files, zone creation, health checks, and verification of internal and external name resolution across multi‑AZ deployments.

BINDCoreDNSDNS
0 likes · 24 min read
High‑Availability DNS Solutions on Tencent Cloud: BIND and CoreDNS with ETCD
Programmer DD
Programmer DD
Aug 13, 2022 · Operations

Master Nginx Rewrite, Anti-Hotlinking, and Keepalived HA

This guide explains Nginx rewrite syntax and flags, provides practical rewrite examples, demonstrates how to configure anti‑hotlinking, outlines static‑dynamic resource separation with caching, and shows step‑by‑step installation and configuration of Keepalived for high‑availability Nginx clusters, including required scripts and host settings.

anti-hotlinkinghigh availabilitykeepalived
0 likes · 27 min read
Master Nginx Rewrite, Anti-Hotlinking, and Keepalived HA
Wukong Talks Architecture
Wukong Talks Architecture
Aug 10, 2022 · Backend Development

Designing API Gateways: Comparison, Key Considerations, and Best Practices

This article explains how to design API gateways by comparing common gateway solutions, outlining essential design aspects such as routing, service discovery, load balancing, resilience, security, and high performance, availability, and scalability, and summarizing the strengths of OpenResty, Kong, Zuul, and Spring Cloud Gateway.

KongSpring Cloud Gatewayapi-gateway
0 likes · 24 min read
Designing API Gateways: Comparison, Key Considerations, and Best Practices
Architecture Digest
Architecture Digest
Jul 28, 2022 · Backend Development

Design and Smooth Migration of a High‑Availability Message Middleware Platform from RabbitMQ to RocketMQ

This article details the challenges of scaling RabbitMQ, the evaluation of RocketMQ versus Pulsar, the architectural design of a new high‑availability message middleware platform, and the step‑by‑step smooth migration strategy that enables higher throughput, richer features, and lower operational costs.

RabbitMQRocketMQhigh availability
0 likes · 12 min read
Design and Smooth Migration of a High‑Availability Message Middleware Platform from RabbitMQ to RocketMQ
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Jul 26, 2022 · Backend Development

How to Assemble a Production‑Ready Service Registry from Scratch

This article walks through the complete design of a service registry—from requirement analysis and interface definition to push mechanisms, health‑check strategies, long‑connection technology choices, data storage options, and high‑availability considerations—providing a practical blueprint for building a production‑grade registry.

gRPChigh availabilityservice discovery
0 likes · 15 min read
How to Assemble a Production‑Ready Service Registry from Scratch
Senior Brother's Insights
Senior Brother's Insights
Jul 24, 2022 · Databases

Understanding Redis High Availability: Master‑Slave, Sentinel, and Cluster Explained

This article explains why single‑node Redis suffers from single‑point failures, describes the master‑slave replication model, details the Sentinel automatic failover mechanism, compares various sharding solutions like client‑side sharding, Twemproxy, Codis, and outlines the features and deployment considerations of Redis Cluster.

Clusterdatabasehigh availability
0 likes · 19 min read
Understanding Redis High Availability: Master‑Slave, Sentinel, and Cluster Explained
IT Services Circle
IT Services Circle
Jul 18, 2022 · Databases

Migrating MySQL Dual-Master High Availability to Master‑Slave Architecture: Lessons Learned and Simple Conversion Steps

After a month of testing a MySQL dual‑master high‑availability setup, the author details the numerous pitfalls encountered—including primary key collisions, sync failures, and data inconsistencies—and explains why they switched to a simpler master‑slave configuration, providing step‑by‑step instructions for the conversion.

Database ReplicationMaster‑SlaveMySQL
0 likes · 8 min read
Migrating MySQL Dual-Master High Availability to Master‑Slave Architecture: Lessons Learned and Simple Conversion Steps
MaGe Linux Operations
MaGe Linux Operations
Jul 17, 2022 · Databases

Understanding Redis High Availability: Master‑Slave, Sentinel, and Cluster Explained

This article explains how Redis tackles single‑point failures with master‑slave replication, introduces Sentinel for automatic failover, compares client‑side and proxy sharding solutions like Twemproxy and Codis, and details the native Redis Cluster architecture for true distributed storage and high availability.

Clusterhigh availabilityredis
0 likes · 18 min read
Understanding Redis High Availability: Master‑Slave, Sentinel, and Cluster Explained
Top Architect
Top Architect
Jul 15, 2022 · Backend Development

Design and Evolution of Meituan's Real-Time Logistics Distributed System

This article details Meituan's instant logistics platform architecture, covering its background, distributed system design, high‑availability deployment, AI‑driven optimization, and future challenges, while sharing practical solutions for scalability, fault tolerance, and operational efficiency in a high‑concurrency environment.

Distributed SystemsMeituanMicroservices
0 likes · 9 min read
Design and Evolution of Meituan's Real-Time Logistics Distributed System
High Availability Architecture
High Availability Architecture
Jul 12, 2022 · Operations

Postmortem of the July 13, 2021 Bilibili SLB Outage: Timeline, Root Cause, and Improvement Measures

This article details the July 13, 2021 Bilibili service outage caused by a Lua‑based SLB CPU spike, describing the incident timeline, root‑cause analysis of a weight‑zero bug, mitigation steps including new SLB deployment, and the subsequent operational and architectural improvements.

Load BalancerLuaRoot Cause Analysis
0 likes · 17 min read
Postmortem of the July 13, 2021 Bilibili SLB Outage: Timeline, Root Cause, and Improvement Measures
Bilibili Tech
Bilibili Tech
Jul 12, 2022 · Operations

Bilibili SLB Outage Postmortem (July 13, 2021): Timeline, Root Cause, and Improvements

On July 13 2021 Bilibili’s L7 SLB crashed when a recent Lua deployment set a balancer weight to the string “0”, producing a NaN value that triggered an infinite loop and 100 % CPU, prompting emergency restarts, a fresh cluster rollout, and long‑term safeguards such as automated provisioning, stricter Lua validation, and enhanced multi‑active disaster‑recovery processes.

Load BalancerRoot Cause AnalysisSLB
0 likes · 17 min read
Bilibili SLB Outage Postmortem (July 13, 2021): Timeline, Root Cause, and Improvements
Architect
Architect
Jul 11, 2022 · Databases

Understanding Redis: Features, Use Cases, and Architectural Evolution

Redis is an open‑source, in‑memory data store that supports various data types, persistence, replication, Sentinel, clustering, Lua scripting, pipelines, and distributed locks, and the article walks through its evolution from simple caching to a high‑availability, horizontally scalable database solution.

ClusterIn-Memory Databasecaching
0 likes · 12 min read
Understanding Redis: Features, Use Cases, and Architectural Evolution
Efficient Ops
Efficient Ops
Jul 6, 2022 · Databases

How DataBus Enables Real-Time, Scalable Database Synchronization for Oracle Migration

DataBus is a real‑time data synchronization framework designed to support Oracle de‑commissioning, micro‑service migration, and heterogeneous storage engines by providing high‑availability CDC, flexible data pipelines, and seamless full‑to‑incremental migration across multiple source and target databases.

CDCdata synchronizationdatabase migration
0 likes · 19 min read
How DataBus Enables Real-Time, Scalable Database Synchronization for Oracle Migration
ITPUB
ITPUB
Jul 3, 2022 · Operations

How Keepalived Achieves High‑Availability Load Balancing with LVS and VRRP

This article explains Keepalived’s high‑availability architecture, detailing its integration with LVS/IPVS, VRRP‑based master election, configuration parameters like state, priority, nopreempt, and weight, and how traffic is forwarded and balanced across real servers using various scheduling algorithms.

IPVSLVSVRRP
0 likes · 15 min read
How Keepalived Achieves High‑Availability Load Balancing with LVS and VRRP
Sanyou's Java Diary
Sanyou's Java Diary
Jun 30, 2022 · Databases

Mastering Redis High Availability: Sharding, Consistent Hashing, and Sentinel Explained

This article explains Redis high‑availability strategies, covering basic hash sharding, the advantages of consistent hashing, client‑side versus proxy‑based partitioning, master‑slave replication, and the Sentinel failover mechanism, with diagrams illustrating node addition, removal, and failover decision processes.

consistent hashingdatabasehigh availability
0 likes · 10 min read
Mastering Redis High Availability: Sharding, Consistent Hashing, and Sentinel Explained
Architecture Digest
Architecture Digest
Jun 30, 2022 · Information Security

Design and Implementation of a Jump Server Using Linux PAM for Secure Access

This article presents a jump server solution that leverages Linux PAM to intercept authentication, outlines its micro‑service architecture, describes login, command, and privilege flows for Linux, Windows, MySQL, Redis and network devices, and discusses permission rules, high‑availability design, and security advantages.

Jump ServerMicroserviceshigh availability
0 likes · 15 min read
Design and Implementation of a Jump Server Using Linux PAM for Secure Access
IT Architects Alliance
IT Architects Alliance
Jun 27, 2022 · Databases

Mastering Cluster Terminology and Database Cluster Architectures

This article explains core cluster concepts, the benefits of building database clusters, classifies common cluster types, and compares scalable architectures such as Oracle RAC, MySQL Cluster, sharding, CAP/BAS​E theory, and cross‑database transaction strategies for high availability and performance.

CAP theoremDatabase ClusterMySQL Cluster
0 likes · 22 min read
Mastering Cluster Terminology and Database Cluster Architectures
Architecture Digest
Architecture Digest
Jun 27, 2022 · Backend Development

Design and Evolution of Baidu Comment Platform: Architecture, Performance Optimization, and Stability

This article details the architecture, design principles, performance enhancements, and reliability strategies of Baidu's comment middle‑platform, which supports billions of daily requests across dozens of products while ensuring high availability, low latency, and continuous iterative development.

Backend ArchitectureComment Systemdistributed services
0 likes · 17 min read
Design and Evolution of Baidu Comment Platform: Architecture, Performance Optimization, and Stability
Architect's Guide
Architect's Guide
Jun 26, 2022 · Backend Development

Building a Million‑Message‑Per‑Second RabbitMQ Service: Architecture, Scaling, and High Availability

This article explains how to design and operate a RabbitMQ cluster capable of handling millions of messages per second by describing RabbitMQ fundamentals, Google‑scale deployment, sharding and consistent‑hash plugins, high‑availability mirroring, federation, and integration with Spring AMQP, while also covering practical deployment scenarios and performance trade‑offs.

FederationMessage QueueRabbitMQ
0 likes · 23 min read
Building a Million‑Message‑Per‑Second RabbitMQ Service: Architecture, Scaling, and High Availability
Architecture Digest
Architecture Digest
Jun 23, 2022 · Backend Development

Design and Implementation of Baidu App’s Personal Wallet: Architecture, Data Synchronization, Caching, and High‑Availability Strategies

This article presents a comprehensive case study of Baidu App’s personal wallet, detailing its background, business goals, system architecture, data‑synchronization mechanisms, multi‑level caching, read‑write separation, consistency guarantees, configuration management, and database sharding to achieve high availability and scalable performance.

BackendSystem Architecturecaching
0 likes · 18 min read
Design and Implementation of Baidu App’s Personal Wallet: Architecture, Data Synchronization, Caching, and High‑Availability Strategies
HaoDF Tech Team
HaoDF Tech Team
Jun 21, 2022 · Operations

Evolution and High‑Availability Construction of the Haodafu Offline Message Push System

This article describes how the Haodafu offline push service grew from a simple PHP notification tool into a robust, highly‑available micro‑service platform by redesigning architecture, adopting vendor push channels, adding message‑queue reliability, implementing comprehensive monitoring, observability, and a fault‑diagnosis platform to ensure delivery rates and operational stability.

Mobile BackendObservabilitySRE
0 likes · 21 min read
Evolution and High‑Availability Construction of the Haodafu Offline Message Push System
IT Services Circle
IT Services Circle
Jun 21, 2022 · Databases

MySQL High‑Availability Incident Review and Resolution in a Dual‑Master Setup with Keepalived

This article recounts a MySQL high‑availability incident in a dual‑master environment, explains how missing binary‑log index files caused replication failures, and details step‑by‑step troubleshooting, directory recreation, binlog position correction, and configuration improvements to restore reliable database operation.

MySQLReplicationdatabases
0 likes · 8 min read
MySQL High‑Availability Incident Review and Resolution in a Dual‑Master Setup with Keepalived
Baidu Geek Talk
Baidu Geek Talk
Jun 20, 2022 · Backend Development

How Baidu’s “My Wallet” Unified User Assets with Scalable Backend Architecture

This article details the design and implementation of Baidu App’s “My Wallet” feature, covering its background, business goals, system architecture, data synchronization, multi‑level caching, read‑write separation, data consistency, configurability, and database sharding to achieve high availability and performance for billions of users.

BackendRead-Write SeparationScalability
0 likes · 18 min read
How Baidu’s “My Wallet” Unified User Assets with Scalable Backend Architecture
Aikesheng Open Source Community
Aikesheng Open Source Community
Jun 17, 2022 · Databases

MySQL Database High‑Availability Architecture Exploration – Highlights from the 2022 Gdevops Global Agile Operations Summit

The 2022 Gdevops Global Agile Operations Summit in Guangzhou featured a technical session by Zheng Zengquan of Aikesheng on MySQL high‑availability architecture, covering data consistency, production‑ready solutions, and a comparison of MGR versus traditional master‑slave setups, with the full PPT available for download.

Database ArchitectureGdevops SummitMySQL
0 likes · 3 min read
MySQL Database High‑Availability Architecture Exploration – Highlights from the 2022 Gdevops Global Agile Operations Summit
Sanyou's Java Diary
Sanyou's Java Diary
Jun 15, 2022 · Databases

How to Build MySQL Master‑Master HA with Keepalived and Docker

This tutorial walks through setting up a highly available MySQL master‑master cluster using Docker containers, configuring MySQL replication, and employing Keepalived for automatic failover and virtual IP management, complete with step‑by‑step commands, configuration files, and troubleshooting tips.

DockerLinuxMaster-Master Replication
0 likes · 23 min read
How to Build MySQL Master‑Master HA with Keepalived and Docker
vivo Internet Technology
vivo Internet Technology
Jun 15, 2022 · Cloud Native

Vivo Container Cluster Monitoring Architecture and Cloud‑Native Observability Practices

Vivo’s cloud‑native monitoring solution combines high‑availability Prometheus clusters, VictoriaMetrics storage, Grafana visualization, and a custom leader‑election adapter to deduplicate data while forwarding metrics to Kafka and OLAP systems, addressing large‑scale performance, scalability, and integration challenges and paving the way for AI‑driven AIOps.

Cloud Native MonitoringKubernetesObservability
0 likes · 18 min read
Vivo Container Cluster Monitoring Architecture and Cloud‑Native Observability Practices
DataFunSummit
DataFunSummit
Jun 2, 2022 · Databases

An In‑Depth Overview of Apache BookKeeper: Architecture, Features, and Use Cases

This article provides a comprehensive technical overview of Apache BookKeeper, covering its role as a distributed append‑only log service, core concepts, high‑availability mechanisms, storage‑media evolution, comparisons with Raft, and community resources, while illustrating its use in Pulsar and large‑scale data platforms.

Apache BookKeeperData InfrastructureDistributed Log
0 likes · 12 min read
An In‑Depth Overview of Apache BookKeeper: Architecture, Features, and Use Cases
DataFunTalk
DataFunTalk
May 30, 2022 · Big Data

ByteGraph: ByteDance’s Self‑Developed Graph Database – Architecture, Data Model, Query Language, and Operational Challenges

This article introduces ByteDance’s self‑developed graph database ByteGraph, covering its fundamentals, use‑case scenarios, data model and Gremlin query language, architecture and implementation details, and key challenges such as indexing, hot‑spot handling, resource allocation, high availability, and offline‑online data fusion.

ByteGraphGraph DatabaseGremlin
0 likes · 14 min read
ByteGraph: ByteDance’s Self‑Developed Graph Database – Architecture, Data Model, Query Language, and Operational Challenges
IT Architects Alliance
IT Architects Alliance
May 28, 2022 · Operations

Why Circuit Breaking and Degradation Are Essential for High‑Availability Microservices

The article explains how microservice architectures can suffer from cascading failures, why circuit breaking and degradation are critical for protecting service availability, compares popular libraries such as Sentinel, Hystrix and Resilience4j, and dives deep into Sentinel's degradation implementation, rule definition, data collection, verification, and execution flow.

Circuit BreakingMicroservicesResilience
0 likes · 12 min read
Why Circuit Breaking and Degradation Are Essential for High‑Availability Microservices
Architecture Digest
Architecture Digest
May 19, 2022 · Operations

Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring

The article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, appropriate load‑balancing algorithms, monitoring, and automated recovery, and also discusses high‑concurrency identification, CDN/OSS usage, and practical recommendations for cloud‑native environments.

Vertical Scalinghigh availabilityhorizontal scaling
0 likes · 11 min read
Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring
IT Architects Alliance
IT Architects Alliance
May 15, 2022 · R&D Management

Mastering Technical Architecture: Strategic & Tactical Design Principles

This article explains how technical architecture transforms product requirements into concrete systems, tackles uncertainty in technology choices, and presents strategic principles—fit, simplicity, evolution—alongside tactical guidelines for high concurrency, high availability, and business design, supported by logical and physical diagram examples.

System DesignTechnical architecturedesign principles
0 likes · 15 min read
Mastering Technical Architecture: Strategic & Tactical Design Principles
Xiaolei Talks DB
Xiaolei Talks DB
May 15, 2022 · Databases

How TiDB Achieves Multi-Active High Availability Across Multiple Data Centers

This article explains TiDB's multi‑active high‑availability architectures—including same‑city dual‑center, triple‑center, and two‑region three‑center deployments—detailing hard requirements, RPO/RTO goals, placement‑rule configurations, and practical disaster‑recovery recommendations for distributed database clusters, and how adaptive sync modes affect failover performance.

Placement RuleTiDBhigh availability
0 likes · 12 min read
How TiDB Achieves Multi-Active High Availability Across Multiple Data Centers
Top Architect
Top Architect
May 14, 2022 · Backend Development

Strategic and Tactical Design Principles for Technical Architecture

This article explains how to design robust technical architectures by addressing strategic principles such as suitability, simplicity, and evolution, and tactical guidelines covering high concurrency, high availability, and business design, while illustrating logical and physical architecture diagrams for real‑world systems.

Software ArchitectureSystem Designdesign principles
0 likes · 14 min read
Strategic and Tactical Design Principles for Technical Architecture
JavaEdge
JavaEdge
May 14, 2022 · Backend Development

Unveiling Kafka’s Triple‑High Architecture: Availability, Performance, and Concurrency

This article breaks down Kafka’s high‑availability, high‑performance, and high‑concurrency design, covering controller and leader election, replica and ISR mechanisms, ACK settings, the Reactor NIO model, zero‑copy I/O, compression, producer batching, memory‑pooling, and the multi‑layer network threading architecture.

KafkaNetwork ConcurrencyProducer Batching
0 likes · 21 min read
Unveiling Kafka’s Triple‑High Architecture: Availability, Performance, and Concurrency