Tagged articles
1414 articles
Page 7 of 15
IT Architects Alliance
IT Architects Alliance
Nov 5, 2021 · Operations

Introduction to Linux Virtual Server (LVS): Architecture and Features

This article provides a comprehensive overview of Linux Virtual Server (LVS), covering its basic concepts, three‑tier architecture, load‑balancing techniques, scheduling algorithms, high availability, reliability, and suitable deployment environments for building high‑performance, scalable server clusters.

LVSServer Clusteringhigh availability
0 likes · 14 min read
Introduction to Linux Virtual Server (LVS): Architecture and Features
Tencent Qidian Tech Team
Tencent Qidian Tech Team
Nov 1, 2021 · Backend Development

How to Build a Scalable Distributed Timer with Redis and Time Wheel

This article explains the design of a distributed timer service using a time‑wheel data structure stored in Redis, covering application scenarios, required features, architecture components such as access layer, scheduler, worker, and management center, and detailing reliability and performance techniques.

KafkaTime Wheeldistributed timer
0 likes · 13 min read
How to Build a Scalable Distributed Timer with Redis and Time Wheel
IT Architects Alliance
IT Architects Alliance
Oct 31, 2021 · Operations

How to Build a Highly Available Redis Service with Sentinel – A Practical Guide

This article explains why Redis needs high availability, defines common failure scenarios, compares several HA architectures—including single‑instance, master‑slave with one or multiple Sentinel processes, and VIP‑based solutions—and provides step‑by‑step guidance for deploying a robust Redis Sentinel cluster.

BackendOperationsarchitecture
0 likes · 13 min read
How to Build a Highly Available Redis Service with Sentinel – A Practical Guide
Java High-Performance Architecture
Java High-Performance Architecture
Oct 31, 2021 · Backend Development

Technical Architecture Mastery: Strategic & Tactical Design Principles

This article explores how to transform product requirements into robust technical architectures by outlining strategic principles—appropriateness, simplicity, evolution—and tactical guidelines such as high concurrency, high availability, and business design, while addressing uncertainty, component complexity, and practical implementation with Java‑centric examples.

Scalabilityarchitecturedesign principles
0 likes · 13 min read
Technical Architecture Mastery: Strategic & Tactical Design Principles
Top Architect
Top Architect
Oct 27, 2021 · Backend Development

Technical Architecture Design Principles: Strategy, Tactics, and Practical Guidelines

This article explains how to design robust technical architectures by applying strategic principles of suitability, simplicity, and evolution, and tactical principles covering high concurrency, high availability, and business design, while illustrating logical and physical architecture diagrams and offering practical implementation advice.

Software ArchitectureSystem Designdesign principles
0 likes · 15 min read
Technical Architecture Design Principles: Strategy, Tactics, and Practical Guidelines
Qingyun Technology Community
Qingyun Technology Community
Oct 26, 2021 · Cloud Native

What Is Cloud‑Native Storage and Why It Matters for Modern Applications

Cloud‑native storage is a set of storage technologies designed for cloud‑native environments, offering high availability, scalability, performance, consistency, durability, and dynamic deployment options across public, private, and on‑premises solutions, and addressing the unique challenges of stateful applications running on Kubernetes and similar platforms.

Cloud NativeScalabilityhigh availability
0 likes · 10 min read
What Is Cloud‑Native Storage and Why It Matters for Modern Applications
DataFunSummit
DataFunSummit
Oct 21, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing two coordinator high‑availability solutions, and explaining the cross‑cluster scheduling architecture that leverages idle Presto resources to improve overall big‑data processing efficiency.

Big DataCross-Cluster SchedulingPresto
0 likes · 16 min read
Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling
dbaplus Community
dbaplus Community
Oct 20, 2021 · Big Data

How JD Achieves ClickHouse High‑Availability for Billion‑Scale OLAP

JD's OLAP platform runs on ClickHouse and Doris across 3,000 servers, handling billions of daily queries and petabytes of data, and this article details the selection criteria, cluster deployment models, high‑availability architecture, operational challenges, and future roadmap.

Big DataCluster DeploymentDistributed Systems
0 likes · 21 min read
How JD Achieves ClickHouse High‑Availability for Billion‑Scale OLAP
Baidu Geek Talk
Baidu Geek Talk
Oct 20, 2021 · Operations

Practical Strategies for Building High‑Availability Systems

This article presents a comprehensive, step‑by‑step guide on improving system reliability through early fault detection, scope reduction, frequency reduction, and rapid incident handling, using real‑world practices from Baidu's commercial hosting platform.

Log StandardizationOperationscapacity planning
0 likes · 20 min read
Practical Strategies for Building High‑Availability Systems
MaGe Linux Operations
MaGe Linux Operations
Oct 16, 2021 · Operations

Why Does One Kafka Broker Failure Halt All Consumers? HA & Replication Explained

The article examines Kafka’s high‑availability mechanisms, detailing its multi‑replica design, ISR synchronization, leader election, and the critical role of the __consumer_offset topic, and explains why a single broker outage can render the entire cluster unusable unless replication factors are properly configured.

Consumer OffsetDistributed SystemsKafka
0 likes · 10 min read
Why Does One Kafka Broker Failure Halt All Consumers? HA & Replication Explained
Open Source Linux
Open Source Linux
Oct 14, 2021 · Operations

How to Build High‑Availability Load Balancing with Keepalived & HAProxy

This guide explains how to combine the open‑source tools Keepalived and HAProxy to create a highly available software load‑balancing solution, covering the underlying concepts, installation steps, configuration files, health‑check scripts, session persistence, SSL offloading, and traffic routing techniques.

HAProxyLinuxNetworking
0 likes · 28 min read
How to Build High‑Availability Load Balancing with Keepalived & HAProxy
Xianyu Technology
Xianyu Technology
Oct 14, 2021 · Cloud Computing

Multi-Region High Availability Architecture for Xianyu Recommendation Service

The Xianyu recommendation service was re‑architected into an active‑active, multi‑region high‑availability system—using a unified access‑layer router, centralizing long‑tail dependencies, keeping data unsharded, refactoring caches and MySQL replication, and adhering to traffic‑closed‑loop and availability‑first principles—to overcome latency, improve scalability, and ensure low‑cost disaster recovery across two regions and three data centers.

Database ReplicationSystem Architecturehigh availability
0 likes · 15 min read
Multi-Region High Availability Architecture for Xianyu Recommendation Service
IT Architects Alliance
IT Architects Alliance
Oct 12, 2021 · Backend Development

Technical Summary of Large-Scale Distributed Website Architecture

This article provides a comprehensive overview of large‑scale distributed website architecture, covering its characteristics, design goals, architectural patterns, performance, high‑availability, scalability, extensibility, security, agility, evolution stages, and practical implementation techniques such as caching, load balancing, database sharding, service‑orientation and message queues.

Distributed SystemsScalabilitycaching
0 likes · 23 min read
Technical Summary of Large-Scale Distributed Website Architecture
ByteDance ADFE Team
ByteDance ADFE Team
Oct 12, 2021 · Fundamentals

Designing for Failure: Principles, Organizational Practices, and Technical Solutions

This article examines why failure is inevitable in software systems, proposes a mindset of failure‑oriented design, outlines organizational roles and processes to mitigate incidents, and presents concrete technical techniques such as distributed locking and traffic shaping to build resilient, high‑availability services.

Distributed Systemsfailure designhigh availability
0 likes · 25 min read
Designing for Failure: Principles, Organizational Practices, and Technical Solutions
MaGe Linux Operations
MaGe Linux Operations
Oct 9, 2021 · Operations

How to Build High‑Availability Load Balancing with HAProxy and Keepalived

This guide explains how to configure HAProxy for high‑performance TCP/HTTP load balancing and combine it with Keepalived to achieve high‑availability using VRRP, covering installation, core features, health checks, session persistence, SSL offloading, routing rules, and practical configuration examples.

HAProxyhigh availabilitykeepalived
0 likes · 27 min read
How to Build High‑Availability Load Balancing with HAProxy and Keepalived
Efficient Ops
Efficient Ops
Oct 8, 2021 · Operations

Why a Single Kafka Broker Failure Can Halt the Entire Cluster

This article explains Kafka's high‑availability architecture, covering multi‑replica redundancy, ISR synchronization, producer ACK settings, and the critical role of the __consumer_offset topic, and shows how to configure replication factors to prevent a single‑node outage from stopping consumption.

Consumer OffsetKafkaReplication
0 likes · 11 min read
Why a Single Kafka Broker Failure Can Halt the Entire Cluster
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 30, 2021 · Operations

High‑Availability Architecture Design for the Integrated Membership System of Tongcheng and eLong

This article details the design and implementation of a high‑performance, highly available membership system for the merged Tongcheng‑eLong platform, covering Elasticsearch dual‑center clusters, traffic‑isolated three‑cluster architecture, deep ES optimizations, Redis caching and dual‑center clusters, MySQL dual‑center partitioning, migration strategies, and future fine‑grained flow‑control and degradation measures.

ElasticsearchSystem Architecturehigh availability
0 likes · 21 min read
High‑Availability Architecture Design for the Integrated Membership System of Tongcheng and eLong
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Sep 29, 2021 · Databases

Redis Interview Questions and Core Concepts: Data Types, Performance, Persistence, High Availability, and Common Use Cases

This article provides a comprehensive overview of Redis, covering its definition, basic and special data structures, performance optimizations, expiration and eviction policies, common application scenarios, persistence mechanisms, high‑availability architectures, distributed lock implementations, transaction handling, and related algorithms such as Redlock and Bloom filters.

CacheData Structuresdistributed-lock
0 likes · 45 min read
Redis Interview Questions and Core Concepts: Data Types, Performance, Persistence, High Availability, and Common Use Cases
Top Architect
Top Architect
Sep 27, 2021 · Backend Development

Best Practices for Designing, Securing and Scaling Java Backend APIs

This article explains how to design robust Java backend APIs, covering interface definition, request/response formats, error handling, token generation, digital signing, interceptor chains, rate limiting, HTTPS migration, and strategies for high concurrency and high availability such as load balancing, clustering and caching.

Spring Bootapi-designbackend-development
0 likes · 16 min read
Best Practices for Designing, Securing and Scaling Java Backend APIs
Java Interview Crash Guide
Java Interview Crash Guide
Sep 24, 2021 · Databases

Why Redis Is More Than a Cache: From Basics to Clustering

Redis, an open‑source in‑memory data store, serves as a database, cache, and message broker, offering rich data types, persistence, replication, Sentinel, and clustering; this article walks through its evolution from simple caching to high‑availability, distributed architectures, and advanced client features.

ClusterIn-Memory Databasehigh availability
0 likes · 15 min read
Why Redis Is More Than a Cache: From Basics to Clustering
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Sep 22, 2021 · Databases

Common Redis Interview Questions and Answers

This article provides a comprehensive list of typical Redis interview questions covering its features, performance, data structures, supported data types, common use cases, eviction policies, persistence methods, clustering, high‑availability mechanisms, transaction handling, and comparisons with local caches like Guava and Caffeine.

CacheData StructuresPersistence
0 likes · 15 min read
Common Redis Interview Questions and Answers
ITPUB
ITPUB
Sep 17, 2021 · Databases

How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture

This article details NetBank's evolution of database deployment—from early distributed setups to a unitized, cloud‑native architecture—covering disaster‑recovery upgrades, distributed database design, multi‑tenant strategies, containerized migration, and the performance and operational impacts of moving to a three‑site five‑center model.

containerizationdisaster recoverydistributed databases
0 likes · 20 min read
How NetBank Scaled Its Database: From Two‑Site Three‑Center to Three‑Site Five‑Center Architecture
TAL Education Technology
TAL Education Technology
Sep 16, 2021 · Backend Development

Design and Architecture of MQProxy: A Distributed Message Queue Proxy for Kafka

MQProxy is a Java‑based distributed message‑queue proxy built on Apache Kafka that abstracts underlying queue selection, protocols, and health monitoring, offering developers a simple SDK with produce/consume/commit APIs, advanced features like delayed and dead‑letter queues, and a scalable architecture for high availability.

Distributed SystemsMQProxyMessage Queue
0 likes · 17 min read
Design and Architecture of MQProxy: A Distributed Message Queue Proxy for Kafka
IT Architects Alliance
IT Architects Alliance
Sep 12, 2021 · Operations

Mastering Service Degradation: Keep Your System Available Under Heavy Load

This article explains the concept of service degradation, defines SLA levels including the six‑nine metric, and details practical strategies such as fallback data, rate‑limiting, timeout handling, read/write degradation, retry mechanisms, and front‑end techniques to maintain high availability during traffic spikes.

FallbackMicroservicesSLA
0 likes · 14 min read
Mastering Service Degradation: Keep Your System Available Under Heavy Load
Architect
Architect
Sep 11, 2021 · Operations

Understanding Service Degradation and Its Practical Strategies

This article explains the concept of service degradation, its relationship with rate limiting and SLA, and presents various practical mitigation techniques such as fallback data, rate‑limit throttling, timeout handling, fault isolation, retry mechanisms, feature switches, read/write degradation, and front‑end strategies to maintain high availability during traffic spikes or component failures.

FallbackSLAcircuit breaker
0 likes · 13 min read
Understanding Service Degradation and Its Practical Strategies
IT Architects Alliance
IT Architects Alliance
Sep 11, 2021 · Backend Development

Technical Summary of Large-Scale Distributed Website Architecture

This article provides a comprehensive technical overview of large‑scale distributed website architecture, covering its characteristics, design goals, architectural patterns, performance, high availability, scalability, extensibility, security, agility, and a detailed example evolution from a single‑server setup to a multi‑layer, service‑oriented system.

Distributed SystemsScalabilityService Architecture
0 likes · 22 min read
Technical Summary of Large-Scale Distributed Website Architecture
dbaplus Community
dbaplus Community
Sep 8, 2021 · Operations

Why Does a Single Kafka Broker Crash Bring Down All Consumers?

An in‑depth look at Kafka’s high‑availability mechanisms reveals how multi‑replica design, ISR leader election, and the request.required.acks setting interact, why a single broker failure—especially of the __consumer_offset topic—can halt consumption, and how to configure replication factors to prevent such outages.

Distributed SystemsISRKafka
0 likes · 10 min read
Why Does a Single Kafka Broker Crash Bring Down All Consumers?
Top Architect
Top Architect
Sep 8, 2021 · Backend Development

Understanding Apache RocketMQ Architecture: Components, Routing, and Message Flow

This article provides a comprehensive overview of Apache RocketMQ, detailing its core components—Namesrv, Broker, Producer, and Consumer—explaining routing registration, message storage, queue allocation strategies, and key concepts such as topics, tags, and consumer types, while comparing it with Kafka.

Kafka ComparisonMessage QueueMessaging Middleware
0 likes · 14 min read
Understanding Apache RocketMQ Architecture: Components, Routing, and Message Flow
DataFunTalk
DataFunTalk
Sep 4, 2021 · Big Data

High‑Availability Practices of ClickHouse in JD.com: Architecture, Deployment, and Operations

The article details JD.com’s large‑scale OLAP strategy using ClickHouse as the primary engine and Doris as a secondary engine, covering application scenarios, component selection criteria, cluster deployment models, high‑availability architecture, fault‑handling procedures, performance tuning, and future cloud‑native plans.

Big DataCluster DeploymentOLAP
0 likes · 19 min read
High‑Availability Practices of ClickHouse in JD.com: Architecture, Deployment, and Operations
HelloTech
HelloTech
Sep 2, 2021 · Operations

How Production Full‑Link Load Testing Guarantees High Availability at Scale

The article explains why large‑scale services must conduct production full‑link load testing, describes its evolution from ad‑hoc trials to standardized monthly practices, and details the technical and procedural steps—including traffic modeling, JMeter usage, middleware tagging, and responsibility mapping—that ensure reliable capacity planning and risk mitigation.

MicroservicesOperationscapacity planning
0 likes · 13 min read
How Production Full‑Link Load Testing Guarantees High Availability at Scale
HomeTech
HomeTech
Sep 1, 2021 · Databases

Case Study: TiDB Deployment for the 2021 "818 Global Auto Festival"

This case study details how Car Home leveraged TiDB 5.1.1 with a three‑data‑center, five‑replica HTAP architecture to support the high‑traffic 818 Global Auto Festival, covering background, business requirements, database selection, system design, performance challenges, solutions, and post‑event insights.

HTAPPerformance TestingTiCDC
0 likes · 11 min read
Case Study: TiDB Deployment for the 2021 "818 Global Auto Festival"
IT Architects Alliance
IT Architects Alliance
Aug 30, 2021 · Backend Development

Tinyid: A High‑Performance Distributed ID Generation System

Tinyid is a Java‑based distributed ID generator that uses a database segment algorithm, supports multiple master databases, offers both HTTP and Java client interfaces, and provides high throughput and availability for billions of IDs daily.

Microservicesdistributed-idhigh availability
0 likes · 11 min read
Tinyid: A High‑Performance Distributed ID Generation System
Java Architecture Diary
Java Architecture Diary
Aug 30, 2021 · Databases

How to Build a High‑Availability GreatSQL MGR Cluster with Docker‑Compose

This article explains the role of distributed architecture for high‑performance internet systems, introduces GreatSQL as a native distributed relational database, compares it with MySQL, and provides step‑by‑step Docker‑Compose instructions to set up, start, and verify a three‑node MGR cluster, plus integration with the PIG microservice platform.

Docker ComposeGreatSQLMGR
0 likes · 8 min read
How to Build a High‑Availability GreatSQL MGR Cluster with Docker‑Compose
Programmer DD
Programmer DD
Aug 28, 2021 · Databases

How Redis Master‑Slave Replication Works: Handshake, Sync, and Code Walkthrough

Redis, the high‑performance open‑source key‑value store, uses a master‑slave replication mechanism that ensures data redundancy, read/write separation, fault recovery, and high‑availability; this article explains its handshake process, synchronization phases, replication states, and key source‑code functions in detail.

Code WalkthroughMaster‑SlavePSYNC
0 likes · 12 min read
How Redis Master‑Slave Replication Works: Handshake, Sync, and Code Walkthrough
dbaplus Community
dbaplus Community
Aug 25, 2021 · Databases

Master‑Slave, Replica Set, and Sharding: How MongoDB Achieves High Availability

This article explains MongoDB's evolution from Master‑Slave to Replica Set and Sharding architectures, detailing how each model provides high availability, data reliability, and scalability, and offers practical configuration tips to ensure strong consistency and minimal downtime in production deployments.

Database ArchitectureMongoDBReplica Set
0 likes · 20 min read
Master‑Slave, Replica Set, and Sharding: How MongoDB Achieves High Availability
IT Architects Alliance
IT Architects Alliance
Aug 24, 2021 · Backend Development

Design and Technical Specification of a High‑Throughput Message Center

This article presents a comprehensive design for a high‑availability message center that targets 10,000 messages per second inbound throughput and 1,000 messages per second outbound delivery, detailing technical goals, functional requirements, technology selection, architectural diagrams, and implementation guidelines using RocketMQ, Elasticsearch, Spring Cloud Gateway, MySQL, Docker, and Kubernetes.

Backend ArchitectureKuberneteshigh availability
0 likes · 5 min read
Design and Technical Specification of a High‑Throughput Message Center
Top Architect
Top Architect
Aug 24, 2021 · Databases

Redis Architecture Options and Deployment Guide

This article reviews Redis's latest features and compares various deployment architectures—including single‑replica, dual‑replica, cluster, and read‑write‑separation modes—detailing their reliability, performance characteristics, suitable use cases, and provides a Java Jedis example for configuring a direct‑connect cluster.

ClusterJedisarchitecture
0 likes · 12 min read
Redis Architecture Options and Deployment Guide
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Aug 24, 2021 · Information Security

How NetEase Cloud IM SDK Prevents DNS Hijacking with HttpDNS High‑Availability

This article explains the DNS hijacking threat, shares a real incident affecting NetEase Cloud IM, and details a comprehensive high‑availability architecture—including HttpDNS, laddered HTTP requests, caching strategies, and SNI handling—that protects the SDK from DNS attacks and ensures reliable service.

CacheDNS hijackingHTTPDNS
0 likes · 13 min read
How NetEase Cloud IM SDK Prevents DNS Hijacking with HttpDNS High‑Availability
Efficient Ops
Efficient Ops
Aug 23, 2021 · Operations

Master HAProxy: Build High‑Performance L7/L4 Load Balancers & HA Clusters

This guide introduces HAProxy, an open‑source L4/L7 load balancer, and walks through its core features, performance and stability characteristics, step‑by‑step installation on CentOS 7, configuration of both L7 and L4 balancing, monitoring, and setting up high‑availability with Keepalived.

HAProxyLinuxOperations
0 likes · 27 min read
Master HAProxy: Build High‑Performance L7/L4 Load Balancers & HA Clusters
Liangxu Linux
Liangxu Linux
Aug 22, 2021 · Operations

Build Nginx High Availability with Keepalived on Linux

This guide explains how to achieve high availability for Nginx by deploying a dual‑machine keepalived setup, covering the concepts of HA, VRRP, configuration of keepalived on master and backup nodes, a health‑check script, and step‑by‑step commands to test automatic failover.

LinuxVRRPfailover
0 likes · 9 min read
Build Nginx High Availability with Keepalived on Linux
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Aug 20, 2021 · Operations

From Taobao to the Cloud: Secrets of Building Ultra‑High‑Availability Systems

This talk shares practical high‑availability strategies learned from Alibaba’s Taobao platform and Alibaba Cloud, covering traditional IDC stability, cache and disaster‑recovery designs, cloud‑native fault‑tolerance, performance‑capacity trade‑offs, traffic shaping, multi‑region replication, and lessons from real‑world incidents like GitLab failures.

Alibabacloud architecturefault tolerance
0 likes · 21 min read
From Taobao to the Cloud: Secrets of Building Ultra‑High‑Availability Systems
TAL Education Technology
TAL Education Technology
Aug 19, 2021 · Operations

Comprehensive SRE Guide for Summer and Winter High‑Load Periods in an Online Education Platform

This document outlines a comprehensive SRE‑driven operational framework for ensuring stable, high‑availability online education services during peak summer and winter periods, detailing pre‑, during‑, and post‑maintenance phases, architectural principles, load testing, monitoring, capacity management, safety hardening, chaos engineering, incident response, and post‑mortem practices.

Load TestingSREcapacity planning
0 likes · 17 min read
Comprehensive SRE Guide for Summer and Winter High‑Load Periods in an Online Education Platform
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Aug 19, 2021 · Operations

How Alibaba Conquered Double 11: Scaling to 17.5k TPS with High‑Availability Architecture

Alibaba’s eight‑year Double 11 journey illustrates how the company tackled exponential business growth by inventing high‑availability middleware, precise capacity planning, unit‑based deployment, online stress testing, hybrid‑cloud elasticity, and intelligent runtime control to balance throughput, cost, and user experience during the midnight peak.

Distributed Systemscapacity planningcloud scaling
0 likes · 23 min read
How Alibaba Conquered Double 11: Scaling to 17.5k TPS with High‑Availability Architecture
IT Architects Alliance
IT Architects Alliance
Aug 17, 2021 · Backend Development

Meituan Instant Logistics: Distributed System Architecture, Practices, and Future Challenges

The article details Meituan’s five‑year evolution of its instant logistics platform, describing the distributed backend architecture, AI‑driven optimization, scalability and high‑availability practices, as well as future challenges in microservice complexity and operational automation.

Distributed SystemsLogisticsMicroservices
0 likes · 10 min read
Meituan Instant Logistics: Distributed System Architecture, Practices, and Future Challenges
Ctrip Technology
Ctrip Technology
Aug 17, 2021 · Databases

Sharding and Database Refactoring for High‑Volume Train Ticket Orders at Ctrip

This article describes how Ctrip's senior backend engineer designed and implemented horizontal database sharding, a service‑level proxy, dual‑read/write mechanisms, and a staged migration process to overcome order‑database bottlenecks, improve scalability, and ensure high availability for the rapidly growing international train‑ticket business.

Backendhigh availabilitymiddleware
0 likes · 16 min read
Sharding and Database Refactoring for High‑Volume Train Ticket Orders at Ctrip
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 17, 2021 · Databases

Design and Implementation of a Cloud‑Native MySQL Container Platform for High Availability and Resource Efficiency

The article describes how a bank built a Kubernetes‑based, containerized MySQL service platform (CDD) to improve database high availability, resource utilization, automated operations, and agile delivery by addressing network, storage, scheduling, and management challenges through custom networking, hybrid storage, scheduler extensions, and multi‑AZ deployment.

Cloud NativeKubernetescontainerization
0 likes · 16 min read
Design and Implementation of a Cloud‑Native MySQL Container Platform for High Availability and Resource Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
Aug 15, 2021 · Operations

Enterprise Multi‑Data Center Evolution: From Two‑Region Three‑Center to Distributed Active/Active Architecture

The article explains how enterprises are moving from traditional primary‑backup and two‑region three‑center data‑center models toward distributed active/active data‑center architectures to achieve continuous 24/7 operations, higher resource utilization, and fault‑transparent services, while outlining the technical and organizational challenges involved.

Active-ActiveIT Operationsdisaster recovery
0 likes · 10 min read
Enterprise Multi‑Data Center Evolution: From Two‑Region Three‑Center to Distributed Active/Active Architecture
21CTO
21CTO
Aug 14, 2021 · Backend Development

Designing a System That Scales to 100 Million Users: Key Strategies

This guide explains how to build a highly available, scalable architecture for supporting hundreds of millions of users, covering decoupling, redundancy, vertical and horizontal scaling, load balancing, database replication, sharding, caching, CDN, GeoDNS, and best practices for progressive system expansion.

CDNDatabase ReplicationScalability
0 likes · 19 min read
Designing a System That Scales to 100 Million Users: Key Strategies
Laravel Tech Community
Laravel Tech Community
Aug 12, 2021 · Backend Development

Cache Penetration, Cache Breakdown, and Cache Avalanche: Concepts and Solutions

The article explains the concepts of cache penetration, cache breakdown, and cache avalanche in Redis‑based systems, analyzes the performance problems they cause under high concurrency, and presents practical mitigation techniques such as Bloom filters, caching empty objects, distributed locks, high‑availability clusters, rate limiting, and data pre‑warming.

BackendCachebloom-filter
0 likes · 6 min read
Cache Penetration, Cache Breakdown, and Cache Avalanche: Concepts and Solutions
MaGe Linux Operations
MaGe Linux Operations
Aug 11, 2021 · Databases

Understanding Cloud MySQL: Instance Types, Replication Modes, and High Availability

This article explains the different Cloud MySQL instance architectures, details asynchronous, semi‑synchronous, and strong synchronous replication, describes high‑availability failover mechanisms, outlines upgrade procedures, and covers binlog usage, rollback methods, slow‑query optimization, and storage fragmentation.

BackupInstance TypesReplication
0 likes · 11 min read
Understanding Cloud MySQL: Instance Types, Replication Modes, and High Availability
Baidu Geek Talk
Baidu Geek Talk
Aug 9, 2021 · Databases

BaikalDB Implementation Practice at Tongcheng Yilong: High Availability, HTAP, Performance and Cost Optimization

Tongcheng Yilong’s BaikalDB deployment combines high‑availability multi‑Raft HA, HTAP support, and share‑nothing scalability to deliver over 72K TPS OLTP and ten‑fold faster OLAP queries while cutting operational costs up to a hundredfold through dual‑center, columnar storage and cloud‑native elasticity.

BaikalDBColumnar StorageHTAP
0 likes · 27 min read
BaikalDB Implementation Practice at Tongcheng Yilong: High Availability, HTAP, Performance and Cost Optimization
Ops Development Stories
Ops Development Stories
Aug 5, 2021 · Cloud Native

How to Deploy NFS Subdir External Provisioner on Kubernetes with HA

This guide walks through deploying the NFS‑subdir‑external‑provisioner on Kubernetes, covering migration to the new repository, configuring storage classes with subdirectory support, applying RBAC resources, creating PVCs, enabling high‑availability with leader election, and troubleshooting common mount errors.

Cloud NativeKubernetesNFS
0 likes · 14 min read
How to Deploy NFS Subdir External Provisioner on Kubernetes with HA
Programmer DD
Programmer DD
Aug 2, 2021 · Backend Development

High Availability for Elastic Job Lite: Active‑Standby and Dual‑Data‑Center Design

This article explains how to transform single‑node Elastic Job Lite deployments into highly available solutions, covering Zookeeper‑based sharding, active‑standby strategies for dual‑data‑center setups, custom sharding implementations, and priority scheduling to ensure tasks run reliably across both primary and backup sites.

Distributed SystemsElastic-JobJob Scheduling
0 likes · 14 min read
High Availability for Elastic Job Lite: Active‑Standby and Dual‑Data‑Center Design
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Aug 1, 2021 · Backend Development

Designing a Scalable Cloud Shopping Cart: Architecture, Caching, and Payment Strategies

This article details the design and architecture of a cloud‑based shopping cart system, covering its functional modules, layered and cluster designs, technical goals such as stability and elasticity, three‑level caching, asynchronous checks, storage heterogeneity, and payment‑processing solutions.

Shopping CartSystem Designbackend-development
0 likes · 4 min read
Designing a Scalable Cloud Shopping Cart: Architecture, Caching, and Payment Strategies
HelloTech
HelloTech
Jul 30, 2021 · Operations

Foundations of High Availability: Defining and Managing Strong and Weak Service Dependencies

The article defines strong versus weak service dependencies, outlines governance through discovery, fault injection, and refactoring, recommends front‑end and back‑end fault‑tolerance measures such as timeouts and circuit breakers, describes isolation and artificial degradation switches, verifies classifications, and notes current middleware gaps and hiring information.

BackendFault InjectionService Dependency
0 likes · 10 min read
Foundations of High Availability: Defining and Managing Strong and Weak Service Dependencies
Baidu Intelligent Testing
Baidu Intelligent Testing
Jul 29, 2021 · Backend Development

Building High‑Availability Architecture for Baidu Feed Online Recommendation System

This article describes how Baidu engineered a flexible, multi‑level fault‑tolerant architecture—including dynamic retry scheduling, multi‑recall coordination, ranking layer degradation, and cross‑IDC multi‑master storage—to achieve five‑nine availability for its massive feed recommendation service.

Cloud Nativedynamic retryfault tolerance
0 likes · 16 min read
Building High‑Availability Architecture for Baidu Feed Online Recommendation System
Efficient Ops
Efficient Ops
Jul 27, 2021 · Databases

Mastering Tencent Cloud MySQL: Instance Types, Replication, HA & Upgrades

This guide explains Tencent Cloud MySQL's three instance types, detailed replication modes (asynchronous, semi‑synchronous, strong synchronous), high‑availability failover mechanisms, upgrade procedures, binlog management, restore options, slow‑query tuning, and space fragmentation, providing practical insights for reliable cloud database operations.

BinlogInstance UpgradeReplication
0 likes · 11 min read
Mastering Tencent Cloud MySQL: Instance Types, Replication, HA & Upgrades
Laravel Tech Community
Laravel Tech Community
Jul 23, 2021 · Backend Development

Cache Penetration, Cache Breakdown, and Cache Avalanche: Concepts and Mitigation Strategies

The article explains the concepts of cache penetration, cache breakdown, and cache avalanche in Redis‑based systems, describes the performance risks they pose to persistent databases, and presents practical mitigation techniques such as Bloom filters, empty‑object caching, hot‑key permanence, distributed locks, high‑availability clusters, rate limiting, and data pre‑warming.

BackendCachebloom-filter
0 likes · 6 min read
Cache Penetration, Cache Breakdown, and Cache Avalanche: Concepts and Mitigation Strategies
Aikesheng Open Source Community
Aikesheng Open Source Community
Jul 23, 2021 · Databases

Implementing Conversion Between MySQL Group Replication (MGR) and Semi‑Synchronous Replication

This guide demonstrates how to switch a MySQL 5.7.32 deployment between Group Replication (MGR) and semi‑synchronous replication, covering environment checks, node configuration, plugin installation, replication setup, validation, and the limitations encountered when combining the two modes.

ConfigurationGroup ReplicationSemi‑synchronous Replication
0 likes · 9 min read
Implementing Conversion Between MySQL Group Replication (MGR) and Semi‑Synchronous Replication
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Jul 19, 2021 · Cloud Native

Mastering Kubernetes Node Isolation, Scaling, and Rolling Updates – Practical Commands and Tips

This guide walks through essential Kubernetes operations such as isolating and recovering nodes, expanding clusters with new nodes, dynamically scaling Pods, managing Labels, scheduling Pods to specific Nodes, performing rolling updates, and configuring high‑availability for etcd and Master components, all with concrete command‑line examples and YAML snippets.

KubernetesNode ManagementRolling Update
0 likes · 19 min read
Mastering Kubernetes Node Isolation, Scaling, and Rolling Updates – Practical Commands and Tips
macrozheng
macrozheng
Jul 18, 2021 · Operations

Why Did Bilibili Crash? A Developer’s Deep Dive into High‑Availability Failures

In this article, a programmer recounts the recent Bilibili outage, analyzes its timeline, proposes technical root‑cause hypotheses such as CDN failure and service‑chain avalanche, shares insights from the platform’s high‑availability architecture, and outlines preventive techniques for building more resilient backend systems.

BilibiliCDNOperations
0 likes · 10 min read
Why Did Bilibili Crash? A Developer’s Deep Dive into High‑Availability Failures
21CTO
21CTO
Jul 16, 2021 · Operations

What Bilibili’s Outage Teaches About Achieving True High Availability

The article analyzes Bilibili’s recent service outage, explains why high availability matters, introduces key metrics like MTBF and MTTR, and outlines practical strategies such as redundancy, rate limiting, isolation, failover, timeout control, circuit breaking, degradation, and multi‑region deployment to build resilient systems.

MTBFMTTROperations
0 likes · 18 min read
What Bilibili’s Outage Teaches About Achieving True High Availability
Open Source Linux
Open Source Linux
Jul 15, 2021 · Backend Development

How Kernel-Level Content-Based Load Balancing Boosts Server Performance

This article explains the principles and implementation of content‑based request distribution in Linux IPVS and the kernel‑mode KTCPVS, covering TCP gateway vs. migration, scheduling algorithms, high‑availability mechanisms, and performance benefits such as improved cache hit rates and scalability.

IPVSKTCPVSNetworking
0 likes · 23 min read
How Kernel-Level Content-Based Load Balancing Boosts Server Performance
Youzan Coder
Youzan Coder
Jul 15, 2021 · Backend Development

Message Queue Architecture Comparison: NSQ, Kafka, and RocketMQ in Distributed Systems

The article compares the architectures of NSQ (YouZan branch), Kafka, and RocketMQ—detailing their coordination mechanisms, storage models, consistency guarantees, and operational trade‑offs—while recommending Kafka for log‑big‑data workloads, RocketMQ for massive topic counts, and NSQ for extensibility and lightweight deployment.

Distributed SystemsKafkaMessage Queue
0 likes · 16 min read
Message Queue Architecture Comparison: NSQ, Kafka, and RocketMQ in Distributed Systems
Top Architect
Top Architect
Jul 14, 2021 · Databases

Redis Read‑Write Separation Architecture: Star vs. Chain Replication

This article explains Alibaba Cloud's Redis read‑write separation architecture, comparing star and chain replication models, their performance and scalability trade‑offs, and how transparent compatibility, high availability, and high performance are achieved through redis‑proxy, HA monitoring, and optimized binlog replication.

Read-Write Separationdatabaseshigh availability
0 likes · 8 min read
Redis Read‑Write Separation Architecture: Star vs. Chain Replication
Wukong Talks Architecture
Wukong Talks Architecture
Jul 14, 2021 · Operations

Understanding High Availability: Lessons from the Bilibili Outage

This article analyzes Bilibili's recent service disruption, explains the concept and quantitative metrics of high availability, and outlines practical techniques such as rate limiting, isolation, failover, timeout control, circuit breaking, degradation, and multi‑region active‑active deployments to improve system reliability.

Distributed SystemsHAMTBF
0 likes · 13 min read
Understanding High Availability: Lessons from the Bilibili Outage
IT Architects Alliance
IT Architects Alliance
Jul 10, 2021 · Operations

Building a High‑Availability Redis Service with Sentinel

This article explains how to design and deploy a highly available Redis architecture using Sentinel, covering failure scenarios, evaluation of common HA solutions, step‑by‑step configurations from a single‑node setup to a three‑Sentinel deployment, and practical tips such as using virtual IPs for seamless client access.

BackendDevOpshigh availability
0 likes · 12 min read
Building a High‑Availability Redis Service with Sentinel
IT Architects Alliance
IT Architects Alliance
Jul 8, 2021 · Operations

Mastering High Availability: From Cold Backup to Multi‑Region Active‑Active

This article analyzes various high‑availability strategies for stateful backend services—covering cold backup, dual‑machine hot standby, same‑city active‑active, remote active‑active, and multi‑region active‑active architectures—detailing their benefits, limitations, and practical implementation considerations.

Active-ActiveSystem Designbackend operations
0 likes · 14 min read
Mastering High Availability: From Cold Backup to Multi‑Region Active‑Active
21CTO
21CTO
Jul 8, 2021 · Backend Development

How a Cloud Shopping Cart Achieves Scalability, Reliability, and Performance

This article explains the architecture of a cloud‑based shopping cart, covering its functional roles, layered and cluster designs, distributed technical architecture goals, three‑tier caching, storage heterogeneity, payment solutions, Nginx+LUA aggregation, anti‑bot measures, and multi‑dimensional user feature identification.

Shopping Cartcachingcloud computing
0 likes · 5 min read
How a Cloud Shopping Cart Achieves Scalability, Reliability, and Performance
Open Source Linux
Open Source Linux
Jul 5, 2021 · Operations

Designing Scalable, High‑Availability Network Services with Linux LVS

This article explains the principles and architecture of scalable, high‑availability network services using Linux Virtual Server (LVS), covering definitions, requirements, load‑balancing mechanisms, cluster components, geographic distribution, BGP routing, and practical deployment considerations for web, media, cache, and mail services.

LVShigh availabilityload balancing
0 likes · 25 min read
Designing Scalable, High‑Availability Network Services with Linux LVS
Java High-Performance Architecture
Java High-Performance Architecture
Jul 3, 2021 · Backend Development

Building a High‑Throughput, Highly Available Messaging Center with RocketMQ & Elasticsearch

This article outlines the technical, business, and product goals for a messaging center, presents a prototype and functional requirements, evaluates RocketMQ and Elasticsearch as core technologies, and details the architectural design, underlying frameworks, and DevOps strategies—including Spring Cloud Gateway, Kubernetes, and Docker—to achieve 10,000 msg/s upstream throughput, 1,000 msg/s downstream delivery, and 100 % high availability.

ElasticsearchKubernetesMessaging System
0 likes · 5 min read
Building a High‑Throughput, Highly Available Messaging Center with RocketMQ & Elasticsearch
Dada Group Technology
Dada Group Technology
Jul 2, 2021 · Backend Development

Design and Implementation of a High‑Availability Coupon Platform with Distributed Storage (JimDB)

This article describes the architecture and optimization of JD.com’s coupon platform, covering the JimDB distributed in‑memory database for core storage, a massive distributed task system for product coupons, high‑availability strategies for store coupons, and the overall middle‑platform design that ensures scalability, low latency, and data consistency across millions of daily transactions.

BackendCouponSystem Architecture
0 likes · 8 min read
Design and Implementation of a High‑Availability Coupon Platform with Distributed Storage (JimDB)
DataFunTalk
DataFunTalk
Jun 30, 2021 · Big Data

Kuaishou Havok Data Service Platform and Its High‑Availability Assurance System

The article introduces Kuaishou's Havok data‑service platform—a one‑stop, configuration‑driven solution that lowers development barriers—and details the comprehensive high‑availability architecture, including hierarchical isolation, elastic scaling, link grading, disaster recovery, and rate‑limiting mechanisms that enable zero‑failure support for large‑scale events.

Data PlatformHavokKuaishou
0 likes · 11 min read
Kuaishou Havok Data Service Platform and Its High‑Availability Assurance System
Alibaba Cloud Native
Alibaba Cloud Native
Jun 30, 2021 · Operations

How We Built a Dual‑Center, High‑Availability RocketMQ Platform

This article explains why RocketMQ was chosen, describes its large‑scale usage, details the design and implementation of a same‑city dual‑center architecture with near‑by production and consumption, outlines failover mechanisms, governance practices, lessons learned, and future plans for the messaging platform.

Dual CenterMessage QueueOperations
0 likes · 15 min read
How We Built a Dual‑Center, High‑Availability RocketMQ Platform
IT Architects Alliance
IT Architects Alliance
Jun 29, 2021 · Operations

Understanding High Availability: Compute and Storage Strategies Explained

This article defines high availability, explains why achieving four nines is a common goal, and categorizes HA into compute and storage solutions, detailing common architectures such as active‑passive, master‑slave, symmetric and asymmetric clusters, as well as various storage replication patterns.

Infrastructurecompute HAhigh availability
0 likes · 3 min read
Understanding High Availability: Compute and Storage Strategies Explained
MaGe Linux Operations
MaGe Linux Operations
Jun 27, 2021 · Operations

Master HAProxy: From Installation to High‑Availability Load Balancing

This article introduces HAProxy as a free, high‑performance load balancer, explains its core L4/L7 features, walks through installation on CentOS 7, shows detailed configuration for HTTP and TCP modes, covers logging, log rotation, health checks, session persistence, monitoring, and demonstrates high‑availability setup using Keepalived.

HAProxyhigh availability
0 likes · 27 min read
Master HAProxy: From Installation to High‑Availability Load Balancing
TAL Education Technology
TAL Education Technology
Jun 24, 2021 · Backend Development

Mature Applications of Real-Time Audio/Video in Education: TalRTC Architecture, High Availability, and Network Optimization

The presentation details the TalRTC real‑time communication platform used in online education, covering its product overview, three‑layer architecture, high‑availability and weak‑network strategies, as well as special optimizations for teaching scenarios that improve audio‑video quality and reliability.

Education TechnologyRTCWeak Network Optimization
0 likes · 9 min read
Mature Applications of Real-Time Audio/Video in Education: TalRTC Architecture, High Availability, and Network Optimization
Ctrip Technology
Ctrip Technology
Jun 24, 2021 · Backend Development

Design and Implementation of Distributed Cache with Eventual and Strong Consistency at Ctrip Finance

This article presents Ctrip Finance's design of a unified high‑availability Redis cache service, covering both eventual‑consistency and strong‑consistency scenarios, the overall architecture, data‑accuracy, completeness and availability mechanisms, lock handling, fault‑tolerant updates, and operational recovery strategies.

ConsistencyMicroservicesdistributed cache
0 likes · 26 min read
Design and Implementation of Distributed Cache with Eventual and Strong Consistency at Ctrip Finance
Architecture Digest
Architecture Digest
Jun 24, 2021 · Big Data

Kuaishou's Big Data Service Platform: Architecture, Key Technologies, and Future Outlook

This article introduces Kuaishou's data platform serviceification, outlining the background challenges for data engineers, the platform's architecture and key technologies such as configuration‑driven development, multi‑mode APIs, data acceleration, and high‑availability mechanisms, and concludes with a summary of achievements and future directions.

Big DataData AccelerationData Platform
0 likes · 12 min read
Kuaishou's Big Data Service Platform: Architecture, Key Technologies, and Future Outlook
Java Architect Essentials
Java Architect Essentials
Jun 23, 2021 · Databases

How Redis Read‑Write Separation Boosts Performance and Cuts Costs

This article explains the background, architecture, and replication models of Redis read‑write separation, compares star and chain replication, and outlines its transparent compatibility, high availability, and performance benefits while noting consistency trade‑offs for read‑heavy workloads.

Database ArchitectureRead-Write Separationhigh availability
0 likes · 9 min read
How Redis Read‑Write Separation Boosts Performance and Cuts Costs
Sohu Tech Products
Sohu Tech Products
Jun 23, 2021 · Cloud Native

Containerizing Stateful Services on Kubernetes: Challenges, Solutions, and Best Practices

This article examines the difficulties of running stateful services such as Redis, etcd, and MySQL on Kubernetes and presents practical solutions—including workload selection, CRD/operator extensions, scheduling strategies, high‑availability mechanisms, performance‑optimized networking and storage, and chaos‑engineering validation—to achieve reliable, high‑performance containerized deployments.

CRDKubernetesNetworking
0 likes · 32 min read
Containerizing Stateful Services on Kubernetes: Challenges, Solutions, and Best Practices