Tagged articles
1414 articles
Page 4 of 15
Huolala Tech
Huolala Tech
Oct 12, 2023 · Backend Development

Inside LApiGateway: Building a Scalable Microservice Gateway for High Availability

LApiGateway is Huolala's custom Java-based microservice gateway built on Spring Cloud Gateway, featuring service discovery via Consul, centralized configuration, a rich plugin system, DSL-driven routing, high‑availability designs like physical and service isolation, gray‑scale migration, traffic pre‑warming, and adaptive load balancing to ensure stable, efficient API management.

DSLJavaMicroservice Gateway
0 likes · 12 min read
Inside LApiGateway: Building a Scalable Microservice Gateway for High Availability
JD Cloud Developers
JD Cloud Developers
Sep 28, 2023 · Backend Development

Designing a Scalable, High‑Availability Order System: Architecture Insights

This article details the design of a decoupled, high‑availability order system, covering business scope, value propositions, layered architecture, real‑time data layer, read/write separation, caching, messaging, search, multi‑tenant support, data security, and future challenges such as personalized queries and cost‑effective scaling.

Backend ArchitectureScalabilitydata modeling
0 likes · 12 min read
Designing a Scalable, High‑Availability Order System: Architecture Insights
HomeTech
HomeTech
Sep 27, 2023 · Backend Development

Design and Evolution of a High‑Availability SMS Platform at AutoHome

This article details the architectural evolution, high‑availability strategies, fault‑monitoring mechanisms, and performance optimizations of AutoHome's enterprise SMS platform, covering its migration from .Net to Java, service decomposition with Kafka, multi‑datacenter deployment, and operational safeguards for large‑scale events.

BackendKafkaSMS
0 likes · 9 min read
Design and Evolution of a High‑Availability SMS Platform at AutoHome
Alibaba Cloud Native
Alibaba Cloud Native
Sep 24, 2023 · Cloud Computing

Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK

This article explains how to build robust, highly available cloud‑native applications on Alibaba Cloud Container Service for Kubernetes (ACK) by covering architecture principles, multi‑zone cluster design, Kubernetes HA features such as topology spread constraints and pod anti‑affinity, storage strategies, load‑balancing, virtual nodes, health probes, monitoring, and multi‑cluster deployment patterns.

ACKCloud NativeKubernetes
0 likes · 35 min read
Designing Highly Available Cloud‑Native Applications on Alibaba Cloud ACK
ITPUB
ITPUB
Sep 22, 2023 · Databases

Where Does Database Innovation Come From? Exploring the Future of Distributed Databases

The article examines the driving forces behind database innovation, emphasizing the role of inherent shortcomings, AI integration, and the emergence of third‑generation distributed databases that aim for minimal usability, controllable latency high availability, and 100% data correctness.

Distributed SystemsInnovationdata correctness
0 likes · 11 min read
Where Does Database Innovation Come From? Exploring the Future of Distributed Databases
Architect
Architect
Sep 21, 2023 · Databases

Redis Deep Dive: 20 Classic Interview Questions Explained

This article provides a comprehensive technical walkthrough of Redis, covering its core concepts, data structures, performance tricks, caching pitfalls, persistence options, high‑availability architectures, distributed locking mechanisms, and consistency strategies, all illustrated with concrete examples and code snippets.

Data StructuresDistributed LocksPersistence
0 likes · 50 min read
Redis Deep Dive: 20 Classic Interview Questions Explained
JD Tech
JD Tech
Sep 11, 2023 · Big Data

Construction and High-Fidelity Load Testing of Real-Time Data Dual-Stream

This article explains how to build a dual‑stream real‑time data pipeline for big‑data applications, defines construction standards, and details a three‑step high‑fidelity load‑testing process that ensures stability and high availability during peak promotional periods.

Load Testingdual-streamhigh availability
0 likes · 10 min read
Construction and High-Fidelity Load Testing of Real-Time Data Dual-Stream
Java Architect Essentials
Java Architect Essentials
Sep 10, 2023 · Databases

Why TiDB Is the NewSQL Database Redefining OLTP & OLAP

This article provides a comprehensive technical overview of NewSQL concepts and TiDB, covering its origins, core features, distributed architecture, high‑availability design, MySQL compatibility, unsupported MySQL functionalities, configuration defaults, and real‑world application scenarios for both OLTP and OLAP workloads.

HTAPMySQL compatibilityNewSQL
0 likes · 24 min read
Why TiDB Is the NewSQL Database Redefining OLTP & OLAP
Top Architect
Top Architect
Sep 7, 2023 · Backend Development

Design and Implementation of an Enterprise Unified Push Notification Service

The article outlines the design and evolution of an enterprise‑level unified push notification service, detailing its modular to service‑oriented architecture, multi‑channel support, high‑performance non‑functional requirements, and the comprehensive components such as clients, notification, template, distribution, priority queues, adapters, analytics, and database layers.

Backend ArchitectureMicroservicesPush Notification
0 likes · 16 min read
Design and Implementation of an Enterprise Unified Push Notification Service
Huolala Tech
Huolala Tech
Sep 7, 2023 · Big Data

How Huolala Ensures Doris Stability: Real-World Big Data Practices

This article details Huolala's big‑data architecture and the practical measures—ranging from background analysis and stability challenges to case studies, discovery mechanisms, capacity planning, high‑availability, and automation—that the company employs to guarantee Doris's reliability and performance across its rapidly growing logistics platform.

Big DataOLAPcapacity planning
0 likes · 15 min read
How Huolala Ensures Doris Stability: Real-World Big Data Practices
Efficient Ops
Efficient Ops
Aug 30, 2023 · Databases

Understanding Redis: Architecture, Clustering, and Persistence Explained

This article introduces Redis as an open‑source in‑memory key‑value store, explains its data‑structure server nature, outlines common deployment options—including single instance, high‑availability, Sentinel, and Cluster—describes replication mechanisms, and details persistence models such as RDB, AOF, and hybrid approaches.

ClusterIn-Memory DatabasePersistence
0 likes · 18 min read
Understanding Redis: Architecture, Clustering, and Persistence Explained
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 28, 2023 · Databases

MySQL InnoDB Cluster Read Replicas: Adding, Configuring, and Managing Read‑Only Nodes

This article explains how MySQL InnoDB Cluster 8.1 introduces read‑only replica instances, shows step‑by‑step commands to create and configure them, describes their failover behavior, routing options with MySQL Router, health‑check isolation, replication‑lag handling, and how to hide replicas from traffic.

InnoDB ClusterMySQL RouterReplication
0 likes · 17 min read
MySQL InnoDB Cluster Read Replicas: Adding, Configuring, and Managing Read‑Only Nodes
JD Retail Technology
JD Retail Technology
Aug 24, 2023 · Operations

High‑Availability Strategies for E‑commerce Large‑Scale Promotion Systems

This article outlines a comprehensive framework for preparing e‑commerce platforms for major sales events, covering the history of promotions, business models, system chain segmentation, stability goals, strategic planning, tactical measures, growth promotion, and reference resources to ensure high availability and reliable user experience.

e‑commercehigh availabilitylarge‑scale promotion
0 likes · 19 min read
High‑Availability Strategies for E‑commerce Large‑Scale Promotion Systems
Test Development Learning Exchange
Test Development Learning Exchange
Aug 23, 2023 · Operations

Server Monitoring Strategies and Tools Using Python

This article outlines key strategies and Python tools for server monitoring, including defining metrics, utilizing libraries like psutil and requests, log analysis, load testing with Locust and PyTest, and implementing automated alerts for high availability.

Load TestingPythonServer Monitoring
0 likes · 4 min read
Server Monitoring Strategies and Tools Using Python
Architect
Architect
Aug 19, 2023 · Databases

Deep Dive into MySQL Replication: Mechanisms, Performance, and Real‑World Optimizations

This article thoroughly examines MySQL replication, detailing binlog formats, event types, replication workflows, semi‑synchronous and parallel replication techniques, performance benchmarks, and practical implementation steps such as fake‑slave registration and connection‑pool enhancements, while illustrating each concept with concrete examples and code snippets.

BinlogPerformance OptimizationReplication
0 likes · 30 min read
Deep Dive into MySQL Replication: Mechanisms, Performance, and Real‑World Optimizations
Code Ape Tech Column
Code Ape Tech Column
Aug 15, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions

This article details the design and implementation of a highly available, high‑performance membership system serving over a billion users, covering dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis dual‑center caching, MySQL partitioned clusters, migration strategies, and refined flow‑control and degradation mechanisms.

Distributed SystemsElasticsearchhigh availability
0 likes · 20 min read
High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions
Java Architect Essentials
Java Architect Essentials
Aug 10, 2023 · Backend Development

Unlock Nginx Power: From Installation to High‑Performance Load Balancing and High Availability

This comprehensive guide walks you through the challenges of monolithic deployments, explains core Nginx concepts, shows step‑by‑step installation, configures reverse proxy, static‑dynamic separation, compression, buffering, caching, security features, SSL, high‑availability with keepalived, and essential performance tuning for production‑grade servers.

NginxPerformance Optimizationhigh availability
0 likes · 43 min read
Unlock Nginx Power: From Installation to High‑Performance Load Balancing and High Availability
FunTester
FunTester
Aug 10, 2023 · Backend Development

How QQ Music Scaled Its Comment System for Celebrity Live Events

This article details the architectural redesign of QQ Music's comment platform—migrating to MongoDB, introducing threaded comments, and employing caching and message‑queue decoupling—to handle massive read/write spikes during celebrity live‑drop events while maintaining high availability and performance.

Backend ArchitectureComment SystemMessage Queue
0 likes · 8 min read
How QQ Music Scaled Its Comment System for Celebrity Live Events
Tencent Cloud Developer
Tencent Cloud Developer
Aug 9, 2023 · Backend Development

Designing a High‑Availability Comment System for QQ Music: Architecture, Challenges, and Optimizations

QQ Music’s comment system was re‑engineered with a MongoDB backend, cache layer, asynchronous writes, split read/write services, priority queues and rate‑limiting, enabling it to endure celebrity‑driven traffic spikes, maintain data consistency, and deliver high‑availability, low‑latency user experiences.

Backend ArchitectureComment SystemMessage Queue
0 likes · 7 min read
Designing a High‑Availability Comment System for QQ Music: Architecture, Challenges, and Optimizations
Java Interview Crash Guide
Java Interview Crash Guide
Aug 8, 2023 · Operations

How We Built 99.99% High Availability for a Billion‑User Membership System

This article details the end‑to‑end high‑availability architecture—including dual‑center Elasticsearch clusters, Redis caching with distributed locks, and a dual‑center MySQL partitioned setup—that enables a membership platform serving billions of users to sustain massive traffic while ensuring data consistency and rapid recovery.

ElasticsearchScalabilitySystem Architecture
0 likes · 21 min read
How We Built 99.99% High Availability for a Billion‑User Membership System
Didi Tech
Didi Tech
Aug 7, 2023 · Backend Development

How Didi Achieved Cross‑Datacenter Elasticsearch Replication for Strong Consistency

This article explains Didi's self‑developed DCDR system that replicates Elasticsearch indices across data‑center clusters, detailing its design goals, core mechanisms, chain construction, historical data recovery, real‑time sync, and data‑quality validation to ensure high availability and strong consistency.

Cross‑Datacenter ReplicationDCDRData Consistency
0 likes · 15 min read
How Didi Achieved Cross‑Datacenter Elasticsearch Replication for Strong Consistency
DataFunSummit
DataFunSummit
Jul 27, 2023 · Backend Development

Building a High‑Availability ClickHouse Cluster with RaftKeeper

This article explains how RaftKeeper leverages the Raft consensus algorithm to create a high‑availability, high‑performance ClickHouse cluster across multiple data centers, covering project background, architecture, core features, performance optimizations, and real‑world deployment results.

Backend DevelopmentClickHouseCross-DataCenter
0 likes · 17 min read
Building a High‑Availability ClickHouse Cluster with RaftKeeper
Tech Architecture Stories
Tech Architecture Stories
Jul 23, 2023 · Backend Development

Beyond Scale: Rethinking Architecture Boundaries for Massive Services

This article reflects on years of designing large‑scale backend systems at Tencent, discussing how to define clear architecture boundaries, ensure high availability, integrate diverse technologies, and use observability and monitoring to continuously evolve and improve massive service architectures.

Distributed SystemsObservabilitySystem Design
0 likes · 25 min read
Beyond Scale: Rethinking Architecture Boundaries for Massive Services
Alibaba Terminal Technology
Alibaba Terminal Technology
Jul 21, 2023 · Cloud Native

How Tengine-Ingress Boosts Cloud‑Native Traffic with Zero‑Downtime Updates

Tengine-Ingress, Alibaba’s cloud‑native ingress gateway built on Tengine‑Proxy, replaces the legacy Tengine gateway by delivering dynamic, loss‑less configuration updates, high‑availability gray‑release mechanisms, global consistency checks, and significant performance gains in TLS handshake latency, CPU usage, and memory consumption across large‑scale deployments.

IngressKubernetescloud-native
0 likes · 19 min read
How Tengine-Ingress Boosts Cloud‑Native Traffic with Zero‑Downtime Updates
AI Cyberspace
AI Cyberspace
Jul 17, 2023 · Operations

Mastering VRRP: How to Ensure Router Redundancy and Prevent Split‑Brain Failures

This article explains the VRRP protocol’s core concepts, state machine, election process, and multi‑master HA mode, provides step‑by‑step Linux router configuration examples—including group creation, priority, interface tracking, preempt mode, timers, and learning—plus an overview of Keepalived’s architecture and split‑brain mitigation strategies.

Network ProtocolsVRRPhigh availability
0 likes · 19 min read
Mastering VRRP: How to Ensure Router Redundancy and Prevent Split‑Brain Failures
Selected Java Interview Questions
Selected Java Interview Questions
Jul 15, 2023 · Operations

High‑Availability Architecture for a Large‑Scale Membership System

The article describes how a membership system serving billions of users across multiple platforms achieves high performance and high availability through dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis caching with distributed locks, dual‑center MySQL partitioning, and fine‑grained flow‑control and degradation strategies.

Backend ArchitectureDistributed SystemsElasticsearch
0 likes · 25 min read
High‑Availability Architecture for a Large‑Scale Membership System
Architect
Architect
Jul 14, 2023 · Databases

From Single‑Node to Scalable Redis Cluster: A Step‑by‑Step Architecture Guide

This article walks through Redis's evolution from a simple single‑instance cache to a highly available, high‑performance cluster, explaining persistence mechanisms (RDB, AOF, hybrid), master‑slave replication, Sentinel automatic failover, and sharding strategies with concrete examples and trade‑offs.

Database ArchitecturePersistenceReplication
0 likes · 20 min read
From Single‑Node to Scalable Redis Cluster: A Step‑by‑Step Architecture Guide
Ximalaya Technology Team
Ximalaya Technology Team
Jul 13, 2023 · Databases

Evolution of Ximalaya KV Storage and XCache Architecture

Ximalaya’s KV storage progressed from a simple Redis master‑slave setup to client‑side sharding, then adopted Codis clustering for elastic scaling, integrated Pika’s disk‑based store with cold‑hot separation, introduced KV‑blob separation, fast‑slow command pools, second‑level expansion, ehash fields, large‑key circuit breaking, multi‑active data‑center replication, and now targets cloud‑native deployment, advanced features, and AI‑driven operations.

CodisKV storagePika
0 likes · 19 min read
Evolution of Ximalaya KV Storage and XCache Architecture
21CTO
21CTO
Jul 12, 2023 · Fundamentals

Why Repeating Code Is Killing Your Projects: Master DRY, SOLID, and High‑Availability Design

This article explores essential software engineering practices—including the DRY principle, SOLID design principles, common design patterns, high‑availability architecture, automation, effective communication, and career development—offering concrete examples, code snippets, and actionable advice to help developers write cleaner, more maintainable, and scalable code.

DRYDesign PatternsSOLID
0 likes · 31 min read
Why Repeating Code Is Killing Your Projects: Master DRY, SOLID, and High‑Availability Design
JD Retail Technology
JD Retail Technology
Jul 11, 2023 · Operations

Technical Strategies for Ensuring System Stability During the 618 Promotion

The article analyzes the importance of the 618 sales event, identifies factors that threaten system stability such as traffic spikes, massive data, complex scenarios, long delivery chains and low tolerance, and proposes comprehensive application, storage, and operational measures—including unitization, monitoring, logging, fast‑fail, rate‑limiting, degradation, database and cache designs, and emergency processes—to guarantee reliable service during the promotion.

Scalabilityhigh availabilitylarge‑scale promotion
0 likes · 14 min read
Technical Strategies for Ensuring System Stability During the 618 Promotion
Tencent Cloud Developer
Tencent Cloud Developer
Jul 3, 2023 · Fundamentals

Architectural Design Basics: High‑Performance, High‑Availability, and Scalability Patterns

Drawing on the book 'From Zero to Architecture' and personal experience, this article outlines practical design methods that define core concepts, address performance, availability and scalability complexities, and detail patterns for storage and compute optimization, fault‑tolerant interfaces, modular architectures, and evolutionary refactoring to build sustainable high‑quality systems.

Scalabilityarchitecturehigh availability
0 likes · 45 min read
Architectural Design Basics: High‑Performance, High‑Availability, and Scalability Patterns
ITPUB
ITPUB
Jul 1, 2023 · Databases

Mastering MySQL Master‑Slave Replication: Principles, Delays, and Solutions

This article explains MySQL master‑slave architecture, why it’s used, the replication process, consistency challenges, causes of replication lag, and practical strategies—including binlog formats, mixed mode, monitoring, caching, and failover setups—to optimize performance and ensure high availability.

BinlogLagMaster‑Slave
0 likes · 11 min read
Mastering MySQL Master‑Slave Replication: Principles, Delays, and Solutions
Alibaba Cloud Native
Alibaba Cloud Native
Jun 26, 2023 · Cloud Native

How RocketMQ Evolved Its High‑Availability Architecture for Cloud‑Native Deployments

This article examines RocketMQ's high‑availability evolution—from early master‑slave and Raft‑based designs to the v5 DLedger fusion model—detailing replica groups, data sharding, election mechanisms, replication strategies, metric trade‑offs, log‑divergence handling, controller roles, heartbeat optimizations, and comparisons with Kafka and Pulsar, all illustrated with diagrams and code snippets.

Cloud NativeDLedgerDistributed Systems
0 likes · 36 min read
How RocketMQ Evolved Its High‑Availability Architecture for Cloud‑Native Deployments
dbaplus Community
dbaplus Community
Jun 12, 2023 · Databases

How a Redis Memory Upgrade Triggered Data Loss: Sentinel Failover Lessons

A recent Redis deployment faced memory expansion, a master‑slave switch, and unexpected data loss when the new master entered read‑only mode, prompting a deep dive into sentinel behavior, maxmemory settings, and replica‑ignore‑maxmemory nuances to prevent similar failures.

Memory Upgradefailoverhigh availability
0 likes · 12 min read
How a Redis Memory Upgrade Triggered Data Loss: Sentinel Failover Lessons
Open Source Linux
Open Source Linux
Jun 9, 2023 · Backend Development

How We Built a High‑Availability Membership System for Billions of Users

This article details the design and implementation of a highly available, high‑performance membership platform serving over a billion users, covering Elasticsearch dual‑center clusters, traffic‑isolated clusters, deep ES optimizations, Redis caching strategies, MySQL dual‑center partitioning, seamless migration, and fine‑grained flow‑control and degradation mechanisms.

Elasticsearchcachinghigh availability
0 likes · 21 min read
How We Built a High‑Availability Membership System for Billions of Users
JD Tech
JD Tech
Jun 7, 2023 · Operations

Practical Guide to Achieving High Availability in Software Delivery

This article explains the concept of high availability, outlines the challenges of collaborative delivery, architectural design, coding practices, secure release, and deployment operations, and provides concrete steps, process standards, emergency plans, and self‑check tools to ensure reliable, fault‑tolerant software systems.

CollaborationDeploymentarchitecture
0 likes · 13 min read
Practical Guide to Achieving High Availability in Software Delivery
MaGe Linux Operations
MaGe Linux Operations
May 31, 2023 · Operations

How We Achieved 20k TPS High‑Availability for a Billion‑User Membership System

This article details the design and implementation of a highly available, high‑performance membership system that serves over a billion users, covering Elasticsearch dual‑center HA, traffic‑isolated clusters, Redis caching, MySQL dual‑center partitioning, seamless migration, and refined flow‑control and degradation strategies.

ElasticsearchSystem Architecturehigh availability
0 likes · 19 min read
How We Achieved 20k TPS High‑Availability for a Billion‑User Membership System
Tencent Cloud Developer
Tencent Cloud Developer
May 24, 2023 · Backend Development

Backend Development Best Practices: DRY, SOLID, High Availability, and Design Patterns

A senior Tencent Cloud backend engineer outlines practical best‑practice guidelines—applying DRY and SOLID principles, leveraging common design patterns, designing high‑availability architectures, automating workflows, prioritizing value‑driven development, fostering clear communication, ensuring reliability, and encouraging continuous learning—to write clean, maintainable, and resilient backend systems.

BackendCareer DevelopmentDRY
0 likes · 30 min read
Backend Development Best Practices: DRY, SOLID, High Availability, and Design Patterns
Huolala Tech
Huolala Tech
May 18, 2023 · Databases

When to Adopt Distributed Databases? A Practical Guide to Choosing the Right Architecture

This article examines why traditional single‑node databases struggle with growing data volumes, outlines the three main distributed‑database architectures, compares their trade‑offs in availability, consistency, scalability and operational complexity, and offers practical criteria for deciding whether a distributed solution is truly needed.

Cloud DatabasesDatabase ArchitectureHTAP
0 likes · 25 min read
When to Adopt Distributed Databases? A Practical Guide to Choosing the Right Architecture
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
May 18, 2023 · Cloud Native

How to Build a High‑Availability E‑Commerce Platform on Kubernetes

This guide explains how to design and deploy a highly available, scalable e‑commerce platform on Kubernetes by containerizing services, planning clusters, configuring replication, load balancing, persistent storage, monitoring, security, CI/CD pipelines, and provides complete YAML examples for frontend, backend, and database components.

Cloud NativeDeploymentKubernetes
0 likes · 8 min read
How to Build a High‑Availability E‑Commerce Platform on Kubernetes
Architecture Digest
Architecture Digest
May 16, 2023 · Backend Development

High‑Availability Architecture for a Membership System: Dual‑Center ES Cluster, Redis Caching, MySQL Migration, and Fine‑Grained Flow Control

This article presents a comprehensive engineering case study of a high‑traffic membership system, detailing the dual‑center Elasticsearch high‑availability design, traffic‑isolated three‑cluster ES architecture, Redis caching strategy, dual‑center MySQL partitioning and migration plan, abnormal member relationship governance, and future fine‑grained flow‑control and downgrade policies.

Backend ArchitectureData MigrationElasticsearch
0 likes · 19 min read
High‑Availability Architecture for a Membership System: Dual‑Center ES Cluster, Redis Caching, MySQL Migration, and Fine‑Grained Flow Control
Practical DevOps Architecture
Practical DevOps Architecture
May 16, 2023 · Databases

Redis Course Curriculum Overview: Distributed Locks, High Availability, Clustering, Persistence, and Advanced Projects

This article outlines a comprehensive Redis training program covering fundamentals, distributed lock implementation, high‑availability mechanisms, clustering, persistence strategies, and practical projects such as Bloom filter integration and flash‑sale systems, providing learners with the knowledge to master advanced Redis usage.

Persistencebloom-filterclustering
0 likes · 5 min read
Redis Course Curriculum Overview: Distributed Locks, High Availability, Clustering, Persistence, and Advanced Projects
Meituan Technology Team
Meituan Technology Team
May 11, 2023 · Databases

Meituan Database High Availability System

Meituan’s high‑availability database system, built on a multi‑region Raft‑group architecture, addresses rapid instance growth and stringent availability by deploying three‑node HA cores, micro‑services, and MGR clusters with AZ‑ and region‑level disaster recovery, while employing multi‑channel fault detection, weighted election, and semi‑synchronous consistency mechanisms, and outlines future moves toward decentralized proxies and fully clustered designs.

HA deploymentRaftRipple
0 likes · 20 min read
Meituan Database High Availability System
Java High-Performance Architecture
Java High-Performance Architecture
May 7, 2023 · Backend Development

How We Built a Billion‑User High‑Availability Membership System with Dual‑Center ES, Redis, and MySQL

This article details the design and implementation of a high‑performance, highly available membership platform serving billions of users, covering dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis caching strategies, MySQL dual‑center partitioning, seamless migration, and fine‑grained flow‑control and degradation mechanisms.

Backend EngineeringElasticsearchScalable Design
0 likes · 21 min read
How We Built a Billion‑User High‑Availability Membership System with Dual‑Center ES, Redis, and MySQL
dbaplus Community
dbaplus Community
May 4, 2023 · Databases

How HuoLa La Built a Hybrid‑Cloud Database Middleware for Massive MySQL Scale

This article details HuoLa La's journey of designing, implementing, and evolving a hybrid‑cloud self‑built database middleware that unifies multi‑cloud environments, achieves up to 1024× MySQL horizontal scaling, and addresses challenges such as multi‑language stacks, high availability, SQL governance, and multi‑AZ deployment.

Cloud NativeDatabase MiddlewareMulti‑AZ
0 likes · 17 min read
How HuoLa La Built a Hybrid‑Cloud Database Middleware for Massive MySQL Scale
Su San Talks Tech
Su San Talks Tech
May 2, 2023 · Databases

Master Redis Interview Questions: Performance, Persistence, High Availability & More

This comprehensive guide covers why Redis is fast, its underlying data structures, single‑threaded and epoll models, cache eviction policies, persistence mechanisms, high‑availability architecture, performance bottlenecks, distributed locking, cache avalanche/penetration strategies, data skew handling, and bitmap applications, providing essential knowledge for interview preparation.

cachingdatabasehigh availability
0 likes · 23 min read
Master Redis Interview Questions: Performance, Persistence, High Availability & More
Top Architect
Top Architect
Apr 30, 2023 · Backend Development

Kafka Core Concepts, Architecture, Performance, and Operational Practices

This article provides a comprehensive overview of Kafka, covering its core value as a message queue, fundamental concepts, cluster architecture, log storage mechanisms, zero‑copy data transfer, high‑throughput and high‑availability design, consumer group behavior, rebalance strategies, and practical operational commands for managing topics, partitions, and offsets.

BackendDistributed SystemsKafka
0 likes · 31 min read
Kafka Core Concepts, Architecture, Performance, and Operational Practices
Tencent Cloud Middleware
Tencent Cloud Middleware
Apr 25, 2023 · Big Data

How to Achieve High Availability for Kafka Across Data Centers: Architectures, Trade‑offs, and Solutions

This article explains Kafka's cross‑data‑center high‑availability options, compares stretched and connected cluster designs, outlines typical failure scenarios, and reviews both community and commercial replication solutions, helping architects choose the most suitable deployment for their specific requirements.

Connected ClusterCross‑Data‑CenterKafka
0 likes · 24 min read
How to Achieve High Availability for Kafka Across Data Centers: Architectures, Trade‑offs, and Solutions
IT Architects Alliance
IT Architects Alliance
Apr 14, 2023 · Databases

Comprehensive Guide to Database Horizontal Scaling, Smooth 2N Expansion, and Keepalived High‑Availability Configuration

This technical guide explains how to scale a sharded database horizontally by introducing five expansion schemes—including shutdown, write‑stop, log‑based, dual‑write, and smooth 2N approaches—covers MariaDB installation, master‑master replication setup, dynamic data‑source configuration with ShardingJDBC, and detailed Keepalived high‑availability configuration for seamless service continuity.

Dual WriteDynamic Data SourceMariaDB
0 likes · 31 min read
Comprehensive Guide to Database Horizontal Scaling, Smooth 2N Expansion, and Keepalived High‑Availability Configuration
Java Architect Essentials
Java Architect Essentials
Apr 12, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System

This article details the design and implementation of a high‑availability, billion‑scale membership system, covering Elasticsearch dual‑center clusters, traffic‑isolated architectures, deep ES optimizations, Redis caching strategies, MySQL migration with dual‑center partitioning, abnormal member relationship handling, and future fine‑grained flow‑control and degradation plans.

Distributed SystemsElasticsearchFlow Control
0 likes · 20 min read
High‑Availability Architecture for a Billion‑Scale Membership System
Efficient Ops
Efficient Ops
Apr 12, 2023 · Operations

Building Highly Available Prometheus Monitoring with Thanos: A Practical Guide

This article explains why native Prometheus HA solutions fall short for large, multi‑region clusters and shows how to use Thanos components—including sidecar, query, store gateway, and compactor—to achieve long‑term storage, unlimited scaling, a global view, and non‑intrusive integration with existing Prometheus deployments.

KubernetesObservabilityPrometheus
0 likes · 22 min read
Building Highly Available Prometheus Monitoring with Thanos: A Practical Guide
vivo Internet Technology
vivo Internet Technology
Apr 12, 2023 · Backend Development

vivo Transaction Platform: Architecture Design and Key Technical Solutions

The article details vivo’s transition from a monolithic mall to a micro‑service transaction platform, outlining a multi‑tenant architecture with ShardingSphere‑sharded MySQL, Snowflake IDs, Elasticsearch search, configurable state machines, generic delayed tasks, Seata and local‑message distributed transactions, plus high‑availability safeguards, emphasizing pragmatic solution selection.

Distributed TransactionsMulti-Tenant ArchitectureSeata
0 likes · 11 min read
vivo Transaction Platform: Architecture Design and Key Technical Solutions
Tencent Cloud Developer
Tencent Cloud Developer
Apr 12, 2023 · Backend Development

Designing Scalable Backend Architecture and Push Systems: Boundaries, Organization, and Feedback

The article by Tencent backend expert Lv Yuanfang explains how to design scalable mobile‑internet backend architectures and push services by defining clear protocol boundaries, aligning system components with team organization through a conversion‑layer service, and implementing comprehensive feedback loops—including health metrics, monitoring, and data‑driven analysis—to ensure high availability and low coupling.

Backend ArchitectureL5Microservices
0 likes · 16 min read
Designing Scalable Backend Architecture and Push Systems: Boundaries, Organization, and Feedback
Architecture Digest
Architecture Digest
Apr 11, 2023 · Databases

Understanding MySQL Replication: Principles, Mechanisms, and Practical Applications

This article explains MySQL replication’s background, binlog formats, event types, positioning methods, asynchronous and semi‑synchronous workflows, parallel replication techniques, and real‑world deployment strategies such as HA components, middleware, remote binlog copying, and data‑transfer services, providing a comprehensive guide for building highly available and scalable MySQL infrastructures.

BinlogReplicationdatabase
0 likes · 26 min read
Understanding MySQL Replication: Principles, Mechanisms, and Practical Applications
IT Services Circle
IT Services Circle
Apr 2, 2023 · Databases

Understanding MySQL Master‑Slave Replication: Principles, Lag, and Failover

This article explains MySQL master‑slave replication, covering its architecture, binlog‑based replication process, causes and mitigation of replication lag, and strategies for master‑slave failover, helping readers grasp why and how to use replication for read/write separation, high availability, and backup.

Master‑SlaveReplicationdatabase
0 likes · 12 min read
Understanding MySQL Master‑Slave Replication: Principles, Lag, and Failover
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 29, 2023 · Backend Development

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

ByteHouse evolved its real‑time import pipeline from a community ClickHouse architecture to a custom HaKafka engine and a cloud‑native design, addressing node failures, read‑write conflicts, scaling costs, and latency by introducing two‑level concurrency, memory tables, exactly‑once semantics, and robust fault‑tolerance.

Distributed SystemsKafkaReal-time Ingestion
0 likes · 15 min read
How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka
DataFunTalk
DataFunTalk
Mar 29, 2023 · Big Data

Evolution of ByteHouse Real‑Time Ingestion: From Internal Demands to a Cloud‑Native Architecture

This article details the motivation, architectural evolution, and technical implementations of ByteHouse's real‑time ingestion pipeline, covering internal business requirements, distributed‑system challenges, the custom HaKafka engine, memory‑table optimizations, and the transition to a cloud‑native design that delivers high availability, low‑latency, and exactly‑once semantics.

ByteHouseKafkaReal-time Ingestion
0 likes · 13 min read
Evolution of ByteHouse Real‑Time Ingestion: From Internal Demands to a Cloud‑Native Architecture
Cloud Native Technology Community
Cloud Native Technology Community
Mar 28, 2023 · Cloud Native

How to Set Up Multi‑Cluster Networking with Kube‑OVN OVN‑IC

This guide explains how to enable cross‑cluster pod communication in Kubernetes using Kube‑OVN's OVN‑IC feature, covering prerequisites, single‑node and high‑availability database deployment, automatic and manual route configuration, and cleanup procedures with concrete Docker/Containerd commands and ConfigMap examples.

Cloud NativeKube-OVNKubernetes
0 likes · 15 min read
How to Set Up Multi‑Cluster Networking with Kube‑OVN OVN‑IC
Code Ape Tech Column
Code Ape Tech Column
Mar 27, 2023 · Databases

Horizontal Database Scaling Strategies and Practical Implementation with MariaDB, Keepalived, and ShardingJDBC

This article explains how to expand a sharded database from three to four nodes, compares five migration schemes—including stop‑service, stop‑write, log‑based, dual‑write, and smooth 2N approaches—and provides step‑by‑step instructions for MariaDB installation, master‑slave configuration, Keepalived high‑availability setup, and dynamic data‑source integration using ShardingJDBC.

MariaDBShardingJDBCdatabase sharding
0 likes · 33 min read
Horizontal Database Scaling Strategies and Practical Implementation with MariaDB, Keepalived, and ShardingJDBC
vivo Internet Technology
vivo Internet Technology
Mar 22, 2023 · Operations

Design and Implementation of a Multi‑Layer Load Balancing Platform (VGW)

The article details how a multi‑layer load‑balancing platform (VGW) was designed—combining 7‑layer Nginx, 4‑layer LVS with FULLNAT, and 3‑layer network devices—to achieve business reliability, fault isolation via BGP‑announced VIPs, and high‑throughput performance using DPDK, while providing redundancy at server, link, and cluster levels.

4-layerBGPDPDK
0 likes · 20 min read
Design and Implementation of a Multi‑Layer Load Balancing Platform (VGW)
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 16, 2023 · Databases

How ByteDance’s Abase Achieves Extreme High Availability in KV Storage

This article explains the evolution, architecture, and high‑availability solutions of ByteDance’s Abase KV storage system, detailing its multi‑write design, leader‑less approach, multi‑region deployment, consistency mechanisms, performance optimizations, and real‑world metrics that support billions of requests per second.

ByteDanceDistributed SystemsKV storage
0 likes · 20 min read
How ByteDance’s Abase Achieves Extreme High Availability in KV Storage
Programmer DD
Programmer DD
Mar 16, 2023 · Operations

Why High Availability Matters: Building Fault‑Tolerant Cloud Systems

The article explains how system failures like bugs, security breaches, and cloud outages can cripple businesses, and outlines the concepts of fault tolerance and disaster recovery as essential components of high‑availability architectures to ensure continuous service and protect revenue.

disaster recoveryfault tolerancehigh availability
0 likes · 7 min read
Why High Availability Matters: Building Fault‑Tolerant Cloud Systems
JD Retail Technology
JD Retail Technology
Mar 16, 2023 · Operations

Ensuring High Availability in Software: Collaboration, Architecture, Implementation, and Operational Practices

This article explains the concept of high availability, outlines the challenges of achieving it in complex software delivery chains, and provides practical guidance on improving collaboration efficiency, establishing process standards, designing robust architecture, implementing disciplined coding, executing safe releases, and maintaining operational safeguards.

CollaborationDeploymentarchitecture
0 likes · 11 min read
Ensuring High Availability in Software: Collaboration, Architecture, Implementation, and Operational Practices
ITPUB
ITPUB
Mar 14, 2023 · Big Data

How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters

This article explains why disaster‑recovery and active‑active architectures are essential for OLAP MPP data‑warehouse clusters, outlines the specific RPO/RTO requirements for batch and real‑time workloads, and compares several data‑synchronization techniques and active‑active deployment models with their advantages and drawbacks.

Active-ActiveMPPOLAP
0 likes · 12 min read
How to Build Real-Time Active‑Active Disaster Recovery for OLAP MPP Clusters
IT Architects Alliance
IT Architects Alliance
Mar 14, 2023 · Operations

Key Practices for Achieving High Availability in Internet Services

The article outlines essential high‑availability techniques for internet‑scale systems, covering availability metrics, microservice modularization, database redundancy, load balancing, rate limiting, circuit breaking, isolation, retry strategies, rollback plans, stress testing, monitoring, and on‑call procedures.

OperationsSystem Designhigh availability
0 likes · 10 min read
Key Practices for Achieving High Availability in Internet Services
Tencent Cloud Developer
Tencent Cloud Developer
Mar 13, 2023 · Cloud Computing

Design Principles for High‑Availability System Architecture

The article outlines a comprehensive high‑availability architecture framework across six layers—development standards, application services, storage, product fallback, operations deployment, and emergency response—detailing design principles such as stateless services, elastic scaling, redundant storage, robust monitoring, gray releases, and chaos engineering to ensure resilient, continuously available systems.

DeploymentScalabilitySystem Architecture
0 likes · 25 min read
Design Principles for High‑Availability System Architecture
dbaplus Community
dbaplus Community
Mar 5, 2023 · Backend Development

How to Achieve Fast and Stable MySQL Data Center Migration at Scale

This article details the background, migration options, and step‑by‑step automated procedures used by a large‑scale e‑commerce platform to safely move over 400 MySQL clusters, comparing expansion‑plus‑master‑slave switching with cascading replication and explaining the chosen fast, reliable solution.

AutomationCascading Replicationdatabase migration
0 likes · 9 min read
How to Achieve Fast and Stable MySQL Data Center Migration at Scale
Laravel Tech Community
Laravel Tech Community
Mar 1, 2023 · Operations

Comprehensive Guide to Installing Nginx, Configuring Reverse Proxy, Load Balancing, SSL, and High‑Availability with Keepalived and LVS

This article provides a step‑by‑step tutorial on installing Nginx, setting up reverse proxy and various load‑balancing methods, configuring upstream directives, enabling SSL, and building high‑availability clusters using Keepalived and LVS with detailed command examples and configuration snippets.

LVSNginxSSL
0 likes · 21 min read
Comprehensive Guide to Installing Nginx, Configuring Reverse Proxy, Load Balancing, SSL, and High‑Availability with Keepalived and LVS
Java High-Performance Architecture
Java High-Performance Architecture
Mar 1, 2023 · Backend Development

Designing a High‑Performance Membership System Using ES, Redis, and MySQL

This article details how a large‑scale membership platform achieves high performance and high availability by employing a dual‑center Elasticsearch cluster, traffic‑isolated ES clusters, deep ES optimizations, Redis caching with distributed locks, dual‑center MySQL partitioning, seamless data migration, and fine‑grained flow‑control and degradation strategies.

BackendSystem Architecturehigh availability
0 likes · 21 min read
Designing a High‑Performance Membership System Using ES, Redis, and MySQL
dbaplus Community
dbaplus Community
Feb 26, 2023 · Backend Development

Why Redis Is More Than a Cache: Architecture, Persistence, and Scaling Explained

This article provides a comprehensive overview of Redis, covering its role as an in‑memory data‑structure server, various deployment topologies such as single instances, high‑availability, Sentinel, replication, and clustering, as well as its persistence mechanisms including RDB, AOF, and fork‑based snapshots.

ClusterIn-Memory DatabasePersistence
0 likes · 17 min read
Why Redis Is More Than a Cache: Architecture, Persistence, and Scaling Explained
Open Source Linux
Open Source Linux
Feb 20, 2023 · Operations

Master Nginx: From Basics to Advanced Load Balancing, Caching and High Availability

This comprehensive guide walks you through Nginx fundamentals, environment setup, reverse‑proxy load balancing, static‑dynamic separation, resource compression, buffering, caching, IP whitelist/blacklist, cross‑origin handling, anti‑hotlinking, large‑file transfer, SSL configuration, high‑availability with Keepalived, and performance‑tuning techniques for production‑grade deployments.

Performance Optimizationhigh availabilityreverse proxy
0 likes · 44 min read
Master Nginx: From Basics to Advanced Load Balancing, Caching and High Availability
Architect
Architect
Feb 15, 2023 · Backend Development

Comprehensive Nginx Guide: Installation, Configuration, Load Balancing, Caching, Security, and Performance Optimization

This extensive tutorial walks through Nginx fundamentals, environment setup, reverse‑proxy load balancing, static‑dynamic separation, resource compression, buffering, proxy caching, IP black‑white listing, anti‑hotlinking, large‑file handling, SSL configuration, high‑availability with Keepalived, and key performance‑tuning techniques for production deployments.

NginxPerformance OptimizationSSL
0 likes · 44 min read
Comprehensive Nginx Guide: Installation, Configuration, Load Balancing, Caching, Security, and Performance Optimization
Architecture Digest
Architecture Digest
Feb 10, 2023 · Operations

Design and Implementation of Vivo Jenkins Scheduler for High Availability and Resource Scheduling

This article analyzes common Jenkins high‑availability challenges, reviews existing industry solutions, and presents Vivo's own Jenkins Scheduler architecture—including API‑gateway, event center, scheduling algorithms, flow‑control, and callback mechanisms—demonstrating its production deployment and future container‑based evolution.

DevOpsJenkinsResource Management
0 likes · 12 min read
Design and Implementation of Vivo Jenkins Scheduler for High Availability and Resource Scheduling
Bilibili Tech
Bilibili Tech
Feb 7, 2023 · Cloud Native

Bilibili Configuration Center (Config & Paladin): Architecture, Features, and Performance

Bilibili’s Config Center evolved from the 2017 Config v1 monolith—offering unified UI, MySQL storage, and long‑polling—to the Raft‑based Paladin v2, which adds lifecycle management, tenant isolation, incremental publishing, high‑throughput caching, multi‑active deployment, validation and rich tooling, handling hundreds of thousands of configs and tens of thousands of concurrent clients with sub‑50 ms push latency while planning deeper K8s integration.

Distributed SystemsMicroservicesPaladin
0 likes · 15 min read
Bilibili Configuration Center (Config & Paladin): Architecture, Features, and Performance
21CTO
21CTO
Feb 5, 2023 · Backend Development

How Meituan Scaled Its Code Hosting Platform to Millions of Repositories

This article details Meituan's three‑stage evolution of its self‑developed Code platform—from a single‑machine service to a multi‑machine read‑write‑separated system and finally to a distributed, sharded architecture—highlighting the scalability and high‑availability challenges faced and the engineering solutions implemented.

Backend ArchitectureDistributed SystemsScalability
0 likes · 24 min read
How Meituan Scaled Its Code Hosting Platform to Millions of Repositories
Meituan Technology Team
Meituan Technology Team
Feb 2, 2023 · R&D Management

Design and Evolution of Meituan's Distributed Code Hosting Platform

Meituan’s home‑grown Code platform evolved from a single‑server Git service to a distributed, sharded system with multi‑active replication, using Go‑based HTTP/SSH proxies, gRPC communication, and version‑based routing to achieve horizontal scalability, high availability, and millions of daily Git operations.

Distributed SystemsGitMeituan
0 likes · 22 min read
Design and Evolution of Meituan's Distributed Code Hosting Platform
ITPUB
ITPUB
Jan 31, 2023 · Databases

How Pigsty Turns PostgreSQL into a Cost‑Effective Open‑Source RDS Alternative

Pigsty is an open‑source platform that upgrades PostgreSQL across six dimensions—observability, reliability, availability, maintainability, extensibility, and interoperability—delivering enterprise‑grade features, built‑in monitoring, automatic failover, backup, and performance tuning while cutting cloud database costs dramatically.

Cost OptimizationObservabilityPostgreSQL
0 likes · 22 min read
How Pigsty Turns PostgreSQL into a Cost‑Effective Open‑Source RDS Alternative
Java High-Performance Architecture
Java High-Performance Architecture
Jan 28, 2023 · Backend Development

Unlock Nginx Mastery: Load Balancing, Caching, SSL, and High‑Availability Explained

This comprehensive guide walks you through Nginx fundamentals, from installing and configuring load balancing, static asset handling, compression, buffering, caching, IP access control, cross‑origin support, anti‑hotlinking, large file handling, SSL setup, high‑availability with Keepalived, and performance tuning techniques for robust backend services.

BackendProxySSL
0 likes · 44 min read
Unlock Nginx Mastery: Load Balancing, Caching, SSL, and High‑Availability Explained
Java High-Performance Architecture
Java High-Performance Architecture
Jan 24, 2023 · Backend Development

How to Build Highly Available Backend APIs: 10 Essential Design Principles

This article explains why high availability is crucial for backend services and outlines ten practical design principles—including dependency control, avoiding single points, load balancing, isolation, rate limiting, circuit breaking, async processing, degradation, gray release, and chaos engineering—to help developers create resilient APIs.

Backendapi-designfault tolerance
0 likes · 10 min read
How to Build Highly Available Backend APIs: 10 Essential Design Principles
21CTO
21CTO
Jan 19, 2023 · Backend Development

How Baidu Built a Scalable Asset Wallet for 100M+ Users: Architecture & Lessons

This article details the end‑to‑end design and implementation of Baidu App’s personal wallet, covering background, business flow, system architecture, data synchronization, multi‑level caching, read‑write separation, consistency mechanisms, configuration management, and database sharding to achieve high availability for billions of users.

BackendSystem Architecturecaching
0 likes · 16 min read
How Baidu Built a Scalable Asset Wallet for 100M+ Users: Architecture & Lessons
Top Architect
Top Architect
Jan 19, 2023 · Backend Development

Comprehensive Guide to Nginx: Installation, Configuration, and Performance Optimization

This extensive tutorial walks through installing Nginx from source, setting up environment, configuring reverse proxy load balancing, static resource handling, compression, buffering, caching, IP black‑white lists, anti‑hotlinking, large file transfer, SSL certificates, high availability with Keepalived, and advanced performance tuning techniques.

NginxPerformance OptimizationSSL
0 likes · 44 min read
Comprehensive Guide to Nginx: Installation, Configuration, and Performance Optimization
Tencent Cloud Developer
Tencent Cloud Developer
Jan 16, 2023 · Cloud Native

Scaling Sheep, Sheep, Sheep to Support 100 Million Daily Active Users: A Tencent Cloud Case Study

Tencent Cloud helped the viral game 《羊了个羊》 scale from 5,000 QPS to support over 100 million daily active users in a week by using serverless Kubernetes auto‑scaling, real‑time logging, WAF/Anti‑DDoS protection, CDN, and read‑write separation with Redis, achieving high performance, availability, and scalability.

CDNCloud NativeTKE Serverless
0 likes · 12 min read
Scaling Sheep, Sheep, Sheep to Support 100 Million Daily Active Users: A Tencent Cloud Case Study
Alibaba Cloud Native
Alibaba Cloud Native
Jan 13, 2023 · Cloud Native

Mastering Nacos: From Origins to High‑Availability Configuration Management

This article provides a comprehensive overview of Nacos, covering its origins, evolution of the configuration center, typical use cases, step‑by‑step integration methods, troubleshooting tips, high‑availability mechanisms, commercial MSE advantages, migration guides, and the upcoming features in Nacos 3.0.

Cloud NativeConfiguration ManagementJava
0 likes · 14 min read
Mastering Nacos: From Origins to High‑Availability Configuration Management
Top Architect
Top Architect
Jan 12, 2023 · Operations

Comprehensive Guide to Installing Nginx, Configuring Reverse Proxy, Load Balancing, and High Availability with Keepalived and LVS

This article provides a step‑by‑step tutorial on installing Nginx, setting up reverse proxy and various load‑balancing methods, configuring upstream directives, deploying Keepalived for high‑availability failover, and building an LVS‑DR cluster to achieve robust, production‑grade traffic distribution.

LVSNginxhigh availability
0 likes · 25 min read
Comprehensive Guide to Installing Nginx, Configuring Reverse Proxy, Load Balancing, and High Availability with Keepalived and LVS
Efficient Ops
Efficient Ops
Jan 10, 2023 · Big Data

Why a Single Kafka Broker Failure Can Halt All Consumers – Deep Dive into HA

This article explains Kafka's multi‑replica design, ISR mechanism, leader election rules, and producer acknowledgment settings, then shows how the built‑in __consumer_offset topic with a single replica can cause a whole cluster to become unavailable when one broker crashes, and offers practical fixes.

Consumer OffsetsISRKafka
0 likes · 9 min read
Why a Single Kafka Broker Failure Can Halt All Consumers – Deep Dive into HA