Tagged articles

2122 articles

Page 6 of 22

Jul 15, 2023 · Operations

High‑Availability Architecture for a Large‑Scale Membership System

The article describes how a membership system serving billions of users across multiple platforms achieves high performance and high availability through dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis caching with distributed locks, dual‑center MySQL partitioning, and fine‑grained flow‑control and degradation strategies.

Backend ArchitectureDistributed SystemsElasticsearch

0 likes · 25 min read

High‑Availability Architecture for a Large‑Scale Membership System

政采云技术

Jul 10, 2023 · Cloud Computing

RocketMQ High Availability Mechanism: DLedger and Raft-Based Leader Election

This article explores RocketMQ's high availability mechanism, focusing on DLedger's implementation of Raft-based leader election for distributed message middleware, covering both traditional and Controller mode architectures.

Cloud ComputingDLedgerDistributed Systems

0 likes · 18 min read

RocketMQ High Availability Mechanism: DLedger and Raft-Based Leader Election

NetEase Cloud Music Tech Team

Jul 10, 2023 · Big Data

Design and Implementation of the Log Reporting, Collection, and Distribution Pipeline in NetEase Cloud Music's Corona Front‑end Monitoring System

The article details NetEase Cloud Music’s Corona monitoring pipeline, explaining how SDKs report logs via an HTTP service, how a transmission layer normalizes and stores them, how a Flume‑like collector forwards logs to HBase and Kafka, and how Flink tasks shard and filter streams for various monitoring services while handling traffic spikes and offering an independent Node.js channel for other business units.

Distributed SystemsFlinkFrontend

0 likes · 10 min read

Design and Implementation of the Log Reporting, Collection, and Distribution Pipeline in NetEase Cloud Music's Corona Front‑end Monitoring System

Spring Full-Stack Practical Cases

Jul 9, 2023 · Backend Development

Mastering System Scalability: How the AKF Cube Guides X, Y, Z Expansions

This article explains the AKF Cube methodology for scaling distributed systems, detailing how horizontal replication (X axis), functional decomposition (Y axis), and user‑based sharding (Z axis) can be combined to boost performance while balancing implementation costs.

AKF cubeDistributed SystemsScalability

0 likes · 12 min read

Mastering System Scalability: How the AKF Cube Guides X, Y, Z Expansions

Architects Research Society

Jul 7, 2023 · Operations

Design Patterns and Principles for Building Large‑Scale Systems

This article outlines key design patterns and principles—such as scalability, idempotency, asynchronous processing, health checks, circuit breakers, feature flags, bulkheads, service discovery, retries, metrics, rate limiting, back‑pressure, and canary releases—that enable large‑scale, reliable, and resilient distributed systems.

Distributed SystemsObservabilityReliability

0 likes · 16 min read

Design Patterns and Principles for Building Large‑Scale Systems

Xiao Lou's Tech Notes

Jul 7, 2023 · Databases

How Didi Cut ClickHouse CPU Usage by 90% with a Simple Thread Check Fix

This article walks through how Didi identified excessive CPU consumption by ClickHouse background move threads, diagnosed the root cause using top and pstack, and applied a lightweight code guard that reduced CPU load from 30% to under 5%, improving overall cluster performance.

CPUClickHouseDistributed Systems

0 likes · 9 min read

How Didi Cut ClickHouse CPU Usage by 90% with a Simple Thread Check Fix

政采云技术

Jun 29, 2023 · Backend Development

Understanding RocketMQ: Architecture, Modules, and Deployment Essentials

This article provides a comprehensive overview of RocketMQ, covering its origin, core concepts, component roles, cluster deployment architecture, workflow, feature details, persistence mechanisms, and cleanup policies, offering developers a solid foundation for using this high‑performance messaging middleware.

ApacheArchitectureDistributed Systems

0 likes · 18 min read

Understanding RocketMQ: Architecture, Modules, and Deployment Essentials

Alibaba Cloud Native

Jun 26, 2023 · Cloud Native

How RocketMQ Evolved Its High‑Availability Architecture for Cloud‑Native Deployments

This article examines RocketMQ's high‑availability evolution—from early master‑slave and Raft‑based designs to the v5 DLedger fusion model—detailing replica groups, data sharding, election mechanisms, replication strategies, metric trade‑offs, log‑divergence handling, controller roles, heartbeat optimizations, and comparisons with Kafka and Pulsar, all illustrated with diagrams and code snippets.

Cloud NativeDLedgerDistributed Systems

0 likes · 36 min read

How RocketMQ Evolved Its High‑Availability Architecture for Cloud‑Native Deployments

DataFunSummit

Jun 21, 2023 · Databases

Forum on Building Ultra‑Scale Storage Systems: Insights from Baidu, Meituan, Ant Group, Xiaomi and Baidu Cloud

The forum gathers senior experts from Baidu, Meituan, Ant Group, Xiaomi and Baidu Cloud to share practical experiences and future trends on constructing ultra‑large‑scale file, block, KV and NoSQL storage systems, focusing on low‑cost, high‑performance solutions and architectural challenges.

Distributed SystemsKV storageblock storage

0 likes · 8 min read

Forum on Building Ultra‑Scale Storage Systems: Insights from Baidu, Meituan, Ant Group, Xiaomi and Baidu Cloud

dbaplus Community

Jun 20, 2023 · Operations

How Agricultural Bank Built a Chaos Engineering Platform for Resilience

The article outlines the Agricultural Bank of China's initiative to adopt chaos engineering, describing the challenges of modern distributed systems, the design and capabilities of their in‑house chaos platform, product research, industry comparisons, practical use cases across development, operations and disaster recovery, and future development directions.

Cloud NativeDistributed SystemsPlatform Development

0 likes · 14 min read

How Agricultural Bank Built a Chaos Engineering Platform for Resilience

Open Source Linux

Jun 20, 2023 · Operations

Mastering Load Balancing: Types, Architectures, and Algorithms Explained

This article explains why high‑performance clusters are complex, introduces the three main load‑balancing categories—DNS, hardware, and software—describes their definitions, advantages, and drawbacks, outlines typical combined architectures, and reviews common load‑balancing algorithms such as round‑robin, weighted round‑robin, least‑load, performance‑optimal, and hash‑based methods.

AlgorithmsDistributed SystemsHardware

0 likes · 6 min read

Mastering Load Balancing: Types, Architectures, and Algorithms Explained

FunTester

Jun 19, 2023 · Big Data

Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability

This article provides a comprehensive overview of Kafka's architecture and fundamental concepts, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower synchronization, offset handling, message storage at both logical and physical layers, as well as producer and consumer workflows, partition assignment strategies, rebalancing, log management, zero‑copy I/O, and reliability mechanisms.

Distributed SystemsKafkaLog Management

0 likes · 22 min read

Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability

Baidu Geek Talk

Jun 19, 2023 · Operations

How Baidu’s Tianyan Log Service Overcomes ELK’s Scaling and Performance Limits

This article examines the challenges of logging in distributed services, compares the traditional ELK stack with Baidu's Tianyan solution, details Tianyan's architecture—including Ingest, Store, Consumer, Elastic Agent, Fleet, APM, Beats, and Disruptor‑based high‑throughput pipelines—covers resource isolation, dynamic cleanup, and best‑practice recommendations for building a scalable, low‑latency log platform.

Distributed SystemsElastic StackLog Management

0 likes · 26 min read

How Baidu’s Tianyan Log Service Overcomes ELK’s Scaling and Performance Limits

Open Source Linux

Jun 16, 2023 · Backend Development

How Netflix’s Cloud Gateway Cuts Errors with Adaptive Load Balancing

Netflix’s cloud‑gateway team redesigned its load‑balancing stack—combining client latency, server utilization, and probabilistic choice‑of‑2 algorithms—to dramatically lower error rates, improve request distribution, and enhance fault‑tolerance for millions of requests per second.

Distributed SystemsNetflixadaptive algorithms

0 likes · 19 min read

How Netflix’s Cloud Gateway Cuts Errors with Adaptive Load Balancing

ITPUB

Jun 15, 2023 · Databases

How Domestic Databases Are Shaping China’s Financial Digital Transformation

Amid China’s push for digital and domestic technology, the article examines the evolution of native database products, the opportunities and challenges they face—especially in the financial sector—and how policy, cloud‑native architectures, distributed systems, and multi‑cloud demands are driving the next wave of innovation.

ChinaDigital TransformationDistributed Systems

0 likes · 10 min read

How Domestic Databases Are Shaping China’s Financial Digital Transformation

Goodme Frontend Team

Jun 14, 2023 · Frontend Development

Top 5 Must‑Read Articles for Frontend Engineers: Gateways, Raft, Svelte & More

This newsletter curates five high‑quality articles covering gateway system design, Google I/O front‑end highlights, the Raft consensus algorithm, Svelte internals, and advanced image‑loading techniques, inviting readers to share and discuss valuable industry insights.

Backend IntegrationDistributed SystemsFrontend Development

0 likes · 5 min read

Top 5 Must‑Read Articles for Frontend Engineers: Gateways, Raft, Svelte & More

Sanyou's Java Diary

Jun 12, 2023 · Backend Development

Master RocketMQ 4.9.x Consumption: Architecture, Load Balancing, and Retry Strategies

This article walks through RocketMQ 4.9.x’s consumption architecture, explaining the roles of NameServer, Broker, Producer and Consumer, the publish‑subscribe model, storage structures, load‑balancing algorithms, long‑polling, concurrent and ordered consumption, progress persistence, and the built‑in retry mechanism.

ConsumerDistributed SystemsMessage Queue

0 likes · 28 min read

Master RocketMQ 4.9.x Consumption: Architecture, Load Balancing, and Retry Strategies

DeWu Technology

Jun 7, 2023 · Backend Development

Ensuring Data Consistency Across Microservices: Strategies and Design Principles

This article examines why data consistency between microservices is critical, defines key terminology, and presents two practical approaches—business‑side final consistency and platform‑side final consistency—detailing their core ideas, design principles, workflow diagrams, and real‑world implementation considerations such as idempotency, storage choices, latency tolerance, state‑machine design, concurrency control, and observability.

Data ConsistencyDistributed SystemsIdempotency

0 likes · 17 min read

Ensuring Data Consistency Across Microservices: Strategies and Design Principles

Code Ape Tech Column

Jun 6, 2023 · Backend Development

Business Compensation Mechanisms: Rollback and Retry Strategies in Distributed Systems

The article explains business compensation mechanisms in distributed microservice architectures, detailing rollback and retry approaches, their implementation patterns, strategies, and practical considerations for achieving eventual consistency while handling failures and outlines best practices for idempotency, monitoring, and workflow engine design.

Distributed SystemsRetrybusiness compensation

0 likes · 14 min read

Business Compensation Mechanisms: Rollback and Retry Strategies in Distributed Systems

Top Architect

Jun 5, 2023 · Big Data

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

This article comprehensively explores Kafka’s core concepts, architecture, and the techniques it employs—such as ack strategies, replica synchronization, high‑watermark, leader‑epoch, zero‑copy, batch sending, compression, and reactor‑based networking—to achieve both strong reliability and high throughput in distributed messaging systems.

Distributed SystemsKafkaMessage Queue

0 likes · 31 min read

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

Alibaba Cloud Developer

Jun 5, 2023 · Databases

Mastering Cache Consistency: Strategies to Prevent Stale Data in High‑Concurrency Systems

This article examines why cache‑database consistency problems arise under high concurrency, compares common update orders, explains delayed double‑delete and cache‑aside patterns, and presents practical solutions such as retry mechanisms, message queues, and MySQL binlog subscription to keep data synchronized.

Cache ConsistencyDistributed Systemscache-aside

0 likes · 11 min read

Mastering Cache Consistency: Strategies to Prevent Stale Data in High‑Concurrency Systems

Alibaba Cloud Big Data AI Platform

Jun 5, 2023 · Artificial Intelligence

How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

The Dynamic Graph Sampling (DGS) service, built on GraphLearn, delivers sub‑20 ms latency for real‑time GNN inference on large, constantly evolving graphs by separating storage from computation, using event‑driven pre‑sampling, lazy multi‑hop concatenation, and a publish‑subscribe architecture that scales linearly across distributed workers.

Alibaba CloudDistributed SystemsGraphLearn

0 likes · 12 min read

How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

Architects Research Society

Jun 4, 2023 · Big Data

Understanding Transactions in Apache Kafka

This article explains the design, semantics, and practical usage of Apache Kafka's transaction API, covering why transactions are needed for exactly‑once processing, the underlying atomic multi‑partition writes, zombie fencing, consumer guarantees, Java API details, performance considerations, and operational best practices.

Apache KafkaDistributed SystemsExactly-Once

0 likes · 19 min read

Understanding Transactions in Apache Kafka

MaGe Linux Operations

Jun 1, 2023 · Backend Development

How Netflix’s New Load‑Balancing Algorithm Cuts Errors by Orders of Magnitude

Netflix’s cloud‑gateway team redesigned Zuul’s load‑balancing using a combination of client latency, server utilization, choice‑of‑2 and Join‑the‑Shortest‑Queue algorithms, adding server‑reported metrics, adaptive thresholds and statistical decay, which dramatically reduced error rates, latency and improved traffic distribution in production.

Distributed SystemsNetflixPerformance

0 likes · 20 min read

How Netflix’s New Load‑Balancing Algorithm Cuts Errors by Orders of Magnitude

Liangxu Linux

May 28, 2023 · Backend Development

Preventing Redis Cache Penetration, Avalanche, and Thundering Herd

This article explains the causes of Redis cache penetration, avalanche, and thundering herd, and provides practical mitigation strategies such as caching null values, using white‑lists, Bloom filters, pre‑warming hot keys, staggered expirations, multi‑level caching, and lock mechanisms.

BackendCacheDistributed Systems

0 likes · 7 min read

Preventing Redis Cache Penetration, Avalanche, and Thundering Herd

Didi Tech

May 26, 2023 · Big Data

Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System

Didi’s redesigned Spatial‑Temporal Supply‑Demand System replaces a single‑Redis bottleneck with a multi‑cluster routing layer, semantic sharding, multi‑level caching and delayed queues, achieving higher horizontal scalability, fault isolation, ~30 % latency reduction, increased cache hit rates, fewer query nodes, and faster, code‑free feature configuration.

Configuration ManagementDistributed SystemsGolang

0 likes · 19 min read

Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System

dbaplus Community

May 25, 2023 · Databases

15 Powerful Redis Patterns for Scalable Backend Systems

This article presents fifteen practical Redis usage patterns—including caching, distributed sessions, locks, global IDs, counters, bitmaps, shopping carts, timelines, message queues, lotteries, likes, product tags, filtering, follow relationships, and ranking—illustrating how each can be implemented with commands and code snippets to build efficient, scalable backend services.

Data StructuresDistributed Systems

0 likes · 9 min read

15 Powerful Redis Patterns for Scalable Backend Systems

Code Ape Tech Column

May 22, 2023 · Backend Development

Vertical Performance Optimization: Load Balancing Architecture and Practices

This article explores the evolution of load‑balancing architectures from Alibaba’s early systems to modern micro‑service meshes, detailing DNS, hardware, and software solutions, common algorithms, and real‑world case studies such as Double‑11, China Railway 12306, WeChat red packets, and Douyin, highlighting performance, scalability, and reliability considerations.

Distributed SystemsService Mesh

0 likes · 17 min read

Vertical Performance Optimization: Load Balancing Architecture and Practices

DataFunTalk

May 21, 2023 · Databases

Graph Database Storage Techniques and Practices with Galaxybase

This article introduces RDF and property graph models, explains the core goals of graph database storage, compares mainstream storage solutions such as array, linked‑list and LSM‑Tree approaches, and presents practical deployment experiences of the Galaxybase distributed graph database.

Distributed SystemsGalaxybaseGraph Database

0 likes · 23 min read

Graph Database Storage Techniques and Practices with Galaxybase

Selected Java Interview Questions

May 17, 2023 · Backend Development

Effective Cache Strategies for Large Distributed Systems

This article explains how to design and use various client‑side, CDN, and server‑side caching techniques—including HTTP Cache‑Control, Redis data structures, cache consistency patterns, and mitigation of cache penetration, breakdown, and avalanche—to improve performance and reliability of high‑traffic distributed applications.

CDNCache ConsistencyDistributed Systems

0 likes · 23 min read

Effective Cache Strategies for Large Distributed Systems

ITPUB

May 10, 2023 · Cloud Native

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

This article explains how Meituan’s storage team designed the MStore distributed storage platform to separate storage and compute, addressing scaling, cost, and reliability challenges of monolithic architectures, and details its cloud‑native components, data model, performance optimizations, observability, and the derived EBS block‑storage service.

Distributed SystemsMStorePerformance

0 likes · 16 min read

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

HelloTech

May 8, 2023 · Artificial Intelligence

One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

The article presents a comprehensive one‑stop AI platform that unifies training, model, feature, and decision services across cloud, edge, mobile, Flink, and application environments, detailing its architecture, the limitations of cloud‑centric inference, the advantages of localized inference, and the challenges and solutions for model and feature localization, SDK design, and future AutoML enhancements.

AI PlatformDistributed SystemsFlink

0 likes · 17 min read

One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

Architects Research Society

May 6, 2023 · Databases

Understanding Eventual Consistency in Apache CouchDB

This article explains how Apache CouchDB achieves eventual consistency through its MVCC architecture, CAP theorem trade‑offs, incremental replication, and document‑level versioning, illustrating concepts such as local consistency, conflict resolution, and practical use‑cases for building scalable distributed systems.

CouchDBDistributed SystemsMVCC

0 likes · 21 min read

Understanding Eventual Consistency in Apache CouchDB

Architect's Guide

May 6, 2023 · Backend Development

Implementing a WebSocket Load‑Balancing Library for Microservice Architectures

This article introduces a Spring‑Boot library that uses a configuration annotation to enable WebSocket load‑balancing across microservice instances, detailing its usage, abstract design, connection management, message routing, and extensibility for various long‑connection protocols.

Distributed Systemsjavaspring-boot

0 likes · 12 min read

Implementing a WebSocket Load‑Balancing Library for Microservice Architectures

Top Architect

May 3, 2023 · Backend Development

Understanding RPC: Principles, Implementation Details, and Code Walkthrough

This article explains the fundamentals of Remote Procedure Call (RPC), covering its definition, core challenges, service registration and discovery with Zookeeper, client proxy generation, network transmission using Netty, serialization and compression, server-side request handling via reflection or Javassist, and performance comparisons between proxy strategies.

Distributed SystemsJavassistNetty

0 likes · 24 min read

Understanding RPC: Principles, Implementation Details, and Code Walkthrough

Top Architect

Apr 30, 2023 · Backend Development

Kafka Core Concepts, Architecture, Performance, and Operational Practices

This article provides a comprehensive overview of Kafka, covering its core value as a message queue, fundamental concepts, cluster architecture, log storage mechanisms, zero‑copy data transfer, high‑throughput and high‑availability design, consumer group behavior, rebalance strategies, and practical operational commands for managing topics, partitions, and offsets.

BackendDistributed SystemsKafka

0 likes · 31 min read

Kafka Core Concepts, Architecture, Performance, and Operational Practices

AntTech

Apr 28, 2023 · Information Security

Threshold Proxy Re‑Encryption (TPRE) with National Cryptographic Algorithms for Secure Data Sharing

The article explains how cryptographic access control, especially a hybrid‑encrypted Threshold Proxy Re‑Encryption scheme built on national SM2/SM3/SM4 algorithms, offers high‑strength, decentralized, and efficient data authorization and sharing, addressing the limitations of traditional role‑based models.

Distributed Systemsaccess controlcryptography

0 likes · 5 min read

Threshold Proxy Re‑Encryption (TPRE) with National Cryptographic Algorithms for Secure Data Sharing

Architects Research Society

Apr 25, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka's transaction API, describes how exactly‑once processing is achieved in stream‑processing applications, outlines the Java client usage, and discusses the internal components, performance considerations, and best‑practice tips for developers.

Distributed SystemsExactly-OnceKafka

0 likes · 16 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

Architecture Digest

Apr 23, 2023 · Backend Development

Kafka Core Concepts, Architecture, Performance Optimizations, and Production Deployment Guide

This article provides a comprehensive technical overview of Kafka, covering its core message‑queue value, architecture components such as producers, consumers, topics, partitions and replication, high‑performance mechanisms like zero‑copy and OS cache, resource planning for disks, memory, CPU and network, operational tools and commands, consumer‑group management, rebalance strategies, and internal scheduling mechanisms such as the time‑wheel.

Backend ArchitectureDistributed SystemsKafka

0 likes · 30 min read

Kafka Core Concepts, Architecture, Performance Optimizations, and Production Deployment Guide

Big Data Technology Architecture

Apr 22, 2023 · Big Data

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

This article comprehensively explores Kafka’s core architecture, explaining how asynchronous decoupling and traffic shaping are achieved, detailing the roles of producers, brokers, consumers, and ZooKeeper, and analyzing the reliability and performance techniques such as ACK policies, replication, idempotent and transactional producers, page‑cache flushing, zero‑copy, compression, batching, and load‑balancing strategies.

Distributed SystemsMessage QueueReliability

0 likes · 31 min read

JD Retail Technology

Apr 19, 2023 · Databases

Understanding Distributed Data Consistency: CAP, BASE, and Transaction Solutions

This article explains why achieving data consistency in modern distributed systems is challenging, reviews ACID properties of local databases, discusses the CAP and BASE theorems, examines event ordering mechanisms, and compares practical solutions such as two‑phase commit, XA, local message tables, and MQ‑based transaction models.

BASE theoremCAP theoremData Consistency

0 likes · 19 min read

Understanding Distributed Data Consistency: CAP, BASE, and Transaction Solutions

Huawei Cloud Developer Alliance

Apr 19, 2023 · Databases

Huawei GaussDB Wins Top Science Award: Cutting‑Edge Cloud‑Native Innovations

Huawei Cloud’s GaussDB, a high‑performance, highly available, elastic, AI‑optimized and secure cloud‑native distributed database, earned the top Science & Technology Progress award, showcasing breakthrough innovations and wide adoption in major financial institutions.

AI OptimizationDistributed SystemsGaussDB

0 likes · 6 min read

Huawei GaussDB Wins Top Science Award: Cutting‑Edge Cloud‑Native Innovations

Huolala Tech

Apr 17, 2023 · Big Data

How HuoLala Accelerated Ad‑hoc Queries with a Hybrid Offline Engine

This article describes how HuoLala identified slow ad‑hoc query performance in its Hive‑on‑Tez stack, surveyed comparable industry solutions, and built a multi‑engine hybrid offline service that dramatically improves query latency, outlines its architecture, key design decisions, production impact, and future roadmap.

Big DataDistributed SystemsSQL Routing

0 likes · 12 min read

How HuoLala Accelerated Ad‑hoc Queries with a Hybrid Offline Engine

Sanyou's Java Diary

Apr 13, 2023 · Backend Development

Master Dubbo: High‑Performance Java RPC Framework Explained

This comprehensive guide introduces Dubbo, a high‑performance Java RPC framework, covering its core concepts, architecture, configuration methods, load‑balancing and fault‑tolerance strategies, underlying communication mechanisms, and extension points, helping developers build robust distributed applications.

Distributed SystemsDubboJava RPC

0 likes · 24 min read

Master Dubbo: High‑Performance Java RPC Framework Explained

ITPUB

Apr 13, 2023 · Fundamentals

Mastering Distributed Transactions: From CAP to BASE and Practical Solutions

This article explains distributed transactions, the reasons they arise, the CAP and BASE theories that guide consistency trade‑offs, and outlines strong, eventual, and weak consistency solutions along with popular frameworks for implementing them in modern distributed systems.

BASE theoryCAP theoryDistributed Systems

0 likes · 11 min read

Mastering Distributed Transactions: From CAP to BASE and Practical Solutions

政采云技术

Apr 13, 2023 · Backend Development

Understanding the AKF Scale Cube: X, Y, Z Axes for System Scalability and Their Application to Kafka and Redis

The article explains the AKF Scale Cube model—horizontal replication (X axis), functional decomposition (Y axis), and data/service partitioning (Z axis)—and demonstrates how these three scaling dimensions can be applied to backend systems such as Kafka and Redis to achieve high availability, performance, and fault isolation.

AKF Scale CubeDistributed SystemsKafka

0 likes · 19 min read

Understanding the AKF Scale Cube: X, Y, Z Axes for System Scalability and Their Application to Kafka and Redis

IT Architects Alliance

Apr 13, 2023 · Backend Development

WebSocket Load Balancing Across Microservices Using a Single Annotation

This article explains how to solve the WebSocket message‑delivery problem in a micro‑service environment by introducing a lightweight library that uses a custom annotation to automatically forward messages between service instances, with detailed design, configuration, and code examples.

BackendDistributed SystemsMicroservices

0 likes · 12 min read

WebSocket Load Balancing Across Microservices Using a Single Annotation

Java Architect Essentials

Apr 12, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System

This article details the design and implementation of a high‑availability, billion‑scale membership system, covering Elasticsearch dual‑center clusters, traffic‑isolated architectures, deep ES optimizations, Redis caching strategies, MySQL migration with dual‑center partitioning, abnormal member relationship handling, and future fine‑grained flow‑control and degradation plans.

Distributed SystemsElasticsearchFlow Control

0 likes · 20 min read

High‑Availability Architecture for a Billion‑Scale Membership System

HomeTech

Apr 5, 2023 · Backend Development

Design and Implementation of a Real‑Time Cache Update System Based on Kafka and Distributed Cache

This article presents a comprehensive design and implementation of a real‑time cache update system that leverages Kafka‑driven database change streams, a centralized cache scheduling center, executor registration, broadcast and fail‑over scheduling, and a lightweight SDK to achieve millisecond‑level cache consistency for C‑end services.

BackendCacheDistributed Systems

0 likes · 10 min read

Design and Implementation of a Real‑Time Cache Update System Based on Kafka and Distributed Cache

Wukong Talks Architecture

Apr 4, 2023 · Fundamentals

Understanding the Raft Consensus Algorithm: Roles, Leader Election, and Fault Handling

This article explains the Raft consensus algorithm, detailing its roles, leader election process, term management, fault handling, and how it ensures consistency in both single‑node and multi‑node distributed systems for modern cloud‑native applications.

ConsensusDistributed SystemsRaft

0 likes · 12 min read

Understanding the Raft Consensus Algorithm: Roles, Leader Election, and Fault Handling

DataFunTalk

Apr 2, 2023 · Backend Development

Introducing RaftKeeper: A High‑Performance Raft‑Based Distributed Coordination Service

RaftKeeper is an open‑source, C++‑implemented Raft‑based distributed consensus service that offers double‑the‑throughput, sub‑second latency, five‑nines availability, and full ZooKeeper compatibility, targeting high‑performance OLAP workloads and large‑scale backend scenarios.

BackendConsensusDistributed Systems

0 likes · 5 min read

Introducing RaftKeeper: A High‑Performance Raft‑Based Distributed Coordination Service

Alibaba Cloud Developer

Mar 31, 2023 · Backend Development

How Alibaba’s Custom Three‑Layer Distribution Boosts Scheduled Task Efficiency

This article walks through Alibaba's evolution from single‑machine scheduled jobs to a customized three‑layer distributed task framework, detailing classifications, Spring scheduling examples, batch processing integration, cluster distribution mechanics, and optimization techniques that maximize resource utilization and achieve smooth, balanced task execution.

AntschedulerDistributed Systemsspring

0 likes · 16 min read

How Alibaba’s Custom Three‑Layer Distribution Boosts Scheduled Task Efficiency

Open Source Linux

Mar 31, 2023 · Fundamentals

What Is a Network Operating System? Concepts, Functions, and Key Examples

This article explains the concept of network operating systems, traces their historical development, outlines core functions such as resource sharing, communication, security, and management, describes typical architectures, and introduces major examples like UNIX, Linux, NetWare, and Windows Server.

Distributed SystemsNOSNetwork Operating System

0 likes · 8 min read

What Is a Network Operating System? Concepts, Functions, and Key Examples

IT Services Circle

Mar 30, 2023 · Databases

Interview Review: Core Concepts of MySQL, OS, Networking, Redis, and Distributed Systems

This article compiles a technical interview recap covering MySQL MVCC, atomicity and persistence mechanisms, operating‑system page cache and deadlock concepts, TCP reliability and flow‑control, Redis persistence and clustering, as well as distributed transaction and consensus fundamentals.

Distributed SystemsMySQLNetworking

0 likes · 14 min read

Interview Review: Core Concepts of MySQL, OS, Networking, Redis, and Distributed Systems

Code Ape Tech Column

Mar 30, 2023 · Backend Development

How to Ensure No Message Loss in MQ Systems – Interview Guide and Practical Solutions

This article explains the common interview question of guaranteeing 100% message reliability in MQ middleware such as Kafka or RabbitMQ, outlines the three lifecycle stages of a message, discusses detection mechanisms, id generation, idempotent consumption, and handling message backlog, providing concrete design patterns and practical examples.

Distributed SystemsIdempotencyKafka

0 likes · 12 min read

How to Ensure No Message Loss in MQ Systems – Interview Guide and Practical Solutions

Volcano Engine Developer Services

Mar 29, 2023 · Backend Development

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

ByteHouse evolved its real‑time import pipeline from a community ClickHouse architecture to a custom HaKafka engine and a cloud‑native design, addressing node failures, read‑write conflicts, scaling costs, and latency by introducing two‑level concurrency, memory tables, exactly‑once semantics, and robust fault‑tolerance.

Distributed SystemsKafkaReal-time Ingestion

0 likes · 15 min read

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

MaGe Linux Operations

Mar 27, 2023 · Backend Development

Mastering Rate Limiting: Concepts, Algorithms, and Real-World Implementations

This article explains the fundamental concepts of rate limiting, including time and resource dimensions, various rule types such as QPS, connection count, bandwidth, black/white lists, and distributed considerations, then details common algorithms like token bucket, leaky bucket, sliding window, and practical implementations using Nginx, Guava, Redis, and Sentinel.

BackendDistributed SystemsToken Bucket

0 likes · 16 min read

Mastering Rate Limiting: Concepts, Algorithms, and Real-World Implementations

Top Architect

Mar 27, 2023 · Big Data

Kafka Architecture, Performance Optimization, and Production Deployment Guide

This article provides a comprehensive overview of Kafka’s core concepts, high‑performance design, cluster planning, resource evaluation, deployment steps, producer and consumer configurations, fault‑tolerance mechanisms, and operational tools, offering practical guidance for building and managing a high‑throughput Kafka production environment.

Cluster DeploymentConsumerDistributed Systems

0 likes · 31 min read

Kafka Architecture, Performance Optimization, and Production Deployment Guide

dbaplus Community

Mar 22, 2023 · Databases

Scaling an Airline Ticket Order Database: From Monolith to 64‑Shard Sharding

The article details how a rapidly growing airline ticket order system was re‑architected by identifying performance bottlenecks, applying vertical and horizontal sharding, optimizing cache layers, implementing dual‑write mechanisms, and planning a phased migration to achieve ten‑fold QPS growth while reducing resource usage and operational risk.

Cache OptimizationDistributed SystemsDual Write

0 likes · 38 min read

Scaling an Airline Ticket Order Database: From Monolith to 64‑Shard Sharding

Architect

Mar 21, 2023 · Operations

Log Management, Observability, and APM Practices in Distributed Systems

This article explains what logs are, when to record them, their value in large‑scale architectures, and how to build effective logging, metrics, and tracing platforms using tools such as ELK, Prometheus, and SkyWalking, while also presenting good and bad logging practices and sample batch‑log retrieval code.

APMDistributed SystemsELK

0 likes · 20 min read

Log Management, Observability, and APM Practices in Distributed Systems

Volcano Engine Developer Services

Mar 16, 2023 · Databases

How ByteDance’s Abase Achieves Extreme High Availability in KV Storage

This article explains the evolution, architecture, and high‑availability solutions of ByteDance’s Abase KV storage system, detailing its multi‑write design, leader‑less approach, multi‑region deployment, consistency mechanisms, performance optimizations, and real‑world metrics that support billions of requests per second.

ByteDanceDistributed SystemsKV storage

0 likes · 20 min read

How ByteDance’s Abase Achieves Extreme High Availability in KV Storage

DataFunSummit

Mar 15, 2023 · Databases

Abase: ByteDance’s Large‑Scale Online KV Storage System – Architecture, High Availability, and Key Technologies

This article introduces Abase, ByteDance’s massive online KV storage system, detailing its evolution from a single‑cluster KV service to a multi‑region, multi‑tenant platform, and explains the high‑availability challenges and the leaderless multi‑write architecture, hybrid logical clocks, quorum settings, and performance optimizations that enable hundred‑billion QPS and sub‑10 ms latency.

ABaseByteDanceDatabase Architecture

0 likes · 19 min read

Abase: ByteDance’s Large‑Scale Online KV Storage System – Architecture, High Availability, and Key Technologies

Architects Research Society

Mar 15, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

This article explains why exactly‑once semantics are needed for stream‑processing applications, describes Kafka's transactional model and semantics, details the Java transaction API and its usage, and discusses the internal components, performance trade‑offs, and practical guidelines for building reliable Kafka‑based pipelines.

Distributed SystemsExactly-OnceKafka

0 likes · 17 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

ITPUB

Mar 14, 2023 · Fundamentals

Master Distributed Systems: CAP, BASE, Locks, Transactions, Paxos & Raft

This comprehensive guide explores core distributed system concepts—including the CAP theorem and its trade‑offs, BASE consistency, various distributed lock strategies, multiple transaction patterns such as 2PC, 3PC, TCC and Seata, as well as consensus algorithms Paxos and Raft, while also covering idempotency and rate‑limiting techniques.

CAP theoremDistributed SystemsDistributed Transactions

0 likes · 29 min read

Master Distributed Systems: CAP, BASE, Locks, Transactions, Paxos & Raft

Programmer DD

Mar 14, 2023 · Backend Development

Why This Spring ‘Full‑Stack’ Book Is a Must‑Read for Java Developers

The article reviews a comprehensive Spring framework book that covers the entire Spring family—from core concepts and data access to web development and cloud‑native microservices—using a practical, localized approach and a large milk‑tea shop case study to guide both beginners and experienced Java developers.

Distributed SystemsSpring Frameworkbackend-development

0 likes · 10 min read

Why This Spring ‘Full‑Stack’ Book Is a Must‑Read for Java Developers

Bilibili Tech

Mar 14, 2023 · Big Data

Bilibili HDFS Erasure Coding Strategy and Implementation

Bilibili reduced petabyte‑scale storage costs by back‑porting erasure‑coding patches to its HDFS 2.8.4 cluster, deploying a parallel EC‑enabled cluster, adding a data‑proxy service, intelligent routing and block‑checking, and automating cold‑data migration, while noting write overhead and planning native acceleration.

Big DataData ReliabilityDistributed Systems

0 likes · 14 min read

Bilibili HDFS Erasure Coding Strategy and Implementation

FunTester

Mar 13, 2023 · Operations

How Chaos Engineering Can Strengthen System Reliability: A Practical Guide

This article explains the origins and principles of chaos engineering, illustrates how fault‑injection scenarios expose system weaknesses, outlines step‑by‑step implementation—from tool selection and metric definition to execution and post‑mortem—and highlights its role in achieving high‑availability service level agreements.

DevOpsDistributed SystemsFault Injection

0 likes · 10 min read

How Chaos Engineering Can Strengthen System Reliability: A Practical Guide

JavaEdge

Mar 8, 2023 · Backend Development

Choosing CP vs AP for Service Discovery: When to Use Zookeeper or a Message Bus

This article explains the importance of service discovery in high‑availability systems, compares DNS, VIP, Zookeeper‑based CP solutions and message‑bus‑based AP approaches, outlines their registration and subscription workflows, highlights scalability and consistency trade‑offs, and provides practical guidance for designing robust registration centers.

APDistributed SystemsMessage Bus

0 likes · 14 min read

Choosing CP vs AP for Service Discovery: When to Use Zookeeper or a Message Bus

Top Architect

Mar 8, 2023 · Backend Development

Implementing Rate Limiting with Redis: setnx, ZSet Sliding Window, and Token‑Bucket Approaches

This article explains three Redis‑based rate‑limiting techniques—using setnx for simple counters, leveraging ZSet for a sliding‑window algorithm, and applying a token‑bucket pattern with List—provides Java code examples for each method, discusses their advantages and drawbacks, and shows how to integrate them into backend services.

Distributed Systemsbackend-developmentjava

0 likes · 7 min read

Implementing Rate Limiting with Redis: setnx, ZSet Sliding Window, and Token‑Bucket Approaches

MaGe Linux Operations

Feb 25, 2023 · Backend Development

Mastering Rate Limiting: Strategies, Algorithms, and Real‑World Implementations

This article explains how rate limiting protects system availability by controlling traffic flow, introduces common patterns such as circuit breaking, service degradation, delay and privilege handling, compares cache, degradation, and rate limiting, and details popular algorithms and practical code implementations for both single‑node and distributed environments.

Distributed SystemsGuavaToken Bucket

0 likes · 13 min read

Mastering Rate Limiting: Strategies, Algorithms, and Real‑World Implementations

JD Tech

Feb 23, 2023 · Backend Development

Comprehensive Guide to Scheduling Tasks: Algorithms, Java Implementations, and Distributed Solutions

This article provides an in‑depth overview of scheduled task processing, covering common business scenarios, fundamental principles, single‑machine algorithms such as min‑heap and time‑wheel, Java utilities like Timer, DelayQueue, ScheduledExecutorService, Spring Task, Quartz, and distributed approaches using Redis, Elastic‑Job, and XXL‑Job.

Distributed SystemsTime Wheelcron

0 likes · 22 min read

Comprehensive Guide to Scheduling Tasks: Algorithms, Java Implementations, and Distributed Solutions

Alibaba Cloud Native

Feb 23, 2023 · Cloud Native

How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power

This article explains how OpenYurt, an unobtrusive cloud‑native edge platform, integrates with the CNStack technology hub to deliver high‑availability, offline‑autonomous, and programmable edge services for Longyuan Power’s massive multi‑province server fleet.

CNStackCloud NativeDistributed Systems

0 likes · 10 min read

How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power

MaGe Linux Operations

Feb 19, 2023 · Backend Development

Mastering API Protection: Rate Limiting, Caching, and Degradation for E‑Commerce Spikes

When a product suddenly surges in demand, this guide explains how to safeguard e‑commerce APIs using rate‑limiting algorithms (leaky bucket, token bucket, sliding window), Nginx and Java semaphore controls, distributed throttling with message queues, service degradation strategies, and caching techniques to maintain stability.

Distributed Systemse‑commercerate limiting

0 likes · 11 min read

Mastering API Protection: Rate Limiting, Caching, and Degradation for E‑Commerce Spikes

ITPUB

Feb 13, 2023 · Fundamentals

How a Bat-Borne Virus Explains the Gossip Protocol in Distributed Systems

Using a fictional coronavirus carried by a bat, the article illustrates the Gossip protocol’s mechanisms—direct mail, anti-entropy, and epidemic spread—to explain how distributed systems achieve eventual consistency, highlighting advantages, drawbacks, and practical considerations for storage components like Cassandra.

Anti-entropyDistributed SystemsGossip Protocol

0 likes · 10 min read

How a Bat-Borne Virus Explains the Gossip Protocol in Distributed Systems

DataFunSummit

Feb 13, 2023 · Big Data

ClickHouse in Self‑Service Analytics: Architecture, Optimization Practices and Future Roadmap at ZuanZuan Platform

This article details how ZuanZuan leveraged ClickHouse as the core OLAP engine for its massive self‑service analytics platform, covering OLAP engine selection criteria, system architecture, real‑world use cases, performance tuning, operational challenges, and future development plans.

AnalyticsBig DataClickHouse

0 likes · 16 min read

ClickHouse in Self‑Service Analytics: Architecture, Optimization Practices and Future Roadmap at ZuanZuan Platform

Baidu Intelligent Cloud Tech Hub

Feb 8, 2023 · Cloud Computing

How Baidu’s Next‑Gen Metadata Engine Powers Trillion‑Object Object Storage

This article details Baidu's Cloud Storage (BOS) architecture, the challenges of its legacy metadata system, and the design of a new generation metadata engine that enables trillion‑object buckets, million‑QPS performance, hierarchical namespaces, and intelligent lifecycle management.

BaiduDistributed Systemscloud storage

0 likes · 14 min read

How Baidu’s Next‑Gen Metadata Engine Powers Trillion‑Object Object Storage

Bilibili Tech

Feb 7, 2023 · Cloud Native

Bilibili Configuration Center (Config & Paladin): Architecture, Features, and Performance

Bilibili’s Config Center evolved from the 2017 Config v1 monolith—offering unified UI, MySQL storage, and long‑polling—to the Raft‑based Paladin v2, which adds lifecycle management, tenant isolation, incremental publishing, high‑throughput caching, multi‑active deployment, validation and rich tooling, handling hundreds of thousands of configs and tens of thousands of concurrent clients with sub‑50 ms push latency while planning deeper K8s integration.

Distributed SystemsMicroservicesPaladin

0 likes · 15 min read

Bilibili Configuration Center (Config & Paladin): Architecture, Features, and Performance

IT Architects Alliance

Feb 6, 2023 · Cloud Native

What Is Kubernetes and Why Is It Hard to Get Started?

This article introduces Kubernetes as a Google‑originated container‑based distributed cluster management system, explaining its architecture, core components such as Master, Nodes, Pods, Services, etcd, and detailing how communication, scheduling, storage, external access, scaling, and controller coordination work together.

Cloud NativeDistributed SystemsKubernetes

0 likes · 8 min read

What Is Kubernetes and Why Is It Hard to Get Started?

Code Ape Tech Column

Feb 6, 2023 · Backend Development

Understanding the Basic Structure and Technical Stack of RPC Architecture

This article explains the fundamental components of RPC architecture, details the client‑server roles, communication protocols, serialization methods, transport protocols, and synchronous versus asynchronous invocation patterns, providing a comprehensive guide for building a custom RPC framework.

Distributed SystemsRPCTransport Protocol

0 likes · 12 min read

Understanding the Basic Structure and Technical Stack of RPC Architecture

21CTO

Feb 5, 2023 · Backend Development

How Meituan Scaled Its Code Hosting Platform to Millions of Repositories

This article details Meituan's three‑stage evolution of its self‑developed Code platform—from a single‑machine service to a multi‑machine read‑write‑separated system and finally to a distributed, sharded architecture—highlighting the scalability and high‑availability challenges faced and the engineering solutions implemented.

Backend ArchitectureDistributed SystemsScalability

0 likes · 24 min read

How Meituan Scaled Its Code Hosting Platform to Millions of Repositories

Programmer DD

Feb 3, 2023 · Databases

Preventing Redis Cache Failures: Avalanche, Penetration, and Breakdown Solutions

This article explains the three main Redis cache anomalies—cache avalanche, cache penetration, and cache breakdown—detailing their symptoms, root causes, and practical mitigation strategies with Java code examples and architectural recommendations.

CacheDistributed Systemscache-avalanche

0 likes · 16 min read

Preventing Redis Cache Failures: Avalanche, Penetration, and Breakdown Solutions

Alibaba Cloud Developer

Feb 3, 2023 · Cloud Computing

Rethinking Cloud Computing: How Alibaba’s CIPU Redefines Compute Power

This article revisits cloud computing by tracing the evolution of compute power, exploring Alibaba Cloud’s infrastructure breakthroughs such as the CIPU processor and its core platforms, and analyzing how these advances reshape elastic, big‑data, high‑performance, and AI workloads while highlighting trust, cost, and self‑service challenges.

Alibaba CloudCIPUCloud Computing

0 likes · 32 min read

Rethinking Cloud Computing: How Alibaba’s CIPU Redefines Compute Power

Java Architect Essentials

Feb 2, 2023 · Backend Development

Comparison of Distributed Task Scheduling Frameworks: Elastic‑Job vs X‑Job and Other Solutions

This article examines common business scenarios that require timed execution, introduces single‑machine and distributed scheduling frameworks such as Timer, ScheduledExecutorService, Spring, Quartz, TBSchedule, Elastic‑Job, Saturn and XXL‑Job, and provides a detailed feature‑by‑feature comparison to help choose the most suitable solution.

BackendComparisonDistributed Systems

0 likes · 14 min read

Comparison of Distributed Task Scheduling Frameworks: Elastic‑Job vs X‑Job and Other Solutions

Meituan Technology Team

Feb 2, 2023 · R&D Management

Design and Evolution of Meituan's Distributed Code Hosting Platform

Meituan’s home‑grown Code platform evolved from a single‑server Git service to a distributed, sharded system with multi‑active replication, using Go‑based HTTP/SSH proxies, gRPC communication, and version‑based routing to achieve horizontal scalability, high availability, and millions of daily Git operations.

Distributed SystemsGitMeituan

0 likes · 22 min read

Design and Evolution of Meituan's Distributed Code Hosting Platform

JD Tech

Feb 2, 2023 · Fundamentals

Understanding the Byzantine Generals Problem and the Raft Consensus Algorithm

This article explains the Byzantine Generals problem, its fault‑tolerance limits, and how the Raft consensus algorithm solves a simplified version of the problem through leader election, log replication, and safety mechanisms, while also comparing Raft with Paxos, ZAB, and PBFT and providing Go code examples.

Byzantine GeneralsConsensus AlgorithmDistributed Systems

0 likes · 20 min read

Understanding the Byzantine Generals Problem and the Raft Consensus Algorithm

vivo Internet Technology

Jan 30, 2023 · Backend Development

Dubbo ZooKeeper Registry Implementation Principle Analysis

The article dissects Dubbo’s ZooKeeperRegistry by tracing its inheritance from AbstractRegistry through FailbackRegistry to CacheableFailbackRegistry, detailing local memory‑disk caching, retry logic via a timing wheel, URL‑push optimizations, and the ZooKeeper‑based ephemeral node and watcher mechanisms that enable dynamic service discovery, while also covering core ZooKeeper concepts.

Distributed SystemsDubboRPC Framework

0 likes · 20 min read

Dubbo ZooKeeper Registry Implementation Principle Analysis

Architect's Guide

Jan 28, 2023 · Backend Development

Implementing a Simple Java RPC Framework with Zookeeper, Netty, and Javassist

This article walks through the design and implementation of a lightweight Java RPC framework, covering core concepts such as service registration and discovery with Zookeeper, network communication via Netty, serialization, compression, dynamic proxy generation using Javassist, and performance comparisons between reflection and bytecode‑generated proxies.

Distributed SystemsJavassistNetty

0 likes · 23 min read

Implementing a Simple Java RPC Framework with Zookeeper, Netty, and Javassist

Top Architect

Jan 19, 2023 · Backend Development

Implementing a Simple Java RPC Framework: Architecture, Service Registration, Proxy Generation, and Network Transport

This article explains the principles and implementation of a lightweight Java RPC framework, covering service registration with Zookeeper, client-side dynamic proxies, serialization, compression, Netty-based network transport, and both reflection and Javassist proxy generation, with extensive code examples and performance comparison.

Distributed SystemsJavassistNetty

0 likes · 25 min read

Implementing a Simple Java RPC Framework: Architecture, Service Registration, Proxy Generation, and Network Transport

DataFunTalk

Jan 19, 2023 · Big Data

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

This article presents a comprehensive overview of Tencent's Alluxio project, covering the evolution of big‑data architecture, recent Alluxio research progress, typical deployment cases, and future work, while highlighting performance improvements, integration with cloud and AI workloads, and community contributions.

AIAlluxioBig Data

0 likes · 21 min read

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

Architect

Jan 18, 2023 · Databases

Design and Architecture of Bilibili's High‑Performance KV Storage System

This article presents the background, overall architecture, partitioning strategies, raft‑based replication, binlog support, multi‑active deployment, bulk‑load mechanisms, storage‑engine optimizations, load‑balancing policies, and failure‑detection & recovery techniques of a high‑reliability, high‑throughput key‑value store used at Bilibili.

Distributed SystemsKV storagePartitioning

0 likes · 22 min read

Design and Architecture of Bilibili's High‑Performance KV Storage System

DeWu Technology

Jan 16, 2023 · Cloud Native

Nacos Service Registration and Discovery: Principles and Implementation

The article explains Nacos’s open‑source service registry and discovery mechanisms, detailing client auto‑configuration, registration and health‑check workflows, server‑side instance handling, asynchronous copy‑on‑write processing, heartbeat cleanup, and cluster synchronization, while comparing its AP/CP capabilities to Zookeeper and Eureka.

Distributed SystemsMicroservicesNacos

0 likes · 55 min read

Nacos Service Registration and Discovery: Principles and Implementation

MaGe Linux Operations

Jan 13, 2023 · Fundamentals

Why ULID Beats UUID: A Deep Dive into Unique, Sortable IDs

This article explains what ULID is, why it often outperforms UUID by combining timestamp and randomness for collision‑free, lexicographically sortable identifiers, details its specification, binary layout, encoding, and shows practical Python usage and common application scenarios.

Distributed SystemsULIDunique identifier

0 likes · 8 min read

Why ULID Beats UUID: A Deep Dive into Unique, Sortable IDs

Architect

Jan 12, 2023 · Operations

Critical Path Analysis for Latency Optimization in Large Distributed Systems

This article explains common latency analysis techniques, details the principles and implementation of critical path tracing, and demonstrates its practical application in Baidu App's recommendation service to efficiently identify and reduce performance bottlenecks in complex distributed architectures.

Distributed Systemscritical-pathlatency analysis

0 likes · 14 min read

Critical Path Analysis for Latency Optimization in Large Distributed Systems

Programmer DD

Jan 11, 2023 · Databases

Redis Deep Dive: Pipelines, Pub/Sub, Persistence, Locks & Cluster

This comprehensive guide explores Redis fundamentals and advanced features, covering pipelines for reduced RTT, publish/subscribe messaging, key expiration strategies, transaction behavior, persistence mechanisms (RDB, AOF, hybrid), distributed locking techniques, sentinel high‑availability, and cluster sharding, with practical code examples and diagrams.

Distributed Systemsdatabaseredis

0 likes · 47 min read

Redis Deep Dive: Pipelines, Pub/Sub, Persistence, Locks & Cluster

Mike Chen's Internet Architecture

Jan 10, 2023 · Backend Development

Key Differences Between RPC and Message Queues (MQ) in Distributed Systems

This article explains the core distinctions between Remote Procedure Call (RPC) and Message Queue (MQ) technologies, covering their architectures, communication patterns, functional features, and performance considerations, and outlines typical use cases such as synchronous calls, decoupling, traffic shaping, and asynchronous processing in distributed systems.

Distributed SystemsMessage QueueMicroservices

0 likes · 6 min read

Key Differences Between RPC and Message Queues (MQ) in Distributed Systems

Aikesheng Open Source Community

Jan 10, 2023 · Databases

Cassandra Multi‑Data‑Center Fault Tolerance Experiment and Analysis

This article presents a step‑by‑step experiment on a Cassandra cluster spanning two data centers, demonstrating how token ownership, data distribution, and fault‑tolerance behave when nodes fail or are removed, and explains the observed owns percentages and replication effects.

Distributed SystemsNoSQLcassandra

0 likes · 15 min read

Cassandra Multi‑Data‑Center Fault Tolerance Experiment and Analysis

Architect

Jan 8, 2023 · Backend Development

Rethinking Microservices: From Hype to Core Architectural Principles

This article critically examines the microservices movement, tracing its historical roots, debunking common hype, and arguing that the true value lies in modular design, clear team ownership, and disciplined architectural practices rather than merely scaling distributed systems.

Distributed SystemsMicroservicesSoftware Architecture

0 likes · 23 min read

Rethinking Microservices: From Hype to Core Architectural Principles

Tencent Cloud Developer

Jan 5, 2023 · Cloud Native

QQ Music High-Availability Architecture Overview

QQ Music achieves high availability by layering redundant multi‑datacenter architecture, proactive chaos‑engineering toolchains, and comprehensive observability—including metrics, logging, tracing and profiling—while employing service grading, adaptive retry windows and EMA‑based dynamic timeouts to gracefully handle faults across its massive micro‑service ecosystem.

Distributed SystemsMicroservicesObservability

0 likes · 24 min read

QQ Music High-Availability Architecture Overview

DataFunTalk

Jan 3, 2023 · Big Data

Tencent Unified Big Data Scheduling Platform – Architecture, Design, and Operations

The article presents an in‑depth overview of Tencent's self‑developed Unified Scheduling Platform, detailing its system architecture, design challenges, performance optimizations, resource‑fair scheduling mechanisms, operational metrics, future roadmap, and a Q&A session that together illustrate how the platform enables massive offline data processing at scale.

Big DataDistributed SystemsPerformance Optimization

0 likes · 18 min read

Tencent Unified Big Data Scheduling Platform – Architecture, Design, and Operations

Top Architect

Jan 2, 2023 · Big Data

Optimizing Kafka at Meituan: Challenges and Solutions for a Large‑Scale Data Platform

This article details Meituan's use of Kafka as a unified data cache and distribution layer, outlines the challenges of massive scale and latency, and presents comprehensive optimizations across application, system, and cluster management layers, including disk balancing, migration acceleration, fetcher isolation, and full‑link monitoring.

Big DataDistributed SystemsKafka

0 likes · 22 min read

Optimizing Kafka at Meituan: Challenges and Solutions for a Large‑Scale Data Platform