Tagged articles

2122 articles

Page 3 of 22

Apr 3, 2025 · Fundamentals

Understanding CAP Theory and BASE: Data Consistency in Distributed Systems

This article explains the CAP theorem and its practical extension BASE, describing their core concepts, trade‑off combinations, typical components such as Zookeeper, Eureka, and Nacos, and engineering techniques like asynchronous replication, Saga, and idempotent design for building highly available distributed systems.

AvailabilityBASECAP theorem

0 likes · 5 min read

Understanding CAP Theory and BASE: Data Consistency in Distributed Systems

Mingyi World Elasticsearch

Apr 1, 2025 · Big Data

Elasticsearch Unveiled: Learn Search Engine Basics Through Comics

This visual guide walks readers through Elasticsearch fundamentals—from architecture and indexing to clustering, query DSL, aggregations, and performance tuning—using comic-style illustrations that simplify each concept for easy understanding, and security considerations, multilingual support, and real‑time search capabilities.

Big DataDistributed SystemsElasticsearch

0 likes · 2 min read

Elasticsearch Unveiled: Learn Search Engine Basics Through Comics

Raymond Ops

Mar 30, 2025 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained

This article explains three Elasticsearch data‑synchronization methods, compares their pros and cons, and then dives into ES cluster structure, node roles, shard allocation, distributed queries, split‑brain handling, and fault‑tolerance mechanisms, providing a comprehensive guide for developers and ops engineers.

Cluster ManagementDistributed SystemsElasticsearch

0 likes · 9 min read

Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained

Ma Wei Says

Mar 30, 2025 · Fundamentals

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

Kafka 4.0 introduces KRaft, a ZooKeeper‑free metadata layer built on the Raft consensus algorithm, detailing role transitions, leader election, log replication, controller and broker responsibilities, and fault‑tolerance mechanisms, enabling a more scalable and self‑managed architecture for large‑scale distributed streaming.

Consensus AlgorithmDistributed SystemsKRaft

0 likes · 13 min read

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

Ma Wei Says

Mar 28, 2025 · Backend Development

Choosing the Right Message Queue: Kafka vs RocketMQ vs RabbitMQ Explained

This article compares Kafka, RocketMQ, and RabbitMQ, detailing their architectures, performance characteristics, strengths, and ideal use‑cases to help engineers select the most suitable message‑queue solution for high‑throughput, fault‑tolerant, and real‑time processing scenarios.

Distributed SystemsEvent StreamingHigh Throughput

0 likes · 11 min read

Choosing the Right Message Queue: Kafka vs RocketMQ vs RabbitMQ Explained

Bilibili Tech

Mar 25, 2025 · Cloud Native

Technical Case Study: Accelerating Live‑to‑VOD Conversion for the 2025 Spring Festival Gala

By replacing the legacy FLV recorder with an m3u8‑based service, introducing a short‑segment, state‑machine transcoder, and deploying an event‑driven proxy and instant‑clipping UI, the team cut the 4‑hour‑40‑minute Spring Festival Gala’s live‑to‑VOD processing from 41 minutes to about eight minutes, achieving roughly a five‑fold speedup.

Distributed SystemsPerformance OptimizationVOD conversion

0 likes · 21 min read

Technical Case Study: Accelerating Live‑to‑VOD Conversion for the 2025 Spring Festival Gala

DataFunSummit

Mar 20, 2025 · Artificial Intelligence

Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training

The article traces the evolution of AI training stability from early manual operations on small GPU clusters to sophisticated, fault‑tolerant infrastructures for thousand‑card and ten‑thousand‑card models, detailing Baidu Baige’s metrics, monitoring, eBPF‑based diagnostics, and checkpoint strategies that reduce invalid training time and accelerate fault recovery.

Distributed SystemsLarge-Scale Trainingcheckpointing

0 likes · 22 min read

Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training

Mike Chen's Internet Architecture

Mar 20, 2025 · Backend Development

Comprehensive Guide to Apache Zookeeper: Architecture, Use Cases, and Commands

This article provides an in‑depth overview of Apache Zookeeper, covering its core concepts, common application scenarios such as pub/sub, configuration management and naming services, detailed architecture including znodes and node types, the watch mechanism, ZAB consensus protocol, and practical usage examples with Maven dependencies, Java client code, and command‑line operations.

Backend DevelopmentCoordination ServiceDistributed Systems

0 likes · 9 min read

Comprehensive Guide to Apache Zookeeper: Architecture, Use Cases, and Commands

Mike Chen's Internet Architecture

Mar 20, 2025 · Backend Development

Master Apache Kafka: Architecture, Setup, and Essential Commands Explained

This article provides a comprehensive overview of Apache Kafka, covering its core concepts, architecture, common use cases, installation steps, and essential command-line operations for managing topics and brokers in production environments and ensuring reliability.

Backend DevelopmentDistributed SystemsKafka

0 likes · 9 min read

Master Apache Kafka: Architecture, Setup, and Essential Commands Explained

FunTester

Mar 18, 2025 · Operations

How to Build a Fault‑Isolation Shield for High‑Traffic Distributed Systems

The article explains how to construct a comprehensive fault‑isolation and protection system for modern distributed applications, covering entry‑side rate limiting, exit‑side circuit breaking, internal resource isolation, monitoring, chaos‑engineering validation, and automatic self‑healing mechanisms using tools such as Sentinel, Nginx, Hystrix, SkyWalking, Prometheus and Kubernetes.

Circuit BreakingDistributed SystemsMicroservices

0 likes · 7 min read

How to Build a Fault‑Isolation Shield for High‑Traffic Distributed Systems

Baidu Geek Talk

Mar 17, 2025 · Industry Insights

From Manual Restarts to Automated Fault Tolerance: The Evolution of AI Training Stability

This article traces the decade‑long evolution of AI training stability—from early small‑model manual operations to large‑scale, multi‑thousand‑GPU clusters—detailing metrics like invalid training time, fault‑tolerance architectures, eBPF‑based hidden‑fault detection, BCCL enhancements, multi‑level restart strategies, and trigger‑based checkpointing that together shrink downtime from minutes to seconds.

AI trainingDistributed SystemsInfrastructure

0 likes · 22 min read

From Manual Restarts to Automated Fault Tolerance: The Evolution of AI Training Stability

Dual-Track Product Journal

Mar 14, 2025 · Operations

How Bad Inventory Sync Can Kill Your E‑commerce Business—and 3 Fixes to Save It

This article examines how delayed or inconsistent inventory synchronization leads to costly overselling and deadstock in e‑commerce, presents three destructive synchronization patterns, and offers a step‑by‑step guide—including real‑time messaging, distributed locks, rule‑engine integration, and intelligent alerts—to transform inventory management from a liability into a self‑healing system.

BackendDistributed SystemsOperations

0 likes · 8 min read

How Bad Inventory Sync Can Kill Your E‑commerce Business—and 3 Fixes to Save It

Su San Talks Tech

Mar 14, 2025 · Backend Development

Ensuring Idempotency in Distributed Systems: Patterns, Code, and Best Practices

This article explains the concept of idempotency, outlines scenarios where it is essential, analyzes common causes of idempotency problems, and presents a comprehensive set of solutions—including unique constraints, optimistic and pessimistic locks, distributed locks, token mechanisms, state machines, deduplication tables, and global request IDs—accompanied by practical code examples and database design guidelines.

BackendDistributed SystemsIdempotency

0 likes · 14 min read

Ensuring Idempotency in Distributed Systems: Patterns, Code, and Best Practices

Architect's Guide

Feb 26, 2025 · Backend Development

Why Microservices Are More About Organizational Structure Than Pure Technology

The article critically examines the hype around microservices, arguing that their true value lies in promoting modularity, clear ownership, and organizational clarity rather than solving inherent technical problems, and it traces these ideas back to classic software engineering principles and modern operational challenges.

Distributed SystemsScalabilitySoftware Architecture

0 likes · 22 min read

Why Microservices Are More About Organizational Structure Than Pure Technology

Ops Development & AI Practice

Feb 20, 2025 · Backend Development

Mastering Apache RocketMQ: Ports, Commands, and Monitoring Tips

This guide explains the key port configurations of Apache RocketMQ brokers, details essential mqadmin commands for managing topics, checking status, and monitoring consumer progress, and provides practical examples to help administrators efficiently operate and troubleshoot RocketMQ clusters.

CLIDistributed SystemsMessage Queue

0 likes · 7 min read

Mastering Apache RocketMQ: Ports, Commands, and Monitoring Tips

Radish, Keep Going!

Feb 16, 2025 · Fundamentals

Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation

Consistent hashing, a cornerstone of distributed systems, balances load, enhances scalability, and minimizes data migration; this article explains its fundamentals, the drawbacks of basic implementations, the role of virtual nodes, and provides a complete Go-language example with code for adding, removing, and locating nodes.

Distributed Systemsconsistent hashingload balancing

0 likes · 11 min read

Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation

Top Architect

Feb 12, 2025 · Backend Development

Payment System Architecture Overview and Core Components

This article presents a comprehensive overview of a typical payment system architecture, detailing the division between transaction and payment cores, their interactions, service governance, data consistency, asynchronous processing, and practical production practices for building stable, scalable backend payment services.

BackendDistributed Systemsarchitecture

0 likes · 9 min read

Payment System Architecture Overview and Core Components

Alimama Tech

Feb 12, 2025 · Artificial Intelligence

HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling

HighService, Alibaba’s Pythonic AI service framework, accelerates large‑model inference and maximizes GPU utilization by separating CPU‑GPU processes, offering out‑of‑the‑box quantization, parallelism and caching, and dynamically reallocating idle GPUs across clusters through a master‑worker scheduler to keep online latency low while boosting offline throughput for diffusion and LLM workloads.

AI ServiceDistributed SystemsPython

0 likes · 16 min read

HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling

Java Backend Technology

Feb 12, 2025 · Backend Development

How Does RPC Work? A Step‑by‑Step Java Implementation with Netty and Zookeeper

This article explains RPC fundamentals, detailing a Java example that uses Spring for bean management, Netty for high‑performance network communication, and Zookeeper for service registration and discovery, and walks through both server and client implementations with code snippets and test results.

Distributed SystemsNettyRPC

0 likes · 16 min read

How Does RPC Work? A Step‑by‑Step Java Implementation with Netty and Zookeeper

Cognitive Technology Team

Feb 2, 2025 · Fundamentals

Common Misconceptions in Distributed System Design and Their Solutions

Designing distributed systems often falls prey to misconceptions such as assuming reliable networks, zero latency, unlimited bandwidth, inherent security, static topology, zero transmission cost, and full autonomy, but applying retries, idempotency, message queues, encryption, dynamic discovery, caching, and time protocols can mitigate these issues.

ConsensusDistributed SystemsLatency

0 likes · 5 min read

Common Misconceptions in Distributed System Design and Their Solutions

Code Mala Tang

Jan 31, 2025 · Backend Development

Mastering Retry Patterns in Node.js: From Basics to Advanced Strategies

This article explores the retry pattern as a resilient design technique for distributed systems, detailing its fundamentals, a simple Node.js implementation, and advanced strategies such as exponential backoff with jitter, circuit breaker integration, comprehensive logging, and best‑practice guidelines for robust error handling.

Distributed SystemsError HandlingNode.js

0 likes · 11 min read

Mastering Retry Patterns in Node.js: From Basics to Advanced Strategies

JD Tech

Jan 23, 2025 · Databases

Comprehensive Migration Plan for MongoDB to Alternative Data Stores

This article presents a complete MongoDB migration solution, detailing the migration rhythm, code refactoring using a decorator pattern, data source replacement with JImKV, MySQL and ES, bulk and incremental data transfer strategies, and deployment safeguards such as monitoring, gray release, and rollback to ensure a seamless cut‑over without service disruption.

Data MigrationDecorator PatternDistributed Systems

0 likes · 8 min read

Comprehensive Migration Plan for MongoDB to Alternative Data Stores

Tencent Cloud Developer

Jan 22, 2025 · Cloud Native

Rate Limiting: Concepts, Algorithms, and Distributed Solutions

Rate limiting protects micro‑service stability by rejecting excess traffic, using algorithms such as fixed‑window, sliding‑window, leaky‑bucket and token‑bucket, and can be deployed locally or distributed via Redis, load‑balancers, or coordination services, each offering different trade‑offs in precision, scalability, and complexity.

Distributed SystemsGolangMicroservices

0 likes · 31 min read

Rate Limiting: Concepts, Algorithms, and Distributed Solutions

JD Cloud Developers

Jan 22, 2025 · Backend Development

Mastering High-Concurrency Inventory Deduction for Flash Sale Systems

This article explores practical strategies for handling the high‑concurrency inventory deduction problem in flash‑sale scenarios, covering lock‑based approaches, Redis caching, partitioned stock management, asynchronous updates, and distributed scaling techniques to prevent overselling and improve throughput.

Backend ArchitectureDistributed Systemshigh concurrency

0 likes · 11 min read

Mastering High-Concurrency Inventory Deduction for Flash Sale Systems

Alibaba Cloud Developer

Jan 16, 2025 · Fundamentals

Master Distributed Systems: Theory, Design Patterns, and Microservice Architecture

This comprehensive guide explores the fundamentals of distributed systems, covering theoretical foundations, architecture design patterns, consistency models, scalability, deployment, operations, and practical engineering practices for building robust microservice‑based solutions.

ConsistencyDistributed Systemsarchitecture

0 likes · 32 min read

Master Distributed Systems: Theory, Design Patterns, and Microservice Architecture

Architecture & Thinking

Jan 14, 2025 · Backend Development

Master RocketMQ Basic Messages: Lifecycle, Code Samples & Use Cases

This guide explains Apache RocketMQ’s ordinary message concept, its full lifecycle, how to create topics, Java code for sending and receiving messages, key configuration tips, and real‑world scenarios such as asynchronous decoupling and traffic‑shaping for micro‑service architectures.

Distributed SystemsJavaMessage Queue

0 likes · 9 min read

Master RocketMQ Basic Messages: Lifecycle, Code Samples & Use Cases

Alibaba Cloud Developer

Jan 14, 2025 · Databases

Why Logs Are the New Database: Shared Log Architecture in Distributed Systems

This article explores how modern distributed databases treat logs as foundational storage components, examines industry best practices from Aurora DSQL, DynamoDB, and OceanBase, and abstracts the essential properties and design considerations for building a universal, durable, and linearizable log module.

Distributed Systemslog architectureshared log

0 likes · 19 min read

Why Logs Are the New Database: Shared Log Architecture in Distributed Systems

Alibaba Cloud Developer

Jan 10, 2025 · Databases

Boost System Performance: Using Locality Principles to Cut Database Queries

This article explains the locality principle—time and space locality—and shows how applying these concepts to caching and data access in distributed systems can dramatically reduce database query volume, improve latency, and achieve up to 84% performance gains while managing memory and GC overhead.

Distributed SystemsPerformance Optimizationcaching

0 likes · 21 min read

Boost System Performance: Using Locality Principles to Cut Database Queries

Architect

Jan 9, 2025 · Industry Insights

How to Ensure Immediate Reads After Writes in Multi-Active Architectures

This article analyzes the "write‑after‑immediate‑read" challenge in multi‑active disaster‑recovery setups, breaks down solution directions, presents a three‑city five‑center case study, and outlines a four‑step model—distinguish scenarios, mark written data, assess latency, and enable near‑by access—to achieve consistent, low‑latency reads.

BackendData ConsistencyDistributed Systems

0 likes · 15 min read

How to Ensure Immediate Reads After Writes in Multi-Active Architectures

IT Architects Alliance

Jan 8, 2025 · Backend Development

Mastering High Concurrency in Distributed Systems: Strategies & Real-World Cases

This article explores the challenges of handling massive simultaneous requests in distributed architectures and presents practical solutions such as load balancing, distributed caching, asynchronous processing, and sharding, illustrated with case studies from major e‑commerce and social platforms.

Backend ArchitectureDistributed Systemsasynchronous processing

0 likes · 20 min read

Mastering High Concurrency in Distributed Systems: Strategies & Real-World Cases

vivo Internet Technology

Jan 8, 2025 · Cloud Native

vivo Internet Technology 2024 Year-End Technical Review

In its 2024 year‑end review, vivo Internet Technology published 44 technical articles, delivered 19 presentations on cloud‑native, AI, security and more, joined major open‑source foundations such as Linux, CNCF and CCF, and grew a community of over 70,000 technical professionals.

Distributed SystemsJavaPerformance Optimization

0 likes · 8 min read

vivo Internet Technology 2024 Year-End Technical Review

IT Architects Alliance

Jan 7, 2025 · Industry Insights

Why Multi-Active Architecture Matters and How to Build It

The article explains why multi‑active (active‑active) architecture is essential for modern enterprises, outlines its evolution from single‑server setups, details core principles like redundancy and data synchronization, compares common deployment patterns, examines industry use cases, and discusses challenges and mitigation strategies.

Data ConsistencyDistributed Systemscloud computing

0 likes · 21 min read

Why Multi-Active Architecture Matters and How to Build It

Su San Talks Tech

Jan 7, 2025 · Databases

7 Common Pitfalls of Database Sharding and How to Solve Them

This article examines seven typical challenges that arise after implementing database sharding—such as global ID conflicts, cross‑shard queries, distributed transactions, shard‑key design, data migration, pagination, and operational overhead—and provides practical solutions and code examples to address each issue.

Distributed SystemsID generationdatabase scaling

0 likes · 12 min read

7 Common Pitfalls of Database Sharding and How to Solve Them

IT Architects Alliance

Jan 6, 2025 · Fundamentals

Mastering the CAP Theorem: Balancing Consistency, Availability, and Partition Tolerance

An in‑depth guide explains the CAP theorem’s three pillars—Consistency, Availability, Partition Tolerance—illustrates why only two can be achieved simultaneously, and shows real‑world trade‑offs across e‑commerce, finance, and social platforms, while introducing the complementary BASE model for practical system design.

AvailabilityBASE modelCAP theorem

0 likes · 15 min read

Mastering the CAP Theorem: Balancing Consistency, Availability, and Partition Tolerance

Architect

Jan 3, 2025 · Operations

Designing Multi‑Active Distributed Systems: Overcoming Write Latency and Data Replication Challenges

This article analyzes the architectural impact of cross‑city multi‑active deployments, focusing on data‑layer design, write latency, sharding strategies, replication topologies, and routing considerations to achieve high availability, performance, and scalability in large‑scale distributed systems.

Distributed Systemsdata replicationmulti-active architecture

0 likes · 22 min read

Designing Multi‑Active Distributed Systems: Overcoming Write Latency and Data Replication Challenges

Architecture Digest

Jan 3, 2025 · Operations

Designing High‑Availability Architecture with Rate Limiting, Circuit Breaking, and Degradation Strategies

This article explains how to build a highly available distributed e‑commerce system by using load‑balancing, redundant servers, rate‑limiting techniques, circuit‑breaker patterns, and graceful degradation methods, and provides concrete Spring Cloud and Java code examples for each strategy.

BackendCircuit BreakingDistributed Systems

0 likes · 18 min read

Designing High‑Availability Architecture with Rate Limiting, Circuit Breaking, and Degradation Strategies

dbaplus Community

Jan 1, 2025 · Backend Development

Mastering Multi-Active Data Architecture: Reducing Write Latency and Ensuring High Availability

This article examines the challenges of building multi‑active distributed systems, focusing on the data layer’s role in high availability, write‑latency, sharding, isolation, replication strategies, and routing decisions, and provides concrete architectural patterns and practical guidelines for robust backend design.

Distributed SystemsLatencydata replication

0 likes · 23 min read

Mastering Multi-Active Data Architecture: Reducing Write Latency and Ensuring High Availability

BirdNest Tech Talk

Dec 29, 2024 · Fundamentals

Unlocking Distributed System Design: 20 Core Patterns Explained

This article distills the key design patterns behind distributed systems—covering replication, partitioning, consensus, and fault‑tolerance—by presenting each pattern’s problem statement, concrete solution, trade‑offs, and technical considerations, all illustrated with real‑world examples from projects like Kafka and Cassandra.

ConsensusDesign PatternsDistributed Systems

0 likes · 18 min read

Unlocking Distributed System Design: 20 Core Patterns Explained

macrozheng

Dec 28, 2024 · Operations

What Makes China’s 12306 Railway Ticketing System So Resilient?

The article examines China’s 12306 railway ticketing platform, tracing its evolution from early Unix‑based reservation software to a massive, real‑time, three‑tier distributed system that handles billions of requests during peak travel periods, highlighting its architectural challenges, high‑concurrency solutions, and unique national centralization.

ChinaDistributed Systemshigh concurrency

0 likes · 9 min read

What Makes China’s 12306 Railway Ticketing System So Resilient?

Selected Java Interview Questions

Dec 24, 2024 · Backend Development

Design and Implementation of a Custom Distributed Job Scheduling Framework (k‑job)

This article introduces the motivation, architecture, technology choices, and key implementation details of a lightweight, highly extensible distributed job scheduling framework built on gRPC, Protobuf, a custom name‑server, and a bespoke message‑queue, addressing limitations of existing solutions like Quartz, XXL‑Job, and PowerJob.

Distributed SystemsJavaJob Scheduling

0 likes · 14 min read

Design and Implementation of a Custom Distributed Job Scheduling Framework (k‑job)

JavaEdge

Dec 23, 2024 · Backend Development

How Meta Achieves Near‑Perfect Cache Consistency: Lessons from Polaris

This article explains why cache consistency is critical for Meta, how the company measures and monitors consistency, the design of the Polaris system that detects and resolves stale cache entries, and provides a concrete Python‑style example illustrating the challenges and solutions.

ConsistencyDistributed SystemsMeta

0 likes · 14 min read

How Meta Achieves Near‑Perfect Cache Consistency: Lessons from Polaris

Architecture Digest

Dec 22, 2024 · Backend Development

Technical Overview and History of China’s 12306 Railway Ticketing System

The article provides a detailed, informal yet informative overview of the evolution, architecture, and massive scale challenges of China’s 12306 railway ticketing platform, tracing its roots from early Unix‑based systems to modern distributed backend solutions handling billions of requests during peak travel periods.

Backend ArchitectureChinaDistributed Systems

0 likes · 9 min read

Technical Overview and History of China’s 12306 Railway Ticketing System

Xiaohongshu Tech REDtech

Dec 19, 2024 · Databases

Data Consistency Verification Practices and Implementation at Xiaohongshu

Xiaohongshu built a lock‑free, non‑disruptive data‑consistency verification tool that automatically selects optimal methods, handles heterogeneous sources and dynamic changes, performs full and incremental checks via chunked checksums or row‑by‑row comparison, quickly isolates mismatches, and supports automatic remediation, ensuring reliable migrations and sharding.

Data ConsistencyDistributed Systemsdata validation

0 likes · 16 min read

Data Consistency Verification Practices and Implementation at Xiaohongshu

Java Tech Enthusiast

Dec 17, 2024 · Databases

DBOS – Database‑Oriented Operating System

DBOS, a Database‑Oriented Operating System proposed by Matei Zaharia and Michael Stonebrake, builds the OS atop a distributed, ACID‑compliant database, storing all system and application state in tables, which simplifies scaling, ensures strong consistency, improves debugging, and reduces attack surface for cloud‑native workloads.

DBOSDistributed SystemsOperating System

0 likes · 8 min read

DBOS – Database‑Oriented Operating System

MaGe Linux Operations

Dec 14, 2024 · Big Data

Master Kafka: From Core Concepts to Real-World Deployment

This comprehensive guide explains Kafka’s architecture, core APIs, topics and partitions, deployment steps, multi‑broker clustering, and practical use cases such as messaging, log aggregation, stream processing, and data import/export with Kafka Connect, providing a hands‑on tutorial for developers and engineers.

Distributed SystemsInstallationKafka

0 likes · 30 min read

Master Kafka: From Core Concepts to Real-World Deployment

Tencent Cloud Developer

Dec 12, 2024 · Backend Development

Common Rate Limiting Algorithms: Fixed Window, Sliding Window, Sliding Log, Leaky Bucket, and Token Bucket

The article examines five common rate‑limiting algorithms—Fixed Window, Sliding Window, Sliding Log, Leaky Bucket, and Token Bucket—detailing their principles, pros and cons, and providing complete C++ implementations to help developers choose the best approach for controlling traffic bursts and ensuring system stability.

BackendC++Distributed Systems

0 likes · 14 min read

Common Rate Limiting Algorithms: Fixed Window, Sliding Window, Sliding Log, Leaky Bucket, and Token Bucket

JD Tech Talk

Dec 11, 2024 · Backend Development

Analysis of Message Queue Disorder Issues and Practical Solutions

This article examines the root causes of message queue disorder in distributed systems, illustrates real‑world impacts such as data loss during migration, and presents concrete mitigation strategies including ordered messaging, pre‑processing checks, state‑machine handling, and monitoring to improve system reliability.

Distributed SystemsMessage QueueReliability

0 likes · 9 min read

Analysis of Message Queue Disorder Issues and Practical Solutions

FunTester

Dec 5, 2024 · Backend Development

Understanding Aeron: A High‑Performance Messaging Framework and Its Advantages

Aeron is an open‑source, low‑latency, high‑throughput messaging framework that leverages zero‑copy memory, shared‑memory IPC and UDP transport to deliver microsecond‑level latency for finance, gaming, and distributed systems, offering a simple API and powerful performance features.

AeronDistributed SystemsHigh-Performance Messaging

0 likes · 9 min read

Understanding Aeron: A High‑Performance Messaging Framework and Its Advantages

Architecture & Thinking

Dec 5, 2024 · Backend Development

Understanding Apache RocketMQ: Domain Model, Communication & Message Patterns

This article explains Apache RocketMQ's core components—including producers, topics, queues, and consumer groups—covers synchronous RPC versus asynchronous messaging, compares point‑to‑point and publish‑subscribe transmission models, and highlights their suitable scenarios and trade‑offs.

Backend DevelopmentDistributed SystemsMessage Queue

0 likes · 9 min read

Understanding Apache RocketMQ: Domain Model, Communication & Message Patterns

Sanyou's Java Diary

Dec 2, 2024 · Big Data

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower mechanics, logical and physical storage models, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, log retention and compaction, indexing, zero‑copy transmission, and the reliability concepts that ensure data durability.

Data StreamingDistributed SystemsKafka

0 likes · 18 min read

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

BirdNest Tech Talk

Dec 1, 2024 · Fundamentals

How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls

Establishing an RDMA connection requires exchanging key parameters such as LID, QP number, and memory keys, and this article systematically outlines the essential information, compares six exchange methods—from static configuration to distributed services—and evaluates their advantages, drawbacks, and suitable scenarios.

Distributed SystemsInfiniBandNetworking

0 likes · 7 min read

How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls

Architecture and Beyond

Nov 30, 2024 · Artificial Intelligence

Scalable Engineering Architecture for AIGC Products: Principles, Design, and Implementation

This article examines why scalability is a core requirement for AIGC products and presents a comprehensive engineering architecture—including modular design, distributed systems, resource scheduling, queue management, and layered architecture—to achieve high performance, cost efficiency, and long‑term maintainability.

AIGCDistributed SystemsScalability

0 likes · 20 min read

Scalable Engineering Architecture for AIGC Products: Principles, Design, and Implementation

Lobster Programming

Nov 28, 2024 · Fundamentals

How Paxos Guarantees Strong Consistency in Distributed Systems

This article explains the Paxos consensus algorithm, detailing its roles (proposer, acceptor, learner), the two-phase prepare and accept process, handling of proposal numbers, and how it ensures strong consistency across distributed nodes through examples and diagrams.

Distributed SystemsPaxosalgorithm

0 likes · 9 min read

How Paxos Guarantees Strong Consistency in Distributed Systems

Tencent Cloud Developer

Nov 27, 2024 · Databases

Analyzing the Write‑After‑Read Consistency Challenge in Multi‑Active Distributed Architectures

The article examines the write‑after‑read consistency problem in multi‑active cross‑region systems, compares single‑write‑single‑read routing, quorum‑based multi‑write‑multi‑read, and read‑while‑copy methods, explains why primary‑secondary replication is preferred, and proposes a four‑step framework—scenario flagging, data marking, latency evaluation, and near‑by asynchronous replication—to meet WAR requirements efficiently.

ConsistencyDatabase ReplicationDistributed Systems

0 likes · 12 min read

Analyzing the Write‑After‑Read Consistency Challenge in Multi‑Active Distributed Architectures

Alibaba Cloud Developer

Nov 25, 2024 · Fundamentals

How ZooKeeper Guarantees Sequential Order in Distributed Read‑Write Locks

This article explores how Alibaba’s Nuwa service and the open‑source ZooKeeper implement distributed read‑write locks using sequential files, detailing the challenges of maintaining cversion order during failover and the community’s solutions to ensure consistency and reliability.

ConsensusDistributed SystemsZooKeeper

0 likes · 14 min read

How ZooKeeper Guarantees Sequential Order in Distributed Read‑Write Locks

Architecture & Thinking

Nov 25, 2024 · Backend Development

Mastering RocketMQ: Core Concepts, Comparison, and Java Implementation

This comprehensive guide introduces RocketMQ's architecture, compares it with RabbitMQ and Kafka, outlines typical use cases, explains key concepts such as producers, brokers, consumers, topics, tags, and offsets, and provides complete Java code examples for building producers and consumers.

Backend DevelopmentDistributed SystemsJava

0 likes · 14 min read

Mastering RocketMQ: Core Concepts, Comparison, and Java Implementation

Top Architect

Nov 23, 2024 · Backend Development

Integrating Spring Boot with XXL-Job for Distributed Task Scheduling

This article explains how to integrate Spring Boot with the open‑source XXL‑Job distributed task scheduler, covering XXL‑Job fundamentals, configuration of the admin console and executor, Maven dependencies, property settings, code examples, @XxlJob annotation parameters, best practices, and includes additional promotional material.

Backend DevelopmentDistributed SystemsJava

0 likes · 16 min read

Integrating Spring Boot with XXL-Job for Distributed Task Scheduling

DataFunSummit

Nov 22, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec’s recommendation system architecture, detailing training and inference optimizations, embedding parallelism, CPU/GPU placement strategies, online learning pipelines, and network compression techniques that together improve scalability, latency, and cost efficiency.

Distributed SystemsEasyRecInference Optimization

0 likes · 15 min read

EasyRec Recommendation Algorithm Training and Inference Optimization

Zhuanzhuan Tech

Nov 20, 2024 · Backend Development

Design and Implementation of a High‑Performance Message Notification System

This article presents a comprehensive design of a high‑performance, fault‑tolerant message notification system, covering service partitioning, system architecture, idempotent processing, dynamic error detection, thread‑pool management, retry mechanisms, and stability measures such as traffic‑spike handling, resource isolation, third‑party protection, monitoring, and active‑active deployment.

Backend ArchitectureDistributed SystemsJava

0 likes · 16 min read

Design and Implementation of a High‑Performance Message Notification System

Top Architect

Nov 20, 2024 · Big Data

Understanding Distributed Systems and Kafka: Architecture, Message Ordering, and Java Consumer Practices

This article explains the fundamentals of distributed systems, introduces Apache Kafka's architecture and components, discusses how Kafka ensures ordered message consumption, and provides Java consumer configuration tips to maintain message order, offering practical guidance for backend developers working with streaming data.

Big DataDistributed SystemsJava

0 likes · 11 min read

Understanding Distributed Systems and Kafka: Architecture, Message Ordering, and Java Consumer Practices

JavaEdge

Nov 16, 2024 · Backend Development

How Netflix Built a Low‑Latency Distributed Counter Service at Scale

This article explains Netflix's distributed counter abstraction built on their time‑series service, detailing use cases, API design, counter types, implementation methods, control‑plane configuration, performance results, and future work to achieve near‑real‑time, low‑latency counting at massive scale.

Backend ArchitectureDistributed SystemsLow latency

0 likes · 25 min read

How Netflix Built a Low‑Latency Distributed Counter Service at Scale

Architecture & Thinking

Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataClickHouseCloud Native

0 likes · 22 min read

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

Volcano Engine Developer Services

Nov 14, 2024 · Cloud Computing

Why Edge Cloud Is the Next Frontier: Trends, Challenges, and Solutions

This article examines the evolution of edge cloud from its early CDN roots to modern edge-native operating systems, outlines the business drivers and technical challenges such as massive node management, lightweight constraints, weak network environments, and multi‑shape compute needs, and presents the architecture, key components, and future directions of edge cloud solutions.

Distributed SystemsEdge ComputingResource Management

0 likes · 22 min read

Why Edge Cloud Is the Next Frontier: Trends, Challenges, and Solutions

Cognitive Technology Team

Nov 14, 2024 · Operations

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

To ensure distributed applications can recover automatically from hardware, network, or service failures, this guide outlines three core capabilities—fault detection, graceful handling, and monitoring—plus practical strategies such as asynchronous component separation, retries, circuit breakers, isolation, load shedding, failover, compensation, checkpointing, graceful degradation, rate limiting, leader election, fault injection, chaos engineering, and use of availability zones.

Cloud NativeDistributed SystemsOperations

0 likes · 7 min read

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

DeWu Technology

Nov 13, 2024 · Backend Development

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

The new Rainbow Bridge architecture replaces the SLB‑based load‑balancing model with a self‑managed, multi‑AZ metadata center and enhanced SDK that aggregates node health, provides zone‑aware weighted routing, supports rapid failover and manual overrides, and delivers faster recovery and scalable traffic handling.

Distributed Systemsload balancingmetadata

0 likes · 11 min read

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

Baidu Tech Salon

Nov 8, 2024 · Cloud Computing

Design and Evolution of Baidu Canghai Storage Unified Technology Stack

Baidu Canghai Storage’s unified technology stack—comprising a meta‑aware distributed metadata layer, a hybrid single‑node‑distributed namespace, and an online erasure‑coding data layer—delivers AI‑driven, high‑performance, low‑cost, ZB‑scale cloud storage by modularizing metadata, namespace, and data services for object, file, and block workloads.

BaiduDistributed SystemsMicroservices

0 likes · 16 min read

Design and Evolution of Baidu Canghai Storage Unified Technology Stack

Huolala Tech

Nov 8, 2024 · Backend Development

How Huolala Built a Scalable Real‑Time Reconciliation Platform for Millions of Daily Transactions

Huolala’s real‑time reconciliation platform tackles massive daily transaction volumes by addressing distributed system consistency, high‑throughput data ingestion, dynamic cluster scaling, and security safeguards, enabling sub‑second settlement verification across hundreds of services.

Backend ArchitectureData ConsistencyDistributed Systems

0 likes · 10 min read

How Huolala Built a Scalable Real‑Time Reconciliation Platform for Millions of Daily Transactions

58 Tech

Nov 8, 2024 · Operations

Design and Optimization of an App Operation Platform: Ensuring High Availability, Performance, and Scalability

This article details the architecture, challenges, and optimization techniques of an app operation platform, covering its dual-engine design, caching strategies, and high‑availability principles that reduce response time to under 4 ms while supporting massive concurrent traffic.

App OperationsDistributed SystemsPerformance Optimization

0 likes · 7 min read

Design and Optimization of an App Operation Platform: Ensuring High Availability, Performance, and Scalability

Tencent Cloud Developer

Nov 7, 2024 · Backend Development

Cache Consistency Strategies and Best Practices for the Cache‑Aside Pattern

The article explains cache‑aside consistency challenges and compares four update strategies—DB‑then‑cache, cache‑then‑DB, DB‑then‑delete, and delete‑then‑DB—showing that deleting the cache after a successful DB write offers the smallest inconsistency window, while recommending TTLs, message‑queue invalidation, and multi‑key coordination for robust eventual consistency.

Cache ConsistencyDistributed Systemscache-aside

0 likes · 20 min read

Cache Consistency Strategies and Best Practices for the Cache‑Aside Pattern

Baidu Geek Talk

Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

Big DataDistributed SystemsNewSQL

0 likes · 18 min read

Architect

Nov 3, 2024 · Backend Development

How Ctrip Scaled Its Ticket Booking System for Flash‑Sale Events

This article analyzes the challenges Ctrip faced when handling massive traffic during ticket flash‑sale events and details the architectural upgrades, caching strategies, database optimizations, supplier integration safeguards, and traffic‑control mechanisms that enabled stable, fast, and consistent booking experiences.

BackendDistributed SystemsSystem Architecture

0 likes · 18 min read

How Ctrip Scaled Its Ticket Booking System for Flash‑Sale Events

Architect

Oct 31, 2024 · Cloud Native

Designing a Resilient Stateful Distributed System for Cloud‑Native Environments

This article analyzes the motivations, models, and design considerations for building stateful distributed architectures—covering microservices, service discovery, access‑layer isolation, fault tolerance, scaling, and deployment strategies—to help architects create reliable, low‑latency cloud‑native systems.

Cloud NativeDistributed SystemsMicroservices

0 likes · 33 min read

Designing a Resilient Stateful Distributed System for Cloud‑Native Environments

JD Retail Technology

Oct 31, 2024 · Big Data

JDQ Kafka Bandwidth Throttling Architecture and Optimization

This article presents an in‑depth analysis of Kafka's native throttling mechanisms, identifies their limitations in large‑scale e‑commerce scenarios, and introduces JDQ's multi‑dimensional, dynamic throttling architecture that ensures stable throughput and priority‑aware bandwidth management across broker failures and traffic spikes.

Distributed SystemsJDQKafka

0 likes · 17 min read

JDQ Kafka Bandwidth Throttling Architecture and Optimization

Tencent Cloud Developer

Oct 31, 2024 · Backend Development

Monolith vs Microservices: Evolution of Architecture and How to Choose

The article traces software architecture from early distributed systems through monoliths, SOA, microservices and serverless, explaining why each paradigm arose, the trade‑offs they entail, and how to decide between monolith and microservices based on team size, expertise, organizational structure, and business needs.

Distributed SystemsMicroservicesSoftware Architecture

0 likes · 25 min read

Architect

Oct 30, 2024 · Backend Development

How to Build Distributed WebSocket Messaging with Spring, Redis, and Kafka

This article explains how to enable cross‑node WebSocket communication in a distributed Spring application by using a message queue (Redis or Kafka) to broadcast messages, tracking user connections with Redis, and providing a complete step‑by‑step implementation with code samples and configuration details.

Distributed SystemsJavaKafka

0 likes · 20 min read

How to Build Distributed WebSocket Messaging with Spring, Redis, and Kafka

Baidu Geek Talk

Oct 30, 2024 · Cloud Computing

Baidu Cloud Infrastructure for AI-Native Era

Baidu Intelligent Cloud outlines how its evolving, high-performance infrastructure—featuring rapid 3-minute instance provisioning, over 200 GB bandwidth, elastic computing, specialized storage, and AI-driven MLOps tools—enables AI-native model training and deployment across booming sectors such as automotive and finance, supporting the industry’s shift to AI-centric cloud services.

Case StudiesDistributed SystemsMLOps

0 likes · 9 min read

Baidu Cloud Infrastructure for AI-Native Era

Tencent Cloud Middleware

Oct 30, 2024 · Backend Development

How Kafka Guarantees High Reliability and Performance – A Deep Technical Dive

This article thoroughly examines Apache Kafka’s architecture, covering its macro components, ack strategies, replication mechanisms, high‑watermark handling, leader election, and performance optimizations such as batch sending, compression, PageCache, zero‑copy, mmap and sendfile, while also explaining common pitfalls like data loss and log corruption.

Distributed SystemsKafkaMessage Queue

0 likes · 31 min read

How Kafka Guarantees High Reliability and Performance – A Deep Technical Dive

Tencent Cloud Developer

Oct 22, 2024 · Industry Insights

Designing Stateful Distributed Systems: Core Principles and Architecture Patterns

This article analyzes the motivations, benefits, and challenges of building stateful distributed systems, compares monolithic, SOA, and microservice models, and provides detailed guidance on access layers, service discovery, fault tolerance, scaling, and data storage for cloud‑native architectures.

Cloud NativeDistributed SystemsMicroservices

0 likes · 29 min read

Designing Stateful Distributed Systems: Core Principles and Architecture Patterns

JavaEdge

Oct 21, 2024 · Operations

Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture

This article explores the advantages of unitized architecture over traditional microservices, detailing how its modular design, dedicated routing layer, and tailored observability practices enhance system resilience, fault‑tolerance, and operational insight for large‑scale distributed applications.

Distributed SystemsResiliencefault tolerance

0 likes · 17 min read

Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture

Baidu Geek Talk

Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGClickHouseCloud Native

0 likes · 19 min read

Architect

Oct 17, 2024 · Operations

Designing Multi‑Active Distributed Systems: Key Factors and Replication Strategies

This article analyzes the architectural challenges of building large‑scale distributed systems with multi‑active (cross‑city) capabilities, focusing on data‑layer design, write latency, replication models, sharding techniques, and routing impacts to guide reliable, high‑performance infrastructure decisions.

Distributed Systemsarchitecturedata replication

0 likes · 22 min read

Designing Multi‑Active Distributed Systems: Key Factors and Replication Strategies

Tencent Cloud Developer

Oct 15, 2024 · Industry Insights

Why Write Latency Drives Multi‑Active Distributed Architecture Design

This article analyzes how write latency, write volume, isolation, and data replication strategies influence the design of multi‑active distributed systems, offering practical guidance on sharding, synchronous and asynchronous replication, routing, and architecture selection for high availability and performance across regions.

Distributed Systemsdata replicationhigh availability

0 likes · 23 min read

Why Write Latency Drives Multi‑Active Distributed Architecture Design

Qunar Tech Salon

Oct 10, 2024 · Operations

Design and Architecture of a Distributed Task Scheduling System for Database Automation

This document outlines the terminology, background, requirements, task classifications, state model, and detailed architecture—including TaskScheduler, TaskWorker, and TaskConsole components—of a new distributed task scheduling system designed to replace Celery in a database automation platform, with emphasis on scalability, reliability, and extensibility.

Distributed SystemsLocksOperations

0 likes · 23 min read

Design and Architecture of a Distributed Task Scheduling System for Database Automation

MaGe Linux Operations

Oct 7, 2024 · Operations

Why Choose RocketMQ? Features, Comparisons, and Reliability Explained

This article provides a comprehensive overview of RocketMQ, covering its architecture, key features such as high reliability, low latency and high throughput, comparisons with Kafka, RabbitMQ and ActiveMQ, and detailed mechanisms that ensure message durability, performance, and ordered consumption.

Distributed SystemsLow latencyMessage Queue

0 likes · 12 min read

Why Choose RocketMQ? Features, Comparisons, and Reliability Explained

Su San Talks Tech

Oct 5, 2024 · Backend Development

Mastering Idempotency: Design Patterns and Code Solutions for Reliable APIs

Idempotency ensures that repeated API calls produce the same result without side effects, and this guide explains its principles, common scenarios like payments and messaging, root causes of idempotency failures, and multiple implementation strategies—including unique constraints, optimistic and pessimistic locks, distributed locks, token mechanisms, state machines, and deduplication tables—with practical code examples.

BackendDistributed SystemsIdempotency

0 likes · 14 min read

Mastering Idempotency: Design Patterns and Code Solutions for Reliable APIs

dbaplus Community

Oct 3, 2024 · Operations

How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems

This article explains Netflix's chaos engineering practice, detailing the challenges of microservice reliability, the implementation of the Chaos Monkey tool, the step‑by‑step methodology, guiding principles, and real‑world outcomes that demonstrate improved system availability.

Chaos MonkeyDistributed SystemsNetflix

0 likes · 6 min read

How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems

Open Source Tech Hub

Oct 1, 2024 · Backend Development

Build a Distributed Casbin Watcher with Workerman Redis in PHP

This guide explains how to implement a Casbin Watcher for distributed policy synchronization using Workerman's asynchronous Redis client in PHP, covering the underlying principles, required interfaces, code implementation, and a complete usage example with publish‑subscribe messaging.

CasbinDistributed SystemsPHP

0 likes · 7 min read

Build a Distributed Casbin Watcher with Workerman Redis in PHP

IT Services Circle

Sep 27, 2024 · Operations

Analysis of the Shanghai Stock Exchange Outage and System Design Lessons

The article recounts the Shanghai Stock Exchange’s sudden P0 outage that halted trading, analyzes the causes such as massive order volume and system bottlenecks, and discusses how distributed architectures and message‑queue based queuing can mitigate similar high‑concurrency failures.

Distributed SystemsOperationshigh concurrency

0 likes · 6 min read

Analysis of the Shanghai Stock Exchange Outage and System Design Lessons

AntData

Sep 26, 2024 · Databases

Apache HoraeDB (CeresDB): An Open‑Source Distributed Time‑Series Database

Apache HoraeDB (CeresDB) is an open‑source, distributed, high‑availability time‑series database developed by Ant Group, supporting multi‑dimensional queries, compatible with Prometheus and OpenTSDB, and offering SQL and OLAP capabilities for use cases such as APM, IoT monitoring, financial analytics, and AI‑infra observability.

Distributed SystemsObservabilitySQL

0 likes · 5 min read

Apache HoraeDB (CeresDB): An Open‑Source Distributed Time‑Series Database

Java Architecture Stack

Sep 26, 2024 · Backend Development

Deep Dive into SOFAJRaft: How Java Implements Multi‑Raft Consensus

This article examines the core implementation of SOFAJRaft, a high‑performance Java library based on the Raft consensus algorithm, covering node initialization, leader election, log replication, snapshot handling, fault recovery, and multi‑Raft‑Group support with detailed code examples.

ConsensusDistributed SystemsJava

0 likes · 13 min read

Deep Dive into SOFAJRaft: How Java Implements Multi‑Raft Consensus

Baidu Tech Salon

Sep 25, 2024 · Backend Development

Innovative Solutions for Reducing Result Inconsistency in Baidu Search System

The paper introduces a production‑grade framework that uses tiny controlled traffic, feature‑flattening experiments, dynamic debugging, and an automated inspection flywheel to measure each component’s contribution to Baidu’s search result diff‑rate, isolate root causes, and dramatically reduce inconsistency without impacting real users.

DebuggingDistributed Systemsdata flattening

0 likes · 13 min read

Innovative Solutions for Reducing Result Inconsistency in Baidu Search System

Baidu Geek Talk

Sep 25, 2024 · Industry Insights

How Baidu Eliminated Search Result Inconsistencies with Data‑Flattening Experiments

Baidu tackled the challenge of search result inconsistency by quantifying diff rates, designing a data‑flattening technique, leveraging fake traffic and dynamic debugging, orchestrating large‑scale experiments, and automating inspection, ultimately identifying all contributing features and dramatically reducing result volatility.

BaiduDistributed Systemsdata flattening

0 likes · 15 min read

How Baidu Eliminated Search Result Inconsistencies with Data‑Flattening Experiments

Architect

Sep 24, 2024 · Industry Insights

How Bilibili Re‑engineered Its Search Indexing Pipeline for Hour‑Level Turnaround

This article details Bilibili's transformation of its search offline indexing architecture—from a manual, low‑throughput MySQL‑centric process to a distributed, KV‑based, protobuf‑driven pipeline that leverages Taishan storage and Spark, cutting build cycles from days to hours while solving performance, consistency, and maintenance challenges.

Big DataDistributed SystemsProtobuf

0 likes · 24 min read

How Bilibili Re‑engineered Its Search Indexing Pipeline for Hour‑Level Turnaround

Su San Talks Tech

Sep 22, 2024 · Backend Development

Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions

This article explains why rate limiting is essential for microservice stability, introduces basic concepts like thresholds and rejection strategies, and walks through multiple algorithms—including fixed‑window, sliding‑window, sliding‑log, leaky‑bucket, token‑bucket—and their Java implementations as well as Redis‑based distributed approaches, complete with code samples and performance considerations.

Backend DevelopmentDistributed SystemsJava

0 likes · 25 min read

Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions

Java Tech Enthusiast

Sep 20, 2024 · Backend Development

What Is RPC and Why It Is Not a Protocol

The article clarifies that RPC (Remote Procedure Call) is a mechanism—not a protocol—used to abstract network communication so remote methods can be invoked like local calls, illustrating its design with LPC, dynamic proxies, request handlers, and showing HTTP as one possible implementation.

Backend DevelopmentDistributed SystemsIPC

0 likes · 6 min read

What Is RPC and Why It Is Not a Protocol

Deepin Linux

Sep 19, 2024 · Backend Development

Comprehensive Guide to gRPC: Concepts, C++ Implementation, and Real‑World Use Cases

This article explains the limitations of traditional RPC, introduces gRPC and Protocol Buffers, details their architecture and performance advantages, provides step‑by‑step C++ server and client code, and discusses practical scenarios such as microservices, real‑time data processing, and a file‑storage service example.

C++Distributed SystemsMicroservices

0 likes · 29 min read

Comprehensive Guide to gRPC: Concepts, C++ Implementation, and Real‑World Use Cases

FunTester

Sep 18, 2024 · Operations

Overview and Practice of Chaos Engineering

Chaos Engineering introduces controlled failures to test system resilience, covering its history, practical benefits, experiment design, and a comparison of popular open‑source and commercial tools for improving reliability in distributed and cloud‑native environments.

Distributed SystemsReliability

0 likes · 13 min read

Overview and Practice of Chaos Engineering

macrozheng

Sep 12, 2024 · Backend Development

How to Design Scalable, Unique Order Numbers for High‑Traffic Systems

This article examines common order‑number generation rules and compares four practical solutions—UUID, database auto‑increment, Snowflake algorithm, and Redis INCR—providing code examples and best‑practice recommendations for building globally unique, fast‑producing identifiers in distributed backend systems.

Distributed Systemsorder IDunique identifier

0 likes · 12 min read

How to Design Scalable, Unique Order Numbers for High‑Traffic Systems

Alibaba Cloud Developer

Sep 11, 2024 · Backend Development

How a Two‑Level Cache Boosted High‑Concurrency Container Performance

By redesigning the caching layer with a two‑level architecture combining local and distributed caches, the author dramatically reduced CPU usage, lowered response times, and increased system capacity under high QPS workloads, while evaluating trade‑offs of various cache strategies, pre‑warming, refresh mechanisms, and operational considerations.

Distributed SystemsPerformance Optimizationcaching

0 likes · 11 min read

How a Two‑Level Cache Boosted High‑Concurrency Container Performance

MaGe Linux Operations

Sep 10, 2024 · Backend Development

How Kafka Elects Leaders and Distributes Partitions: A Deep Dive

This article explains Kafka's leader election process, partition assignment strategy, distribution policies, file layout, and the evolution of consumer offset storage, providing a comprehensive overview of how Kafka ensures reliable and efficient message handling in a distributed environment.

Distributed SystemsKafkaPartition Assignment

0 likes · 5 min read

How Kafka Elects Leaders and Distributes Partitions: A Deep Dive