Tagged articles

2122 articles

Page 2 of 22

Aug 28, 2025 · Fundamentals

5 Architecture Elements, 15 Design Principles & 6 Common Pitfalls

This article explains the essential components of software architecture—elements, structure, and connections—while presenting fifteen universal design principles, practical guidelines for monolithic, distributed, and microservice systems, and six common architectural mistakes to avoid, helping teams build scalable, reliable, and maintainable solutions.

Distributed SystemsMicroservicesScalability

0 likes · 21 min read

5 Architecture Elements, 15 Design Principles & 6 Common Pitfalls

Tencent Cloud Developer

Aug 27, 2025 · Fundamentals

Mastering Software Architecture: 15 Universal Principles, Common Pitfalls, and Evolution from Monolith to Microservices

This article explains the core concept of software architecture as elements, structure, and connections, distinguishes systems, subsystems, modules, components and frameworks, compares architecture classifications, describes the evolution from monolithic to distributed and microservice designs, presents fifteen practical design principles, and warns about six typical architectural pitfalls.

Distributed SystemsMicroservicesSoftware Architecture

0 likes · 23 min read

Mastering Software Architecture: 15 Universal Principles, Common Pitfalls, and Evolution from Monolith to Microservices

Architect's Guide

Aug 25, 2025 · Fundamentals

19 Essential Distributed System Design Patterns You Must Know

This article explores nineteen core design patterns for distributed systems—including Bloom filters, consistent hashing, quorum, leader‑follower, heartbeat, fencing, WAL, segmented logs, high‑water mark, leases, gossip, Phi accrual detection, split‑brain handling, checksums, CAP and PACELC theorems, hinted handoff, read repair, and Merkle trees—explaining their purpose, operation, and typical use cases.

ConsistencyDistributed Systemsfault tolerance

0 likes · 14 min read

19 Essential Distributed System Design Patterns You Must Know

Tech Freedom Circle

Aug 24, 2025 · Operations

How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem

A leading financial platform suffered a six‑hour outage and $170 million loss when its Nacos service‑registry cluster entered a split‑brain state due to network partition, exposing flaws in AP‑mode deployment, monitoring gaps, and cascading failures that were later resolved through Raft migration, multi‑active architecture, and client‑side resilience.

Distributed SystemsMicroservicesNacos

0 likes · 32 min read

How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem

Big Data Technology Tribe

Aug 22, 2025 · Backend Development

How StarRocks Keeps Metadata Consistent Across FE Nodes

This article explains the roles of StarRocks FE and BE nodes, details the metadata stored in FE, describes the leader‑follower‑observer architecture, and shows how BDB JE replication, journal logs, and checkpoint mechanisms ensure metadata synchronization and durability even after node failures.

BDB JEDistributed SystemsReplication

0 likes · 17 min read

How StarRocks Keeps Metadata Consistent Across FE Nodes

Wukong Talks Architecture

Aug 21, 2025 · Operations

Why LinkedIn Dropped Kafka for Northguard – A Deep Dive into Its Architecture

LinkedIn, the creator of Kafka, has largely abandoned Kafka in favor of a new log storage system called Northguard, whose design mirrors Apache Pulsar with features like storage‑compute separation, log striping, and a multi‑layer data model, offering superior scalability, operability, consistency, and durability for massive data streams.

Apache PulsarDistributed SystemsLinkedIn

0 likes · 22 min read

Why LinkedIn Dropped Kafka for Northguard – A Deep Dive into Its Architecture

Open Source Tech Hub

Aug 21, 2025 · Backend Development

Build a Scalable Distributed Captcha Login with PHP Webman and Redis

This guide explains how to replace traditional session‑based captcha authentication with a Redis‑backed, token‑driven solution using the high‑performance PHP Webman framework and the tinywan/captcha plugin, covering architecture, generation and verification flows, installation steps, and code examples.

CaptchaDistributed SystemsPHP

0 likes · 8 min read

Build a Scalable Distributed Captcha Login with PHP Webman and Redis

Xiaokun's Architecture Exploration Notes

Aug 17, 2025 · Backend Development

Why Multi-Leader Replication Causes Conflicts and How to Tackle Them

This article examines how single‑value objects encounter data‑value conflicts in multi‑leader replication architectures, explains the role of causal ordering versus concurrent writes, and outlines the need for global clocks, versioning, and broadcast mechanisms to resolve such conflicts.

Distributed SystemsVersion Vectorconflict resolution

0 likes · 7 min read

Why Multi-Leader Replication Causes Conflicts and How to Tackle Them

Mike Chen's Internet Architecture

Aug 17, 2025 · Big Data

Master Kafka: Essential Commands for Starting, Managing Topics, and Messaging

This guide walks you through the core Kafka commands for starting and stopping the service, creating, listing, describing, and deleting topics, as well as producing and consuming messages, while explaining key parameters such as Zookeeper, partitions, and replication factors.

Big DataDistributed SystemsKafka

0 likes · 4 min read

Master Kafka: Essential Commands for Starting, Managing Topics, and Messaging

Sohu Tech Products

Aug 13, 2025 · Backend Development

How to Build a Strictly Incremental Distributed ID System with Redis, MySQL, and Nacos

This article examines the challenges of distributed ID generation, compares common solutions like UUID and Snowflake, and presents a custom approach that combines MySQL segment tables, Redis caching, and Nacos switches to achieve high‑performance, strictly incremental IDs with automatic failover.

Backend DevelopmentDistributed SystemsID generation

0 likes · 11 min read

How to Build a Strictly Incremental Distributed ID System with Redis, MySQL, and Nacos

Xiaokun's Architecture Exploration Notes

Aug 10, 2025 · Databases

How Single-Leader Replication Handles Write Conflicts: Strategies and Insights

This article examines write conflicts in single-leader replication, comparing exclusive and shared data, exploring uniqueness constraints, async replication delays, and various conflict‑resolution techniques such as unique indexes, bitmap mapping, LWW ordering, and multi‑version control for collaborative editing.

ConsistencyDistributed SystemsReplication

0 likes · 9 min read

How Single-Leader Replication Handles Write Conflicts: Strategies and Insights

Didi Tech

Aug 7, 2025 · Cloud Native

How HUATUO Revolutionizes Cloud‑Native Observability with Zero‑Impact BPF Tracing

HUATUO, Didi's open‑source cloud‑native observability project, leverages BPF‑based low‑overhead kernel tracing, unified metric and event frameworks, automatic flame‑graph generation, and seamless integration with Prometheus, Grafana and Elasticsearch to provide panoramic, zero‑intrusive monitoring and continuous performance profiling for complex production environments.

BPFCloud NativeDistributed Systems

0 likes · 11 min read

How HUATUO Revolutionizes Cloud‑Native Observability with Zero‑Impact BPF Tracing

Zhuanzhuan Tech

Aug 6, 2025 · Backend Development

Mastering Distributed Caching: Easy-Cache’s Multi‑Level Dynamic Upgrade and Consistency

This article introduces Easy-Cache, a Spring‑AOP based caching framework that eliminates repetitive cache code by offering annotation‑driven operations, multi‑level Redis and local caches, dynamic upgrade/downgrade, elastic expiration, and Lua‑script‑backed consistency mechanisms for high‑availability distributed systems.

Distributed SystemsLuaredis

0 likes · 18 min read

Mastering Distributed Caching: Easy-Cache’s Multi‑Level Dynamic Upgrade and Consistency

Xiaokun's Architecture Exploration Notes

Aug 3, 2025 · Fundamentals

Understanding Causal Consistency: Order Guarantees, Lamport Timestamps, and Total Order Broadcast

This article explains the challenges of implementing causal consistency, compares it with linear and sequential consistency, describes how order guarantees are enforced in leader‑based replication, introduces Lamport timestamps and total‑order broadcast, and outlines practical approaches for achieving causal consistency in distributed systems.

ConsistencyDistributed SystemsLamport timestamp

0 likes · 14 min read

Understanding Causal Consistency: Order Guarantees, Lamport Timestamps, and Total Order Broadcast

Big Data Technology Tribe

Jul 30, 2025 · Backend Development

How InfiniFS Optimizes Metadata Access with Optimistic Cache and Lazy Invalidation

This article explains InfiniFS's cache organization for directory metadata, its optimistic cache usage, and the lazy invalidation mechanism that broadcasts rename updates to a few metadata servers, enabling scalable and efficient metadata services in large‑scale distributed file systems.

Cache DesignDistributed SystemsMetadata Caching

0 likes · 7 min read

How InfiniFS Optimizes Metadata Access with Optimistic Cache and Lazy Invalidation

Xiaokun's Architecture Exploration Notes

Jul 27, 2025 · Fundamentals

Can Multi-Leader and Leaderless Replication Achieve Linear Consistency?

This article examines why multi‑leader and leaderless replication models struggle to provide linear consistency, explores write‑conflict handling, quorum‑based NWR mechanisms, sloppy quorum and hinted‑handoff techniques, and summarizes the trade‑offs involved in achieving strong consistency across distributed data centers.

Distributed SystemsMulti-LeaderReplication

0 likes · 15 min read

Can Multi-Leader and Leaderless Replication Achieve Linear Consistency?

Architect

Jul 24, 2025 · Backend Development

Mastering RPC: From Basics to Building Your Own Framework

This article explains what RPC is, why RPC frameworks are needed, the underlying principles and technologies such as dynamic proxies, serialization, NIO communication, service registration, governance and routing, and walks through a simple hand‑crafted RPC implementation with a comparison of popular frameworks.

Distributed SystemsRPC

0 likes · 18 min read

Mastering RPC: From Basics to Building Your Own Framework

Architect's Guide

Jul 24, 2025 · Backend Development

7 Proven Strategies to Prevent Overselling in High‑Concurrency Flash Sales (SpringBoot)

This article explores high‑concurrency flash‑sale scenarios, demonstrates why naïve @Transactional and lock usage can still cause overselling, and presents seven concrete implementations—including improved lock, AOP lock, two pessimistic‑lock variants, optimistic lock, a blocking queue, and a Disruptor queue—complete with SpringBoot code, JMeter testing results, and performance analysis.

Distributed SystemsSpringBootconcurrency

0 likes · 23 min read

7 Proven Strategies to Prevent Overselling in High‑Concurrency Flash Sales (SpringBoot)

DaTaobao Tech

Jul 23, 2025 · Artificial Intelligence

How Alibaba’s New Distributed Agent Framework Solves 2C AI Challenges

Alibaba introduces the ali‑langengine‑dflow framework, a hybrid distributed‑agent architecture that moves core intelligence to the cloud while keeping execution reachable on heterogeneous client devices, addressing data‑isolation, latency and security issues of existing cloud‑VM and local‑agent solutions for 2C internet services.

AIAgentDistributed Systems

0 likes · 21 min read

How Alibaba’s New Distributed Agent Framework Solves 2C AI Challenges

Su San Talks Tech

Jul 22, 2025 · Backend Development

10 Common Microservice Pitfalls and How to Avoid Them

This article shares ten frequent microservice problems—from improper service splitting and distributed transaction failures to configuration chaos, logging fragmentation, database sharing, API incompatibility, CI bottlenecks, missing monitoring, and team collaboration issues—offering concrete solutions, best‑practice principles, and code examples to help engineers build robust, maintainable microservice systems.

Backend ArchitectureDistributed SystemsMicroservices

0 likes · 11 min read

10 Common Microservice Pitfalls and How to Avoid Them

Xiaokun's Architecture Exploration Notes

Jul 20, 2025 · Databases

How to Achieve Linear Consistency in Single-Leader Replication: Challenges and Solutions

This article examines eventual and linear consistency in leader‑based replication, explains read‑your‑writes and shared‑data scenarios, discusses replication lag, failover trade‑offs, multi‑data‑center risks, and shows how consensus systems like ZooKeeper and etcd implement true linear consistency.

Distributed SystemsReplicationdatabases

0 likes · 13 min read

How to Achieve Linear Consistency in Single-Leader Replication: Challenges and Solutions

Java Tech Enthusiast

Jul 19, 2025 · Big Data

How to De‑duplicate 1 Billion QQ Numbers with Bitmap, Bloom Filter, and Distributed Solutions

This article explores multiple techniques—including bitmap indexing, Bloom filters, external sorting, Spark, Redis, and a hierarchical bitmap architecture—to efficiently deduplicate ten‑hundred‑million QQ numbers while balancing memory usage, accuracy, and processing speed.

BitmapDistributed Systemsbloom-filter

0 likes · 12 min read

How to De‑duplicate 1 Billion QQ Numbers with Bitmap, Bloom Filter, and Distributed Solutions

Kuaishou Tech

Jul 17, 2025 · Artificial Intelligence

How DHPS Boosted Online Inference Throughput by 270% with RDMA

This article details the design and evolution of DHPS, Kuaishou's load‑balanced, RDMA‑based high‑performance service architecture, explaining its network, storage, and traffic‑scheduling innovations that deliver over 270% query‑throughput improvement, lower latency, reduced CPU usage, and near‑five‑nine availability for large‑scale AI inference workloads.

Distributed SystemsRDMAStorage Engine

0 likes · 17 min read

How DHPS Boosted Online Inference Throughput by 270% with RDMA

Su San Talks Tech

Jul 17, 2025 · Big Data

How to De‑Duplicate 1 Billion QQ Numbers Using Under 1 GB of Memory

This article explores multiple techniques—including bitmap indexing, Bloom filters, external sorting, Spark, and layered bitmap structures—to efficiently remove duplicate QQ numbers from a dataset of up to one billion entries while keeping memory usage below a gigabyte and maintaining high accuracy.

BitmapDistributed SystemsSpark

0 likes · 12 min read

How to De‑Duplicate 1 Billion QQ Numbers Using Under 1 GB of Memory

Xiaokun's Architecture Exploration Notes

Jul 13, 2025 · Databases

Why Transaction Consistency Matters: From ACID to Distributed Consensus

This article explores the purpose of database transactions, the various failure scenarios they guard against, the nuances of consistency across ACID, replica and CAP models, and how atomic commit and consensus mechanisms ensure reliable distributed data processing.

ACIDConsensusConsistency

0 likes · 11 min read

Why Transaction Consistency Matters: From ACID to Distributed Consensus

Su San Talks Tech

Jul 13, 2025 · Backend Development

8 Proven Retry Strategies to Prevent Costly Failures in Distributed Systems

Discover why improper retry logic can cause massive financial losses, learn eight practical retry solutions—from simple loops to advanced Resilience4j and distributed lock techniques—and see how to avoid retry storms, ensure idempotency, and protect resources in high‑traffic backend services.

Distributed SystemsIdempotencyResilience

0 likes · 13 min read

8 Proven Retry Strategies to Prevent Costly Failures in Distributed Systems

IT Services Circle

Jul 11, 2025 · Backend Development

10 Essential System Design Trade‑offs Every Engineer Should Master

Understanding system design trade‑offs is crucial for building robust software; this article examines ten common compromises—from vertical vs. horizontal scaling and SQL vs. NoSQL to CAP theorem, consistency models, REST vs. GraphQL, stateful vs. stateless architectures, caching strategies, and synchronous vs. asynchronous processing—highlighting their benefits and drawbacks.

Backend ArchitectureDistributed SystemsScalability

0 likes · 10 min read

10 Essential System Design Trade‑offs Every Engineer Should Master

IT Architects Alliance

Jul 10, 2025 · Cloud Native

Inside Alibaba’s Tech Stack: Cloud‑Native Architecture Behind Billions of Transactions

This article examines Alibaba's extensive cloud‑native technology stack—including distributed computing, storage, middleware, real‑time data processing, AI platforms, performance engineering, and security—revealing how its architects design systems that handle massive transaction volumes during events like Double 11.

Big DataDistributed SystemsMicroservices

0 likes · 12 min read

Inside Alibaba’s Tech Stack: Cloud‑Native Architecture Behind Billions of Transactions

Big Data Technology Tribe

Jul 9, 2025 · Backend Development

Mastering Idempotency: Design Patterns & Best Practices for Reliable Distributed Systems

This comprehensive guide explains the concept of idempotency, why it is essential in distributed and micro‑service architectures, and provides practical patterns, code examples, and best‑practice recommendations for HTTP, databases, messaging, caching, and service‑mesh implementations.

BackendDesign PatternsDistributed Systems

0 likes · 21 min read

Mastering Idempotency: Design Patterns & Best Practices for Reliable Distributed Systems

IT Architects Alliance

Jul 8, 2025 · Cloud Native

Why Do Big‑Tech Architects Earn Six Figures? The Skills That Set Them Apart

The article explores why architects at leading tech firms command six‑figure salaries while those in traditional companies earn far less, highlighting gaps in technical depth, massive data handling, performance optimization, business insight, continuous learning, and the scarcity of true senior architects.

Big DataCareer DevelopmentDistributed Systems

0 likes · 9 min read

Why Do Big‑Tech Architects Earn Six Figures? The Skills That Set Them Apart

Big Data Technology Tribe

Jul 8, 2025 · Operations

Mastering Retry Strategies: Why Exponential Backoff Is Essential for Reliable Systems

This article explains the purpose of retry mechanisms, why exponential backoff is crucial for handling transient failures, compares common backoff strategies, details key parameters such as base delay, max delay, multiplier and jitter, and provides a Java example that demonstrates their practical effects.

Distributed SystemsJavaRetry

0 likes · 6 min read

Mastering Retry Strategies: Why Exponential Backoff Is Essential for Reliable Systems

Practical DevOps Architecture

Jul 8, 2025 · Big Data

Master High‑Performance E‑Commerce Search with Elasticsearch & SpringBoot

This comprehensive course teaches developers how to design and implement a high‑throughput, scalable search engine for e‑commerce platforms using Elasticsearch and SpringBoot, covering architecture, data modeling, performance tuning, and advanced features such as autocomplete, fuzzy correction, price filtering, and sales reporting.

Distributed SystemsElasticsearchSpringBoot

0 likes · 8 min read

Master High‑Performance E‑Commerce Search with Elasticsearch & SpringBoot

Xiaokun's Architecture Exploration Notes

Jul 6, 2025 · Databases

Demystifying Consistency Models: From Linear to Eventual in Distributed Systems

This article explores the concept of consistency in distributed systems, breaking down various consistency models—including linear, sequential, causal, and eventual—explaining their definitions, practical implications, and how they guide the design of high‑availability architectures and data replication strategies.

ConsistencyDistributed Systemsconsistency models

0 likes · 13 min read

Demystifying Consistency Models: From Linear to Eventual in Distributed Systems

IT Architects Alliance

Jul 6, 2025 · Backend Development

Why Microservices Are the Secret to Higher Salaries and Scalable Systems

Microservices have become the standard architecture for large internet companies, offering superior scalability, maintainability, and team autonomy compared to monolithic systems, while demanding a broad tech stack—including service discovery, API gateways, container orchestration, and distributed transaction handling—making expertise in this area highly lucrative.

Backend ArchitectureCloud NativeDevOps

0 likes · 9 min read

Why Microservices Are the Secret to Higher Salaries and Scalable Systems

Deepin Linux

Jul 4, 2025 · Backend Development

Mastering Protocol Buffers in C++: Installation, Data Types, and Real‑World Use Cases

This comprehensive guide explains what Protocol Buffers are, why they outperform JSON and XML, how to install and configure the library, the supported data types, code generation for multiple languages, practical C++ examples, and typical scenarios such as distributed systems, storage, and network communication.

CData StructuresDistributed Systems

0 likes · 23 min read

Mastering Protocol Buffers in C++: Installation, Data Types, and Real‑World Use Cases

Su San Talks Tech

Jul 3, 2025 · Databases

Mastering MySQL Sharding: Strategies for 1 Billion Orders

This article explores the pain points of a 700‑million‑row MySQL order table, presents vertical and horizontal sharding strategies, introduces gene‑based Snowflake IDs, details routing logic, migration steps, common pitfalls, and shows performance gains after applying the final architecture.

Distributed SystemsPerformance Optimizationdatabase scaling

0 likes · 9 min read

Mastering MySQL Sharding: Strategies for 1 Billion Orders

Selected Java Interview Questions

Jul 1, 2025 · Backend Development

Why Our Custom Snowflake ID Failed and How to Build Reliable IDs

A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake generator, prompting a deep dive into the standard algorithm, the mistakes in the bespoke version, and practical recommendations for using proven implementations and proper machine‑ID configuration.

Backend DevelopmentDistributed SystemsID generation

0 likes · 7 min read

Why Our Custom Snowflake ID Failed and How to Build Reliable IDs

Architecture & Thinking

Jun 30, 2025 · Backend Development

Mastering RocketMQ Retry: Producer & Consumer Strategies for Reliable Messaging

This article deeply explores Apache RocketMQ's retry mechanisms, detailing producer and consumer retry strategies, flow control handling, dead‑letter queue management, advanced configurations, best practices, and comparisons with Kafka and RabbitMQ, providing practical code examples and monitoring recommendations for building highly reliable distributed systems.

Dead Letter QueueDistributed SystemsIdempotency

0 likes · 8 min read

Mastering RocketMQ Retry: Producer & Consumer Strategies for Reliable Messaging

Lin is Dream

Jun 27, 2025 · Backend Development

How to Solve Common RocketMQ Issues: Duplicates, Throttling, Retries, and Loss

This article examines frequent RocketMQ problems such as duplicate sending, flow‑control throttling, message retries, duplicate consumption, backlog, and loss, and provides practical configuration tweaks, scaling strategies, batch sending, idempotent handling, and retry mechanisms to ensure reliable message delivery.

Distributed SystemsJavaMessage Queue

0 likes · 9 min read

How to Solve Common RocketMQ Issues: Duplicates, Throttling, Retries, and Loss

Lin is Dream

Jun 26, 2025 · Backend Development

Unveiling RocketMQ: How Messages Journey Through Storage, Delivery, and Expiration

This article systematically breaks down RocketMQ's core mechanisms—covering message roles, disk storage, push/pull delivery, expiration handling, retry queues, and cluster failover—so developers can understand every stage a message undergoes from creation to cleanup and ensure reliable, high‑performance messaging.

Distributed SystemsJavaMessage Queue

0 likes · 13 min read

Unveiling RocketMQ: How Messages Journey Through Storage, Delivery, and Expiration

Lin is Dream

Jun 25, 2025 · Backend Development

12 Essential RocketMQ Best Practices for Reliable Messaging

This article presents a comprehensive set of RocketMQ usage guidelines—including topic and tag conventions, producer and consumer group naming, key handling, logging, retry mechanisms, and cluster deployment recommendations—to help engineers build stable, high‑performance, and observable messaging systems in production environments.

Distributed SystemsMessage QueueRocketMQ

0 likes · 9 min read

12 Essential RocketMQ Best Practices for Reliable Messaging

TAL Education Technology

Jun 23, 2025 · Operations

How Chaos Engineering Boosts System Resilience: A Practical Guide

This article explains what Chaos Engineering is, why it matters for modern distributed systems, outlines a step‑by‑step approach to designing and running effective chaos experiments, describes platform features, and shares a real‑world case study of a pre‑launch blind test.

Distributed SystemsReliabilityResilience Testing

0 likes · 9 min read

How Chaos Engineering Boosts System Resilience: A Practical Guide

Lobster Programming

Jun 23, 2025 · Backend Development

How RocketMQ’s CommitLog Powers Million‑Level Concurrency

This article explains how RocketMQ’s CommitLog architecture—sequential writes, mmap zero‑copy, PageCache acceleration, fixed‑size log files, flexible flushing strategies, and efficient ConsumeQueue indexing—enables the system to sustain million‑level QPS with high reliability and low latency.

CommitLogDistributed SystemsPageCache

0 likes · 6 min read

How RocketMQ’s CommitLog Powers Million‑Level Concurrency

dbaplus Community

Jun 22, 2025 · Backend Development

Why UUIDv7 Is the New Go-To Primary Key for Distributed Databases

The article explains the drawbacks of traditional random UUIDs as primary keys, introduces the time‑ordered design of UUIDv7, compares it with earlier versions, and provides practical Java code and SQL examples for generating and using UUIDv7 in databases.

Distributed SystemsSQLUUIDv7

0 likes · 8 min read

Why UUIDv7 Is the New Go-To Primary Key for Distributed Databases

Xiaokun's Architecture Exploration Notes

Jun 22, 2025 · Backend Development

How Leader‑Based Replication Shapes High‑Availability Storage Systems

This article explains the principles of leader‑based and multi‑leader replication, compares synchronous and asynchronous modes, discusses consistency trade‑offs, conflict resolution strategies, and architectural variations for single‑ and multi‑data‑center deployments.

ConsistencyDistributed Systemsdata replication

0 likes · 13 min read

How Leader‑Based Replication Shapes High‑Availability Storage Systems

Architect's Guide

Jun 22, 2025 · Backend Development

How to Build a Scalable Delayed Queue with Redis and Java

This article explains why traditional polling fails for large‑scale delayed tasks, compares built‑in Java, RocketMQ, and RabbitMQ delay queues, and provides a detailed Redis‑based design with architecture diagrams, message structures, and a 2.0 version that uses real‑time locking for low‑latency delivery.

Distributed SystemsJavadelayed queue

0 likes · 8 min read

How to Build a Scalable Delayed Queue with Redis and Java

Cognitive Technology Team

Jun 21, 2025 · Fundamentals

Understanding Faults, Failures, and Fault Tolerance in Distributed Systems

This tutorial explains the definitions of faults and failures in distributed systems, explores their types and root causes, and presents fault‑tolerance mechanisms such as replication, checkpointing, redundancy, error detection, load balancing, and consensus algorithms to build resilient architectures.

Distributed Systemsconsensus algorithmsdata replication

0 likes · 10 min read

Understanding Faults, Failures, and Fault Tolerance in Distributed Systems

Mike Chen's Internet Architecture

Jun 20, 2025 · Backend Development

Understanding Dubbo’s Load‑Balancing Strategies: From Random to Consistent Hash

This article introduces Dubbo, Alibaba’s high‑performance Java RPC framework, and provides a detailed examination of its client‑side load‑balancing mechanisms, covering the default Random strategy and the alternatives RoundRobin, LeastActive, and ConsistentHash, along with their principles, advantages, and drawbacks.

Backend DevelopmentDistributed SystemsDubbo

0 likes · 5 min read

Understanding Dubbo’s Load‑Balancing Strategies: From Random to Consistent Hash

DeWu Technology

Jun 18, 2025 · Backend Development

Solving Distributed Transaction Challenges with a Supply‑Chain Consistency Framework

This article explores a supply‑chain consistency framework that tackles distributed transaction challenges by using eventual consistency, detailing its theoretical models, architecture, core components, async execution, retry mechanisms, and a practical code example for Spring‑Boot applications.

Distributed SystemsSpring Bootannotation

0 likes · 10 min read

Solving Distributed Transaction Challenges with a Supply‑Chain Consistency Framework

AsiaInfo Technology: New Tech Exploration

Jun 16, 2025 · Artificial Intelligence

How LangGraph Implements Shared Memory for Multi‑Agent Systems: Techniques, Tools, and Future Directions

This article examines the theory and practice of shared memory in multi‑agent systems, tracing its evolution from classic blackboard models to modern solutions like Mem0.ai, Open Memory, and A‑MEM, and provides concrete design patterns, integration strategies, and future research directions for LangGraph users.

AI memoryDistributed SystemsLLM

0 likes · 37 min read

How LangGraph Implements Shared Memory for Multi‑Agent Systems: Techniques, Tools, and Future Directions

Pan Zhi's Tech Notes

Jun 16, 2025 · Backend Development

How RocketMQ Guarantees No Message Loss, Duplication, or Disorder

This article explains RocketMQ’s architecture, the roles of NameServer, Broker, Producer, Consumer, and how each component ensures reliable message delivery—covering synchronous, asynchronous, and one‑way sending, storage mechanisms, consumer retries, dead‑letter queues, installation steps, and Java client integration with code examples.

Distributed SystemsInstallationJava

0 likes · 20 min read

How RocketMQ Guarantees No Message Loss, Duplication, or Disorder

Java Tech Enthusiast

Jun 15, 2025 · Backend Development

Understanding RocketMQ Name Server: Routing, Registration, and Heartbeat Explained

This article revisits RocketMQ's Name Server, detailing its core components, how it registers brokers, maintains routing information, handles client queries, and uses heartbeat mechanisms to ensure high availability and dynamic scaling in distributed messaging systems.

Backend DevelopmentDistributed SystemsMessage Queue

0 likes · 11 min read

Understanding RocketMQ Name Server: Routing, Registration, and Heartbeat Explained

Xuanwu Backend Tech Stack

Jun 15, 2025 · Backend Development

Understanding Zookeeper’s One‑Time Watch and Persistent Listener Techniques

This article explains why Zookeeper's watch mechanism triggers only once, outlines the performance, reliability, and design reasons behind it, describes its asynchronous eventual consistency, and provides Java code examples for basic watches, manual re‑registration, and using the Curator framework for persistent listeners.

Distributed SystemsJavaOne-time Trigger

0 likes · 7 min read

Understanding Zookeeper’s One‑Time Watch and Persistent Listener Techniques

macrozheng

Jun 13, 2025 · Backend Development

How to Build a Real‑Time Chat with Spring Boot WebSocket: Step‑by‑Step Guide

This article explains how to integrate WebSocket into a Spring Boot project to create a lightweight instant‑messaging system, covering dependency setup, configuration classes, core server implementation, required modules, common deployment issues, and practical solutions with complete code examples.

Backend DevelopmentDistributed SystemsInstant Messaging

0 likes · 14 min read

How to Build a Real‑Time Chat with Spring Boot WebSocket: Step‑by‑Step Guide

AI Large Model Application Practice

Jun 3, 2025 · Backend Development

Scaling Human‑in‑the‑Loop Agents to Distributed Environments with Robust Fault Recovery

This article explains how to extend a single‑process Human‑in‑the‑Loop (HITL) agent to a distributed, multi‑user API service using FastAPI, detailing session management, interrupt handling, client and server fault‑recovery strategies, and providing concrete code snippets and architectural diagrams.

Distributed SystemsHuman-in-the-LoopLangGraph

0 likes · 16 min read

Scaling Human‑in‑the‑Loop Agents to Distributed Environments with Robust Fault Recovery

dbaplus Community

May 25, 2025 · Databases

How to Generate Short, Sequential Numeric IDs Without Snowflake Overhead

To replace long UUIDs with short, sequential numeric account IDs, the article explores the limitations of Snowflake’s 64‑bit IDs, evaluates MySQL auto‑increment and REPLACE INTO approaches, identifies deadlock issues, and ultimately proposes a segmented free‑ID table with batch allocation to achieve compact, ordered IDs.

Distributed SystemsID generationauto_increment

0 likes · 14 min read

How to Generate Short, Sequential Numeric IDs Without Snowflake Overhead

Xiaokun's Architecture Exploration Notes

May 25, 2025 · Fundamentals

How Consensus, CAP, and BASE Shape High‑Availability Architecture

This article explains the role of consensus algorithms in achieving high‑availability through redundancy and automatic failover, clarifies distributed consistency, explores the CAP theorem and its C component, and introduces the BASE theory as a practical complement for eventual consistency in modern distributed systems.

BASE theoryCAP theoremConsensus

0 likes · 10 min read

Architect

May 21, 2025 · Databases

Designing Short Numeric ID Generation Using MySQL Auto‑Increment and Segment Allocation

The article examines the challenges of generating short, user‑friendly numeric account IDs, evaluates Snowflake and MySQL auto‑increment approaches, discusses deadlock issues with REPLACE INTO, and presents a final segment‑based solution that allocates ID blocks per login server while avoiding waste and concurrency problems.

Database designDistributed Systemsauto_increment

0 likes · 12 min read

Designing Short Numeric ID Generation Using MySQL Auto‑Increment and Segment Allocation

FunTester

May 19, 2025 · Operations

Chaos Engineering Tools, Theory, and Practices

Chaos engineering, a scientific method for improving system resilience, is explored through an overview of leading tools such as Gremlin, ChaosBlade, Chaos Mesh, Chaos Toolkit, and ChaosMeta, alongside core concepts, real-world case studies, common misconceptions, and the practical value of controlled fault injection in distributed systems.

Distributed SystemsFault InjectionReliability

0 likes · 12 min read

Chaos Engineering Tools, Theory, and Practices

Xiaokun's Architecture Exploration Notes

May 18, 2025 · Fundamentals

How Distributed Consensus Overcomes the FLP Impossibility Theorem

This article explores how to build fault‑tolerant distributed systems by formalizing consensus, outlines its core properties, explains the FLP impossibility theorem, and shows how algorithms like Raft sidestep its limits through timing constraints and recovery mechanisms.

ConsensusDistributed SystemsFLP theorem

0 likes · 8 min read

How Distributed Consensus Overcomes the FLP Impossibility Theorem

FunTester

May 16, 2025 · Operations

Chaos Engineering: Evolution, Workflow, Advantages, and Practice Principles

Chaos engineering is a discipline that deliberately injects faults into distributed systems to test and improve resilience, tracing its evolution from Netflix's Chaos Monkey to modern platforms, outlining its operational workflow, benefits, and core principles for reliable system design.

Distributed SystemsFault InjectionOperations

0 likes · 9 min read

Chaos Engineering: Evolution, Workflow, Advantages, and Practice Principles

Top Architecture Tech Stack

May 15, 2025 · Backend Development

Understanding Cookie + Session Mechanism and Distributed Session Sharing Solutions

This article explains the Cookie + Session mechanism for maintaining user state, discusses its limitations such as size, performance and security, examines challenges in distributed environments, and reviews common solutions including session replication, sticky load balancing, centralized storage, and the use of ThreadLocal for small‑scale backend applications.

CookieDistributed SystemsSession

0 likes · 17 min read

Understanding Cookie + Session Mechanism and Distributed Session Sharing Solutions

FunTester

May 15, 2025 · Operations

Uncovering the Eight Hidden Pitfalls That Can Crash Your Distributed System

This article dissects the classic Eight Fallacies of Distributed Computing, explaining each mistaken assumption about network reliability, latency, bandwidth, security, topology, administration, cost, and homogeneity, and provides real‑world case studies and practical recommendations to help engineers design more resilient distributed systems.

Distributed SystemsFallaciesLatency

0 likes · 16 min read

Uncovering the Eight Hidden Pitfalls That Can Crash Your Distributed System

Infra Learning Club

May 15, 2025 · R&D Management

How This Pioneer Programmer Coded Until 60: The Key Practices Behind His Longevity

The article examines the 60‑year coding career of OceanBase founder Yang Zhenkun, outlining five concrete strategies—deep technical focus, embracing change, building soft‑skill influence, maintaining health, and proactive career planning—that enable programmers to sustain relevance and vitality in a fast‑moving industry.

Distributed SystemsSoftware Engineeringcareer longevity

0 likes · 7 min read

How This Pioneer Programmer Coded Until 60: The Key Practices Behind His Longevity

Alibaba Cloud Infrastructure

May 14, 2025 · Artificial Intelligence

How Mooncake’s KVCache Boosts Large‑Model Inference Efficiency and Cost

Mooncake, an open‑source large‑model inference platform, introduces a KVCache‑centric architecture that dramatically improves throughput, reduces latency and cuts inference costs by up to 20%, while integrating with frameworks like SGLang and vLLM and leveraging Alibaba Cloud’s eRDMA and GPUDirect technologies for scalable, high‑performance deployments.

AI PerformanceAlibaba CloudDistributed Systems

0 likes · 7 min read

How Mooncake’s KVCache Boosts Large‑Model Inference Efficiency and Cost

Xiaokun's Architecture Exploration Notes

May 11, 2025 · Fundamentals

How Fencing Tokens Ensure Safety and Liveness in Distributed Lock Services

This article explores how fencing tokens can provide safety and liveness guarantees in distributed lock services, illustrating fault scenarios, token-based conflict resolution, and abstract system models that help engineers prioritize correctness while tolerating temporary unavailability.

Distributed Systemsfault tolerancefencing tokens

0 likes · 8 min read

How Fencing Tokens Ensure Safety and Liveness in Distributed Lock Services

Xiaokun's Architecture Exploration Notes

May 11, 2025 · Fundamentals

Why Unreliable Networks Threaten Distributed Systems and How to Mitigate Them

Distributed systems suffer from network unreliability—including packet loss, out‑of‑order delivery, variable latency, and ambiguous node failures—making timeout settings and fault detection challenging, and this article explains these issues, compares synchronous and asynchronous networks, and discusses strategies to balance latency and resource utilization.

Distributed SystemsNetwork Reliabilityasynchronous network

0 likes · 8 min read

Why Unreliable Networks Threaten Distributed Systems and How to Mitigate Them

Xiaokun's Architecture Exploration Notes

May 11, 2025 · Fundamentals

Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

This article examines the unreliability of physical clocks in distributed systems, compares synchronous and asynchronous network timing, explains the roles of wall and monotonic clocks, and explores logical clocks, snapshot isolation, and practical solutions such as Google Spanner's TrueTime to ensure data consistency.

Data ConsistencyDistributed SystemsLogical Clock

0 likes · 11 min read

Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

Code Ape Tech Column

May 9, 2025 · Databases

Efficient Strategies for Importing One Billion Records into MySQL

This article explains how to import 1 billion 1 KB log records stored in HDFS or S3 into MySQL by analyzing single‑table limits, using batch inserts, choosing storage engines, sharding, optimizing file‑reading methods, and coordinating distributed tasks with Redis, Redisson, and Zookeeper to ensure ordered, reliable, and high‑throughput data loading.

Batch InsertDistributed SystemsKafka

0 likes · 19 min read

Efficient Strategies for Importing One Billion Records into MySQL

Su San Talks Tech

May 7, 2025 · Backend Development

6 Scalable Leaderboard Solutions: From DB Sorting to Real‑Time Stream Processing

This article examines six different leaderboard implementation strategies—from simple database sorting and cache‑plus‑scheduled tasks to Redis sorted sets, sharded Redis clusters, pre‑computed layered caches, and real‑time stream processing with Flink—detailing their suitable scenarios, advantages, disadvantages, and architectural diagrams to help engineers choose the most appropriate solution.

Distributed SystemsReal-Timecaching

0 likes · 7 min read

6 Scalable Leaderboard Solutions: From DB Sorting to Real‑Time Stream Processing

Lin is Dream

May 5, 2025 · Backend Development

Mastering MDC with Logback: Traceable Logging for Distributed Systems

This article explains how to use SLF4J's MDC with Logback to assign a unique trace ID to each request, propagate it across threads and services, and configure log patterns so that logs become fully traceable for easier debugging in distributed systems.

Distributed SystemsJavaThreadLocal

0 likes · 7 min read

Mastering MDC with Logback: Traceable Logging for Distributed Systems

dbaplus Community

May 5, 2025 · Fundamentals

Why Banks Are Replacing IBM Mainframes with Distributed Systems – A Deep Dive

The article explains how the Agricultural Bank of China successfully shut down its IBM mainframe, detailing the mainframe's high‑performance architecture, redundancy features, software ecosystem, and why its replacement with a distributed micro‑service core using TDSQL marks a significant shift for banking IT infrastructure.

Distributed SystemsIBMMainframe

0 likes · 9 min read

Why Banks Are Replacing IBM Mainframes with Distributed Systems – A Deep Dive

Xiaokun's Architecture Exploration Notes

May 4, 2025 · Fundamentals

Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

This article examines how unreliable physical clocks—both wall and monotonic—affect distributed systems, compares synchronous and asynchronous network timing, illustrates conflicts caused by timestamp drift, and presents logical clocks and Google’s TrueTime as robust solutions for achieving consistent ordering and data reliability.

Distributed SystemsLogical ClockTrueTime

0 likes · 11 min read

Architect

May 3, 2025 · Backend Development

Why Rebuild a Job Scheduler? Inside a Lightweight Distributed Timing Framework

This article explains the motivation, design choices, and implementation details of a custom distributed job scheduling framework, covering its architecture, load‑balancing strategy, message‑queue handling, persistence mechanisms, and key code snippets, while comparing it to existing solutions like Quartz, XXL‑Job, and PowerJob.

Distributed SystemsJavaMessage Queue

0 likes · 16 min read

Why Rebuild a Job Scheduler? Inside a Lightweight Distributed Timing Framework

macrozheng

Apr 30, 2025 · Fundamentals

Key Questions for a Basic Infrastructure Interview: TCP, Redis, Kafka, CAP & More

This article compiles essential interview questions covering TCP connection termination, multi‑port listening, page load workflow, Redis data structures, Kafka consumer sizing and at‑most‑once semantics, the CAP theorem, Singleton usage, C++ map complexity, and a doubly linked list reversal algorithm, providing concise explanations and code examples.

AlgorithmsBackend DevelopmentDistributed Systems

0 likes · 14 min read

Key Questions for a Basic Infrastructure Interview: TCP, Redis, Kafka, CAP & More

21CTO

Apr 29, 2025 · Backend Development

Why Microservices Might Be the Right Architecture for Your Organization

Microservices are independently deployable services modeled around business domains, offering benefits like smaller deployments, reduced risk, faster release cycles, and clear data ownership, while also introducing challenges such as distributed system complexity, operational overhead, and data consistency, requiring careful design of communication and scaling strategies.

Backend ArchitectureDeploymentDistributed Systems

0 likes · 12 min read

Why Microservices Might Be the Right Architecture for Your Organization

Xiaolei Talks DB

Apr 28, 2025 · Databases

How China's DBA Landscape Is Evolving with Domestic Databases and AI

The article examines China's rapid shift toward domestic databases across finance, government, and energy sectors, highlighting how DBAs must upgrade from reactive fire‑fighting to strategic architects by mastering distributed systems, AI‑driven automation, cloud‑native tools, and open‑source community collaboration.

AIDBADistributed Systems

0 likes · 12 min read

How China's DBA Landscape Is Evolving with Domestic Databases and AI

Lobster Programming

Apr 28, 2025 · Backend Development

How RocketMQ Transactional Messages Ensure Distributed Data Consistency

This article explains RocketMQ's transactional message mechanism, covering half‑message storage, three transaction states, status‑check procedures, key APIs, storage reliability, and the two‑phase commit process that guarantees eventual consistency in distributed systems.

Data ConsistencyDistributed SystemsMessage Queue

0 likes · 6 min read

How RocketMQ Transactional Messages Ensure Distributed Data Consistency

IT Services Circle

Apr 28, 2025 · Fundamentals

Agricultural Bank’s Mainframe Shutdown and Migration to a Distributed Core System: Technical Overview and Industry Implications

The article examines the Agricultural Bank of China's successful shutdown of its IBM mainframe, detailing the z14's specifications, redundancy and virtualization features, the shift to a high‑concurrency distributed micro‑service architecture with TDSQL, and the broader impact on banking and IBM’s presence in China.

BankingDistributed SystemsIBM

0 likes · 9 min read

Agricultural Bank’s Mainframe Shutdown and Migration to a Distributed Core System: Technical Overview and Industry Implications

Architect

Apr 24, 2025 · Backend Development

Beyond the Hype: What Microservices Really Offer (And What They Don’t)

This article critically examines the popular claims surrounding microservices, tracing their historical roots, debunking each touted benefit, exposing distributed‑computing fallacies, and highlighting the real organizational challenges, ultimately concluding that microservices are simply modular components rather than a revolutionary architecture.

Backend DevelopmentDistributed SystemsIndustry analysis

0 likes · 15 min read

Beyond the Hype: What Microservices Really Offer (And What They Don’t)

Tencent Cloud Middleware

Apr 24, 2025 · Backend Development

How TDMQ RocketMQ Implements Distributed Rate Limiting for High‑Throughput Messaging

This article explains TDMQ RocketMQ's distributed rate‑limiting mechanism, covering conversion rules, fast‑fail behavior, token‑based implementation, counting periods, client best practices, elastic TPS options, code examples for different SDK versions, monitoring tips, and answers to common throttling questions.

BackendDistributed SystemsMessage Queue

0 likes · 15 min read

How TDMQ RocketMQ Implements Distributed Rate Limiting for High‑Throughput Messaging

Tencent Cloud Developer

Apr 23, 2025 · Cloud Native

Microservices Architecture: Principles, Modeling, Integration, and Scaling

Microservices are small, autonomous services that replace monolithic codebases by emphasizing loose coupling, high cohesion, bounded contexts, technology-agnostic integration via REST, RPC, or events, disciplined code governance, semantic versioning, local transactions with eventual consistency, and robust scaling patterns such as timeouts, circuit breakers, and auto-scaling, while reflecting organizational structure and avoiding premature complexity.

Distributed Systemsarchitecturescaling

0 likes · 19 min read

Microservices Architecture: Principles, Modeling, Integration, and Scaling

JD Retail Technology

Apr 22, 2025 · Artificial Intelligence

Generative Large‑Model Architecture for JD Advertising: Practices, Challenges, and Optimization

JD’s advertising platform replaces rule‑based recall with a generative large‑model pipeline that unifies e‑commerce knowledge, multimodal user intent, and semantic IDs across recall, coarse‑ranking, fine‑ranking and creative optimization, while meeting sub‑100 ms latency and sub‑¥1‑per‑million‑token cost through quantization, parallelism, caching, and joint generative‑discriminative inference, delivering double‑digit performance gains and paving the way for domain‑specific foundation models.

AdvertisingDistributed SystemsInference Optimization

0 likes · 20 min read

Generative Large‑Model Architecture for JD Advertising: Practices, Challenges, and Optimization

Xiaokun's Architecture Exploration Notes

Apr 20, 2025 · Fundamentals

Why Unreliable Networks Threaten Distributed Systems—and How to Mitigate Them

The article explains how network failures such as packet loss, reordering, latency, and ambiguous node failures make distributed systems unreliable, compares synchronous and asynchronous networks, and discusses the trade‑off between timeout settings and resource utilization.

Distributed SystemsLatencyNetwork Reliability

0 likes · 8 min read

Java Backend Full-Stack

Apr 20, 2025 · Interview Experience

Common Interview Questions from Over 10 Companies – What You’ll Face

A friend who landed a 16K job in a first‑tier city shares more than 30 interview questions he encountered across ten+ companies, covering JVM internals, performance tuning, concurrency, design patterns, database locking, distributed transactions, and system reliability.

Design PatternsDistributed SystemsJVM

0 likes · 4 min read

Common Interview Questions from Over 10 Companies – What You’ll Face

JD Tech

Apr 17, 2025 · Operations

Chaos Engineering: Principles, Core Steps, Tool Selection, and AI Integration

This article explains chaos engineering—its definition, core principles, experimental workflow, tool selection, AI‑driven enhancements, and practical case studies—providing a comprehensive guide for building resilient distributed systems across backend, cloud‑native, mobile, and AI‑enabled environments.

AI integrationDistributed SystemsFault Injection

0 likes · 26 min read

Chaos Engineering: Principles, Core Steps, Tool Selection, and AI Integration

Lobster Programming

Apr 17, 2025 · Backend Development

How Local Message Tables Solve Distributed Transaction Challenges

Using a local message table, developers can break down distributed transactions into local database operations and asynchronous MQ messages, ensuring eventual consistency, simplifying implementation, and handling retries, while balancing advantages like simplicity and compatibility against drawbacks such as added maintenance and potential queue dependencies.

Backend ArchitectureDistributed SystemsLocal Message Table

0 likes · 5 min read

How Local Message Tables Solve Distributed Transaction Challenges

Network Intelligence Research Center (NIRC)

Apr 16, 2025 · Industry Insights

Our EuroSys'25 Experience: Presenting Atlas and Exploring Cutting‑Edge System Research

The article recounts the authors' participation in EuroSys'25 in Rotterdam, detailing the conference schedule, their presentation of the Atlas network verification paper, technical insights into distributed verification, interactions with peers, and memorable social and cultural experiences during the five‑day event.

AtlasDistributed SystemsEuroSys

0 likes · 7 min read

Our EuroSys'25 Experience: Presenting Atlas and Exploring Cutting‑Edge System Research

Cognitive Technology Team

Apr 14, 2025 · Backend Development

Understanding Ordered Messages in RocketMQ: Global and Partitioned Ordering

The article explains how RocketMQ ensures strict message ordering through global FIFO queues and partitioned ordering, covering use cases, key implementation techniques on the producer, broker, and consumer sides, as well as lock mechanisms, retry strategies, and fault‑tolerance design.

BackendDistributed SystemsOrdered Messages

0 likes · 6 min read

Understanding Ordered Messages in RocketMQ: Global and Partitioned Ordering

JD Cloud Developers

Apr 14, 2025 · Backend Development

How to Quickly Fix RPC Timeout Data Inconsistency with a Lightweight Mock/Spy Tool

This article explores the challenges of data consistency in RPC timeout scenarios, especially when idempotency fails, and introduces a lightweight mock/spy tool that can dynamically configure mock responses for any method call, helping restore consistency without full transaction support.

BackendData ConsistencyDistributed Systems

0 likes · 16 min read

How to Quickly Fix RPC Timeout Data Inconsistency with a Lightweight Mock/Spy Tool

Cognitive Technology Team

Apr 13, 2025 · Backend Development

Understanding RocketMQ Master‑Slave Architecture and High‑Availability Mechanisms

This article explains how RocketMQ achieves high availability and data reliability through its master‑slave broker design, covering synchronous and asynchronous replication, flush strategies, transaction messaging, automatic failover with Dledger, and read‑write separation for load balancing in distributed systems.

Distributed SystemsMaster‑SlaveRocketMQ

0 likes · 7 min read

Understanding RocketMQ Master‑Slave Architecture and High‑Availability Mechanisms

FunTester

Apr 12, 2025 · Operations

How to Design Effective Fault‑Testing Cases for Resilient Distributed Systems

This article explains why fault testing is essential for modern distributed and cloud environments, outlines core goals, design principles, common fault categories, practical implementation strategies such as chaos engineering and gray releases, and shows how to analyze results to continuously improve system reliability.

Distributed Systemschaos engineeringfault testing

0 likes · 18 min read

How to Design Effective Fault‑Testing Cases for Resilient Distributed Systems

Java Tech Enthusiast

Apr 11, 2025 · Backend Development

Ensuring Message Processing Once in High-Concurrency Scenarios

The article explains how to guarantee that messages are processed only once in high‑concurrency environments by combining production‑side idempotent publishing, broker‑level deduplication with unique IDs, and consumption‑side business idempotency such as database constraints or distributed locks, while also recommending monitoring, metrics, and reconciliation as safety nets.

Distributed SystemsIdempotencyRocketMQ

0 likes · 6 min read

Ensuring Message Processing Once in High-Concurrency Scenarios

Architect's Must-Have

Apr 11, 2025 · Fundamentals

Mastering Distributed Transactions: 2PC, TCC, and Message Queue Solutions

This article explains the fundamentals of distributed transactions, covering ACID properties, the CAP theorem, two‑phase commit, TCC compensation, and a message‑queue based eventual consistency approach, while highlighting their advantages, drawbacks, and practical application scenarios.

2PCCAP theoremDistributed Systems

0 likes · 17 min read

Mastering Distributed Transactions: 2PC, TCC, and Message Queue Solutions

Java Captain

Apr 10, 2025 · Backend Development

Design and Implementation of Delayed Task Processing for Order Systems

This article explains various approaches to delayed task handling—such as database polling, JDK DelayQueue, Redis expiration listeners, Redisson delay queues, RocketMQ delayed messages, and RabbitMQ dead‑letter queues—evaluating their advantages, drawbacks, and best‑practice recommendations for reliable order‑expiration workflows.

Distributed SystemsMessage Queuedelayed tasks

0 likes · 17 min read

Design and Implementation of Delayed Task Processing for Order Systems

Sanyou's Java Diary

Apr 10, 2025 · Backend Development

Why RocketMQ Beats Kafka: Architecture Simplified and Features Amplified

This article explains how RocketMQ, a Chinese‑origin message queue, simplifies Kafka’s architecture while adding powerful features such as tag‑based filtering, transactional messaging, delayed and dead‑letter queues, and a unified commit‑log storage model, making delayed processing and high‑throughput scenarios easier to implement.

Distributed SystemsKafkaMessage Queue

0 likes · 10 min read

Why RocketMQ Beats Kafka: Architecture Simplified and Features Amplified

Xuanwu Backend Tech Stack

Apr 10, 2025 · Backend Development

Master RabbitMQ: Core Components and Architecture Explained

This article provides a comprehensive overview of RabbitMQ, an open-source AMQP-based message broker, detailing its core components—producers, exchanges, queues, consumers, and broker—along with auxiliary elements like bindings, connections, channels, virtual hosts, and key architectural features such as decoupling, flexible routing, reliability, and scalability.

AMQPBackend DevelopmentDistributed Systems

0 likes · 7 min read

Master RabbitMQ: Core Components and Architecture Explained

IT Services Circle

Apr 9, 2025 · Backend Development

Practical Guide to Rate Limiting: Algorithms, Implementation, and Production Cases

This article explains the fundamentals and practical implementations of common rate‑limiting algorithms—including fixed‑window, sliding‑window, leaky‑bucket, and token‑bucket—provides Java and Redis code samples, discusses their advantages, pitfalls, and real‑world production scenarios, and offers performance‑tuning tips.

Distributed SystemsJavabackend algorithms

0 likes · 10 min read

Practical Guide to Rate Limiting: Algorithms, Implementation, and Production Cases

Su San Talks Tech

Apr 8, 2025 · Backend Development

Mastering Rate Limiting: Practical Algorithms and Real‑World Cases

This article explains why rate limiting is essential for high‑traffic services, compares four classic algorithms (fixed‑window, sliding‑window, leaky‑bucket, token‑bucket), provides Java and Redis implementations, shares production case studies, highlights common pitfalls, and offers performance‑tuning tips for robust backend systems.

BackendDistributed Systemsrate limiting

0 likes · 11 min read

Mastering Rate Limiting: Practical Algorithms and Real‑World Cases

AntData

Apr 3, 2025 · Artificial Intelligence

Ray Flow Insight: Visualizing and Debugging Distributed AI Applications

Ray Flow Insight is an Ant Group open‑source tool that visualizes Ray's distributed programming primitives—Actors, Tasks, and Objects—to turn complex reinforcement‑learning systems from opaque "black boxes" into transparent, debuggable workflows, providing logical, physical, distributed stack, and flame‑graph views for performance analysis and optimization.

AIDebuggingDistributed Systems

0 likes · 32 min read

Ray Flow Insight: Visualizing and Debugging Distributed AI Applications