Tagged articles
2122 articles
Page 2 of 22
High Availability Architecture
High Availability Architecture
Aug 28, 2025 · Fundamentals

5 Architecture Elements, 15 Design Principles & 6 Common Pitfalls

This article explains the essential components of software architecture—elements, structure, and connections—while presenting fifteen universal design principles, practical guidelines for monolithic, distributed, and microservice systems, and six common architectural mistakes to avoid, helping teams build scalable, reliable, and maintainable solutions.

Distributed SystemsMicroservicesScalability
0 likes · 21 min read
5 Architecture Elements, 15 Design Principles & 6 Common Pitfalls
Tencent Cloud Developer
Tencent Cloud Developer
Aug 27, 2025 · Fundamentals

Mastering Software Architecture: 15 Universal Principles, Common Pitfalls, and Evolution from Monolith to Microservices

This article explains the core concept of software architecture as elements, structure, and connections, distinguishes systems, subsystems, modules, components and frameworks, compares architecture classifications, describes the evolution from monolithic to distributed and microservice designs, presents fifteen practical design principles, and warns about six typical architectural pitfalls.

Distributed SystemsMicroservicesSoftware Architecture
0 likes · 23 min read
Mastering Software Architecture: 15 Universal Principles, Common Pitfalls, and Evolution from Monolith to Microservices
Architect's Guide
Architect's Guide
Aug 25, 2025 · Fundamentals

19 Essential Distributed System Design Patterns You Must Know

This article explores nineteen core design patterns for distributed systems—including Bloom filters, consistent hashing, quorum, leader‑follower, heartbeat, fencing, WAL, segmented logs, high‑water mark, leases, gossip, Phi accrual detection, split‑brain handling, checksums, CAP and PACELC theorems, hinted handoff, read repair, and Merkle trees—explaining their purpose, operation, and typical use cases.

ConsistencyDistributed Systemsfault tolerance
0 likes · 14 min read
19 Essential Distributed System Design Patterns You Must Know
Tech Freedom Circle
Tech Freedom Circle
Aug 24, 2025 · Operations

How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem

A leading financial platform suffered a six‑hour outage and $170 million loss when its Nacos service‑registry cluster entered a split‑brain state due to network partition, exposing flaws in AP‑mode deployment, monitoring gaps, and cascading failures that were later resolved through Raft migration, multi‑active architecture, and client‑side resilience.

Distributed SystemsMicroservicesNacos
0 likes · 32 min read
How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem
Big Data Technology Tribe
Big Data Technology Tribe
Aug 22, 2025 · Backend Development

How StarRocks Keeps Metadata Consistent Across FE Nodes

This article explains the roles of StarRocks FE and BE nodes, details the metadata stored in FE, describes the leader‑follower‑observer architecture, and shows how BDB JE replication, journal logs, and checkpoint mechanisms ensure metadata synchronization and durability even after node failures.

BDB JEDistributed SystemsReplication
0 likes · 17 min read
How StarRocks Keeps Metadata Consistent Across FE Nodes
Wukong Talks Architecture
Wukong Talks Architecture
Aug 21, 2025 · Operations

Why LinkedIn Dropped Kafka for Northguard – A Deep Dive into Its Architecture

LinkedIn, the creator of Kafka, has largely abandoned Kafka in favor of a new log storage system called Northguard, whose design mirrors Apache Pulsar with features like storage‑compute separation, log striping, and a multi‑layer data model, offering superior scalability, operability, consistency, and durability for massive data streams.

Apache PulsarDistributed SystemsLinkedIn
0 likes · 22 min read
Why LinkedIn Dropped Kafka for Northguard – A Deep Dive into Its Architecture
Open Source Tech Hub
Open Source Tech Hub
Aug 21, 2025 · Backend Development

Build a Scalable Distributed Captcha Login with PHP Webman and Redis

This guide explains how to replace traditional session‑based captcha authentication with a Redis‑backed, token‑driven solution using the high‑performance PHP Webman framework and the tinywan/captcha plugin, covering architecture, generation and verification flows, installation steps, and code examples.

CaptchaDistributed SystemsPHP
0 likes · 8 min read
Build a Scalable Distributed Captcha Login with PHP Webman and Redis
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Aug 17, 2025 · Backend Development

Why Multi-Leader Replication Causes Conflicts and How to Tackle Them

This article examines how single‑value objects encounter data‑value conflicts in multi‑leader replication architectures, explains the role of causal ordering versus concurrent writes, and outlines the need for global clocks, versioning, and broadcast mechanisms to resolve such conflicts.

Distributed SystemsVersion Vectorconflict resolution
0 likes · 7 min read
Why Multi-Leader Replication Causes Conflicts and How to Tackle Them
Sohu Tech Products
Sohu Tech Products
Aug 13, 2025 · Backend Development

How to Build a Strictly Incremental Distributed ID System with Redis, MySQL, and Nacos

This article examines the challenges of distributed ID generation, compares common solutions like UUID and Snowflake, and presents a custom approach that combines MySQL segment tables, Redis caching, and Nacos switches to achieve high‑performance, strictly incremental IDs with automatic failover.

Backend DevelopmentDistributed SystemsID generation
0 likes · 11 min read
How to Build a Strictly Incremental Distributed ID System with Redis, MySQL, and Nacos

How Single-Leader Replication Handles Write Conflicts: Strategies and Insights

This article examines write conflicts in single-leader replication, comparing exclusive and shared data, exploring uniqueness constraints, async replication delays, and various conflict‑resolution techniques such as unique indexes, bitmap mapping, LWW ordering, and multi‑version control for collaborative editing.

ConsistencyDistributed SystemsReplication
0 likes · 9 min read
How Single-Leader Replication Handles Write Conflicts: Strategies and Insights
Didi Tech
Didi Tech
Aug 7, 2025 · Cloud Native

How HUATUO Revolutionizes Cloud‑Native Observability with Zero‑Impact BPF Tracing

HUATUO, Didi's open‑source cloud‑native observability project, leverages BPF‑based low‑overhead kernel tracing, unified metric and event frameworks, automatic flame‑graph generation, and seamless integration with Prometheus, Grafana and Elasticsearch to provide panoramic, zero‑intrusive monitoring and continuous performance profiling for complex production environments.

BPFCloud NativeDistributed Systems
0 likes · 11 min read
How HUATUO Revolutionizes Cloud‑Native Observability with Zero‑Impact BPF Tracing
Zhuanzhuan Tech
Zhuanzhuan Tech
Aug 6, 2025 · Backend Development

Mastering Distributed Caching: Easy-Cache’s Multi‑Level Dynamic Upgrade and Consistency

This article introduces Easy-Cache, a Spring‑AOP based caching framework that eliminates repetitive cache code by offering annotation‑driven operations, multi‑level Redis and local caches, dynamic upgrade/downgrade, elastic expiration, and Lua‑script‑backed consistency mechanisms for high‑availability distributed systems.

Distributed SystemsLuaredis
0 likes · 18 min read
Mastering Distributed Caching: Easy-Cache’s Multi‑Level Dynamic Upgrade and Consistency
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Aug 3, 2025 · Fundamentals

Understanding Causal Consistency: Order Guarantees, Lamport Timestamps, and Total Order Broadcast

This article explains the challenges of implementing causal consistency, compares it with linear and sequential consistency, describes how order guarantees are enforced in leader‑based replication, introduces Lamport timestamps and total‑order broadcast, and outlines practical approaches for achieving causal consistency in distributed systems.

ConsistencyDistributed SystemsLamport timestamp
0 likes · 14 min read
Understanding Causal Consistency: Order Guarantees, Lamport Timestamps, and Total Order Broadcast
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Jul 27, 2025 · Fundamentals

Can Multi-Leader and Leaderless Replication Achieve Linear Consistency?

This article examines why multi‑leader and leaderless replication models struggle to provide linear consistency, explores write‑conflict handling, quorum‑based NWR mechanisms, sloppy quorum and hinted‑handoff techniques, and summarizes the trade‑offs involved in achieving strong consistency across distributed data centers.

Distributed SystemsMulti-LeaderReplication
0 likes · 15 min read
Can Multi-Leader and Leaderless Replication Achieve Linear Consistency?
Architect
Architect
Jul 24, 2025 · Backend Development

Mastering RPC: From Basics to Building Your Own Framework

This article explains what RPC is, why RPC frameworks are needed, the underlying principles and technologies such as dynamic proxies, serialization, NIO communication, service registration, governance and routing, and walks through a simple hand‑crafted RPC implementation with a comparison of popular frameworks.

Distributed SystemsRPC
0 likes · 18 min read
Mastering RPC: From Basics to Building Your Own Framework
Architect's Guide
Architect's Guide
Jul 24, 2025 · Backend Development

7 Proven Strategies to Prevent Overselling in High‑Concurrency Flash Sales (SpringBoot)

This article explores high‑concurrency flash‑sale scenarios, demonstrates why naïve @Transactional and lock usage can still cause overselling, and presents seven concrete implementations—including improved lock, AOP lock, two pessimistic‑lock variants, optimistic lock, a blocking queue, and a Disruptor queue—complete with SpringBoot code, JMeter testing results, and performance analysis.

Distributed SystemsSpringBootconcurrency
0 likes · 23 min read
7 Proven Strategies to Prevent Overselling in High‑Concurrency Flash Sales (SpringBoot)
DaTaobao Tech
DaTaobao Tech
Jul 23, 2025 · Artificial Intelligence

How Alibaba’s New Distributed Agent Framework Solves 2C AI Challenges

Alibaba introduces the ali‑langengine‑dflow framework, a hybrid distributed‑agent architecture that moves core intelligence to the cloud while keeping execution reachable on heterogeneous client devices, addressing data‑isolation, latency and security issues of existing cloud‑VM and local‑agent solutions for 2C internet services.

AIAgentDistributed Systems
0 likes · 21 min read
How Alibaba’s New Distributed Agent Framework Solves 2C AI Challenges
Su San Talks Tech
Su San Talks Tech
Jul 22, 2025 · Backend Development

10 Common Microservice Pitfalls and How to Avoid Them

This article shares ten frequent microservice problems—from improper service splitting and distributed transaction failures to configuration chaos, logging fragmentation, database sharing, API incompatibility, CI bottlenecks, missing monitoring, and team collaboration issues—offering concrete solutions, best‑practice principles, and code examples to help engineers build robust, maintainable microservice systems.

Backend ArchitectureDistributed SystemsMicroservices
0 likes · 11 min read
10 Common Microservice Pitfalls and How to Avoid Them

How to Achieve Linear Consistency in Single-Leader Replication: Challenges and Solutions

This article examines eventual and linear consistency in leader‑based replication, explains read‑your‑writes and shared‑data scenarios, discusses replication lag, failover trade‑offs, multi‑data‑center risks, and shows how consensus systems like ZooKeeper and etcd implement true linear consistency.

Distributed SystemsReplicationdatabases
0 likes · 13 min read
How to Achieve Linear Consistency in Single-Leader Replication: Challenges and Solutions
Kuaishou Tech
Kuaishou Tech
Jul 17, 2025 · Artificial Intelligence

How DHPS Boosted Online Inference Throughput by 270% with RDMA

This article details the design and evolution of DHPS, Kuaishou's load‑balanced, RDMA‑based high‑performance service architecture, explaining its network, storage, and traffic‑scheduling innovations that deliver over 270% query‑throughput improvement, lower latency, reduced CPU usage, and near‑five‑nine availability for large‑scale AI inference workloads.

Distributed SystemsRDMAStorage Engine
0 likes · 17 min read
How DHPS Boosted Online Inference Throughput by 270% with RDMA
Su San Talks Tech
Su San Talks Tech
Jul 17, 2025 · Big Data

How to De‑Duplicate 1 Billion QQ Numbers Using Under 1 GB of Memory

This article explores multiple techniques—including bitmap indexing, Bloom filters, external sorting, Spark, and layered bitmap structures—to efficiently remove duplicate QQ numbers from a dataset of up to one billion entries while keeping memory usage below a gigabyte and maintaining high accuracy.

BitmapDistributed SystemsSpark
0 likes · 12 min read
How to De‑Duplicate 1 Billion QQ Numbers Using Under 1 GB of Memory
Su San Talks Tech
Su San Talks Tech
Jul 13, 2025 · Backend Development

8 Proven Retry Strategies to Prevent Costly Failures in Distributed Systems

Discover why improper retry logic can cause massive financial losses, learn eight practical retry solutions—from simple loops to advanced Resilience4j and distributed lock techniques—and see how to avoid retry storms, ensure idempotency, and protect resources in high‑traffic backend services.

Distributed SystemsIdempotencyResilience
0 likes · 13 min read
8 Proven Retry Strategies to Prevent Costly Failures in Distributed Systems
IT Services Circle
IT Services Circle
Jul 11, 2025 · Backend Development

10 Essential System Design Trade‑offs Every Engineer Should Master

Understanding system design trade‑offs is crucial for building robust software; this article examines ten common compromises—from vertical vs. horizontal scaling and SQL vs. NoSQL to CAP theorem, consistency models, REST vs. GraphQL, stateful vs. stateless architectures, caching strategies, and synchronous vs. asynchronous processing—highlighting their benefits and drawbacks.

Backend ArchitectureDistributed SystemsScalability
0 likes · 10 min read
10 Essential System Design Trade‑offs Every Engineer Should Master
IT Architects Alliance
IT Architects Alliance
Jul 10, 2025 · Cloud Native

Inside Alibaba’s Tech Stack: Cloud‑Native Architecture Behind Billions of Transactions

This article examines Alibaba's extensive cloud‑native technology stack—including distributed computing, storage, middleware, real‑time data processing, AI platforms, performance engineering, and security—revealing how its architects design systems that handle massive transaction volumes during events like Double 11.

Big DataDistributed SystemsMicroservices
0 likes · 12 min read
Inside Alibaba’s Tech Stack: Cloud‑Native Architecture Behind Billions of Transactions
Big Data Technology Tribe
Big Data Technology Tribe
Jul 9, 2025 · Backend Development

Mastering Idempotency: Design Patterns & Best Practices for Reliable Distributed Systems

This comprehensive guide explains the concept of idempotency, why it is essential in distributed and micro‑service architectures, and provides practical patterns, code examples, and best‑practice recommendations for HTTP, databases, messaging, caching, and service‑mesh implementations.

BackendDesign PatternsDistributed Systems
0 likes · 21 min read
Mastering Idempotency: Design Patterns & Best Practices for Reliable Distributed Systems
IT Architects Alliance
IT Architects Alliance
Jul 8, 2025 · Cloud Native

Why Do Big‑Tech Architects Earn Six Figures? The Skills That Set Them Apart

The article explores why architects at leading tech firms command six‑figure salaries while those in traditional companies earn far less, highlighting gaps in technical depth, massive data handling, performance optimization, business insight, continuous learning, and the scarcity of true senior architects.

Big DataCareer DevelopmentDistributed Systems
0 likes · 9 min read
Why Do Big‑Tech Architects Earn Six Figures? The Skills That Set Them Apart
Practical DevOps Architecture
Practical DevOps Architecture
Jul 8, 2025 · Big Data

Master High‑Performance E‑Commerce Search with Elasticsearch & SpringBoot

This comprehensive course teaches developers how to design and implement a high‑throughput, scalable search engine for e‑commerce platforms using Elasticsearch and SpringBoot, covering architecture, data modeling, performance tuning, and advanced features such as autocomplete, fuzzy correction, price filtering, and sales reporting.

Distributed SystemsElasticsearchSpringBoot
0 likes · 8 min read
Master High‑Performance E‑Commerce Search with Elasticsearch & SpringBoot

Demystifying Consistency Models: From Linear to Eventual in Distributed Systems

This article explores the concept of consistency in distributed systems, breaking down various consistency models—including linear, sequential, causal, and eventual—explaining their definitions, practical implications, and how they guide the design of high‑availability architectures and data replication strategies.

ConsistencyDistributed Systemsconsistency models
0 likes · 13 min read
Demystifying Consistency Models: From Linear to Eventual in Distributed Systems
IT Architects Alliance
IT Architects Alliance
Jul 6, 2025 · Backend Development

Why Microservices Are the Secret to Higher Salaries and Scalable Systems

Microservices have become the standard architecture for large internet companies, offering superior scalability, maintainability, and team autonomy compared to monolithic systems, while demanding a broad tech stack—including service discovery, API gateways, container orchestration, and distributed transaction handling—making expertise in this area highly lucrative.

Backend ArchitectureCloud NativeDevOps
0 likes · 9 min read
Why Microservices Are the Secret to Higher Salaries and Scalable Systems
Deepin Linux
Deepin Linux
Jul 4, 2025 · Backend Development

Mastering Protocol Buffers in C++: Installation, Data Types, and Real‑World Use Cases

This comprehensive guide explains what Protocol Buffers are, why they outperform JSON and XML, how to install and configure the library, the supported data types, code generation for multiple languages, practical C++ examples, and typical scenarios such as distributed systems, storage, and network communication.

CData StructuresDistributed Systems
0 likes · 23 min read
Mastering Protocol Buffers in C++: Installation, Data Types, and Real‑World Use Cases
Su San Talks Tech
Su San Talks Tech
Jul 3, 2025 · Databases

Mastering MySQL Sharding: Strategies for 1 Billion Orders

This article explores the pain points of a 700‑million‑row MySQL order table, presents vertical and horizontal sharding strategies, introduces gene‑based Snowflake IDs, details routing logic, migration steps, common pitfalls, and shows performance gains after applying the final architecture.

Distributed SystemsPerformance Optimizationdatabase scaling
0 likes · 9 min read
Mastering MySQL Sharding: Strategies for 1 Billion Orders
Selected Java Interview Questions
Selected Java Interview Questions
Jul 1, 2025 · Backend Development

Why Our Custom Snowflake ID Failed and How to Build Reliable IDs

A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake generator, prompting a deep dive into the standard algorithm, the mistakes in the bespoke version, and practical recommendations for using proven implementations and proper machine‑ID configuration.

Backend DevelopmentDistributed SystemsID generation
0 likes · 7 min read
Why Our Custom Snowflake ID Failed and How to Build Reliable IDs
Architecture & Thinking
Architecture & Thinking
Jun 30, 2025 · Backend Development

Mastering RocketMQ Retry: Producer & Consumer Strategies for Reliable Messaging

This article deeply explores Apache RocketMQ's retry mechanisms, detailing producer and consumer retry strategies, flow control handling, dead‑letter queue management, advanced configurations, best practices, and comparisons with Kafka and RabbitMQ, providing practical code examples and monitoring recommendations for building highly reliable distributed systems.

Dead Letter QueueDistributed SystemsIdempotency
0 likes · 8 min read
Mastering RocketMQ Retry: Producer & Consumer Strategies for Reliable Messaging
Lin is Dream
Lin is Dream
Jun 27, 2025 · Backend Development

How to Solve Common RocketMQ Issues: Duplicates, Throttling, Retries, and Loss

This article examines frequent RocketMQ problems such as duplicate sending, flow‑control throttling, message retries, duplicate consumption, backlog, and loss, and provides practical configuration tweaks, scaling strategies, batch sending, idempotent handling, and retry mechanisms to ensure reliable message delivery.

Distributed SystemsJavaMessage Queue
0 likes · 9 min read
How to Solve Common RocketMQ Issues: Duplicates, Throttling, Retries, and Loss
Lin is Dream
Lin is Dream
Jun 26, 2025 · Backend Development

Unveiling RocketMQ: How Messages Journey Through Storage, Delivery, and Expiration

This article systematically breaks down RocketMQ's core mechanisms—covering message roles, disk storage, push/pull delivery, expiration handling, retry queues, and cluster failover—so developers can understand every stage a message undergoes from creation to cleanup and ensure reliable, high‑performance messaging.

Distributed SystemsJavaMessage Queue
0 likes · 13 min read
Unveiling RocketMQ: How Messages Journey Through Storage, Delivery, and Expiration
Lin is Dream
Lin is Dream
Jun 25, 2025 · Backend Development

12 Essential RocketMQ Best Practices for Reliable Messaging

This article presents a comprehensive set of RocketMQ usage guidelines—including topic and tag conventions, producer and consumer group naming, key handling, logging, retry mechanisms, and cluster deployment recommendations—to help engineers build stable, high‑performance, and observable messaging systems in production environments.

Distributed SystemsMessage QueueRocketMQ
0 likes · 9 min read
12 Essential RocketMQ Best Practices for Reliable Messaging
TAL Education Technology
TAL Education Technology
Jun 23, 2025 · Operations

How Chaos Engineering Boosts System Resilience: A Practical Guide

This article explains what Chaos Engineering is, why it matters for modern distributed systems, outlines a step‑by‑step approach to designing and running effective chaos experiments, describes platform features, and shares a real‑world case study of a pre‑launch blind test.

Distributed SystemsReliabilityResilience Testing
0 likes · 9 min read
How Chaos Engineering Boosts System Resilience: A Practical Guide
Lobster Programming
Lobster Programming
Jun 23, 2025 · Backend Development

How RocketMQ’s CommitLog Powers Million‑Level Concurrency

This article explains how RocketMQ’s CommitLog architecture—sequential writes, mmap zero‑copy, PageCache acceleration, fixed‑size log files, flexible flushing strategies, and efficient ConsumeQueue indexing—enables the system to sustain million‑level QPS with high reliability and low latency.

CommitLogDistributed SystemsPageCache
0 likes · 6 min read
How RocketMQ’s CommitLog Powers Million‑Level Concurrency
dbaplus Community
dbaplus Community
Jun 22, 2025 · Backend Development

Why UUIDv7 Is the New Go-To Primary Key for Distributed Databases

The article explains the drawbacks of traditional random UUIDs as primary keys, introduces the time‑ordered design of UUIDv7, compares it with earlier versions, and provides practical Java code and SQL examples for generating and using UUIDv7 in databases.

Distributed SystemsSQLUUIDv7
0 likes · 8 min read
Why UUIDv7 Is the New Go-To Primary Key for Distributed Databases
Architect's Guide
Architect's Guide
Jun 22, 2025 · Backend Development

How to Build a Scalable Delayed Queue with Redis and Java

This article explains why traditional polling fails for large‑scale delayed tasks, compares built‑in Java, RocketMQ, and RabbitMQ delay queues, and provides a detailed Redis‑based design with architecture diagrams, message structures, and a 2.0 version that uses real‑time locking for low‑latency delivery.

Distributed SystemsJavadelayed queue
0 likes · 8 min read
How to Build a Scalable Delayed Queue with Redis and Java
Cognitive Technology Team
Cognitive Technology Team
Jun 21, 2025 · Fundamentals

Understanding Faults, Failures, and Fault Tolerance in Distributed Systems

This tutorial explains the definitions of faults and failures in distributed systems, explores their types and root causes, and presents fault‑tolerance mechanisms such as replication, checkpointing, redundancy, error detection, load balancing, and consensus algorithms to build resilient architectures.

Distributed Systemsconsensus algorithmsdata replication
0 likes · 10 min read
Understanding Faults, Failures, and Fault Tolerance in Distributed Systems
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 20, 2025 · Backend Development

Understanding Dubbo’s Load‑Balancing Strategies: From Random to Consistent Hash

This article introduces Dubbo, Alibaba’s high‑performance Java RPC framework, and provides a detailed examination of its client‑side load‑balancing mechanisms, covering the default Random strategy and the alternatives RoundRobin, LeastActive, and ConsistentHash, along with their principles, advantages, and drawbacks.

Backend DevelopmentDistributed SystemsDubbo
0 likes · 5 min read
Understanding Dubbo’s Load‑Balancing Strategies: From Random to Consistent Hash
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jun 16, 2025 · Artificial Intelligence

How LangGraph Implements Shared Memory for Multi‑Agent Systems: Techniques, Tools, and Future Directions

This article examines the theory and practice of shared memory in multi‑agent systems, tracing its evolution from classic blackboard models to modern solutions like Mem0.ai, Open Memory, and A‑MEM, and provides concrete design patterns, integration strategies, and future research directions for LangGraph users.

AI memoryDistributed SystemsLLM
0 likes · 37 min read
How LangGraph Implements Shared Memory for Multi‑Agent Systems: Techniques, Tools, and Future Directions
Pan Zhi's Tech Notes
Pan Zhi's Tech Notes
Jun 16, 2025 · Backend Development

How RocketMQ Guarantees No Message Loss, Duplication, or Disorder

This article explains RocketMQ’s architecture, the roles of NameServer, Broker, Producer, Consumer, and how each component ensures reliable message delivery—covering synchronous, asynchronous, and one‑way sending, storage mechanisms, consumer retries, dead‑letter queues, installation steps, and Java client integration with code examples.

Distributed SystemsInstallationJava
0 likes · 20 min read
How RocketMQ Guarantees No Message Loss, Duplication, or Disorder
Xuanwu Backend Tech Stack
Xuanwu Backend Tech Stack
Jun 15, 2025 · Backend Development

Understanding Zookeeper’s One‑Time Watch and Persistent Listener Techniques

This article explains why Zookeeper's watch mechanism triggers only once, outlines the performance, reliability, and design reasons behind it, describes its asynchronous eventual consistency, and provides Java code examples for basic watches, manual re‑registration, and using the Curator framework for persistent listeners.

Distributed SystemsJavaOne-time Trigger
0 likes · 7 min read
Understanding Zookeeper’s One‑Time Watch and Persistent Listener Techniques
macrozheng
macrozheng
Jun 13, 2025 · Backend Development

How to Build a Real‑Time Chat with Spring Boot WebSocket: Step‑by‑Step Guide

This article explains how to integrate WebSocket into a Spring Boot project to create a lightweight instant‑messaging system, covering dependency setup, configuration classes, core server implementation, required modules, common deployment issues, and practical solutions with complete code examples.

Backend DevelopmentDistributed SystemsInstant Messaging
0 likes · 14 min read
How to Build a Real‑Time Chat with Spring Boot WebSocket: Step‑by‑Step Guide
AI Large Model Application Practice
AI Large Model Application Practice
Jun 3, 2025 · Backend Development

Scaling Human‑in‑the‑Loop Agents to Distributed Environments with Robust Fault Recovery

This article explains how to extend a single‑process Human‑in‑the‑Loop (HITL) agent to a distributed, multi‑user API service using FastAPI, detailing session management, interrupt handling, client and server fault‑recovery strategies, and providing concrete code snippets and architectural diagrams.

Distributed SystemsHuman-in-the-LoopLangGraph
0 likes · 16 min read
Scaling Human‑in‑the‑Loop Agents to Distributed Environments with Robust Fault Recovery
dbaplus Community
dbaplus Community
May 25, 2025 · Databases

How to Generate Short, Sequential Numeric IDs Without Snowflake Overhead

To replace long UUIDs with short, sequential numeric account IDs, the article explores the limitations of Snowflake’s 64‑bit IDs, evaluates MySQL auto‑increment and REPLACE INTO approaches, identifies deadlock issues, and ultimately proposes a segmented free‑ID table with batch allocation to achieve compact, ordered IDs.

Distributed SystemsID generationauto_increment
0 likes · 14 min read
How to Generate Short, Sequential Numeric IDs Without Snowflake Overhead
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
May 25, 2025 · Fundamentals

How Consensus, CAP, and BASE Shape High‑Availability Architecture

This article explains the role of consensus algorithms in achieving high‑availability through redundancy and automatic failover, clarifies distributed consistency, explores the CAP theorem and its C component, and introduces the BASE theory as a practical complement for eventual consistency in modern distributed systems.

BASE theoryCAP theoremConsensus
0 likes · 10 min read
How Consensus, CAP, and BASE Shape High‑Availability Architecture
Architect
Architect
May 21, 2025 · Databases

Designing Short Numeric ID Generation Using MySQL Auto‑Increment and Segment Allocation

The article examines the challenges of generating short, user‑friendly numeric account IDs, evaluates Snowflake and MySQL auto‑increment approaches, discusses deadlock issues with REPLACE INTO, and presents a final segment‑based solution that allocates ID blocks per login server while avoiding waste and concurrency problems.

Database designDistributed Systemsauto_increment
0 likes · 12 min read
Designing Short Numeric ID Generation Using MySQL Auto‑Increment and Segment Allocation
FunTester
FunTester
May 19, 2025 · Operations

Chaos Engineering Tools, Theory, and Practices

Chaos engineering, a scientific method for improving system resilience, is explored through an overview of leading tools such as Gremlin, ChaosBlade, Chaos Mesh, Chaos Toolkit, and ChaosMeta, alongside core concepts, real-world case studies, common misconceptions, and the practical value of controlled fault injection in distributed systems.

Distributed SystemsFault InjectionReliability
0 likes · 12 min read
Chaos Engineering Tools, Theory, and Practices
FunTester
FunTester
May 16, 2025 · Operations

Chaos Engineering: Evolution, Workflow, Advantages, and Practice Principles

Chaos engineering is a discipline that deliberately injects faults into distributed systems to test and improve resilience, tracing its evolution from Netflix's Chaos Monkey to modern platforms, outlining its operational workflow, benefits, and core principles for reliable system design.

Distributed SystemsFault InjectionOperations
0 likes · 9 min read
Chaos Engineering: Evolution, Workflow, Advantages, and Practice Principles
Top Architecture Tech Stack
Top Architecture Tech Stack
May 15, 2025 · Backend Development

Understanding Cookie + Session Mechanism and Distributed Session Sharing Solutions

This article explains the Cookie + Session mechanism for maintaining user state, discusses its limitations such as size, performance and security, examines challenges in distributed environments, and reviews common solutions including session replication, sticky load balancing, centralized storage, and the use of ThreadLocal for small‑scale backend applications.

CookieDistributed SystemsSession
0 likes · 17 min read
Understanding Cookie + Session Mechanism and Distributed Session Sharing Solutions
FunTester
FunTester
May 15, 2025 · Operations

Uncovering the Eight Hidden Pitfalls That Can Crash Your Distributed System

This article dissects the classic Eight Fallacies of Distributed Computing, explaining each mistaken assumption about network reliability, latency, bandwidth, security, topology, administration, cost, and homogeneity, and provides real‑world case studies and practical recommendations to help engineers design more resilient distributed systems.

Distributed SystemsFallaciesLatency
0 likes · 16 min read
Uncovering the Eight Hidden Pitfalls That Can Crash Your Distributed System
Infra Learning Club
Infra Learning Club
May 15, 2025 · R&D Management

How This Pioneer Programmer Coded Until 60: The Key Practices Behind His Longevity

The article examines the 60‑year coding career of OceanBase founder Yang Zhenkun, outlining five concrete strategies—deep technical focus, embracing change, building soft‑skill influence, maintaining health, and proactive career planning—that enable programmers to sustain relevance and vitality in a fast‑moving industry.

Distributed SystemsSoftware Engineeringcareer longevity
0 likes · 7 min read
How This Pioneer Programmer Coded Until 60: The Key Practices Behind His Longevity
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 14, 2025 · Artificial Intelligence

How Mooncake’s KVCache Boosts Large‑Model Inference Efficiency and Cost

Mooncake, an open‑source large‑model inference platform, introduces a KVCache‑centric architecture that dramatically improves throughput, reduces latency and cuts inference costs by up to 20%, while integrating with frameworks like SGLang and vLLM and leveraging Alibaba Cloud’s eRDMA and GPUDirect technologies for scalable, high‑performance deployments.

AI PerformanceAlibaba CloudDistributed Systems
0 likes · 7 min read
How Mooncake’s KVCache Boosts Large‑Model Inference Efficiency and Cost
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
May 11, 2025 · Fundamentals

Why Unreliable Networks Threaten Distributed Systems and How to Mitigate Them

Distributed systems suffer from network unreliability—including packet loss, out‑of‑order delivery, variable latency, and ambiguous node failures—making timeout settings and fault detection challenging, and this article explains these issues, compares synchronous and asynchronous networks, and discusses strategies to balance latency and resource utilization.

Distributed SystemsNetwork Reliabilityasynchronous network
0 likes · 8 min read
Why Unreliable Networks Threaten Distributed Systems and How to Mitigate Them
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
May 11, 2025 · Fundamentals

Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

This article examines the unreliability of physical clocks in distributed systems, compares synchronous and asynchronous network timing, explains the roles of wall and monotonic clocks, and explores logical clocks, snapshot isolation, and practical solutions such as Google Spanner's TrueTime to ensure data consistency.

Data ConsistencyDistributed SystemsLogical Clock
0 likes · 11 min read
Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them
Code Ape Tech Column
Code Ape Tech Column
May 9, 2025 · Databases

Efficient Strategies for Importing One Billion Records into MySQL

This article explains how to import 1 billion 1 KB log records stored in HDFS or S3 into MySQL by analyzing single‑table limits, using batch inserts, choosing storage engines, sharding, optimizing file‑reading methods, and coordinating distributed tasks with Redis, Redisson, and Zookeeper to ensure ordered, reliable, and high‑throughput data loading.

Batch InsertDistributed SystemsKafka
0 likes · 19 min read
Efficient Strategies for Importing One Billion Records into MySQL
Su San Talks Tech
Su San Talks Tech
May 7, 2025 · Backend Development

6 Scalable Leaderboard Solutions: From DB Sorting to Real‑Time Stream Processing

This article examines six different leaderboard implementation strategies—from simple database sorting and cache‑plus‑scheduled tasks to Redis sorted sets, sharded Redis clusters, pre‑computed layered caches, and real‑time stream processing with Flink—detailing their suitable scenarios, advantages, disadvantages, and architectural diagrams to help engineers choose the most appropriate solution.

Distributed SystemsReal-Timecaching
0 likes · 7 min read
6 Scalable Leaderboard Solutions: From DB Sorting to Real‑Time Stream Processing
Lin is Dream
Lin is Dream
May 5, 2025 · Backend Development

Mastering MDC with Logback: Traceable Logging for Distributed Systems

This article explains how to use SLF4J's MDC with Logback to assign a unique trace ID to each request, propagate it across threads and services, and configure log patterns so that logs become fully traceable for easier debugging in distributed systems.

Distributed SystemsJavaThreadLocal
0 likes · 7 min read
Mastering MDC with Logback: Traceable Logging for Distributed Systems
dbaplus Community
dbaplus Community
May 5, 2025 · Fundamentals

Why Banks Are Replacing IBM Mainframes with Distributed Systems – A Deep Dive

The article explains how the Agricultural Bank of China successfully shut down its IBM mainframe, detailing the mainframe's high‑performance architecture, redundancy features, software ecosystem, and why its replacement with a distributed micro‑service core using TDSQL marks a significant shift for banking IT infrastructure.

Distributed SystemsIBMMainframe
0 likes · 9 min read
Why Banks Are Replacing IBM Mainframes with Distributed Systems – A Deep Dive
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
May 4, 2025 · Fundamentals

Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

This article examines how unreliable physical clocks—both wall and monotonic—affect distributed systems, compares synchronous and asynchronous network timing, illustrates conflicts caused by timestamp drift, and presents logical clocks and Google’s TrueTime as robust solutions for achieving consistent ordering and data reliability.

Distributed SystemsLogical ClockTrueTime
0 likes · 11 min read
Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them
Architect
Architect
May 3, 2025 · Backend Development

Why Rebuild a Job Scheduler? Inside a Lightweight Distributed Timing Framework

This article explains the motivation, design choices, and implementation details of a custom distributed job scheduling framework, covering its architecture, load‑balancing strategy, message‑queue handling, persistence mechanisms, and key code snippets, while comparing it to existing solutions like Quartz, XXL‑Job, and PowerJob.

Distributed SystemsJavaMessage Queue
0 likes · 16 min read
Why Rebuild a Job Scheduler? Inside a Lightweight Distributed Timing Framework
macrozheng
macrozheng
Apr 30, 2025 · Fundamentals

Key Questions for a Basic Infrastructure Interview: TCP, Redis, Kafka, CAP & More

This article compiles essential interview questions covering TCP connection termination, multi‑port listening, page load workflow, Redis data structures, Kafka consumer sizing and at‑most‑once semantics, the CAP theorem, Singleton usage, C++ map complexity, and a doubly linked list reversal algorithm, providing concise explanations and code examples.

AlgorithmsBackend DevelopmentDistributed Systems
0 likes · 14 min read
Key Questions for a Basic Infrastructure Interview: TCP, Redis, Kafka, CAP & More
21CTO
21CTO
Apr 29, 2025 · Backend Development

Why Microservices Might Be the Right Architecture for Your Organization

Microservices are independently deployable services modeled around business domains, offering benefits like smaller deployments, reduced risk, faster release cycles, and clear data ownership, while also introducing challenges such as distributed system complexity, operational overhead, and data consistency, requiring careful design of communication and scaling strategies.

Backend ArchitectureDeploymentDistributed Systems
0 likes · 12 min read
Why Microservices Might Be the Right Architecture for Your Organization
Xiaolei Talks DB
Xiaolei Talks DB
Apr 28, 2025 · Databases

How China's DBA Landscape Is Evolving with Domestic Databases and AI

The article examines China's rapid shift toward domestic databases across finance, government, and energy sectors, highlighting how DBAs must upgrade from reactive fire‑fighting to strategic architects by mastering distributed systems, AI‑driven automation, cloud‑native tools, and open‑source community collaboration.

AIDBADistributed Systems
0 likes · 12 min read
How China's DBA Landscape Is Evolving with Domestic Databases and AI
Lobster Programming
Lobster Programming
Apr 28, 2025 · Backend Development

How RocketMQ Transactional Messages Ensure Distributed Data Consistency

This article explains RocketMQ's transactional message mechanism, covering half‑message storage, three transaction states, status‑check procedures, key APIs, storage reliability, and the two‑phase commit process that guarantees eventual consistency in distributed systems.

Data ConsistencyDistributed SystemsMessage Queue
0 likes · 6 min read
How RocketMQ Transactional Messages Ensure Distributed Data Consistency
IT Services Circle
IT Services Circle
Apr 28, 2025 · Fundamentals

Agricultural Bank’s Mainframe Shutdown and Migration to a Distributed Core System: Technical Overview and Industry Implications

The article examines the Agricultural Bank of China's successful shutdown of its IBM mainframe, detailing the z14's specifications, redundancy and virtualization features, the shift to a high‑concurrency distributed micro‑service architecture with TDSQL, and the broader impact on banking and IBM’s presence in China.

BankingDistributed SystemsIBM
0 likes · 9 min read
Agricultural Bank’s Mainframe Shutdown and Migration to a Distributed Core System: Technical Overview and Industry Implications
Architect
Architect
Apr 24, 2025 · Backend Development

Beyond the Hype: What Microservices Really Offer (And What They Don’t)

This article critically examines the popular claims surrounding microservices, tracing their historical roots, debunking each touted benefit, exposing distributed‑computing fallacies, and highlighting the real organizational challenges, ultimately concluding that microservices are simply modular components rather than a revolutionary architecture.

Backend DevelopmentDistributed SystemsIndustry analysis
0 likes · 15 min read
Beyond the Hype: What Microservices Really Offer (And What They Don’t)
Tencent Cloud Middleware
Tencent Cloud Middleware
Apr 24, 2025 · Backend Development

How TDMQ RocketMQ Implements Distributed Rate Limiting for High‑Throughput Messaging

This article explains TDMQ RocketMQ's distributed rate‑limiting mechanism, covering conversion rules, fast‑fail behavior, token‑based implementation, counting periods, client best practices, elastic TPS options, code examples for different SDK versions, monitoring tips, and answers to common throttling questions.

BackendDistributed SystemsMessage Queue
0 likes · 15 min read
How TDMQ RocketMQ Implements Distributed Rate Limiting for High‑Throughput Messaging
Tencent Cloud Developer
Tencent Cloud Developer
Apr 23, 2025 · Cloud Native

Microservices Architecture: Principles, Modeling, Integration, and Scaling

Microservices are small, autonomous services that replace monolithic codebases by emphasizing loose coupling, high cohesion, bounded contexts, technology-agnostic integration via REST, RPC, or events, disciplined code governance, semantic versioning, local transactions with eventual consistency, and robust scaling patterns such as timeouts, circuit breakers, and auto-scaling, while reflecting organizational structure and avoiding premature complexity.

Distributed Systemsarchitecturescaling
0 likes · 19 min read
Microservices Architecture: Principles, Modeling, Integration, and Scaling
JD Retail Technology
JD Retail Technology
Apr 22, 2025 · Artificial Intelligence

Generative Large‑Model Architecture for JD Advertising: Practices, Challenges, and Optimization

JD’s advertising platform replaces rule‑based recall with a generative large‑model pipeline that unifies e‑commerce knowledge, multimodal user intent, and semantic IDs across recall, coarse‑ranking, fine‑ranking and creative optimization, while meeting sub‑100 ms latency and sub‑¥1‑per‑million‑token cost through quantization, parallelism, caching, and joint generative‑discriminative inference, delivering double‑digit performance gains and paving the way for domain‑specific foundation models.

AdvertisingDistributed SystemsInference Optimization
0 likes · 20 min read
Generative Large‑Model Architecture for JD Advertising: Practices, Challenges, and Optimization
JD Tech
JD Tech
Apr 17, 2025 · Operations

Chaos Engineering: Principles, Core Steps, Tool Selection, and AI Integration

This article explains chaos engineering—its definition, core principles, experimental workflow, tool selection, AI‑driven enhancements, and practical case studies—providing a comprehensive guide for building resilient distributed systems across backend, cloud‑native, mobile, and AI‑enabled environments.

AI integrationDistributed SystemsFault Injection
0 likes · 26 min read
Chaos Engineering: Principles, Core Steps, Tool Selection, and AI Integration
Lobster Programming
Lobster Programming
Apr 17, 2025 · Backend Development

How Local Message Tables Solve Distributed Transaction Challenges

Using a local message table, developers can break down distributed transactions into local database operations and asynchronous MQ messages, ensuring eventual consistency, simplifying implementation, and handling retries, while balancing advantages like simplicity and compatibility against drawbacks such as added maintenance and potential queue dependencies.

Backend ArchitectureDistributed SystemsLocal Message Table
0 likes · 5 min read
How Local Message Tables Solve Distributed Transaction Challenges
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Apr 16, 2025 · Industry Insights

Our EuroSys'25 Experience: Presenting Atlas and Exploring Cutting‑Edge System Research

The article recounts the authors' participation in EuroSys'25 in Rotterdam, detailing the conference schedule, their presentation of the Atlas network verification paper, technical insights into distributed verification, interactions with peers, and memorable social and cultural experiences during the five‑day event.

AtlasDistributed SystemsEuroSys
0 likes · 7 min read
Our EuroSys'25 Experience: Presenting Atlas and Exploring Cutting‑Edge System Research
Cognitive Technology Team
Cognitive Technology Team
Apr 13, 2025 · Backend Development

Understanding RocketMQ Master‑Slave Architecture and High‑Availability Mechanisms

This article explains how RocketMQ achieves high availability and data reliability through its master‑slave broker design, covering synchronous and asynchronous replication, flush strategies, transaction messaging, automatic failover with Dledger, and read‑write separation for load balancing in distributed systems.

Distributed SystemsMaster‑SlaveRocketMQ
0 likes · 7 min read
Understanding RocketMQ Master‑Slave Architecture and High‑Availability Mechanisms
FunTester
FunTester
Apr 12, 2025 · Operations

How to Design Effective Fault‑Testing Cases for Resilient Distributed Systems

This article explains why fault testing is essential for modern distributed and cloud environments, outlines core goals, design principles, common fault categories, practical implementation strategies such as chaos engineering and gray releases, and shows how to analyze results to continuously improve system reliability.

Distributed Systemschaos engineeringfault testing
0 likes · 18 min read
How to Design Effective Fault‑Testing Cases for Resilient Distributed Systems
Java Tech Enthusiast
Java Tech Enthusiast
Apr 11, 2025 · Backend Development

Ensuring Message Processing Once in High-Concurrency Scenarios

The article explains how to guarantee that messages are processed only once in high‑concurrency environments by combining production‑side idempotent publishing, broker‑level deduplication with unique IDs, and consumption‑side business idempotency such as database constraints or distributed locks, while also recommending monitoring, metrics, and reconciliation as safety nets.

Distributed SystemsIdempotencyRocketMQ
0 likes · 6 min read
Ensuring Message Processing Once in High-Concurrency Scenarios
Java Captain
Java Captain
Apr 10, 2025 · Backend Development

Design and Implementation of Delayed Task Processing for Order Systems

This article explains various approaches to delayed task handling—such as database polling, JDK DelayQueue, Redis expiration listeners, Redisson delay queues, RocketMQ delayed messages, and RabbitMQ dead‑letter queues—evaluating their advantages, drawbacks, and best‑practice recommendations for reliable order‑expiration workflows.

Distributed SystemsMessage Queuedelayed tasks
0 likes · 17 min read
Design and Implementation of Delayed Task Processing for Order Systems
Sanyou's Java Diary
Sanyou's Java Diary
Apr 10, 2025 · Backend Development

Why RocketMQ Beats Kafka: Architecture Simplified and Features Amplified

This article explains how RocketMQ, a Chinese‑origin message queue, simplifies Kafka’s architecture while adding powerful features such as tag‑based filtering, transactional messaging, delayed and dead‑letter queues, and a unified commit‑log storage model, making delayed processing and high‑throughput scenarios easier to implement.

Distributed SystemsKafkaMessage Queue
0 likes · 10 min read
Why RocketMQ Beats Kafka: Architecture Simplified and Features Amplified
Xuanwu Backend Tech Stack
Xuanwu Backend Tech Stack
Apr 10, 2025 · Backend Development

Master RabbitMQ: Core Components and Architecture Explained

This article provides a comprehensive overview of RabbitMQ, an open-source AMQP-based message broker, detailing its core components—producers, exchanges, queues, consumers, and broker—along with auxiliary elements like bindings, connections, channels, virtual hosts, and key architectural features such as decoupling, flexible routing, reliability, and scalability.

AMQPBackend DevelopmentDistributed Systems
0 likes · 7 min read
Master RabbitMQ: Core Components and Architecture Explained
IT Services Circle
IT Services Circle
Apr 9, 2025 · Backend Development

Practical Guide to Rate Limiting: Algorithms, Implementation, and Production Cases

This article explains the fundamentals and practical implementations of common rate‑limiting algorithms—including fixed‑window, sliding‑window, leaky‑bucket, and token‑bucket—provides Java and Redis code samples, discusses their advantages, pitfalls, and real‑world production scenarios, and offers performance‑tuning tips.

Distributed SystemsJavabackend algorithms
0 likes · 10 min read
Practical Guide to Rate Limiting: Algorithms, Implementation, and Production Cases
Su San Talks Tech
Su San Talks Tech
Apr 8, 2025 · Backend Development

Mastering Rate Limiting: Practical Algorithms and Real‑World Cases

This article explains why rate limiting is essential for high‑traffic services, compares four classic algorithms (fixed‑window, sliding‑window, leaky‑bucket, token‑bucket), provides Java and Redis implementations, shares production case studies, highlights common pitfalls, and offers performance‑tuning tips for robust backend systems.

BackendDistributed Systemsrate limiting
0 likes · 11 min read
Mastering Rate Limiting: Practical Algorithms and Real‑World Cases
AntData
AntData
Apr 3, 2025 · Artificial Intelligence

Ray Flow Insight: Visualizing and Debugging Distributed AI Applications

Ray Flow Insight is an Ant Group open‑source tool that visualizes Ray's distributed programming primitives—Actors, Tasks, and Objects—to turn complex reinforcement‑learning systems from opaque "black boxes" into transparent, debuggable workflows, providing logical, physical, distributed stack, and flame‑graph views for performance analysis and optimization.

AIDebuggingDistributed Systems
0 likes · 32 min read
Ray Flow Insight: Visualizing and Debugging Distributed AI Applications