Tagged articles

distributed systems

2154 articles · Page 1 of 22
FunTester
FunTester
Jul 1, 2026 · Operations

When One Timeout Triggers a Platform‑Wide Outage

The article explains how unbounded retries, replication fan‑out, and naïve autoscaling can amplify a single timeout into a cascade of failures, and it proposes bounded retry policies, load‑aware scaling, and layered persistence as safeguards for reliable API‑centric systems.

autoscalingbounded retriesdistributed systems
0 likes · 12 min read
When One Timeout Triggers a Platform‑Wide Outage
ZhiKe AI
ZhiKe AI
Jun 23, 2026 · Backend Development

Duplicate Requests Aren’t Bugs: 5 Idempotency Solutions for Distributed Systems

When network timeouts or retries cause the same payment request to be processed multiple times, duplicate requests become a common failure mode in distributed systems; this article explains five practical idempotency strategies—unique DB indexes, token checks, state machines, Redis SETNX, and downstream dedup tables—and offers guidance on choosing the right approach.

MicroservicesRedisbackend
0 likes · 16 min read
Duplicate Requests Aren’t Bugs: 5 Idempotency Solutions for Distributed Systems
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 16, 2026 · Operations

How Alibaba’s Two‑Region Three‑Center Design Achieves 99.99% Availability

The article explains Alibaba’s “two‑region three‑center” architecture, detailing how geographically separated primary, backup, and disaster‑recovery data centers work together to provide financial‑grade high availability and protect against single‑site failures or regional catastrophes.

AlibabaData Center ArchitectureDisaster Recovery
0 likes · 3 min read
How Alibaba’s Two‑Region Three‑Center Design Achieves 99.99% Availability
ZhiKe AI
ZhiKe AI
Jun 14, 2026 · Fundamentals

Why Consistency Is a Luxury: A Practical Guide to BASE Theory in Distributed Systems

During peak events like Alibaba's Double‑11 and WeChat's red‑packet frenzy, distributed systems must trade strict consistency for availability; this article explains the CAP theorem, introduces the BASE model, compares CP and AP designs, and provides real‑world case studies and selection guidelines.

ACIDBASE theoryCAP theorem
0 likes · 15 min read
Why Consistency Is a Luxury: A Practical Guide to BASE Theory in Distributed Systems
ZhiKe AI
ZhiKe AI
Jun 13, 2026 · Fundamentals

Why Banks Let You Wait but Never Miscalculate: A 5‑Minute Guide to the CAP Theorem

Every delay you notice in WeChat messages, flash‑sale pages, or bank transfers stems from the same underlying distributed‑system trade‑off, and this article explains the CAP theorem, its three guarantees, common misconceptions, and how CP versus AP architectures shape real‑world services.

BASE modelCAP theoremCP vs AP
0 likes · 11 min read
Why Banks Let You Wait but Never Miscalculate: A 5‑Minute Guide to the CAP Theorem
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 12, 2026 · Industry Insights

Inside Alibaba’s Same‑City Active‑Active Architecture: A Complete Visual Guide

The article breaks down Alibaba’s same‑city active‑active high‑availability architecture, detailing its four design layers—traffic scheduling, stateless application services, data replication, and operational automation—while illustrating how each component ensures continuous service during data‑center failures.

Active-ActiveAlibabaData Replication
0 likes · 5 min read
Inside Alibaba’s Same‑City Active‑Active Architecture: A Complete Visual Guide
Machine Heart
Machine Heart
Jun 11, 2026 · Blockchain

How Agora Uncovered 15 Zero‑Day Deep Bugs in Consensus Protocols with an Industrial‑Grade Multi‑Agent Framework

The paper presents Agora, a hypothesis‑driven multi‑agent system that integrates domain knowledge with large‑model agents to automatically detect deep logic bugs in production‑level consensus protocols, discovering 15 previously unknown vulnerabilities across Raft, EPaxos, HotStuff and BullShark while outperforming GPT‑5.2, Claude 4.5 and other baselines at a fraction of the token cost.

Blockchain SecurityConsensus ProtocolsDeep Bug Detection
0 likes · 12 min read
How Agora Uncovered 15 Zero‑Day Deep Bugs in Consensus Protocols with an Industrial‑Grade Multi‑Agent Framework
Java Architect Handbook
Java Architect Handbook
Jun 9, 2026 · Backend Development

What’s the Dubbo Service Call Process? A 10‑Step Deep Dive

The article breaks down a complete Dubbo RPC invocation into ten precise steps—five on the consumer side and five on the provider side—explaining each core component such as Proxy, Filter, Cluster, LoadBalance, Protocol, and Transport, and addresses common interview follow‑up questions about clustering, load balancing, and sync vs async calls.

DubboJavaMicroservices
0 likes · 13 min read
What’s the Dubbo Service Call Process? A 10‑Step Deep Dive
ZhiKe AI
ZhiKe AI
Jun 8, 2026 · Backend Development

Microservices Unpacked: Why Independent Deployment Trumps Just Splitting Services

The article clarifies that microservices are defined by business‑capability‑aligned, independently deployable services—not merely smaller services—detailing five core mechanisms, distributed‑system constraints, SOA comparison, and five common misconceptions.

CAP theoremDomain-Driven DesignIndependent Deployment
0 likes · 14 min read
Microservices Unpacked: Why Independent Deployment Trumps Just Splitting Services
IT Learning Made Simple
IT Learning Made Simple
May 31, 2026 · Backend Development

What Journey to the West Teaches About Distributed System Architecture

Using the classic tale Journey to the West, the article maps each disciple to a microservice, explains the shift from monolith to microservices, and illustrates service governance, load balancing, service discovery, fault tolerance, and distributed transactions through vivid analogies and concrete examples.

MicroservicesService Governancedistributed systems
0 likes · 7 min read
What Journey to the West Teaches About Distributed System Architecture
Alibaba Cloud Native
Alibaba Cloud Native
May 27, 2026 · Artificial Intelligence

Quickly Build Enterprise Self‑Evolving Agents with AgentScope Builder and Harness Framework

This article presents a deep technical walkthrough of AgentScope Builder, showing how the Harness framework enables a single Java agent implementation to run on a personal machine as MinQwenPaw and then scale to a multi‑tenant, distributed enterprise platform with workspace isolation, sandboxing, and pluggable storage backends.

AgentCloud NativeJava
0 likes · 23 min read
Quickly Build Enterprise Self‑Evolving Agents with AgentScope Builder and Harness Framework
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 27, 2026 · Cloud Native

How RedProcess Evolved into DES: Optimizing Xiaohongshu’s Multimedia Task Scheduler

The article details the evolution from the first‑generation RedProcess scheduler to the Distributed Execution Scheduler (DES), explaining how architectural redesigns in storage layering, push‑based dispatch, and systematic disaster‑recovery transformed Xiaohongshu’s video‑cloud task scheduling from merely usable to highly efficient and resilient.

DESRedisTask scheduling
0 likes · 15 min read
How RedProcess Evolved into DES: Optimizing Xiaohongshu’s Multimedia Task Scheduler
IT Learning Made Simple
IT Learning Made Simple
May 25, 2026 · Backend Development

From Straw Hut to Skyscraper: The Evolution of Software Architecture

This article traces the historical evolution of software architecture—from early monolithic programs likened to straw huts, through layered, distributed, and microservice designs, to modern cloud‑native and AI‑driven approaches—explaining why each shift addresses growing complexity and organizational needs.

Microservicesarchitecture evolutioncloud-native
0 likes · 9 min read
From Straw Hut to Skyscraper: The Evolution of Software Architecture
AntTech
AntTech
May 22, 2026 · Cloud Native

From Computer Use to Datacenter Use: Enabling AI Agents to Drive Data Centers Like Function Calls

The article analyzes how AI agents require datacenter‑scale compute beyond a single virtual machine, explains why existing cloud‑native stacks cannot meet this demand, and details Ant Group's AKernel and openYuanrong solution—including three technical pillars, performance benchmarks, a tiny development team, and a streamlined deployment workflow that turns any developer into a "Build Your Own Cluster" operator.

AI AgentsAKernelCloud Native
0 likes · 16 min read
From Computer Use to Datacenter Use: Enabling AI Agents to Drive Data Centers Like Function Calls
LuTiao Programming
LuTiao Programming
May 21, 2026 · Backend Development

Stop Fighting Microservice Calls—Why Experts Prefer Event‑Driven Architecture for Decoupling Distributed Systems

The article explains how traditional synchronous microservice calls create tight coupling, cascading failures, scaling bottlenecks, and high latency, and demonstrates that adopting an event‑driven architecture with producers, consumers, and a message broker such as Kafka can fully decouple services, improve scalability, and enable patterns like event sourcing and CQRS.

CQRSEvent SourcingEvent-Driven Architecture
0 likes · 14 min read
Stop Fighting Microservice Calls—Why Experts Prefer Event‑Driven Architecture for Decoupling Distributed Systems
Linyb Geek Road
Linyb Geek Road
May 15, 2026 · Backend Development

8 Practical API Idempotency Solutions to Eliminate Duplicate Requests (Pitfall Guide)

The article explains the causes of duplicate requests in distributed systems, defines idempotency, and presents eight concrete implementation strategies—including token mechanisms, unique database indexes, optimistic and pessimistic locks, distributed locks, state machines, request serial numbers, and MQ‑based handling—each with code samples, advantages, drawbacks, and usage guidelines.

API designSpring Bootdatabase
0 likes · 35 min read
8 Practical API Idempotency Solutions to Eliminate Duplicate Requests (Pitfall Guide)
Linyb Geek Road
Linyb Geek Road
May 15, 2026 · Backend Development

Idempotency in Practice: Handling the Same Key with Different Parameters

The article explains why simple key‑based idempotency fails when a second request carries different parameters, and demonstrates how to use database row locks, request fingerprinting, state machines, and explicit error handling to guarantee safe, non‑duplicate execution in payment‑critical APIs.

API designMessage QueueState Machine
0 likes · 13 min read
Idempotency in Practice: Handling the Same Key with Different Parameters
Linyb Geek Road
Linyb Geek Road
May 14, 2026 · Backend Development

How to Build a Reliable 15‑Minute Order Auto‑Cancel in Java: From Naïve @Scheduled to Production‑Ready Redisson

The article walks through the pitfalls of a seemingly simple 15‑minute unpaid‑order cancellation requirement, evaluates five implementation options—from a basic @Scheduled poll to Redis ZSet, DelayQueue, and distributed Redisson solutions—culminating in a production‑grade Redisson scheduler with optimistic‑lock safeguards and detailed best‑practice guidelines.

JavaOrder TimeoutRedis
0 likes · 13 min read
How to Build a Reliable 15‑Minute Order Auto‑Cancel in Java: From Naïve @Scheduled to Production‑Ready Redisson
dbaplus Community
dbaplus Community
May 6, 2026 · Backend Development

Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It

The article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naïve cron‑based scans are unsuitable for tens of millions of rows, and presents three progressively robust solutions using Redis expiration, Redis ZSet polling, and message‑queue or time‑wheel architectures.

Delayed TaskMessage QueueOrder Cancellation
0 likes · 10 min read
Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It
Architect's Guide
Architect's Guide
May 1, 2026 · Backend Development

Senior Architects Reveal a Comprehensive Learning Roadmap for Aspiring System Designers

The article outlines a step‑by‑step learning system compiled by senior architects, covering skill foundations, source‑code analysis, distributed and microservice architectures, concurrency, performance tuning, essential Java tools, and a hands‑on e‑commerce project to help developers become well‑rounded architects.

JavaMicroservicesconcurrency
0 likes · 7 min read
Senior Architects Reveal a Comprehensive Learning Roadmap for Aspiring System Designers
Linyb Geek Road
Linyb Geek Road
Apr 29, 2026 · Backend Development

How Leading Tech Companies Elegantly Avoid the Delayed Double Delete Pitfall

The article dissects why the delayed double‑delete cache‑consistency pattern breaks under high traffic, illustrates Alibaba’s painful experience, and then details two production‑grade alternatives—lease‑based token control and version‑number comparison—explaining their principles, Redis‑Lua implementations, and trade‑offs.

Cache consistencyRedis Luadelayed double delete
0 likes · 8 min read
How Leading Tech Companies Elegantly Avoid the Delayed Double Delete Pitfall
Java Tech Workshop
Java Tech Workshop
Apr 28, 2026 · Backend Development

Implementing Dead Letter Queues and Compensation Mechanisms in SpringBoot

This article explains how to use dead‑letter queues (DLX) to isolate failed messages in distributed SpringBoot applications, compares RabbitMQ and RocketMQ support, and presents a complete compensation framework with design principles, code examples, best‑practice guidelines, and a real‑world case study showing a 96% reduction in dead‑letter traffic.

CompensationDead‑Letter QueueRabbitMQ
0 likes · 23 min read
Implementing Dead Letter Queues and Compensation Mechanisms in SpringBoot
TonyBai
TonyBai
Apr 26, 2026 · Industry Insights

Martin Kleppmann on the New DDIA: How AI Will Disrupt Distributed Systems

In a deep interview, Martin Kleppmann explains why the upcoming second edition of Designing Data‑Intensive Applications rewrites core assumptions, declares MapReduce dead, predicts AI‑driven formal verification, warns of a talent gap, and champions local‑first software as the next frontier of distributed systems.

AICloud PrimitivesDDIA
0 likes · 10 min read
Martin Kleppmann on the New DDIA: How AI Will Disrupt Distributed Systems
ITPUB
ITPUB
Apr 25, 2026 · Interview Experience

How to Design a Billion‑Scale URL Shortening System for an Interview

This article walks through the complete interview‑style design of a billion‑scale URL shortener, covering requirements, capacity estimation, API definitions, database schema, short‑code generation algorithms, sharding, caching, load balancing, rate limiting, and expiration handling, while illustrating each step with concrete examples and calculations.

API designCachingSystem Design
0 likes · 24 min read
How to Design a Billion‑Scale URL Shortening System for an Interview
FunTester
FunTester
Apr 22, 2026 · Operations

Why Do Microservice E2E Tests Fail?

In microservice architectures, end‑to‑end tests often become flaky, slow, and untrustworthy because the assumptions of a stable, deterministic system clash with the reality of distributed, asynchronous services, leading to noisy failures, maintenance overhead, and delayed feedback.

CI/CDMicroservicesTesting Strategy
0 likes · 12 min read
Why Do Microservice E2E Tests Fail?
Java Backend Full-Stack
Java Backend Full-Stack
Apr 20, 2026 · Backend Development

What Skills Should a 3‑Year Java Backend Developer Master?

The article outlines a comprehensive skill matrix for a three‑year Java backend engineer, covering core Java and JVM knowledge, mainstream frameworks, storage, messaging, containerization, architecture, engineering practices, soft skills, and emerging trends such as AI integration and reactive programming.

DockerJVMJava
0 likes · 9 min read
What Skills Should a 3‑Year Java Backend Developer Master?
ITPUB
ITPUB
Apr 17, 2026 · Industry Insights

Why LinkedIn Dumped Kafka for Its Own ‘Northguard’ Streaming Engine

LinkedIn, the original home of Apache Kafka, abandoned the platform for a home‑grown system called Northguard, redesigning log storage, decentralizing metadata, and adding a virtualized Xinfra layer to handle trillions of daily events, while still acknowledging Kafka’s relevance for most companies.

LinkedInNorthguardStreaming
0 likes · 7 min read
Why LinkedIn Dumped Kafka for Its Own ‘Northguard’ Streaming Engine
DataFunSummit
DataFunSummit
Apr 15, 2026 · Artificial Intelligence

How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony

Relax, an open‑source RL training engine built on Megatron‑LM and SGLang, tackles data heterogeneity, system fragility, and role coupling by using a service‑oriented fault‑tolerant architecture, asynchronous pipelines, and multimodal‑native support, achieving up to 76% end‑to‑end speedup over veRL.

AI InfrastructureMultimodalRL Training
0 likes · 11 min read
How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony
Java Tech Enthusiast
Java Tech Enthusiast
Apr 10, 2026 · Databases

16 Powerful Ways to Leverage Redis in Your Applications

This article presents sixteen practical Redis use cases—from simple caching and distributed sessions to global IDs, rate limiting, bitmaps, shopping carts, timelines, message queues, likes, tags, filtering, follow relationships, and ranking—each illustrated with commands and code snippets for real‑world backend development.

Backend DevelopmentCachingData Structures
0 likes · 9 min read
16 Powerful Ways to Leverage Redis in Your Applications
LuTiao Programming
LuTiao Programming
Apr 10, 2026 · Backend Development

Master Payment Gateway Design: Multi‑Channel Aggregation, Smart Routing, and End‑to‑End Merchant Onboarding

The article explains how to build an enterprise‑grade payment gateway that unifies over 50 providers, performs millisecond‑level smart routing, handles failover, dynamic fee calculation, automated merchant onboarding, sharded storage, and comprehensive monitoring to sustain millions of transactions per day.

High concurrencydatabase shardingdistributed systems
0 likes · 10 min read
Master Payment Gateway Design: Multi‑Channel Aggregation, Smart Routing, and End‑to‑End Merchant Onboarding

How Kafka Powers Scalable E‑commerce Order Processing with Go

This article walks through the challenges of a fast‑growing e‑commerce platform during peak sales, explains why Apache Kafka is the ideal asynchronous messaging backbone, and provides a complete Go implementation—including producers, consumers, best‑practice patterns, and real‑world use cases—to achieve high throughput, fault tolerance, and seamless scalability.

Message QueueSaramadistributed systems
0 likes · 14 min read
How Kafka Powers Scalable E‑commerce Order Processing with Go
Ray's Galactic Tech
Ray's Galactic Tech
Mar 31, 2026 · Artificial Intelligence

From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint

This comprehensive guide walks Go engineers through the evolution from a prototype Retrieval‑Augmented Generation (RAG) service to a production‑grade, distributed AI platform, covering architecture, component boundaries, caching strategies, async indexing, observability, security, and step‑by‑step deployment.

AI ArchitectureBackend DevelopmentGo
0 likes · 42 min read
From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint
Tech Freedom Circle
Tech Freedom Circle
Mar 25, 2026 · Backend Development

Cracking Alibaba’s 10M Orders Interview: Architecture Seven‑Suite + Heterogeneous Storage Solution

The article dissects Alibaba’s second‑round interview question on handling 10 million daily order queries, exposing why a single sharding answer fails and presenting a comprehensive architecture‑seven‑suite combined with heterogeneous storage (MySQL, HBase, ClickHouse, ES, Redis, MQ) to achieve high concurrency, low latency, and reliable data consistency.

High concurrencyMicroservicesbackend-architecture
0 likes · 40 min read
Cracking Alibaba’s 10M Orders Interview: Architecture Seven‑Suite + Heterogeneous Storage Solution
TonyBai
TonyBai
Mar 20, 2026 · Cloud Native

When a Server Silently Crashes, How Long Can Your Cluster Survive? Inside the Heartbeat Failover Mechanism

The article explains how distributed systems detect silently dead nodes using heartbeat mechanisms—both push and pull models—covers trade‑offs between interval and timeout, introduces advanced detectors like Cassandra's Φ, gossip protocols, and quorum rules, and shows real‑world implementations in Kubernetes and etcd.

Cassandradistributed systemsfault detection
0 likes · 12 min read
When a Server Silently Crashes, How Long Can Your Cluster Survive? Inside the Heartbeat Failover Mechanism
dbaplus Community
dbaplus Community
Mar 17, 2026 · Backend Development

18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges

This article examines eighteen concrete production systems—from URL shorteners and Amazon S3 to YouTube, Stripe, Slack, and ChatGPT—showing how their design choices illustrate core concepts such as sharding, caching, idempotency, real‑time messaging, and large‑scale engineering, providing a practical roadmap for software engineers.

Case StudiesSystem Designarchitecture
0 likes · 13 min read
18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 16, 2026 · Artificial Intelligence

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Agentic reinforcement learning is evolving from simple text generation to complex, scalable agents, but large‑scale deployment faces challenges like massive parallel rollout scheduling and reproducible environments; this article presents a decoupled T‑architecture that separates high‑level RL logic (Verl) from execution orchestration (Argo Workflows) to address these issues.

Agentic RLArgo WorkflowsScalable Reinforcement Learning
0 likes · 10 min read
Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows
mikechen
mikechen
Mar 12, 2026 · Big Data

How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive

This article explains how Kafka’s sequential disk writes, zero‑copy data path, partition‑based parallelism, and configurable broker and partition settings enable linear‑scale throughput that can reach millions of transactions per second in large‑scale streaming systems.

ThroughputZero‑copydistributed systems
0 likes · 5 min read
How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive
dbaplus Community
dbaplus Community
Mar 5, 2026 · Backend Development

How to Ensure Message Order in Kafka: From Basics to Advanced Solutions

This article explains the concept of message ordering in distributed systems, details how Kafka stores messages in partitions, compares global and partial ordering, evaluates single‑partition, asynchronous, and multi‑partition solutions—including handling data skew and partition expansion—and provides a practical interview guide.

Message Orderingbackenddistributed systems
0 likes · 22 min read
How to Ensure Message Order in Kafka: From Basics to Advanced Solutions
TonyBai
TonyBai
Feb 23, 2026 · Backend Development

Should Financial Infrastructure Drop Rust for Go? How Pragmatism Wins

The article analyzes a Reddit discussion comparing Go and Rust for high‑performance, low‑latency financial systems, weighing Rust’s safety and performance against Go’s development speed and ecosystem, and concludes that pragmatic Go adoption is the optimal choice for most backend workloads.

Backend DevelopmentCorrectnessFinancial Infrastructure
0 likes · 12 min read
Should Financial Infrastructure Drop Rust for Go? How Pragmatism Wins
AI Waka
AI Waka
Feb 22, 2026 · Industry Insights

Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It

The article explains why naïve multi‑agent AI architectures collapse under load due to internal east‑west dependencies, and shows how applying 12‑Factor App and cloud‑native patterns—isolated workers, externalized state, short‑lived sessions, and strict orchestration—enable scalable, fault‑tolerant agentic systems.

12-FactorCloud NativeMicroservices
0 likes · 17 min read
Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It
Coder Trainee
Coder Trainee
Feb 19, 2026 · Fundamentals

Why Switch from UUID to ULID? Exploring Benefits and Features

The article explains why developers are moving from UUID to ULID, detailing ULID’s 128‑bit compatibility, massive per‑millisecond uniqueness, lexicographic sorting, compact 26‑character Base32 encoding, timestamp integration, strong randomness, and practical use cases such as distributed systems and database sharding.

Base32ULIDUUID
0 likes · 3 min read
Why Switch from UUID to ULID? Exploring Benefits and Features
ITPUB
ITPUB
Feb 11, 2026 · Backend Development

How to Guarantee Zero Message Loss in MQ Systems: A Full‑Lifecycle Design

This guide explains why guaranteeing 100% message reliability in MQ is a critical system‑design interview topic and presents a three‑layer architecture—production, storage, and consumption—detailing ACK settings, local message tables, broker replication, leader election safeguards, manual offset commits, and idempotent processing to prevent any message loss.

AcknowledgmentMQMessage reliability
0 likes · 11 min read
How to Guarantee Zero Message Loss in MQ Systems: A Full‑Lifecycle Design
IT Learning Made Simple
IT Learning Made Simple
Feb 10, 2026 · Fundamentals

Complete Guide to Acing the System Architecture Designer Exam – From Beginner to Certification

This article offers a comprehensive, step‑by‑step roadmap for candidates aiming to become certified System Architecture Designers in China, covering exam fundamentals, eligibility, detailed syllabus breakdown, study schedules, practical preparation tactics, resource recommendations, and career benefits, helping readers efficiently navigate the entire certification process.

Cloud Nativecareer developmentcertification
0 likes · 33 min read
Complete Guide to Acing the System Architecture Designer Exam – From Beginner to Certification
Architecture Digest
Architecture Digest
Jan 30, 2026 · Backend Development

How Hera Transforms SpringBoot Logging: A Step‑by‑Step Integration Guide

Integrating the Hera log platform into SpringBoot resolves common distributed‑system logging pain points—centralized storage, full‑trace linkages, and cost‑effective retention—by adding a non‑intrusive agent, configuring custom fields, enabling trace IDs, and providing a web console for rapid, multi‑service debugging and analysis.

HeraLoggingObservability
0 likes · 14 min read
How Hera Transforms SpringBoot Logging: A Step‑by‑Step Integration Guide
AntTech
AntTech
Jan 30, 2026 · Databases

Award-Winning Papers Reveal Databases, AI Typography, and Financial Benchmarks

Three award‑winning papers—OceanBase’s unitized database architecture for billion‑scale map services, a video‑diffusion‑based dynamic typography system that animates text semantically, and the FinBench LDBC financial graph benchmark—are examined, highlighting their design, experimental results, and impact on industry applications.

AIDatabasesGraph Benchmark
0 likes · 6 min read
Award-Winning Papers Reveal Databases, AI Typography, and Financial Benchmarks
Java Architect Handbook
Java Architect Handbook
Jan 28, 2026 · Databases

How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write

This article explains the Redis split‑brain problem that can occur in master‑replica clusters, outlines the interview points interviewers look for, and provides a detailed solution using the min‑replicas‑to‑write (or min‑slaves‑to‑write) configuration to sacrifice write availability for data consistency, along with best‑practice recommendations and common pitfalls.

ConfigurationHigh AvailabilityRedis
0 likes · 12 min read
How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write
LuTiao Programming
LuTiao Programming
Jan 27, 2026 · Big Data

Why LinkedIn Is Replacing Kafka with Its Own Next‑Gen Streaming System

LinkedIn, facing planetary‑scale data volumes, found Kafka’s architecture hitting fundamental limits and built Northguard—a decentralized, log‑striped streaming platform with Raft‑based metadata and an Xinfra migration layer—to gradually replace Kafka’s core responsibilities while maintaining compatibility.

Data ArchitectureLinkedInNorthguard
0 likes · 8 min read
Why LinkedIn Is Replacing Kafka with Its Own Next‑Gen Streaming System
AI Waka
AI Waka
Jan 26, 2026 · Industry Insights

Why Traditional Software Architecture Fails at Scale and How Message‑Based Design Solves It

The article examines the fifty‑year gap between Alan Kay's biologically‑inspired object model and Roy Fielding's REST constraints, explains why mainstream OOP and microservices fall short, and presents a message‑fabric architecture with bindable components, moderators, and assertion‑driven development that finally delivers scalable, autonomous enterprise systems.

Message-drivenMicroservicesassertion-driven development
0 likes · 22 min read
Why Traditional Software Architecture Fails at Scale and How Message‑Based Design Solves It
Architect's Guide
Architect's Guide
Jan 24, 2026 · Fundamentals

Why Our Custom Snowflake ID Generator Failed and How to Fix It

A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake algorithm; this article reviews the standard Snowflake structure, dissects the custom implementation’s critical mistakes—short timestamp, IP‑based business ID, zeroed worker and data‑center IDs—and offers best‑practice recommendations, including using mature libraries and proper worker‑ID strategies.

ID GenerationJavaSnowflake
0 likes · 7 min read
Why Our Custom Snowflake ID Generator Failed and How to Fix It
Architect's Guide
Architect's Guide
Jan 22, 2026 · Big Data

Unlock Kafka’s Power: Core Concepts, High‑Performance Architecture & Real‑World Scaling Tips

This comprehensive guide explores Kafka’s core value as a message queue, explains producers, consumers, topics, partitions, and replication, dives into cluster architecture, zero‑copy I/O, resource planning for disks, memory, CPU and network, and provides practical configuration, consumer‑group management, and operational tooling tips for building high‑throughput, highly available Kafka deployments.

Message QueuePerformance Tuningcluster scaling
0 likes · 31 min read
Unlock Kafka’s Power: Core Concepts, High‑Performance Architecture & Real‑World Scaling Tips
Top Architect
Top Architect
Jan 17, 2026 · Backend Development

Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works

Faced with limitations of existing tools like Quartz, XXL-Job, and PowerJob, the author explains the motivation for creating a custom scheduling framework, describes its architecture—including gRPC communication, protobuf serialization, a self-implemented name server for load balancing, a simple message queue, and time-wheel scheduling—provides code examples, and shares diagrams of discovery and dispatch processes.

JavaMessage QueueOpenAPI
0 likes · 17 min read
Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works
Tech Freedom Circle
Tech Freedom Circle
Jan 15, 2026 · Backend Development

Kafka Rebalance Storm Crushed 120k QPS in JD Interview – How to Understand and Fix

In a JD senior Java architect interview, a Kafka consumer‑group rebalance storm caused QPS to drop from 120k to zero, triggering massive message loss and latency spikes, and the article walks through the rebalance fundamentals, failure causes, impact analysis, cooperative sticky assignor migration, and comprehensive monitoring and mitigation strategies.

Consumer GroupMonitoringRebalance
0 likes · 28 min read
Kafka Rebalance Storm Crushed 120k QPS in JD Interview – How to Understand and Fix
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 12, 2026 · Databases

Designing Scalable Order Sharding for Millions of Daily Transactions

This article outlines a practical sharding strategy for e‑commerce order systems, estimating future load, detailing user‑centric partitioning, heterogeneous designs for merchants and operators, and migration steps to achieve high‑concurrency writes and massive storage without downtime.

Data MigrationOrder Managementdistributed systems
0 likes · 4 min read
Designing Scalable Order Sharding for Millions of Daily Transactions
Tech Freedom Circle
Tech Freedom Circle
Jan 6, 2026 · Backend Development

Why Choose RocketMQ Over Kafka? The Real Reasons Behind the 90% Mistake

This article dissects a common interview question about Kafka's higher throughput versus RocketMQ's richer features, explains the underlying design philosophies, storage models, I/O paths, scaling limits, real‑world use cases such as transaction, delayed and ordered messages, and provides concrete optimization steps and code samples to help engineers make an informed messaging platform choice.

JavaMessage QueueRocketMQ
0 likes · 42 min read
Why Choose RocketMQ Over Kafka? The Real Reasons Behind the 90% Mistake
ITPUB
ITPUB
Jan 3, 2026 · Backend Development

How to Build a Scalable Order Cancellation System: 3 Advanced Delayed‑Task Solutions

This article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naive cron jobs fail at scale, and presents three robust backend designs—Redis ZSet polling, message‑queue delayed messages, and time‑wheel timers—along with practical code snippets and pitfalls to avoid.

Delayed Tasksbackend-architecturedistributed systems
0 likes · 11 min read
How to Build a Scalable Order Cancellation System: 3 Advanced Delayed‑Task Solutions
DeWu Technology
DeWu Technology
Dec 29, 2025 · Backend Development

Unveiling RocketMQ: A Deep Dive into Its Architecture and Performance Secrets

This comprehensive guide explores RocketMQ’s four‑component architecture, storage formats, routing mechanisms, write‑and‑read workflows, high‑availability designs, performance optimizations, and a side‑by‑side comparison with Kafka, providing practical insights for building robust distributed messaging systems.

Message QueueRocketMQdistributed systems
0 likes · 28 min read
Unveiling RocketMQ: A Deep Dive into Its Architecture and Performance Secrets
JavaGuide
JavaGuide
Dec 25, 2025 · Interview Experience

How I Secured Offers from Top Tech Companies in 80 Days

The author, a non‑elite undergraduate and a modest 211 master’s graduate, shares a step‑by‑step 80‑day crash‑course that turned zero Java experience into multiple offers from major tech firms, emphasizing fundamental understanding, AI‑assisted learning, and thoughtful project trade‑offs.

AI-assisted learningAlgorithm PreparationJava
0 likes · 8 min read
How I Secured Offers from Top Tech Companies in 80 Days
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 25, 2025 · Backend Development

How to Resolve Kafka Backlog Under High Load: Practical Tips

This article explains why Kafka experiences message backlog in high‑load environments, identifies producer‑consumer speed mismatches, I/O and resource bottlenecks, and offers concrete strategies such as scaling consumers, tuning hardware, and adjusting Kafka configurations to eliminate the backlog.

BacklogPerformance Tuningdistributed systems
0 likes · 4 min read
How to Resolve Kafka Backlog Under High Load: Practical Tips
Code Ape Tech Column
Code Ape Tech Column
Dec 19, 2025 · Backend Development

Boost SpringBoot Log Management: Step‑by‑Step Integration with Hera

This article explains why traditional SpringBoot logging falls short, introduces the Hera log platform’s three core benefits, outlines a layered integration architecture, and provides a detailed five‑step guide—including Maven dependencies, YAML configuration, custom field providers, log output, traceability, and console usage—plus performance, high‑availability, security tips and common pitfalls.

HeraPerformance OptimizationTracing
0 likes · 14 min read
Boost SpringBoot Log Management: Step‑by‑Step Integration with Hera
Java Architect Handbook
Java Architect Handbook
Dec 14, 2025 · Backend Development

Why Our Custom Snowflake ID Failed and How to Build a Reliable One

A recent production incident revealed that a self‑developed Snowflake‑style ID generator caused duplicate order numbers due to a truncated timestamp, unsafe IP‑based business IDs, and unconfigured worker and data‑center IDs, prompting a detailed analysis of the standard algorithm, the flaws in the custom design, and best‑practice recommendations for robust ID generation.

ID GenerationSnowflakebackend
0 likes · 9 min read
Why Our Custom Snowflake ID Failed and How to Build a Reliable One
Tencent Cloud Middleware
Tencent Cloud Middleware
Dec 12, 2025 · Artificial Intelligence

How A2A over MQTT Transforms AI Agent Collaboration

This article explains the challenges of traditional point‑to‑point AI agent communication, introduces the A2A protocol and its limitations, and details how combining A2A with MQTT via Tencent Cloud TDMQ creates a dynamic, loosely‑coupled, and scalable solution with practical SDK examples and real‑world case studies.

A2A protocolAI AgentsMQTT
0 likes · 16 min read
How A2A over MQTT Transforms AI Agent Collaboration
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 9, 2025 · Backend Development

Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips

This article explains what high concurrency means for Kafka, outlines key performance metrics such as QPS, TPS, throughput and latency, and provides concrete configuration and architectural techniques—including broker optimization, horizontal scaling, network batching, and zero‑copy—to achieve write rates exceeding one million records per second.

High concurrencyPerformance Tuningbackend
0 likes · 4 min read
Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips
Java Architect Handbook
Java Architect Handbook
Dec 9, 2025 · Industry Insights

Why Microservices May Be Overhyped: Tracing Their Real Roots and Myths

The article first lists a series of Java learning projects and community benefits, then critically examines the widely touted advantages of microservices, showing how many of those claims originate from older technologies, debunking common myths, and concluding that microservices are essentially just modular code.

Industry AnalysisMicroservicesdistributed systems
0 likes · 16 min read
Why Microservices May Be Overhyped: Tracing Their Real Roots and Myths
JD Cloud Developers
JD Cloud Developers
Dec 8, 2025 · Fundamentals

Why Raft Guarantees Linear Consistency in Unreliable Networks

This article explains how unreliable networks, clock instability, and node failures can cause data inconsistency in distributed clusters, introduces the Raft consensus algorithm, details its roles, election process, log replication, read/write handling, consistency models, and mechanisms to avoid split-brain and livelock.

ConsensusLog ReplicationRaft
0 likes · 13 min read
Why Raft Guarantees Linear Consistency in Unreliable Networks
Ctrip Technology
Ctrip Technology
Dec 5, 2025 · Databases

How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

This article explains the design and implementation of Ctrip's Data Replication Center (DRC), a MySQL‑based high‑availability system that solves cross‑region data loop, progress tracking, concurrency, DDL handling, and conflict resolution to achieve low‑latency, reliable data replication for global travel services.

Data ReplicationGTIDHigh Availability
0 likes · 21 min read
How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication
Code Wrench
Code Wrench
Nov 26, 2025 · Backend Development

Unlocking Olric’s High‑Performance Network Protocol and RPC Mechanism

This article dives deep into Olric’s network communication architecture and RPC mechanism, explaining its layered transport design, request/response structures, pipeline and batch processing, client‑to‑cluster interactions, data migration and rebalancing, and provides Go code examples illustrating high‑throughput, safe distributed operations.

GoOlricRPC
0 likes · 6 min read
Unlocking Olric’s High‑Performance Network Protocol and RPC Mechanism
Code Wrench
Code Wrench
Nov 24, 2025 · Backend Development

What Makes Olric’s Go Architecture a Masterclass in Distributed KV Design

This article explores Olric, a pure‑Go distributed key‑value engine, detailing its dual embedded/stand‑alone mode, clean three‑layer architecture, core data structures, and engineering choices that illustrate best practices for building high‑performance, maintainable backend systems.

GoKV storearchitecture
0 likes · 10 min read
What Makes Olric’s Go Architecture a Masterclass in Distributed KV Design
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Nov 16, 2025 · Backend Development

How to Choose and Implement Architecture Contracts for Distributed Systems

This article explains why architecture‑level contract decisions are needed in distributed systems, compares strict and loose data contracts, illustrates schema‑on‑read/write patterns, and shows how to ensure forward and backward compatibility when evolving protocols such as JSON and Protobuf.

architecture contractsdata modelingdistributed systems
0 likes · 11 min read
How to Choose and Implement Architecture Contracts for Distributed Systems
Tech Freedom Circle
Tech Freedom Circle
Nov 16, 2025 · Databases

How Redis Pipeline Can Boost Performance 3‑12× and Impress Interviewers

This article explains Redis Pipeline’s core principle of batching commands to reduce network round‑trips, presents benchmark data showing up to 17‑fold speedups, details real‑world use cases such as cache warm‑up, heartbeat reporting, and high‑traffic events, and provides best‑practice guidelines on batch sizing, error handling, cluster constraints, and comparisons with transactions and Lua scripts.

Batch ProcessingJavaRedis
0 likes · 36 min read
How Redis Pipeline Can Boost Performance 3‑12× and Impress Interviewers
Open Source Tech Hub
Open Source Tech Hub
Nov 13, 2025 · Fundamentals

Why Heartbeat Mechanisms Are Critical for Distributed System Reliability

This article explains how periodic heartbeat messages enable distributed systems to detect node failures, choose appropriate intervals and timeouts, compare push and pull models, employ advanced detection algorithms like phi and gossip, and apply these concepts in real-world platforms such as Kubernetes, Cassandra, and etcd.

Failure DetectionSystem Monitoringdistributed systems
0 likes · 22 min read
Why Heartbeat Mechanisms Are Critical for Distributed System Reliability
IT Services Circle
IT Services Circle
Nov 11, 2025 · Backend Development

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

An e‑commerce order system acts as the core connector linking users, merchants, payments, logistics and revenue, and this article dissects its three essential flows—forward, reverse and state transitions—while detailing the technical challenges and solutions for order creation, payment, fulfillment, cancellation, after‑sale, architecture, and data handling.

High concurrencydistributed systemse-commerce
0 likes · 19 min read
How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System
NiuNiu MaTe
NiuNiu MaTe
Nov 5, 2025 · Backend Development

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

This article dissects the core processes, functional challenges, and architectural design of a high‑throughput, strongly consistent e‑commerce order system, covering forward and reverse flows, order creation, payment, fulfillment, cancellation, after‑sale handling, and the layered backend architecture that powers it.

High concurrencyMicroservicesbackend-architecture
0 likes · 21 min read
How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System
IT Architects Alliance
IT Architects Alliance
Nov 4, 2025 · Backend Development

Mastering Distributed Data Consistency: Strategies, Patterns, and Best Practices

This article explores the challenges of maintaining data consistency in distributed microservice architectures, covering CAP theory, consistency models, replication strategies, transaction patterns like Saga and TCC, tooling choices, monitoring practices, and actionable best‑practice recommendations.

CAP theoremData ConsistencyEvent Sourcing
0 likes · 13 min read
Mastering Distributed Data Consistency: Strategies, Patterns, and Best Practices
DevOps Coach
DevOps Coach
Oct 31, 2025 · Backend Development

How Netflix’s Maestro Engine Gained a 100× Speed Boost with a New Actor‑Based Architecture

Netflix’s Maestro workflow orchestrator was redesigned with a lightweight, stateful actor model and Java virtual threads, cutting engine overhead from seconds to milliseconds, delivering a hundred‑fold performance increase while preserving scalability, reliability, and strong execution guarantees for massive data and ML pipelines.

Java virtual threadsNetflix MaestroPerformance Optimization
0 likes · 28 min read
How Netflix’s Maestro Engine Gained a 100× Speed Boost with a New Actor‑Based Architecture
Top Architect
Top Architect
Oct 31, 2025 · Backend Development

Mastering Message Queues: A Deep Dive into RabbitMQ, RocketMQ, and Kafka

This comprehensive guide explains the core components, exchange types, TTL, confirm mechanisms, consumer ACK/NACK, dead‑letter queues, and high‑availability features of RabbitMQ, RocketMQ, and Kafka, while also covering load balancing, ordering, transaction handling, and best practices for reliable message delivery.

Backend DevelopmentMessage QueueRabbitMQ
0 likes · 32 min read
Mastering Message Queues: A Deep Dive into RabbitMQ, RocketMQ, and Kafka
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 29, 2025 · Big Data

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

Facing PB‑scale user behavior data and millions of feature dimensions, the platform transformed its search, advertising, and recommendation pipelines by adopting a distributed, configurable‑service architecture that delivers high‑throughput streaming, elastic storage, rapid feature iteration, and robust fault‑tolerance for AI‑driven personalization.

Big DataData ArchitectureReal-time Processing
0 likes · 17 min read
Revolutionizing Feature Engineering with Distributed Tech & Configurable Services
NiuNiu MaTe
NiuNiu MaTe
Oct 29, 2025 · Backend Development

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

This article walks through the end‑to‑end design of a leaderboard that must serve over 100 million users with 100 k queries per second, covering requirement clarification, real‑time and accuracy challenges, technology selection such as Redis ZSet, multi‑layer architecture, sharding, caching, monitoring, and practical implementation tips to achieve low latency, high consistency, and cost‑effective scalability.

Big DataLeaderboardReal-time
0 likes · 19 min read
How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls
Architect's Guide
Architect's Guide
Oct 28, 2025 · Backend Development

How to Prevent API Scraping in High‑Traffic Seckill Systems with Java

During high‑traffic flash‑sale events like Double 11, malicious users can flood seckill APIs, causing service collapse and inventory errors; this article explains the business pain points and presents a multi‑layer anti‑scraping solution—including rate limiting, behavior detection, captchas, request signing, token mechanisms, and asynchronous order processing—with concrete Java implementations.

API SecurityJavaSeckill
0 likes · 7 min read
How to Prevent API Scraping in High‑Traffic Seckill Systems with Java
Shepherd Advanced Notes
Shepherd Advanced Notes
Oct 24, 2025 · Backend Development

Why Choose Spring Boot + DelayQueue for a Custom Distributed Delayed-Task Queue?

The article systematically analyzes common distributed delayed‑task implementations—Redis ZSet scanning, message‑queue delay features, and Redis key‑expiration listeners—highlighting their pros, cons, and suitable scenarios, then proposes a Spring Boot + DelayQueue component to achieve precise timing, dynamic delays, and robust coordination.

DelayQueueDelayed TasksRedis
0 likes · 11 min read
Why Choose Spring Boot + DelayQueue for a Custom Distributed Delayed-Task Queue?
Huolala Tech
Huolala Tech
Oct 22, 2025 · Backend Development

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

To ensure fund safety and robust operations, the team built a real‑time reconciliation platform that leverages Kafka, and after encountering scaling bottlenecks with a static consumer model, they implemented a dynamic, partition‑level, weighted load‑balancing consumer cluster that supports automatic scaling and high‑throughput processing.

Dynamic ScalingReal-time Processingbackend-architecture
0 likes · 15 min read
Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters
dbaplus Community
dbaplus Community
Oct 16, 2025 · Backend Development

How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience

This article presents a step‑by‑step engineering guide for designing, evolving, and operating a high‑traffic open platform, covering three‑layer decoupled architecture, multi‑level caching, asynchronous message queues, distributed transaction models, high‑availability strategies, and phased rollout plans to sustain billions of daily API calls.

CachingHigh AvailabilityHigh concurrency
0 likes · 20 min read
How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience
NiuNiu MaTe
NiuNiu MaTe
Oct 16, 2025 · Backend Development

Prevent Service Avalanche: Circuit Breaker vs Degradation Strategies Explained

This article explains service avalanche in micro‑service chains, outlines its three failure stages, compares circuit‑breaker and degradation techniques, shows when to apply each, and provides practical guidance on tools like Sentinel and Resilience4j, testing, monitoring, and best‑practice configurations.

MicroservicesSentinelbackend reliability
0 likes · 11 min read
Prevent Service Avalanche: Circuit Breaker vs Degradation Strategies Explained