Tagged articles

2122 articles

Page 1 of 22

May 6, 2026 · Backend Development

Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It

The article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naïve cron‑based scans are unsuitable for tens of millions of rows, and presents three progressively robust solutions using Redis expiration, Redis ZSet polling, and message‑queue or time‑wheel architectures.

Delayed TaskDistributed SystemsMessage Queue

0 likes · 10 min read

Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It

Architect's Guide

May 1, 2026 · Backend Development

Senior Architects Reveal a Comprehensive Learning Roadmap for Aspiring System Designers

The article outlines a step‑by‑step learning system compiled by senior architects, covering skill foundations, source‑code analysis, distributed and microservice architectures, concurrency, performance tuning, essential Java tools, and a hands‑on e‑commerce project to help developers become well‑rounded architects.

Distributed SystemsJavaMicroservices

0 likes · 7 min read

Senior Architects Reveal a Comprehensive Learning Roadmap for Aspiring System Designers

ITPUB

Apr 25, 2026 · Interview Experience

How to Design a Billion‑Scale URL Shortening System for an Interview

This article walks through the complete interview‑style design of a billion‑scale URL shortener, covering requirements, capacity estimation, API definitions, database schema, short‑code generation algorithms, sharding, caching, load balancing, rate limiting, and expiration handling, while illustrating each step with concrete examples and calculations.

Distributed SystemsSystem DesignURL shortener

0 likes · 24 min read

How to Design a Billion‑Scale URL Shortening System for an Interview

FunTester

Apr 22, 2026 · Operations

Why Do Microservice E2E Tests Fail?

In microservice architectures, end‑to‑end tests often become flaky, slow, and untrustworthy because the assumptions of a stable, deterministic system clash with the reality of distributed, asynchronous services, leading to noisy failures, maintenance overhead, and delayed feedback.

Distributed SystemsMicroservicesTesting Strategy

0 likes · 12 min read

Java Backend Full-Stack

Apr 20, 2026 · Backend Development

What Skills Should a 3‑Year Java Backend Developer Master?

The article outlines a comprehensive skill matrix for a three‑year Java backend engineer, covering core Java and JVM knowledge, mainstream frameworks, storage, messaging, containerization, architecture, engineering practices, soft skills, and emerging trends such as AI integration and reactive programming.

Distributed SystemsDockerJVM

0 likes · 9 min read

What Skills Should a 3‑Year Java Backend Developer Master?

ITPUB

Apr 17, 2026 · Industry Insights

Why LinkedIn Dumped Kafka for Its Own ‘Northguard’ Streaming Engine

LinkedIn, the original home of Apache Kafka, abandoned the platform for a home‑grown system called Northguard, redesigning log storage, decentralizing metadata, and adding a virtualized Xinfra layer to handle trillions of daily events, while still acknowledging Kafka’s relevance for most companies.

Distributed SystemsInfrastructureKafka

0 likes · 7 min read

Why LinkedIn Dumped Kafka for Its Own ‘Northguard’ Streaming Engine

DataFunSummit

Apr 15, 2026 · Artificial Intelligence

How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony

Relax, an open‑source RL training engine built on Megatron‑LM and SGLang, tackles data heterogeneity, system fragility, and role coupling by using a service‑oriented fault‑tolerant architecture, asynchronous pipelines, and multimodal‑native support, achieving up to 76% end‑to‑end speedup over veRL.

AI InfrastructureDistributed SystemsRL training

0 likes · 11 min read

How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony

Java Tech Enthusiast

Apr 10, 2026 · Databases

16 Powerful Ways to Leverage Redis in Your Applications

This article presents sixteen practical Redis use cases—from simple caching and distributed sessions to global IDs, rate limiting, bitmaps, shopping carts, timelines, message queues, likes, tags, filtering, follow relationships, and ranking—each illustrated with commands and code snippets for real‑world backend development.

Backend DevelopmentData StructuresDistributed Systems

0 likes · 9 min read

16 Powerful Ways to Leverage Redis in Your Applications

Go Development Architecture Practice

Apr 7, 2026 · Big Data

How Kafka Powers Scalable E‑commerce Order Processing with Go

This article walks through the challenges of a fast‑growing e‑commerce platform during peak sales, explains why Apache Kafka is the ideal asynchronous messaging backbone, and provides a complete Go implementation—including producers, consumers, best‑practice patterns, and real‑world use cases—to achieve high throughput, fault tolerance, and seamless scalability.

Distributed SystemsMessage QueueSarama

0 likes · 14 min read

How Kafka Powers Scalable E‑commerce Order Processing with Go

Ray's Galactic Tech

Mar 31, 2026 · Artificial Intelligence

From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint

This comprehensive guide walks Go engineers through the evolution from a prototype Retrieval‑Augmented Generation (RAG) service to a production‑grade, distributed AI platform, covering architecture, component boundaries, caching strategies, async indexing, observability, security, and step‑by‑step deployment.

AI ArchitectureBackend DevelopmentDistributed Systems

0 likes · 42 min read

From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint

Tech Freedom Circle

Mar 25, 2026 · Backend Development

Cracking Alibaba’s 10M Orders Interview: Architecture Seven‑Suite + Heterogeneous Storage Solution

The article dissects Alibaba’s second‑round interview question on handling 10 million daily order queries, exposing why a single sharding answer fails and presenting a comprehensive architecture‑seven‑suite combined with heterogeneous storage (MySQL, HBase, ClickHouse, ES, Redis, MQ) to achieve high concurrency, low latency, and reliable data consistency.

Backend ArchitectureDistributed SystemsInterview Preparation

0 likes · 40 min read

Cracking Alibaba’s 10M Orders Interview: Architecture Seven‑Suite + Heterogeneous Storage Solution

Mike Chen's Internet Architecture

Mar 23, 2026 · Backend Development

Designing a Million‑QPS Multi‑Level Cache Architecture

This article outlines a multi‑level cache system for handling over a million QPS, detailing the architecture from client to database, key components like Caffeine and Redis Cluster, and providing concrete code examples for read‑through and write‑through flows.

CaffeineDistributed SystemsRedis Cluster

0 likes · 5 min read

Designing a Million‑QPS Multi‑Level Cache Architecture

dbaplus Community

Mar 17, 2026 · Backend Development

18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges

This article examines eighteen concrete production systems—from URL shorteners and Amazon S3 to YouTube, Stripe, Slack, and ChatGPT—showing how their design choices illustrate core concepts such as sharding, caching, idempotency, real‑time messaging, and large‑scale engineering, providing a practical roadmap for software engineers.

Case StudiesDistributed SystemsScalability

0 likes · 13 min read

18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges

Alibaba Cloud Infrastructure

Mar 16, 2026 · Artificial Intelligence

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Agentic reinforcement learning is evolving from simple text generation to complex, scalable agents, but large‑scale deployment faces challenges like massive parallel rollout scheduling and reproducible environments; this article presents a decoupled T‑architecture that separates high‑level RL logic (Verl) from execution orchestration (Argo Workflows) to address these issues.

Argo WorkflowsDistributed SystemsScalable Reinforcement Learning

0 likes · 10 min read

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

mikechen

Mar 12, 2026 · Big Data

How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive

This article explains how Kafka’s sequential disk writes, zero‑copy data path, partition‑based parallelism, and configurable broker and partition settings enable linear‑scale throughput that can reach millions of transactions per second in large‑scale streaming systems.

Distributed SystemsPartitioningThroughput

0 likes · 5 min read

How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive

dbaplus Community

Mar 5, 2026 · Backend Development

How to Ensure Message Order in Kafka: From Basics to Advanced Solutions

This article explains the concept of message ordering in distributed systems, details how Kafka stores messages in partitions, compares global and partial ordering, evaluates single‑partition, asynchronous, and multi‑partition solutions—including handling data skew and partition expansion—and provides a practical interview guide.

BackendDistributed SystemsKafka

0 likes · 22 min read

How to Ensure Message Order in Kafka: From Basics to Advanced Solutions

Linux Cloud Computing Practice

Mar 5, 2026 · Backend Development

Comprehensive Kafka Study Guide: Core Concepts, Architecture, and Interview Questions

This article compiles essential Kafka fundamentals, architectural details, and a thorough set of interview questions ranging from basic to advanced topics, providing a concise yet complete resource for developers and engineers looking to master this distributed messaging platform.

BackendDistributed SystemsKafka

0 likes · 7 min read

Comprehensive Kafka Study Guide: Core Concepts, Architecture, and Interview Questions

Lobster Programming

Mar 2, 2026 · Backend Development

Which Inventory Deduction Strategy Is Best for E‑Commerce? Order, Payment, or Pre‑Deduction Explained

This article examines three e‑commerce inventory deduction methods—order‑time, payment‑time, and pre‑deduction—detailing their mechanisms, advantages, drawbacks, and suitable scenarios to help developers choose the optimal strategy for high‑concurrency sales.

Distributed Systemsdeductioninventory

0 likes · 7 min read

Which Inventory Deduction Strategy Is Best for E‑Commerce? Order, Payment, or Pre‑Deduction Explained

AI Waka

Feb 22, 2026 · Industry Insights

Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It

The article explains why naïve multi‑agent AI architectures collapse under load due to internal east‑west dependencies, and shows how applying 12‑Factor App and cloud‑native patterns—isolated workers, externalized state, short‑lived sessions, and strict orchestration—enable scalable, fault‑tolerant agentic systems.

12-factorCloud NativeDistributed Systems

0 likes · 17 min read

Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It

Coder Trainee

Feb 19, 2026 · Fundamentals

Why Switch from UUID to ULID? Exploring Benefits and Features

The article explains why developers are moving from UUID to ULID, detailing ULID’s 128‑bit compatibility, massive per‑millisecond uniqueness, lexicographic sorting, compact 26‑character Base32 encoding, timestamp integration, strong randomness, and practical use cases such as distributed systems and database sharding.

Base32Distributed SystemsULID

0 likes · 3 min read

Why Switch from UUID to ULID? Exploring Benefits and Features

Java Architect Handbook

Feb 12, 2026 · Backend Development

How to Guarantee Zero Message Loss in RocketMQ – Full‑Lifecycle Best Practices

This article breaks down the interview focus points, core answer, deep analysis, code examples, and common pitfalls for ensuring RocketMQ messages never get lost, covering producer, broker, and consumer configurations, trade‑offs, and practical troubleshooting steps.

BackendDistributed SystemsInterview Preparation

0 likes · 11 min read

How to Guarantee Zero Message Loss in RocketMQ – Full‑Lifecycle Best Practices

Code Wrench

Feb 12, 2026 · Backend Development

Why Bidirectional Streaming in gRPC Is More Than a Pipe – A Deep Dive into grpc-go

This article explores how gRPC bidirectional streaming transforms a simple data pipe into a conversational session by examining the underlying HTTP/2 mechanics, shared state machines, flow‑control strategies, practical patterns, and common pitfalls in grpc-go implementations.

Bidirectional StreamingDistributed SystemsFlow Control

0 likes · 16 min read

Why Bidirectional Streaming in gRPC Is More Than a Pipe – A Deep Dive into grpc-go

ITPUB

Feb 11, 2026 · Backend Development

How to Guarantee Zero Message Loss in MQ Systems: A Full‑Lifecycle Design

This guide explains why guaranteeing 100% message reliability in MQ is a critical system‑design interview topic and presents a three‑layer architecture—production, storage, and consumption—detailing ACK settings, local message tables, broker replication, leader election safeguards, manual offset commits, and idempotent processing to prevent any message loss.

AcknowledgmentDistributed SystemsIdempotency

0 likes · 11 min read

How to Guarantee Zero Message Loss in MQ Systems: A Full‑Lifecycle Design

Architecture Digest

Jan 30, 2026 · Backend Development

How Hera Transforms SpringBoot Logging: A Step‑by‑Step Integration Guide

Integrating the Hera log platform into SpringBoot resolves common distributed‑system logging pain points—centralized storage, full‑trace linkages, and cost‑effective retention—by adding a non‑intrusive agent, configuring custom fields, enabling trace IDs, and providing a web console for rapid, multi‑service debugging and analysis.

Distributed SystemsHeraObservability

0 likes · 14 min read

How Hera Transforms SpringBoot Logging: A Step‑by‑Step Integration Guide

AntTech

Jan 30, 2026 · Databases

Award-Winning Papers Reveal Databases, AI Typography, and Financial Benchmarks

Three award‑winning papers—OceanBase’s unitized database architecture for billion‑scale map services, a video‑diffusion‑based dynamic typography system that animates text semantically, and the FinBench LDBC financial graph benchmark—are examined, highlighting their design, experimental results, and impact on industry applications.

AIDistributed SystemsGraph Benchmark

0 likes · 6 min read

Award-Winning Papers Reveal Databases, AI Typography, and Financial Benchmarks

Java Architect Handbook

Jan 28, 2026 · Databases

How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write

This article explains the Redis split‑brain problem that can occur in master‑replica clusters, outlines the interview points interviewers look for, and provides a detailed solution using the min‑replicas‑to‑write (or min‑slaves‑to‑write) configuration to sacrifice write availability for data consistency, along with best‑practice recommendations and common pitfalls.

ConfigurationDistributed SystemsSplit-Brain

0 likes · 12 min read

How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write

AI Waka

Jan 26, 2026 · Industry Insights

Why Traditional Software Architecture Fails at Scale and How Message‑Based Design Solves It

The article examines the fifty‑year gap between Alan Kay's biologically‑inspired object model and Roy Fielding's REST constraints, explains why mainstream OOP and microservices fall short, and presents a message‑fabric architecture with bindable components, moderators, and assertion‑driven development that finally delivers scalable, autonomous enterprise systems.

Distributed SystemsMicroservicesSoftware Architecture

0 likes · 22 min read

Why Traditional Software Architecture Fails at Scale and How Message‑Based Design Solves It

Architect's Guide

Jan 24, 2026 · Fundamentals

Why Our Custom Snowflake ID Generator Failed and How to Fix It

A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake algorithm; this article reviews the standard Snowflake structure, dissects the custom implementation’s critical mistakes—short timestamp, IP‑based business ID, zeroed worker and data‑center IDs—and offers best‑practice recommendations, including using mature libraries and proper worker‑ID strategies.

Distributed SystemsID generationJava

0 likes · 7 min read

Why Our Custom Snowflake ID Generator Failed and How to Fix It

Architect's Guide

Jan 22, 2026 · Big Data

Unlock Kafka’s Power: Core Concepts, High‑Performance Architecture & Real‑World Scaling Tips

This comprehensive guide explores Kafka’s core value as a message queue, explains producers, consumers, topics, partitions, and replication, dives into cluster architecture, zero‑copy I/O, resource planning for disks, memory, CPU and network, and provides practical configuration, consumer‑group management, and operational tooling tips for building high‑throughput, highly available Kafka deployments.

Distributed SystemsKafkaMessage Queue

0 likes · 31 min read

Unlock Kafka’s Power: Core Concepts, High‑Performance Architecture & Real‑World Scaling Tips

Top Architect

Jan 17, 2026 · Backend Development

Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works

Faced with limitations of existing tools like Quartz, XXL-Job, and PowerJob, the author explains the motivation for creating a custom scheduling framework, describes its architecture—including gRPC communication, protobuf serialization, a self-implemented name server for load balancing, a simple message queue, and time-wheel scheduling—provides code examples, and shares diagrams of discovery and dispatch processes.

Distributed SystemsJavaMessage Queue

0 likes · 17 min read

Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works

Mingyi World Elasticsearch

Jan 15, 2026 · Big Data

Why Elasticsearch Tokenizers Are on the Soft Exam and How to Master Them

The article breaks down the four Elasticsearch tokenizers tested in the latest Soft Exam, explains their behavior with concrete examples, discusses why search technology is now essential for architects, and predicts future exam trends, offering practical study guidance.

Distributed SystemsElasticsearchExam Preparation

0 likes · 9 min read

Why Elasticsearch Tokenizers Are on the Soft Exam and How to Master Them

Tech Freedom Circle

Jan 15, 2026 · Backend Development

Kafka Rebalance Storm Crushed 120k QPS in JD Interview – How to Understand and Fix

In a JD senior Java architect interview, a Kafka consumer‑group rebalance storm caused QPS to drop from 120k to zero, triggering massive message loss and latency spikes, and the article walks through the rebalance fundamentals, failure causes, impact analysis, cooperative sticky assignor migration, and comprehensive monitoring and mitigation strategies.

Distributed SystemsKafkaconsumer-group

0 likes · 28 min read

Kafka Rebalance Storm Crushed 120k QPS in JD Interview – How to Understand and Fix

ITFLY8 Architecture Home

Jan 12, 2026 · Databases

Designing Scalable Order Sharding for Millions of Daily Transactions

This article outlines a practical sharding strategy for e‑commerce order systems, estimating future load, detailing user‑centric partitioning, heterogeneous designs for merchants and operators, and migration steps to achieve high‑concurrency writes and massive storage without downtime.

Data MigrationDistributed SystemsOrder Management

0 likes · 4 min read

Designing Scalable Order Sharding for Millions of Daily Transactions

dbaplus Community

Jan 7, 2026 · Backend Development

Why Our Custom Snowflake ID Collided and How to Build a Reliable Generator

A recent production incident caused duplicate order IDs due to a flawed custom Snowflake implementation, prompting a deep dive into the standard algorithm, analysis of the mistakes, and a set of best‑practice recommendations for designing robust distributed ID generators.

Design PatternsDistributed SystemsID generation

0 likes · 7 min read

Why Our Custom Snowflake ID Collided and How to Build a Reliable Generator

Tech Freedom Circle

Jan 6, 2026 · Backend Development

Why Choose RocketMQ Over Kafka? The Real Reasons Behind the 90% Mistake

This article dissects a common interview question about Kafka's higher throughput versus RocketMQ's richer features, explains the underlying design philosophies, storage models, I/O paths, scaling limits, real‑world use cases such as transaction, delayed and ordered messages, and provides concrete optimization steps and code samples to help engineers make an informed messaging platform choice.

Distributed SystemsJavaKafka

0 likes · 42 min read

Why Choose RocketMQ Over Kafka? The Real Reasons Behind the 90% Mistake

ITPUB

Jan 3, 2026 · Backend Development

How to Build a Scalable Order Cancellation System: 3 Advanced Delayed‑Task Solutions

This article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naive cron jobs fail at scale, and presents three robust backend designs—Redis ZSet polling, message‑queue delayed messages, and time‑wheel timers—along with practical code snippets and pitfalls to avoid.

Backend ArchitectureDistributed SystemsInterview Preparation

0 likes · 11 min read

How to Build a Scalable Order Cancellation System: 3 Advanced Delayed‑Task Solutions

DeWu Technology

Dec 29, 2025 · Backend Development

Unveiling RocketMQ: A Deep Dive into Its Architecture and Performance Secrets

This comprehensive guide explores RocketMQ’s four‑component architecture, storage formats, routing mechanisms, write‑and‑read workflows, high‑availability designs, performance optimizations, and a side‑by‑side comparison with Kafka, providing practical insights for building robust distributed messaging systems.

Distributed SystemsMessage QueueRocketMQ

0 likes · 28 min read

Unveiling RocketMQ: A Deep Dive into Its Architecture and Performance Secrets

JavaGuide

Dec 25, 2025 · Interview Experience

How I Secured Offers from Top Tech Companies in 80 Days

The author, a non‑elite undergraduate and a modest 211 master’s graduate, shares a step‑by‑step 80‑day crash‑course that turned zero Java experience into multiple offers from major tech firms, emphasizing fundamental understanding, AI‑assisted learning, and thoughtful project trade‑offs.

AI-assisted LearningAlgorithm PreparationDistributed Systems

0 likes · 8 min read

How I Secured Offers from Top Tech Companies in 80 Days

Architect Chen

Dec 25, 2025 · Information Security

Understanding Single Sign-On (SSO): Architecture, Components, and Workflow

This article explains the fundamentals of Single Sign-On (SSO), detailing its centralized authentication principle, the roles of CAS Server, CAS Client, and browser, and walks through the complete login flow with diagrams and code snippets for distributed systems.

AuthenticationCASDistributed Systems

0 likes · 4 min read

Understanding Single Sign-On (SSO): Architecture, Components, and Workflow

Mike Chen's Internet Architecture

Dec 25, 2025 · Backend Development

How to Resolve Kafka Backlog Under High Load: Practical Tips

This article explains why Kafka experiences message backlog in high‑load environments, identifies producer‑consumer speed mismatches, I/O and resource bottlenecks, and offers concrete strategies such as scaling consumers, tuning hardware, and adjusting Kafka configurations to eliminate the backlog.

BacklogDistributed SystemsKafka

0 likes · 4 min read

How to Resolve Kafka Backlog Under High Load: Practical Tips

Code Ape Tech Column

Dec 19, 2025 · Backend Development

Boost SpringBoot Log Management: Step‑by‑Step Integration with Hera

This article explains why traditional SpringBoot logging falls short, introduces the Hera log platform’s three core benefits, outlines a layered integration architecture, and provides a detailed five‑step guide—including Maven dependencies, YAML configuration, custom field providers, log output, traceability, and console usage—plus performance, high‑availability, security tips and common pitfalls.

Distributed SystemsHeraLog Management

0 likes · 14 min read

Boost SpringBoot Log Management: Step‑by‑Step Integration with Hera

Woodpecker Software Testing

Dec 18, 2025 · Operations

Mastering Distributed Quantum Node Configuration with Goss: The Ultimate Guide

This guide shows how to use the YAML‑based Goss tool to install, configure, and run automated validation, monitoring, and batch testing of distributed quantum nodes, covering templates, output formats, real‑world scenarios, and best‑practice recommendations.

Distributed SystemsGossQuantum Internet

0 likes · 5 min read

Mastering Distributed Quantum Node Configuration with Goss: The Ultimate Guide

Java Architect Handbook

Dec 14, 2025 · Backend Development

Why Our Custom Snowflake ID Failed and How to Build a Reliable One

A recent production incident revealed that a self‑developed Snowflake‑style ID generator caused duplicate order numbers due to a truncated timestamp, unsafe IP‑based business IDs, and unconfigured worker and data‑center IDs, prompting a detailed analysis of the standard algorithm, the flaws in the custom design, and best‑practice recommendations for robust ID generation.

BackendDistributed SystemsID generation

0 likes · 9 min read

Why Our Custom Snowflake ID Failed and How to Build a Reliable One

Tencent Cloud Middleware

Dec 12, 2025 · Artificial Intelligence

How A2A over MQTT Transforms AI Agent Collaboration

This article explains the challenges of traditional point‑to‑point AI agent communication, introduces the A2A protocol and its limitations, and details how combining A2A with MQTT via Tencent Cloud TDMQ creates a dynamic, loosely‑coupled, and scalable solution with practical SDK examples and real‑world case studies.

A2A protocolAI agentsDistributed Systems

0 likes · 16 min read

How A2A over MQTT Transforms AI Agent Collaboration

Mike Chen's Internet Architecture

Dec 9, 2025 · Backend Development

Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips

This article explains what high concurrency means for Kafka, outlines key performance metrics such as QPS, TPS, throughput and latency, and provides concrete configuration and architectural techniques—including broker optimization, horizontal scaling, network batching, and zero‑copy—to achieve write rates exceeding one million records per second.

BackendDistributed SystemsKafka

0 likes · 4 min read

Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips

Java Architect Handbook

Dec 9, 2025 · Industry Insights

Why Microservices May Be Overhyped: Tracing Their Real Roots and Myths

The article first lists a series of Java learning projects and community benefits, then critically examines the widely touted advantages of microservices, showing how many of those claims originate from older technologies, debunking common myths, and concluding that microservices are essentially just modular code.

Distributed SystemsIndustry analysisMicroservices

0 likes · 16 min read

Why Microservices May Be Overhyped: Tracing Their Real Roots and Myths

JD Cloud Developers

Dec 8, 2025 · Fundamentals

Why Raft Guarantees Linear Consistency in Unreliable Networks

This article explains how unreliable networks, clock instability, and node failures can cause data inconsistency in distributed clusters, introduces the Raft consensus algorithm, details its roles, election process, log replication, read/write handling, consistency models, and mechanisms to avoid split-brain and livelock.

ConsensusConsistencyDistributed Systems

0 likes · 13 min read

Why Raft Guarantees Linear Consistency in Unreliable Networks

Ctrip Technology

Dec 5, 2025 · Databases

How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

This article explains the design and implementation of Ctrip's Data Replication Center (DRC), a MySQL‑based high‑availability system that solves cross‑region data loop, progress tracking, concurrency, DDL handling, and conflict resolution to achieve low‑latency, reliable data replication for global travel services.

Distributed SystemsGTIDcross-region

0 likes · 21 min read

How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

Architect Chen

Dec 3, 2025 · Big Data

Mastering Kafka High Concurrency: Practical Configurations for Million‑TPS Throughput

This guide explains what constitutes high concurrency for Kafka, presents throughput benchmarks, and provides detailed broker‑level configuration tips—including partition planning, producer batching, storage optimization, and zero‑copy settings—to achieve scalable, low‑latency message processing.

Distributed SystemsKafkaThroughput

0 likes · 4 min read

Mastering Kafka High Concurrency: Practical Configurations for Million‑TPS Throughput

Code Wrench

Nov 26, 2025 · Backend Development

Unlocking Olric’s High‑Performance Network Protocol and RPC Mechanism

This article dives deep into Olric’s network communication architecture and RPC mechanism, explaining its layered transport design, request/response structures, pipeline and batch processing, client‑to‑cluster interactions, data migration and rebalancing, and provides Go code examples illustrating high‑throughput, safe distributed operations.

Distributed SystemsGoOlric

0 likes · 6 min read

Unlocking Olric’s High‑Performance Network Protocol and RPC Mechanism

Code Wrench

Nov 24, 2025 · Backend Development

What Makes Olric’s Go Architecture a Masterclass in Distributed KV Design

This article explores Olric, a pure‑Go distributed key‑value engine, detailing its dual embedded/stand‑alone mode, clean three‑layer architecture, core data structures, and engineering choices that illustrate best practices for building high‑performance, maintainable backend systems.

Distributed SystemsGoKV Store

0 likes · 10 min read

What Makes Olric’s Go Architecture a Masterclass in Distributed KV Design

JD Retail Technology

Nov 21, 2025 · Databases

Why JED’s Lock Mechanism Caused Data Loss and How Distributed Locks Can Fix It

An in‑depth post‑mortem of a JED database incident reveals how its lock matrix and MVCC isolation caused metric data loss, explains the underlying lock granularity, transaction isolation levels, and MVCC visibility rules, and proposes short‑term distributed‑lock and long‑term read‑calc‑write solutions.

Distributed SystemsMVCCdatabases

0 likes · 21 min read

Why JED’s Lock Mechanism Caused Data Loss and How Distributed Locks Can Fix It

Architect's Guide

Nov 21, 2025 · Backend Development

Mastering Apollo: A Deep Dive into Ctrip’s Open‑Source Distributed Configuration Center

This article walks through the concepts, architecture, and hands‑on steps for using Apollo, Ctrip’s open‑source distributed configuration center, covering project setup, Spring Boot integration, dynamic updates, clustering, namespaces, high‑availability design, and Kubernetes deployment.

ApolloConfiguration ManagementDistributed Systems

0 likes · 25 min read

Mastering Apollo: A Deep Dive into Ctrip’s Open‑Source Distributed Configuration Center

Xiaokun's Architecture Exploration Notes

Nov 16, 2025 · Backend Development

How to Choose and Implement Architecture Contracts for Distributed Systems

This article explains why architecture‑level contract decisions are needed in distributed systems, compares strict and loose data contracts, illustrates schema‑on‑read/write patterns, and shows how to ensure forward and backward compatibility when evolving protocols such as JSON and Protobuf.

Distributed SystemsProtobufarchitecture contracts

0 likes · 11 min read

How to Choose and Implement Architecture Contracts for Distributed Systems

Tech Freedom Circle

Nov 16, 2025 · Databases

How Redis Pipeline Can Boost Performance 3‑12× and Impress Interviewers

This article explains Redis Pipeline’s core principle of batching commands to reduce network round‑trips, presents benchmark data showing up to 17‑fold speedups, details real‑world use cases such as cache warm‑up, heartbeat reporting, and high‑traffic events, and provides best‑practice guidelines on batch sizing, error handling, cluster constraints, and comparisons with transactions and Lua scripts.

Batch ProcessingBenchmarkDistributed Systems

0 likes · 36 min read

How Redis Pipeline Can Boost Performance 3‑12× and Impress Interviewers

Open Source Tech Hub

Nov 13, 2025 · Fundamentals

Why Heartbeat Mechanisms Are Critical for Distributed System Reliability

This article explains how periodic heartbeat messages enable distributed systems to detect node failures, choose appropriate intervals and timeouts, compare push and pull models, employ advanced detection algorithms like phi and gossip, and apply these concepts in real-world platforms such as Kubernetes, Cassandra, and etcd.

Distributed SystemsFailure DetectionGossip Protocol

0 likes · 22 min read

Why Heartbeat Mechanisms Are Critical for Distributed System Reliability

IT Services Circle

Nov 11, 2025 · Backend Development

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

An e‑commerce order system acts as the core connector linking users, merchants, payments, logistics and revenue, and this article dissects its three essential flows—forward, reverse and state transitions—while detailing the technical challenges and solutions for order creation, payment, fulfillment, cancellation, after‑sale, architecture, and data handling.

Distributed Systemse‑commercehigh concurrency

0 likes · 19 min read

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

Mike Chen's Internet Architecture

Nov 7, 2025 · Backend Development

Understanding Cache Avalanche: Causes and Effective Mitigation Strategies

This article explains what a cache avalanche is, why it occurs in distributed systems, and presents practical mitigation techniques such as randomized expiration, proactive pre‑warming, load protection, and multi‑level caching to prevent system crashes.

CacheDistributed Systemsperformance

0 likes · 4 min read

Understanding Cache Avalanche: Causes and Effective Mitigation Strategies

NiuNiu MaTe

Nov 5, 2025 · Backend Development

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

This article dissects the core processes, functional challenges, and architectural design of a high‑throughput, strongly consistent e‑commerce order system, covering forward and reverse flows, order creation, payment, fulfillment, cancellation, after‑sale handling, and the layered backend architecture that powers it.

Backend ArchitectureDistributed SystemsMicroservices

0 likes · 21 min read

IT Architects Alliance

Nov 4, 2025 · Backend Development

Mastering Distributed Data Consistency: Strategies, Patterns, and Best Practices

This article explores the challenges of maintaining data consistency in distributed microservice architectures, covering CAP theory, consistency models, replication strategies, transaction patterns like Saga and TCC, tooling choices, monitoring practices, and actionable best‑practice recommendations.

CAP theoremData ConsistencyDistributed Systems

0 likes · 13 min read

Mastering Distributed Data Consistency: Strategies, Patterns, and Best Practices

DevOps Coach

Oct 31, 2025 · Backend Development

How Netflix’s Maestro Engine Gained a 100× Speed Boost with a New Actor‑Based Architecture

Netflix’s Maestro workflow orchestrator was redesigned with a lightweight, stateful actor model and Java virtual threads, cutting engine overhead from seconds to milliseconds, delivering a hundred‑fold performance increase while preserving scalability, reliability, and strong execution guarantees for massive data and ML pipelines.

Distributed SystemsJava virtual threadsNetflix Maestro

0 likes · 28 min read

How Netflix’s Maestro Engine Gained a 100× Speed Boost with a New Actor‑Based Architecture

Top Architect

Oct 31, 2025 · Backend Development

Mastering Message Queues: A Deep Dive into RabbitMQ, RocketMQ, and Kafka

This comprehensive guide explains the core components, exchange types, TTL, confirm mechanisms, consumer ACK/NACK, dead‑letter queues, and high‑availability features of RabbitMQ, RocketMQ, and Kafka, while also covering load balancing, ordering, transaction handling, and best practices for reliable message delivery.

Backend DevelopmentDistributed SystemsKafka

0 likes · 32 min read

Mastering Message Queues: A Deep Dive into RabbitMQ, RocketMQ, and Kafka

Instant Consumer Technology Team

Oct 29, 2025 · Big Data

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

Facing PB‑scale user behavior data and millions of feature dimensions, the platform transformed its search, advertising, and recommendation pipelines by adopting a distributed, configurable‑service architecture that delivers high‑throughput streaming, elastic storage, rapid feature iteration, and robust fault‑tolerance for AI‑driven personalization.

Big DataData ArchitectureDistributed Systems

0 likes · 17 min read

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

NiuNiu MaTe

Oct 29, 2025 · Backend Development

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

This article walks through the end‑to‑end design of a leaderboard that must serve over 100 million users with 100 k queries per second, covering requirement clarification, real‑time and accuracy challenges, technology selection such as Redis ZSet, multi‑layer architecture, sharding, caching, monitoring, and practical implementation tips to achieve low latency, high consistency, and cost‑effective scalability.

Big DataDistributed SystemsReal-Time

0 likes · 19 min read

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

Radish, Keep Going!

Oct 28, 2025 · Big Data

How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse

Netflix processes over 5 PB of logs daily, handling millions of events per second, and by layering hot and cold storage, using a custom lexer for fingerprinting, native protocol serialization, and sharded tag maps, they reduced query latency from seconds to sub‑second levels with ClickHouse.

Big DataClickHouseDistributed Systems

0 likes · 8 min read

How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse

Architect's Guide

Oct 28, 2025 · Backend Development

How to Prevent API Scraping in High‑Traffic Seckill Systems with Java

During high‑traffic flash‑sale events like Double 11, malicious users can flood seckill APIs, causing service collapse and inventory errors; this article explains the business pain points and presents a multi‑layer anti‑scraping solution—including rate limiting, behavior detection, captchas, request signing, token mechanisms, and asynchronous order processing—with concrete Java implementations.

API SecurityCaptchaDistributed Systems

0 likes · 7 min read

How to Prevent API Scraping in High‑Traffic Seckill Systems with Java

Ray's Galactic Tech

Oct 22, 2025 · Backend Development

Why Is Kafka So Fast? Deep Dive into Its Core Design and Performance Philosophy

Kafka achieves its remarkable speed through a combination of sequential disk I/O, zero‑copy networking, OS page‑cache usage, efficient batching, compression, partitioned parallelism, and a minimalist log format, each design choice synergistically boosting throughput while keeping latency low.

DesignDistributed SystemsKafka

0 likes · 7 min read

Why Is Kafka So Fast? Deep Dive into Its Core Design and Performance Philosophy

Huolala Tech

Oct 22, 2025 · Backend Development

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

To ensure fund safety and robust operations, the team built a real‑time reconciliation platform that leverages Kafka, and after encountering scaling bottlenecks with a static consumer model, they implemented a dynamic, partition‑level, weighted load‑balancing consumer cluster that supports automatic scaling and high‑throughput processing.

Backend ArchitectureDistributed SystemsDynamic Scaling

0 likes · 15 min read

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

dbaplus Community

Oct 16, 2025 · Backend Development

How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience

This article presents a step‑by‑step engineering guide for designing, evolving, and operating a high‑traffic open platform, covering three‑layer decoupled architecture, multi‑level caching, asynchronous message queues, distributed transaction models, high‑availability strategies, and phased rollout plans to sustain billions of daily API calls.

Distributed SystemsOpen Platformcaching

0 likes · 20 min read

How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience

NiuNiu MaTe

Oct 16, 2025 · Backend Development

Prevent Service Avalanche: Circuit Breaker vs Degradation Strategies Explained

This article explains service avalanche in micro‑service chains, outlines its three failure stages, compares circuit‑breaker and degradation techniques, shows when to apply each, and provides practical guidance on tools like Sentinel and Resilience4j, testing, monitoring, and best‑practice configurations.

Distributed SystemsMicroservicesbackend reliability

0 likes · 11 min read

Prevent Service Avalanche: Circuit Breaker vs Degradation Strategies Explained

BirdNest Tech Talk

Oct 12, 2025 · Artificial Intelligence

What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?

The article uses a whimsical journey to illustrate how token data is dispatched across GPU clusters—detailing functions like get_dispatch_layout, notify_dispatch, and combine_token, showing RDMA and NVLink pathways, performance experiments, and the final verification of token integrity.

AIDistributed SystemsGPU

0 likes · 5 min read

What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?

IT Architects Alliance

Oct 10, 2025 · Information Security

How to Secure Distributed Permissions: Zero Trust Strategies & Code

This article examines the exponential growth of permission complexity in micro‑service architectures, outlines zero‑trust design principles, and provides concrete Java and YAML implementations for fine‑grained, context‑aware access control, caching, dynamic evaluation, and audit monitoring.

Distributed SystemsSecurityZero Trust

0 likes · 11 min read

How to Secure Distributed Permissions: Zero Trust Strategies & Code

DataFunSummit

Oct 5, 2025 · Artificial Intelligence

How Bilibili Uses LLM‑Powered Assistants to Tackle Big‑Data Task Failures

Bilibili’s massive video platform relies on a five‑layer, storage‑compute separated big‑data architecture, handling hundreds of thousands of daily tasks, and now leverages large‑language‑model assistants to automatically diagnose and resolve frequent task failures and performance slowdowns.

AI assistanceBilibiliDistributed Systems

0 likes · 4 min read

How Bilibili Uses LLM‑Powered Assistants to Tackle Big‑Data Task Failures

ITPUB

Oct 5, 2025 · Backend Development

How to Clear a 10‑Million‑Message Queue in 5 Hours: A Five‑Step Rescue Plan

When a flash‑sale causes a 10 million‑message backlog and consumers only process 200 messages per second, this guide shows a five‑step, 5‑hour strategy—horizontal scaling, message downgrade, flow control, temporary dump, and parallel blasting—to restore throughput and prevent system collapse.

Distributed SystemsKafkaPerformance Optimization

0 likes · 6 min read

How to Clear a 10‑Million‑Message Queue in 5 Hours: A Five‑Step Rescue Plan

Data Party THU

Sep 30, 2025 · Backend Development

Ray Serve vs Celery: Which Is Best for GPU‑Intensive Parallel Workloads?

This article compares Ray Serve and Celery, explaining their design philosophies, scaling models, GPU‑aware scheduling, operational trade‑offs, and real‑world case studies to help engineers choose the right tool for high‑throughput online inference or large‑scale batch processing.

Distributed SystemsGPUModel Serving

0 likes · 9 min read

Ray Serve vs Celery: Which Is Best for GPU‑Intensive Parallel Workloads?

Xiaokun's Architecture Exploration Notes

Sep 27, 2025 · Databases

How Version Vectors Resolve Conflicts in Multi‑Leader and Leaderless Replication

This article explains why version vectors are needed in multi‑leader and leaderless replication, describes their implementation and comparison rules, and presents practical conflict‑resolution strategies—including custom resolvers, last‑write‑wins, read‑repair, and request rejection—supported by Java pseudocode and diagrams.

Distributed SystemsMulti-LeaderReplication

0 likes · 16 min read

How Version Vectors Resolve Conflicts in Multi‑Leader and Leaderless Replication

Ray's Galactic Tech

Sep 26, 2025 · Backend Development

How to Seamlessly Integrate Dubbo with Spring Boot for Scalable Distributed Services

This guide walks you through the concepts, environment setup, step‑by‑step integration, advanced configurations, monitoring, troubleshooting, and best practices for combining Dubbo's high‑performance RPC framework with Spring Boot to build production‑grade distributed Java services.

Distributed SystemsDubboJava

0 likes · 6 min read

How to Seamlessly Integrate Dubbo with Spring Boot for Scalable Distributed Services

Tech Freedom Circle

Sep 25, 2025 · Operations

RAGFlow Link Tracing: GPS‑Style Observability for LLM‑Powered Applications

The article explains why RAGFlow needs end‑to‑end link tracing, introduces OpenTelemetry’s core concepts, shows how custom tracing utilities are implemented in Python, describes the layered architecture, provides concrete Docker and YAML configurations, and offers best‑practice guidelines for performance monitoring and fault diagnosis.

Distributed SystemsLLMObservability

0 likes · 24 min read

RAGFlow Link Tracing: GPS‑Style Observability for LLM‑Powered Applications

Tech Freedom Circle

Sep 24, 2025 · Backend Development

Designing a US Presidential Election Voting System: 1M TPS, 10M QPS, Immutable and Non‑Duplicate Votes

This article presents a comprehensive architectural design for a high‑throughput US presidential voting platform that must handle 1 million transactions per second and 10 million queries per second while guaranteeing vote immutability, one‑person‑one‑vote enforcement, real‑time result aggregation, and scalable storage using microservices, Kafka, Redis, Bloom filters, and blockchain anchoring.

BlockchainDistributed SystemsIdempotency

0 likes · 32 min read

Designing a US Presidential Election Voting System: 1M TPS, 10M QPS, Immutable and Non‑Duplicate Votes

Architecture Digest

Sep 23, 2025 · Backend Development

How to Ensure Zero Message Loss in Kafka: Proven Strategies for High‑Reliability Systems

This article explains Kafka's storage architecture, identifies three major message‑loss scenarios across production, storage, and consumption, and provides practical end‑to‑end configurations, detection methods, and business‑level patterns to achieve near‑zero message loss in high‑concurrency distributed systems.

Data ConsistencyDistributed SystemsKafka

0 likes · 13 min read

How to Ensure Zero Message Loss in Kafka: Proven Strategies for High‑Reliability Systems

Baidu Intelligent Cloud Tech Hub

Sep 22, 2025 · Cloud Computing

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

The Mantle system, presented in a SOSP'25 paper by Baidu's storage team and collaborators, delivers a distributed hierarchical namespace for cloud object storage that overcomes traditional scalability and performance limits, enabling massive data lake workloads with dramatically reduced latency and vastly increased throughput.

Distributed SystemsSOSPcloud storage

0 likes · 8 min read

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

Architecture Digest

Sep 19, 2025 · Backend Development

Mastering Message Idempotency: From Simple Checks to State‑Machine Solutions

This article explores the challenges of duplicate message consumption in distributed systems, explains why naive de‑duplication fails under high concurrency, and presents four progressively robust idempotency strategies—from database pessimistic locks and local message tables to a state‑machine approach with Redis or MySQL, highlighting their trade‑offs.

Backend DevelopmentDistributed SystemsIdempotency

0 likes · 11 min read

Mastering Message Idempotency: From Simple Checks to State‑Machine Solutions

Su San Talks Tech

Sep 18, 2025 · Backend Development

Designing a Million‑QPS Rate Limiter for Backend System Interviews

This article walks through a complete, interview‑ready design of a high‑performance rate‑limiting system that can handle up to one million queries per second, covering requirements, core entities, algorithm choices, distributed state storage with Redis, scalability, high availability, latency optimization, hot‑key mitigation, and dynamic rule configuration.

Backend ArchitectureDistributed SystemsSystem Design

0 likes · 29 min read

Designing a Million‑QPS Rate Limiter for Backend System Interviews

FunTester

Sep 16, 2025 · Fundamentals

Why Going Stateless Beats Indexing: The Surprising Power of Grep in AI Coding Assistants

The article explains how Claude Code’s decision to use real‑time grep instead of code indexing reflects a 50‑year‑old Unix philosophy, showing that stateless design improves composability, scalability, predictability, and privacy across AI assistants, serverless platforms, and distributed systems.

AI assistantsDistributed SystemsServerless

0 likes · 19 min read

Why Going Stateless Beats Indexing: The Surprising Power of Grep in AI Coding Assistants

Su San Talks Tech

Sep 16, 2025 · Backend Development

Mastering Message Order in Distributed Queues: From Basics to Advanced Strategies

This article explores the fundamentals of message ordering in distributed message queues, explains why ordering is determined by broker arrival, compares global and partial ordering, and presents practical solutions—from single-partition designs to multi-partition hashing, handling data skew, and safe expansion—plus interview tips.

Distributed SystemsKafkaPartitioning

0 likes · 24 min read

Mastering Message Order in Distributed Queues: From Basics to Advanced Strategies

Architect's Journey

Sep 15, 2025 · Backend Development

Token Bucket vs Leaky Bucket: Deep Dive into Core Traffic‑Control Algorithms

This article compares the token‑bucket and leaky‑bucket rate‑limiting algorithms, explaining their core principles, Java implementation details, key advantages and drawbacks, suitable application scenarios, interview‑style Q&A, and advanced hybrid strategies for building robust high‑concurrency systems.

Distributed SystemsJavaToken Bucket

0 likes · 9 min read

Token Bucket vs Leaky Bucket: Deep Dive into Core Traffic‑Control Algorithms

Xiaokun's Architecture Exploration Notes

Sep 14, 2025 · Fundamentals

How Lamport Clocks Enable Causal Ordering in Distributed Systems

Lamport Clocks provide a lightweight logical timestamp mechanism that captures the 'happens‑before' relationship between events, enabling causal ordering across distributed replicas, supporting versioned keys, MVCC storage, partial ordering, and highlighting both practical applications and inherent limitations in real‑world systems.

Distributed SystemsLamport ClockMVCC

0 likes · 16 min read

How Lamport Clocks Enable Causal Ordering in Distributed Systems

DataFunTalk

Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupDistributed Systems

0 likes · 4 min read

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

DataFunSummit

Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupDistributed Systems

0 likes · 4 min read

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

DataFunSummit

Sep 8, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray

This article introduces Ant Group’s new Ray‑based distributed agent framework Ragent, outlines its background and motivation, and details the four core modules—Profile, Memory, Planning, and Action—that together enable sophisticated LLM‑driven AI agents for large‑scale applications.

AI agentsAnt GroupDistributed Systems

0 likes · 4 min read

How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray

Architecture & Thinking

Sep 8, 2025 · Backend Development

Mastering RocketMQ: 7 Core Techniques for Reliable Messaging

This article walks through seven essential RocketMQ concepts—including message ordering, delayed delivery, accumulation handling, transactional guarantees, retry mechanisms, storage strategies, and filtering—providing code examples, configuration tips, and visual diagrams to help developers build robust distributed messaging systems.

Distributed SystemsJavaMessage Queue

0 likes · 13 min read

Mastering RocketMQ: 7 Core Techniques for Reliable Messaging

DataFunSummit

Sep 7, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI‑agent framework, covering its background, motivation, design and implementation, and detailing the four core modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents at massive scale.

AI agentsAnt GroupDistributed Systems

0 likes · 4 min read

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

IT Services Circle

Sep 6, 2025 · Backend Development

10 Real‑World Scenarios Where Message Queues Transform Your System

This article explores ten practical use‑cases for message queues—covering system decoupling, asynchronous processing, traffic shaping, data synchronization, log collection, broadcast updates, ordered and delayed messages, retry mechanisms, and transactional messaging—illustrated with Java code examples and architectural diagrams.

Backend DevelopmentDistributed SystemsJava

0 likes · 17 min read

10 Real‑World Scenarios Where Message Queues Transform Your System

DataFunTalk

Sep 5, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and details the four essential modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents in large‑scale AI serving.

AI agentsAnt GroupDistributed Systems

0 likes · 5 min read

NiuNiu MaTe

Sep 4, 2025 · Operations

Mastering Multi‑Active Distributed Systems: From Single Server to Global Fault Tolerance

This article walks developers through the evolution of distributed system architectures—from single‑machine deployments to master‑slave, same‑city active‑active, and finally true multi‑active setups—explaining core concepts, replication strategies, conflict resolution, fault detection, switch mechanisms, recovery methods, and interview tips for high‑availability design.

CAP theoremDistributed SystemsInterview Preparation

0 likes · 26 min read

Mastering Multi‑Active Distributed Systems: From Single Server to Global Fault Tolerance

JD Tech Talk

Sep 4, 2025 · Operations

Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions

This article analyzes the multi‑dimensional challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—by sharing real JD engineering scenarios, common failure patterns, and concrete mitigation strategies to help engineers design more resilient services.

BackendDistributed Systemsfault tolerance

0 likes · 36 min read

Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions

JD Retail Technology

Sep 4, 2025 · Operations

Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems

This article walks through the challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—using JD’s production experiences to highlight common pitfalls, root‑cause analyses, and practical mitigation strategies for engineers seeking resilient architecture.

CacheDistributed SystemsJDK

0 likes · 37 min read

Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems

DataFunSummit

Sep 2, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and breaks down the four essential modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate in real‑world scenarios.

Ant GroupDistributed SystemsLLM

0 likes · 5 min read

IT Services Circle

Aug 29, 2025 · Backend Development

Why Smooth Weighted Round Robin Works: The Math Behind Balanced Load Distribution

This article explains the smooth weighted round robin algorithm, contrasts it with the non‑smooth version, walks through step‑by‑step examples for a 5:1:1 server weight scenario, and provides mathematical proofs of both weight correctness and smoothness, including references to the original source.

Distributed Systemsalgorithmload balancing

0 likes · 15 min read

Why Smooth Weighted Round Robin Works: The Math Behind Balanced Load Distribution

Xiaolei Talks DB

Aug 28, 2025 · Databases

How AI Is Transforming Databases: Highlights from China’s DTCC2025

At DTCC2025 in Beijing, industry leaders showcased AI-driven innovations, vector database advances, RAG techniques, and distributed database performance breakthroughs, illustrating how databases are evolving from passive data stores into intelligent, autonomous systems that boost efficiency, scalability, and business value across sectors.

AIDistributed SystemsRAG

0 likes · 10 min read

How AI Is Transforming Databases: Highlights from China’s DTCC2025