Tagged articles
2122 articles
Page 1 of 22
dbaplus Community
dbaplus Community
May 6, 2026 · Backend Development

Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It

The article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naïve cron‑based scans are unsuitable for tens of millions of rows, and presents three progressively robust solutions using Redis expiration, Redis ZSet polling, and message‑queue or time‑wheel architectures.

Delayed TaskDistributed SystemsMessage Queue
0 likes · 10 min read
Why Scheduled Tasks Fail for Million‑Scale Order Cancellation and How Redis Solves It
Architect's Guide
Architect's Guide
May 1, 2026 · Backend Development

Senior Architects Reveal a Comprehensive Learning Roadmap for Aspiring System Designers

The article outlines a step‑by‑step learning system compiled by senior architects, covering skill foundations, source‑code analysis, distributed and microservice architectures, concurrency, performance tuning, essential Java tools, and a hands‑on e‑commerce project to help developers become well‑rounded architects.

Distributed SystemsJavaMicroservices
0 likes · 7 min read
Senior Architects Reveal a Comprehensive Learning Roadmap for Aspiring System Designers
ITPUB
ITPUB
Apr 25, 2026 · Interview Experience

How to Design a Billion‑Scale URL Shortening System for an Interview

This article walks through the complete interview‑style design of a billion‑scale URL shortener, covering requirements, capacity estimation, API definitions, database schema, short‑code generation algorithms, sharding, caching, load balancing, rate limiting, and expiration handling, while illustrating each step with concrete examples and calculations.

Distributed SystemsSystem DesignURL shortener
0 likes · 24 min read
How to Design a Billion‑Scale URL Shortening System for an Interview
FunTester
FunTester
Apr 22, 2026 · Operations

Why Do Microservice E2E Tests Fail?

In microservice architectures, end‑to‑end tests often become flaky, slow, and untrustworthy because the assumptions of a stable, deterministic system clash with the reality of distributed, asynchronous services, leading to noisy failures, maintenance overhead, and delayed feedback.

Distributed SystemsMicroservicesTesting Strategy
0 likes · 12 min read
Why Do Microservice E2E Tests Fail?
Java Backend Full-Stack
Java Backend Full-Stack
Apr 20, 2026 · Backend Development

What Skills Should a 3‑Year Java Backend Developer Master?

The article outlines a comprehensive skill matrix for a three‑year Java backend engineer, covering core Java and JVM knowledge, mainstream frameworks, storage, messaging, containerization, architecture, engineering practices, soft skills, and emerging trends such as AI integration and reactive programming.

Distributed SystemsDockerJVM
0 likes · 9 min read
What Skills Should a 3‑Year Java Backend Developer Master?
ITPUB
ITPUB
Apr 17, 2026 · Industry Insights

Why LinkedIn Dumped Kafka for Its Own ‘Northguard’ Streaming Engine

LinkedIn, the original home of Apache Kafka, abandoned the platform for a home‑grown system called Northguard, redesigning log storage, decentralizing metadata, and adding a virtualized Xinfra layer to handle trillions of daily events, while still acknowledging Kafka’s relevance for most companies.

Distributed SystemsInfrastructureKafka
0 likes · 7 min read
Why LinkedIn Dumped Kafka for Its Own ‘Northguard’ Streaming Engine
DataFunSummit
DataFunSummit
Apr 15, 2026 · Artificial Intelligence

How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony

Relax, an open‑source RL training engine built on Megatron‑LM and SGLang, tackles data heterogeneity, system fragility, and role coupling by using a service‑oriented fault‑tolerant architecture, asynchronous pipelines, and multimodal‑native support, achieving up to 76% end‑to‑end speedup over veRL.

AI InfrastructureDistributed SystemsRL training
0 likes · 11 min read
How Relax Powers Scalable Multi‑Modal RL Training with Full Asynchrony
Java Tech Enthusiast
Java Tech Enthusiast
Apr 10, 2026 · Databases

16 Powerful Ways to Leverage Redis in Your Applications

This article presents sixteen practical Redis use cases—from simple caching and distributed sessions to global IDs, rate limiting, bitmaps, shopping carts, timelines, message queues, likes, tags, filtering, follow relationships, and ranking—each illustrated with commands and code snippets for real‑world backend development.

Backend DevelopmentData StructuresDistributed Systems
0 likes · 9 min read
16 Powerful Ways to Leverage Redis in Your Applications

How Kafka Powers Scalable E‑commerce Order Processing with Go

This article walks through the challenges of a fast‑growing e‑commerce platform during peak sales, explains why Apache Kafka is the ideal asynchronous messaging backbone, and provides a complete Go implementation—including producers, consumers, best‑practice patterns, and real‑world use cases—to achieve high throughput, fault tolerance, and seamless scalability.

Distributed SystemsMessage QueueSarama
0 likes · 14 min read
How Kafka Powers Scalable E‑commerce Order Processing with Go
Ray's Galactic Tech
Ray's Galactic Tech
Mar 31, 2026 · Artificial Intelligence

From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint

This comprehensive guide walks Go engineers through the evolution from a prototype Retrieval‑Augmented Generation (RAG) service to a production‑grade, distributed AI platform, covering architecture, component boundaries, caching strategies, async indexing, observability, security, and step‑by‑step deployment.

AI ArchitectureBackend DevelopmentDistributed Systems
0 likes · 42 min read
From Single-Node RAG to Scalable Go AI Services: A Hands‑On Architecture Blueprint
Tech Freedom Circle
Tech Freedom Circle
Mar 25, 2026 · Backend Development

Cracking Alibaba’s 10M Orders Interview: Architecture Seven‑Suite + Heterogeneous Storage Solution

The article dissects Alibaba’s second‑round interview question on handling 10 million daily order queries, exposing why a single sharding answer fails and presenting a comprehensive architecture‑seven‑suite combined with heterogeneous storage (MySQL, HBase, ClickHouse, ES, Redis, MQ) to achieve high concurrency, low latency, and reliable data consistency.

Backend ArchitectureDistributed SystemsInterview Preparation
0 likes · 40 min read
Cracking Alibaba’s 10M Orders Interview: Architecture Seven‑Suite + Heterogeneous Storage Solution
dbaplus Community
dbaplus Community
Mar 17, 2026 · Backend Development

18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges

This article examines eighteen concrete production systems—from URL shorteners and Amazon S3 to YouTube, Stripe, Slack, and ChatGPT—showing how their design choices illustrate core concepts such as sharding, caching, idempotency, real‑time messaging, and large‑scale engineering, providing a practical roadmap for software engineers.

Case StudiesDistributed SystemsScalability
0 likes · 13 min read
18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 16, 2026 · Artificial Intelligence

Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows

Agentic reinforcement learning is evolving from simple text generation to complex, scalable agents, but large‑scale deployment faces challenges like massive parallel rollout scheduling and reproducible environments; this article presents a decoupled T‑architecture that separates high‑level RL logic (Verl) from execution orchestration (Argo Workflows) to address these issues.

Argo WorkflowsDistributed SystemsScalable Reinforcement Learning
0 likes · 10 min read
Scaling Agentic Reinforcement Learning with a Decoupled T‑Architecture Using Verl and Argo Workflows
mikechen
mikechen
Mar 12, 2026 · Big Data

How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive

This article explains how Kafka’s sequential disk writes, zero‑copy data path, partition‑based parallelism, and configurable broker and partition settings enable linear‑scale throughput that can reach millions of transactions per second in large‑scale streaming systems.

Distributed SystemsPartitioningThroughput
0 likes · 5 min read
How Kafka Handles Million‑Message Concurrency: Architecture Deep Dive
dbaplus Community
dbaplus Community
Mar 5, 2026 · Backend Development

How to Ensure Message Order in Kafka: From Basics to Advanced Solutions

This article explains the concept of message ordering in distributed systems, details how Kafka stores messages in partitions, compares global and partial ordering, evaluates single‑partition, asynchronous, and multi‑partition solutions—including handling data skew and partition expansion—and provides a practical interview guide.

BackendDistributed SystemsKafka
0 likes · 22 min read
How to Ensure Message Order in Kafka: From Basics to Advanced Solutions
AI Waka
AI Waka
Feb 22, 2026 · Industry Insights

Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It

The article explains why naïve multi‑agent AI architectures collapse under load due to internal east‑west dependencies, and shows how applying 12‑Factor App and cloud‑native patterns—isolated workers, externalized state, short‑lived sessions, and strict orchestration—enable scalable, fault‑tolerant agentic systems.

12-factorCloud NativeDistributed Systems
0 likes · 17 min read
Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It
Coder Trainee
Coder Trainee
Feb 19, 2026 · Fundamentals

Why Switch from UUID to ULID? Exploring Benefits and Features

The article explains why developers are moving from UUID to ULID, detailing ULID’s 128‑bit compatibility, massive per‑millisecond uniqueness, lexicographic sorting, compact 26‑character Base32 encoding, timestamp integration, strong randomness, and practical use cases such as distributed systems and database sharding.

Base32Distributed SystemsULID
0 likes · 3 min read
Why Switch from UUID to ULID? Exploring Benefits and Features
Code Wrench
Code Wrench
Feb 12, 2026 · Backend Development

Why Bidirectional Streaming in gRPC Is More Than a Pipe – A Deep Dive into grpc-go

This article explores how gRPC bidirectional streaming transforms a simple data pipe into a conversational session by examining the underlying HTTP/2 mechanics, shared state machines, flow‑control strategies, practical patterns, and common pitfalls in grpc-go implementations.

Bidirectional StreamingDistributed SystemsFlow Control
0 likes · 16 min read
Why Bidirectional Streaming in gRPC Is More Than a Pipe – A Deep Dive into grpc-go
ITPUB
ITPUB
Feb 11, 2026 · Backend Development

How to Guarantee Zero Message Loss in MQ Systems: A Full‑Lifecycle Design

This guide explains why guaranteeing 100% message reliability in MQ is a critical system‑design interview topic and presents a three‑layer architecture—production, storage, and consumption—detailing ACK settings, local message tables, broker replication, leader election safeguards, manual offset commits, and idempotent processing to prevent any message loss.

AcknowledgmentDistributed SystemsIdempotency
0 likes · 11 min read
How to Guarantee Zero Message Loss in MQ Systems: A Full‑Lifecycle Design
Architecture Digest
Architecture Digest
Jan 30, 2026 · Backend Development

How Hera Transforms SpringBoot Logging: A Step‑by‑Step Integration Guide

Integrating the Hera log platform into SpringBoot resolves common distributed‑system logging pain points—centralized storage, full‑trace linkages, and cost‑effective retention—by adding a non‑intrusive agent, configuring custom fields, enabling trace IDs, and providing a web console for rapid, multi‑service debugging and analysis.

Distributed SystemsHeraObservability
0 likes · 14 min read
How Hera Transforms SpringBoot Logging: A Step‑by‑Step Integration Guide
AntTech
AntTech
Jan 30, 2026 · Databases

Award-Winning Papers Reveal Databases, AI Typography, and Financial Benchmarks

Three award‑winning papers—OceanBase’s unitized database architecture for billion‑scale map services, a video‑diffusion‑based dynamic typography system that animates text semantically, and the FinBench LDBC financial graph benchmark—are examined, highlighting their design, experimental results, and impact on industry applications.

AIDistributed SystemsGraph Benchmark
0 likes · 6 min read
Award-Winning Papers Reveal Databases, AI Typography, and Financial Benchmarks
Java Architect Handbook
Java Architect Handbook
Jan 28, 2026 · Databases

How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write

This article explains the Redis split‑brain problem that can occur in master‑replica clusters, outlines the interview points interviewers look for, and provides a detailed solution using the min‑replicas‑to‑write (or min‑slaves‑to‑write) configuration to sacrifice write availability for data consistency, along with best‑practice recommendations and common pitfalls.

ConfigurationDistributed SystemsSplit-Brain
0 likes · 12 min read
How to Prevent Redis Split‑Brain Disasters with min‑replicas‑to‑write
AI Waka
AI Waka
Jan 26, 2026 · Industry Insights

Why Traditional Software Architecture Fails at Scale and How Message‑Based Design Solves It

The article examines the fifty‑year gap between Alan Kay's biologically‑inspired object model and Roy Fielding's REST constraints, explains why mainstream OOP and microservices fall short, and presents a message‑fabric architecture with bindable components, moderators, and assertion‑driven development that finally delivers scalable, autonomous enterprise systems.

Distributed SystemsMicroservicesSoftware Architecture
0 likes · 22 min read
Why Traditional Software Architecture Fails at Scale and How Message‑Based Design Solves It
Architect's Guide
Architect's Guide
Jan 24, 2026 · Fundamentals

Why Our Custom Snowflake ID Generator Failed and How to Fix It

A recent production incident revealed duplicate order IDs caused by a flawed custom Snowflake algorithm; this article reviews the standard Snowflake structure, dissects the custom implementation’s critical mistakes—short timestamp, IP‑based business ID, zeroed worker and data‑center IDs—and offers best‑practice recommendations, including using mature libraries and proper worker‑ID strategies.

Distributed SystemsID generationJava
0 likes · 7 min read
Why Our Custom Snowflake ID Generator Failed and How to Fix It
Architect's Guide
Architect's Guide
Jan 22, 2026 · Big Data

Unlock Kafka’s Power: Core Concepts, High‑Performance Architecture & Real‑World Scaling Tips

This comprehensive guide explores Kafka’s core value as a message queue, explains producers, consumers, topics, partitions, and replication, dives into cluster architecture, zero‑copy I/O, resource planning for disks, memory, CPU and network, and provides practical configuration, consumer‑group management, and operational tooling tips for building high‑throughput, highly available Kafka deployments.

Distributed SystemsKafkaMessage Queue
0 likes · 31 min read
Unlock Kafka’s Power: Core Concepts, High‑Performance Architecture & Real‑World Scaling Tips
Top Architect
Top Architect
Jan 17, 2026 · Backend Development

Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works

Faced with limitations of existing tools like Quartz, XXL-Job, and PowerJob, the author explains the motivation for creating a custom scheduling framework, describes its architecture—including gRPC communication, protobuf serialization, a self-implemented name server for load balancing, a simple message queue, and time-wheel scheduling—provides code examples, and shares diagrams of discovery and dispatch processes.

Distributed SystemsJavaMessage Queue
0 likes · 17 min read
Why We Rebuilt a Java Scheduler and How the New Lightweight Framework Works
Tech Freedom Circle
Tech Freedom Circle
Jan 15, 2026 · Backend Development

Kafka Rebalance Storm Crushed 120k QPS in JD Interview – How to Understand and Fix

In a JD senior Java architect interview, a Kafka consumer‑group rebalance storm caused QPS to drop from 120k to zero, triggering massive message loss and latency spikes, and the article walks through the rebalance fundamentals, failure causes, impact analysis, cooperative sticky assignor migration, and comprehensive monitoring and mitigation strategies.

Distributed SystemsKafkaconsumer-group
0 likes · 28 min read
Kafka Rebalance Storm Crushed 120k QPS in JD Interview – How to Understand and Fix
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 12, 2026 · Databases

Designing Scalable Order Sharding for Millions of Daily Transactions

This article outlines a practical sharding strategy for e‑commerce order systems, estimating future load, detailing user‑centric partitioning, heterogeneous designs for merchants and operators, and migration steps to achieve high‑concurrency writes and massive storage without downtime.

Data MigrationDistributed SystemsOrder Management
0 likes · 4 min read
Designing Scalable Order Sharding for Millions of Daily Transactions
dbaplus Community
dbaplus Community
Jan 7, 2026 · Backend Development

Why Our Custom Snowflake ID Collided and How to Build a Reliable Generator

A recent production incident caused duplicate order IDs due to a flawed custom Snowflake implementation, prompting a deep dive into the standard algorithm, analysis of the mistakes, and a set of best‑practice recommendations for designing robust distributed ID generators.

Design PatternsDistributed SystemsID generation
0 likes · 7 min read
Why Our Custom Snowflake ID Collided and How to Build a Reliable Generator
Tech Freedom Circle
Tech Freedom Circle
Jan 6, 2026 · Backend Development

Why Choose RocketMQ Over Kafka? The Real Reasons Behind the 90% Mistake

This article dissects a common interview question about Kafka's higher throughput versus RocketMQ's richer features, explains the underlying design philosophies, storage models, I/O paths, scaling limits, real‑world use cases such as transaction, delayed and ordered messages, and provides concrete optimization steps and code samples to help engineers make an informed messaging platform choice.

Distributed SystemsJavaKafka
0 likes · 42 min read
Why Choose RocketMQ Over Kafka? The Real Reasons Behind the 90% Mistake
ITPUB
ITPUB
Jan 3, 2026 · Backend Development

How to Build a Scalable Order Cancellation System: 3 Advanced Delayed‑Task Solutions

This article dissects a common interview question about automatically canceling unpaid orders after 30 minutes, explains why naive cron jobs fail at scale, and presents three robust backend designs—Redis ZSet polling, message‑queue delayed messages, and time‑wheel timers—along with practical code snippets and pitfalls to avoid.

Backend ArchitectureDistributed SystemsInterview Preparation
0 likes · 11 min read
How to Build a Scalable Order Cancellation System: 3 Advanced Delayed‑Task Solutions
DeWu Technology
DeWu Technology
Dec 29, 2025 · Backend Development

Unveiling RocketMQ: A Deep Dive into Its Architecture and Performance Secrets

This comprehensive guide explores RocketMQ’s four‑component architecture, storage formats, routing mechanisms, write‑and‑read workflows, high‑availability designs, performance optimizations, and a side‑by‑side comparison with Kafka, providing practical insights for building robust distributed messaging systems.

Distributed SystemsMessage QueueRocketMQ
0 likes · 28 min read
Unveiling RocketMQ: A Deep Dive into Its Architecture and Performance Secrets
JavaGuide
JavaGuide
Dec 25, 2025 · Interview Experience

How I Secured Offers from Top Tech Companies in 80 Days

The author, a non‑elite undergraduate and a modest 211 master’s graduate, shares a step‑by‑step 80‑day crash‑course that turned zero Java experience into multiple offers from major tech firms, emphasizing fundamental understanding, AI‑assisted learning, and thoughtful project trade‑offs.

AI-assisted LearningAlgorithm PreparationDistributed Systems
0 likes · 8 min read
How I Secured Offers from Top Tech Companies in 80 Days
Architect Chen
Architect Chen
Dec 25, 2025 · Information Security

Understanding Single Sign-On (SSO): Architecture, Components, and Workflow

This article explains the fundamentals of Single Sign-On (SSO), detailing its centralized authentication principle, the roles of CAS Server, CAS Client, and browser, and walks through the complete login flow with diagrams and code snippets for distributed systems.

AuthenticationCASDistributed Systems
0 likes · 4 min read
Understanding Single Sign-On (SSO): Architecture, Components, and Workflow
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 25, 2025 · Backend Development

How to Resolve Kafka Backlog Under High Load: Practical Tips

This article explains why Kafka experiences message backlog in high‑load environments, identifies producer‑consumer speed mismatches, I/O and resource bottlenecks, and offers concrete strategies such as scaling consumers, tuning hardware, and adjusting Kafka configurations to eliminate the backlog.

BacklogDistributed SystemsKafka
0 likes · 4 min read
How to Resolve Kafka Backlog Under High Load: Practical Tips
Code Ape Tech Column
Code Ape Tech Column
Dec 19, 2025 · Backend Development

Boost SpringBoot Log Management: Step‑by‑Step Integration with Hera

This article explains why traditional SpringBoot logging falls short, introduces the Hera log platform’s three core benefits, outlines a layered integration architecture, and provides a detailed five‑step guide—including Maven dependencies, YAML configuration, custom field providers, log output, traceability, and console usage—plus performance, high‑availability, security tips and common pitfalls.

Distributed SystemsHeraLog Management
0 likes · 14 min read
Boost SpringBoot Log Management: Step‑by‑Step Integration with Hera
Java Architect Handbook
Java Architect Handbook
Dec 14, 2025 · Backend Development

Why Our Custom Snowflake ID Failed and How to Build a Reliable One

A recent production incident revealed that a self‑developed Snowflake‑style ID generator caused duplicate order numbers due to a truncated timestamp, unsafe IP‑based business IDs, and unconfigured worker and data‑center IDs, prompting a detailed analysis of the standard algorithm, the flaws in the custom design, and best‑practice recommendations for robust ID generation.

BackendDistributed SystemsID generation
0 likes · 9 min read
Why Our Custom Snowflake ID Failed and How to Build a Reliable One
Tencent Cloud Middleware
Tencent Cloud Middleware
Dec 12, 2025 · Artificial Intelligence

How A2A over MQTT Transforms AI Agent Collaboration

This article explains the challenges of traditional point‑to‑point AI agent communication, introduces the A2A protocol and its limitations, and details how combining A2A with MQTT via Tencent Cloud TDMQ creates a dynamic, loosely‑coupled, and scalable solution with practical SDK examples and real‑world case studies.

A2A protocolAI agentsDistributed Systems
0 likes · 16 min read
How A2A over MQTT Transforms AI Agent Collaboration
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 9, 2025 · Backend Development

Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips

This article explains what high concurrency means for Kafka, outlines key performance metrics such as QPS, TPS, throughput and latency, and provides concrete configuration and architectural techniques—including broker optimization, horizontal scaling, network batching, and zero‑copy—to achieve write rates exceeding one million records per second.

BackendDistributed SystemsKafka
0 likes · 4 min read
Boost Kafka to Over 1 Million Messages per Second: Metrics and Tuning Tips
Java Architect Handbook
Java Architect Handbook
Dec 9, 2025 · Industry Insights

Why Microservices May Be Overhyped: Tracing Their Real Roots and Myths

The article first lists a series of Java learning projects and community benefits, then critically examines the widely touted advantages of microservices, showing how many of those claims originate from older technologies, debunking common myths, and concluding that microservices are essentially just modular code.

Distributed SystemsIndustry analysisMicroservices
0 likes · 16 min read
Why Microservices May Be Overhyped: Tracing Their Real Roots and Myths
JD Cloud Developers
JD Cloud Developers
Dec 8, 2025 · Fundamentals

Why Raft Guarantees Linear Consistency in Unreliable Networks

This article explains how unreliable networks, clock instability, and node failures can cause data inconsistency in distributed clusters, introduces the Raft consensus algorithm, details its roles, election process, log replication, read/write handling, consistency models, and mechanisms to avoid split-brain and livelock.

ConsensusConsistencyDistributed Systems
0 likes · 13 min read
Why Raft Guarantees Linear Consistency in Unreliable Networks
Ctrip Technology
Ctrip Technology
Dec 5, 2025 · Databases

How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

This article explains the design and implementation of Ctrip's Data Replication Center (DRC), a MySQL‑based high‑availability system that solves cross‑region data loop, progress tracking, concurrency, DDL handling, and conflict resolution to achieve low‑latency, reliable data replication for global travel services.

Distributed SystemsGTIDcross-region
0 likes · 21 min read
How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication
Code Wrench
Code Wrench
Nov 26, 2025 · Backend Development

Unlocking Olric’s High‑Performance Network Protocol and RPC Mechanism

This article dives deep into Olric’s network communication architecture and RPC mechanism, explaining its layered transport design, request/response structures, pipeline and batch processing, client‑to‑cluster interactions, data migration and rebalancing, and provides Go code examples illustrating high‑throughput, safe distributed operations.

Distributed SystemsGoOlric
0 likes · 6 min read
Unlocking Olric’s High‑Performance Network Protocol and RPC Mechanism
Code Wrench
Code Wrench
Nov 24, 2025 · Backend Development

What Makes Olric’s Go Architecture a Masterclass in Distributed KV Design

This article explores Olric, a pure‑Go distributed key‑value engine, detailing its dual embedded/stand‑alone mode, clean three‑layer architecture, core data structures, and engineering choices that illustrate best practices for building high‑performance, maintainable backend systems.

Distributed SystemsGoKV Store
0 likes · 10 min read
What Makes Olric’s Go Architecture a Masterclass in Distributed KV Design
Architect's Guide
Architect's Guide
Nov 21, 2025 · Backend Development

Mastering Apollo: A Deep Dive into Ctrip’s Open‑Source Distributed Configuration Center

This article walks through the concepts, architecture, and hands‑on steps for using Apollo, Ctrip’s open‑source distributed configuration center, covering project setup, Spring Boot integration, dynamic updates, clustering, namespaces, high‑availability design, and Kubernetes deployment.

ApolloConfiguration ManagementDistributed Systems
0 likes · 25 min read
Mastering Apollo: A Deep Dive into Ctrip’s Open‑Source Distributed Configuration Center
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Nov 16, 2025 · Backend Development

How to Choose and Implement Architecture Contracts for Distributed Systems

This article explains why architecture‑level contract decisions are needed in distributed systems, compares strict and loose data contracts, illustrates schema‑on‑read/write patterns, and shows how to ensure forward and backward compatibility when evolving protocols such as JSON and Protobuf.

Distributed SystemsProtobufarchitecture contracts
0 likes · 11 min read
How to Choose and Implement Architecture Contracts for Distributed Systems
Tech Freedom Circle
Tech Freedom Circle
Nov 16, 2025 · Databases

How Redis Pipeline Can Boost Performance 3‑12× and Impress Interviewers

This article explains Redis Pipeline’s core principle of batching commands to reduce network round‑trips, presents benchmark data showing up to 17‑fold speedups, details real‑world use cases such as cache warm‑up, heartbeat reporting, and high‑traffic events, and provides best‑practice guidelines on batch sizing, error handling, cluster constraints, and comparisons with transactions and Lua scripts.

Batch ProcessingBenchmarkDistributed Systems
0 likes · 36 min read
How Redis Pipeline Can Boost Performance 3‑12× and Impress Interviewers
Open Source Tech Hub
Open Source Tech Hub
Nov 13, 2025 · Fundamentals

Why Heartbeat Mechanisms Are Critical for Distributed System Reliability

This article explains how periodic heartbeat messages enable distributed systems to detect node failures, choose appropriate intervals and timeouts, compare push and pull models, employ advanced detection algorithms like phi and gossip, and apply these concepts in real-world platforms such as Kubernetes, Cassandra, and etcd.

Distributed SystemsFailure DetectionGossip Protocol
0 likes · 22 min read
Why Heartbeat Mechanisms Are Critical for Distributed System Reliability
IT Services Circle
IT Services Circle
Nov 11, 2025 · Backend Development

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

An e‑commerce order system acts as the core connector linking users, merchants, payments, logistics and revenue, and this article dissects its three essential flows—forward, reverse and state transitions—while detailing the technical challenges and solutions for order creation, payment, fulfillment, cancellation, after‑sale, architecture, and data handling.

Distributed Systemse‑commercehigh concurrency
0 likes · 19 min read
How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System
NiuNiu MaTe
NiuNiu MaTe
Nov 5, 2025 · Backend Development

How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System

This article dissects the core processes, functional challenges, and architectural design of a high‑throughput, strongly consistent e‑commerce order system, covering forward and reverse flows, order creation, payment, fulfillment, cancellation, after‑sale handling, and the layered backend architecture that powers it.

Backend ArchitectureDistributed SystemsMicroservices
0 likes · 21 min read
How to Build a High‑Concurrency, Strong‑Consistency E‑Commerce Order System
IT Architects Alliance
IT Architects Alliance
Nov 4, 2025 · Backend Development

Mastering Distributed Data Consistency: Strategies, Patterns, and Best Practices

This article explores the challenges of maintaining data consistency in distributed microservice architectures, covering CAP theory, consistency models, replication strategies, transaction patterns like Saga and TCC, tooling choices, monitoring practices, and actionable best‑practice recommendations.

CAP theoremData ConsistencyDistributed Systems
0 likes · 13 min read
Mastering Distributed Data Consistency: Strategies, Patterns, and Best Practices
DevOps Coach
DevOps Coach
Oct 31, 2025 · Backend Development

How Netflix’s Maestro Engine Gained a 100× Speed Boost with a New Actor‑Based Architecture

Netflix’s Maestro workflow orchestrator was redesigned with a lightweight, stateful actor model and Java virtual threads, cutting engine overhead from seconds to milliseconds, delivering a hundred‑fold performance increase while preserving scalability, reliability, and strong execution guarantees for massive data and ML pipelines.

Distributed SystemsJava virtual threadsNetflix Maestro
0 likes · 28 min read
How Netflix’s Maestro Engine Gained a 100× Speed Boost with a New Actor‑Based Architecture
Top Architect
Top Architect
Oct 31, 2025 · Backend Development

Mastering Message Queues: A Deep Dive into RabbitMQ, RocketMQ, and Kafka

This comprehensive guide explains the core components, exchange types, TTL, confirm mechanisms, consumer ACK/NACK, dead‑letter queues, and high‑availability features of RabbitMQ, RocketMQ, and Kafka, while also covering load balancing, ordering, transaction handling, and best practices for reliable message delivery.

Backend DevelopmentDistributed SystemsKafka
0 likes · 32 min read
Mastering Message Queues: A Deep Dive into RabbitMQ, RocketMQ, and Kafka
Instant Consumer Technology Team
Instant Consumer Technology Team
Oct 29, 2025 · Big Data

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

Facing PB‑scale user behavior data and millions of feature dimensions, the platform transformed its search, advertising, and recommendation pipelines by adopting a distributed, configurable‑service architecture that delivers high‑throughput streaming, elastic storage, rapid feature iteration, and robust fault‑tolerance for AI‑driven personalization.

Big DataData ArchitectureDistributed Systems
0 likes · 17 min read
Revolutionizing Feature Engineering with Distributed Tech & Configurable Services
NiuNiu MaTe
NiuNiu MaTe
Oct 29, 2025 · Backend Development

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

This article walks through the end‑to‑end design of a leaderboard that must serve over 100 million users with 100 k queries per second, covering requirement clarification, real‑time and accuracy challenges, technology selection such as Redis ZSet, multi‑layer architecture, sharding, caching, monitoring, and practical implementation tips to achieve low latency, high consistency, and cost‑effective scalability.

Big DataDistributed SystemsReal-Time
0 likes · 19 min read
How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls
Radish, Keep Going!
Radish, Keep Going!
Oct 28, 2025 · Big Data

How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse

Netflix processes over 5 PB of logs daily, handling millions of events per second, and by layering hot and cold storage, using a custom lexer for fingerprinting, native protocol serialization, and sharded tag maps, they reduced query latency from seconds to sub‑second levels with ClickHouse.

Big DataClickHouseDistributed Systems
0 likes · 8 min read
How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse
Architect's Guide
Architect's Guide
Oct 28, 2025 · Backend Development

How to Prevent API Scraping in High‑Traffic Seckill Systems with Java

During high‑traffic flash‑sale events like Double 11, malicious users can flood seckill APIs, causing service collapse and inventory errors; this article explains the business pain points and presents a multi‑layer anti‑scraping solution—including rate limiting, behavior detection, captchas, request signing, token mechanisms, and asynchronous order processing—with concrete Java implementations.

API SecurityCaptchaDistributed Systems
0 likes · 7 min read
How to Prevent API Scraping in High‑Traffic Seckill Systems with Java
Huolala Tech
Huolala Tech
Oct 22, 2025 · Backend Development

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

To ensure fund safety and robust operations, the team built a real‑time reconciliation platform that leverages Kafka, and after encountering scaling bottlenecks with a static consumer model, they implemented a dynamic, partition‑level, weighted load‑balancing consumer cluster that supports automatic scaling and high‑throughput processing.

Backend ArchitectureDistributed SystemsDynamic Scaling
0 likes · 15 min read
Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters
dbaplus Community
dbaplus Community
Oct 16, 2025 · Backend Development

How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience

This article presents a step‑by‑step engineering guide for designing, evolving, and operating a high‑traffic open platform, covering three‑layer decoupled architecture, multi‑level caching, asynchronous message queues, distributed transaction models, high‑availability strategies, and phased rollout plans to sustain billions of daily API calls.

Distributed SystemsOpen Platformcaching
0 likes · 20 min read
How to Build a Billion‑Scale Open Platform: Architecture, Caching, and Resilience
NiuNiu MaTe
NiuNiu MaTe
Oct 16, 2025 · Backend Development

Prevent Service Avalanche: Circuit Breaker vs Degradation Strategies Explained

This article explains service avalanche in micro‑service chains, outlines its three failure stages, compares circuit‑breaker and degradation techniques, shows when to apply each, and provides practical guidance on tools like Sentinel and Resilience4j, testing, monitoring, and best‑practice configurations.

Distributed SystemsMicroservicesbackend reliability
0 likes · 11 min read
Prevent Service Avalanche: Circuit Breaker vs Degradation Strategies Explained
BirdNest Tech Talk
BirdNest Tech Talk
Oct 12, 2025 · Artificial Intelligence

What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?

The article uses a whimsical journey to illustrate how token data is dispatched across GPU clusters—detailing functions like get_dispatch_layout, notify_dispatch, and combine_token, showing RDMA and NVLink pathways, performance experiments, and the final verification of token integrity.

AIDistributed SystemsGPU
0 likes · 5 min read
What Happens When a Token Travels Through GPU Villages via RDMA and NVLink?
IT Architects Alliance
IT Architects Alliance
Oct 10, 2025 · Information Security

How to Secure Distributed Permissions: Zero Trust Strategies & Code

This article examines the exponential growth of permission complexity in micro‑service architectures, outlines zero‑trust design principles, and provides concrete Java and YAML implementations for fine‑grained, context‑aware access control, caching, dynamic evaluation, and audit monitoring.

Distributed SystemsSecurityZero Trust
0 likes · 11 min read
How to Secure Distributed Permissions: Zero Trust Strategies & Code
DataFunSummit
DataFunSummit
Oct 5, 2025 · Artificial Intelligence

How Bilibili Uses LLM‑Powered Assistants to Tackle Big‑Data Task Failures

Bilibili’s massive video platform relies on a five‑layer, storage‑compute separated big‑data architecture, handling hundreds of thousands of daily tasks, and now leverages large‑language‑model assistants to automatically diagnose and resolve frequent task failures and performance slowdowns.

AI assistanceBilibiliDistributed Systems
0 likes · 4 min read
How Bilibili Uses LLM‑Powered Assistants to Tackle Big‑Data Task Failures
ITPUB
ITPUB
Oct 5, 2025 · Backend Development

How to Clear a 10‑Million‑Message Queue in 5 Hours: A Five‑Step Rescue Plan

When a flash‑sale causes a 10 million‑message backlog and consumers only process 200 messages per second, this guide shows a five‑step, 5‑hour strategy—horizontal scaling, message downgrade, flow control, temporary dump, and parallel blasting—to restore throughput and prevent system collapse.

Distributed SystemsKafkaPerformance Optimization
0 likes · 6 min read
How to Clear a 10‑Million‑Message Queue in 5 Hours: A Five‑Step Rescue Plan
Data Party THU
Data Party THU
Sep 30, 2025 · Backend Development

Ray Serve vs Celery: Which Is Best for GPU‑Intensive Parallel Workloads?

This article compares Ray Serve and Celery, explaining their design philosophies, scaling models, GPU‑aware scheduling, operational trade‑offs, and real‑world case studies to help engineers choose the right tool for high‑throughput online inference or large‑scale batch processing.

Distributed SystemsGPUModel Serving
0 likes · 9 min read
Ray Serve vs Celery: Which Is Best for GPU‑Intensive Parallel Workloads?

How Version Vectors Resolve Conflicts in Multi‑Leader and Leaderless Replication

This article explains why version vectors are needed in multi‑leader and leaderless replication, describes their implementation and comparison rules, and presents practical conflict‑resolution strategies—including custom resolvers, last‑write‑wins, read‑repair, and request rejection—supported by Java pseudocode and diagrams.

Distributed SystemsMulti-LeaderReplication
0 likes · 16 min read
How Version Vectors Resolve Conflicts in Multi‑Leader and Leaderless Replication
Tech Freedom Circle
Tech Freedom Circle
Sep 25, 2025 · Operations

RAGFlow Link Tracing: GPS‑Style Observability for LLM‑Powered Applications

The article explains why RAGFlow needs end‑to‑end link tracing, introduces OpenTelemetry’s core concepts, shows how custom tracing utilities are implemented in Python, describes the layered architecture, provides concrete Docker and YAML configurations, and offers best‑practice guidelines for performance monitoring and fault diagnosis.

Distributed SystemsLLMObservability
0 likes · 24 min read
RAGFlow Link Tracing: GPS‑Style Observability for LLM‑Powered Applications
Tech Freedom Circle
Tech Freedom Circle
Sep 24, 2025 · Backend Development

Designing a US Presidential Election Voting System: 1M TPS, 10M QPS, Immutable and Non‑Duplicate Votes

This article presents a comprehensive architectural design for a high‑throughput US presidential voting platform that must handle 1 million transactions per second and 10 million queries per second while guaranteeing vote immutability, one‑person‑one‑vote enforcement, real‑time result aggregation, and scalable storage using microservices, Kafka, Redis, Bloom filters, and blockchain anchoring.

BlockchainDistributed SystemsIdempotency
0 likes · 32 min read
Designing a US Presidential Election Voting System: 1M TPS, 10M QPS, Immutable and Non‑Duplicate Votes
Architecture Digest
Architecture Digest
Sep 23, 2025 · Backend Development

How to Ensure Zero Message Loss in Kafka: Proven Strategies for High‑Reliability Systems

This article explains Kafka's storage architecture, identifies three major message‑loss scenarios across production, storage, and consumption, and provides practical end‑to‑end configurations, detection methods, and business‑level patterns to achieve near‑zero message loss in high‑concurrency distributed systems.

Data ConsistencyDistributed SystemsKafka
0 likes · 13 min read
How to Ensure Zero Message Loss in Kafka: Proven Strategies for High‑Reliability Systems
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 22, 2025 · Cloud Computing

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

The Mantle system, presented in a SOSP'25 paper by Baidu's storage team and collaborators, delivers a distributed hierarchical namespace for cloud object storage that overcomes traditional scalability and performance limits, enabling massive data lake workloads with dramatically reduced latency and vastly increased throughput.

Distributed SystemsSOSPcloud storage
0 likes · 8 min read
How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage
Architecture Digest
Architecture Digest
Sep 19, 2025 · Backend Development

Mastering Message Idempotency: From Simple Checks to State‑Machine Solutions

This article explores the challenges of duplicate message consumption in distributed systems, explains why naive de‑duplication fails under high concurrency, and presents four progressively robust idempotency strategies—from database pessimistic locks and local message tables to a state‑machine approach with Redis or MySQL, highlighting their trade‑offs.

Backend DevelopmentDistributed SystemsIdempotency
0 likes · 11 min read
Mastering Message Idempotency: From Simple Checks to State‑Machine Solutions
Su San Talks Tech
Su San Talks Tech
Sep 18, 2025 · Backend Development

Designing a Million‑QPS Rate Limiter for Backend System Interviews

This article walks through a complete, interview‑ready design of a high‑performance rate‑limiting system that can handle up to one million queries per second, covering requirements, core entities, algorithm choices, distributed state storage with Redis, scalability, high availability, latency optimization, hot‑key mitigation, and dynamic rule configuration.

Backend ArchitectureDistributed SystemsSystem Design
0 likes · 29 min read
Designing a Million‑QPS Rate Limiter for Backend System Interviews
FunTester
FunTester
Sep 16, 2025 · Fundamentals

Why Going Stateless Beats Indexing: The Surprising Power of Grep in AI Coding Assistants

The article explains how Claude Code’s decision to use real‑time grep instead of code indexing reflects a 50‑year‑old Unix philosophy, showing that stateless design improves composability, scalability, predictability, and privacy across AI assistants, serverless platforms, and distributed systems.

AI assistantsDistributed SystemsServerless
0 likes · 19 min read
Why Going Stateless Beats Indexing: The Surprising Power of Grep in AI Coding Assistants
Su San Talks Tech
Su San Talks Tech
Sep 16, 2025 · Backend Development

Mastering Message Order in Distributed Queues: From Basics to Advanced Strategies

This article explores the fundamentals of message ordering in distributed message queues, explains why ordering is determined by broker arrival, compares global and partial ordering, and presents practical solutions—from single-partition designs to multi-partition hashing, handling data skew, and safe expansion—plus interview tips.

Distributed SystemsKafkaPartitioning
0 likes · 24 min read
Mastering Message Order in Distributed Queues: From Basics to Advanced Strategies
Architect's Journey
Architect's Journey
Sep 15, 2025 · Backend Development

Token Bucket vs Leaky Bucket: Deep Dive into Core Traffic‑Control Algorithms

This article compares the token‑bucket and leaky‑bucket rate‑limiting algorithms, explaining their core principles, Java implementation details, key advantages and drawbacks, suitable application scenarios, interview‑style Q&A, and advanced hybrid strategies for building robust high‑concurrency systems.

Distributed SystemsJavaToken Bucket
0 likes · 9 min read
Token Bucket vs Leaky Bucket: Deep Dive into Core Traffic‑Control Algorithms
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Sep 14, 2025 · Fundamentals

How Lamport Clocks Enable Causal Ordering in Distributed Systems

Lamport Clocks provide a lightweight logical timestamp mechanism that captures the 'happens‑before' relationship between events, enabling causal ordering across distributed replicas, supporting versioned keys, MVCC storage, partial ordering, and highlighting both practical applications and inherent limitations in real‑world systems.

Distributed SystemsLamport ClockMVCC
0 likes · 16 min read
How Lamport Clocks Enable Causal Ordering in Distributed Systems
DataFunTalk
DataFunTalk
Sep 10, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

The article presents Ant Group’s Ray‑based Ragent framework, detailing its background, motivation behind unified AI serving, and the four core modules—Profile, Memory, Planning, and Action—that together enable large‑language‑model agents for financial applications.

AI FrameworkAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents
DataFunSummit
DataFunSummit
Sep 9, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI agent framework, covering its background, motivation in the large‑model era, and a four‑module design (Profile, Memory, Planning, Action) that enables scalable LLM‑driven agents.

AI FrameworkAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ragent Redefines Distributed LLM Agents with Ray
DataFunSummit
DataFunSummit
Sep 8, 2025 · Artificial Intelligence

How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray

This article introduces Ant Group’s new Ray‑based distributed agent framework Ragent, outlines its background and motivation, and details the four core modules—Profile, Memory, Planning, and Action—that together enable sophisticated LLM‑driven AI agents for large‑scale applications.

AI agentsAnt GroupDistributed Systems
0 likes · 4 min read
How Ant Group’s Ragent Redefines LLM‑Based AI Agents on Ray
Architecture & Thinking
Architecture & Thinking
Sep 8, 2025 · Backend Development

Mastering RocketMQ: 7 Core Techniques for Reliable Messaging

This article walks through seven essential RocketMQ concepts—including message ordering, delayed delivery, accumulation handling, transactional guarantees, retry mechanisms, storage strategies, and filtering—providing code examples, configuration tips, and visual diagrams to help developers build robust distributed messaging systems.

Distributed SystemsJavaMessage Queue
0 likes · 13 min read
Mastering RocketMQ: 7 Core Techniques for Reliable Messaging
DataFunSummit
DataFunSummit
Sep 7, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ragent, a Ray‑based distributed AI‑agent framework, covering its background, motivation, design and implementation, and detailing the four core modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents at massive scale.

AI agentsAnt GroupDistributed Systems
0 likes · 4 min read
Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray
IT Services Circle
IT Services Circle
Sep 6, 2025 · Backend Development

10 Real‑World Scenarios Where Message Queues Transform Your System

This article explores ten practical use‑cases for message queues—covering system decoupling, asynchronous processing, traffic shaping, data synchronization, log collection, broadcast updates, ordered and delayed messages, retry mechanisms, and transactional messaging—illustrated with Java code examples and architectural diagrams.

Backend DevelopmentDistributed SystemsJava
0 likes · 17 min read
10 Real‑World Scenarios Where Message Queues Transform Your System
DataFunTalk
DataFunTalk
Sep 5, 2025 · Artificial Intelligence

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and details the four essential modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents in large‑scale AI serving.

AI agentsAnt GroupDistributed Systems
0 likes · 5 min read
Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray
NiuNiu MaTe
NiuNiu MaTe
Sep 4, 2025 · Operations

Mastering Multi‑Active Distributed Systems: From Single Server to Global Fault Tolerance

This article walks developers through the evolution of distributed system architectures—from single‑machine deployments to master‑slave, same‑city active‑active, and finally true multi‑active setups—explaining core concepts, replication strategies, conflict resolution, fault detection, switch mechanisms, recovery methods, and interview tips for high‑availability design.

CAP theoremDistributed SystemsInterview Preparation
0 likes · 26 min read
Mastering Multi‑Active Distributed Systems: From Single Server to Global Fault Tolerance
JD Tech Talk
JD Tech Talk
Sep 4, 2025 · Operations

Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions

This article analyzes the multi‑dimensional challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—by sharing real JD engineering scenarios, common failure patterns, and concrete mitigation strategies to help engineers design more resilient services.

BackendDistributed Systemsfault tolerance
0 likes · 36 min read
Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions
JD Retail Technology
JD Retail Technology
Sep 4, 2025 · Operations

Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems

This article walks through the challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—using JD’s production experiences to highlight common pitfalls, root‑cause analyses, and practical mitigation strategies for engineers seeking resilient architecture.

CacheDistributed SystemsJDK
0 likes · 37 min read
Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems
DataFunSummit
DataFunSummit
Sep 2, 2025 · Artificial Intelligence

How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and breaks down the four essential modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate in real‑world scenarios.

Ant GroupDistributed SystemsLLM
0 likes · 5 min read
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents
IT Services Circle
IT Services Circle
Aug 29, 2025 · Backend Development

Why Smooth Weighted Round Robin Works: The Math Behind Balanced Load Distribution

This article explains the smooth weighted round robin algorithm, contrasts it with the non‑smooth version, walks through step‑by‑step examples for a 5:1:1 server weight scenario, and provides mathematical proofs of both weight correctness and smoothness, including references to the original source.

Distributed Systemsalgorithmload balancing
0 likes · 15 min read
Why Smooth Weighted Round Robin Works: The Math Behind Balanced Load Distribution
Xiaolei Talks DB
Xiaolei Talks DB
Aug 28, 2025 · Databases

How AI Is Transforming Databases: Highlights from China’s DTCC2025

At DTCC2025 in Beijing, industry leaders showcased AI-driven innovations, vector database advances, RAG techniques, and distributed database performance breakthroughs, illustrating how databases are evolving from passive data stores into intelligent, autonomous systems that boost efficiency, scalability, and business value across sectors.

AIDistributed SystemsRAG
0 likes · 10 min read
How AI Is Transforming Databases: Highlights from China’s DTCC2025