Tagged articles
2122 articles
Page 3 of 22
Cognitive Technology Team
Cognitive Technology Team
Apr 3, 2025 · Fundamentals

Understanding CAP Theory and BASE: Data Consistency in Distributed Systems

This article explains the CAP theorem and its practical extension BASE, describing their core concepts, trade‑off combinations, typical components such as Zookeeper, Eureka, and Nacos, and engineering techniques like asynchronous replication, Saga, and idempotent design for building highly available distributed systems.

AvailabilityBASECAP theorem
0 likes · 5 min read
Understanding CAP Theory and BASE: Data Consistency in Distributed Systems
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Apr 1, 2025 · Big Data

Elasticsearch Unveiled: Learn Search Engine Basics Through Comics

This visual guide walks readers through Elasticsearch fundamentals—from architecture and indexing to clustering, query DSL, aggregations, and performance tuning—using comic-style illustrations that simplify each concept for easy understanding, and security considerations, multilingual support, and real‑time search capabilities.

Big DataDistributed SystemsElasticsearch
0 likes · 2 min read
Elasticsearch Unveiled: Learn Search Engine Basics Through Comics
Raymond Ops
Raymond Ops
Mar 30, 2025 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained

This article explains three Elasticsearch data‑synchronization methods, compares their pros and cons, and then dives into ES cluster structure, node roles, shard allocation, distributed queries, split‑brain handling, and fault‑tolerance mechanisms, providing a comprehensive guide for developers and ops engineers.

Cluster ManagementDistributed SystemsElasticsearch
0 likes · 9 min read
Mastering Elasticsearch Data Sync and Cluster Architecture: 3 Strategies Explained
Ma Wei Says
Ma Wei Says
Mar 30, 2025 · Fundamentals

How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus

Kafka 4.0 introduces KRaft, a ZooKeeper‑free metadata layer built on the Raft consensus algorithm, detailing role transitions, leader election, log replication, controller and broker responsibilities, and fault‑tolerance mechanisms, enabling a more scalable and self‑managed architecture for large‑scale distributed streaming.

Consensus AlgorithmDistributed SystemsKRaft
0 likes · 13 min read
How Kafka 4.0’s KRaft Replaces ZooKeeper with Raft Consensus
Ma Wei Says
Ma Wei Says
Mar 28, 2025 · Backend Development

Choosing the Right Message Queue: Kafka vs RocketMQ vs RabbitMQ Explained

This article compares Kafka, RocketMQ, and RabbitMQ, detailing their architectures, performance characteristics, strengths, and ideal use‑cases to help engineers select the most suitable message‑queue solution for high‑throughput, fault‑tolerant, and real‑time processing scenarios.

Distributed SystemsEvent StreamingHigh Throughput
0 likes · 11 min read
Choosing the Right Message Queue: Kafka vs RocketMQ vs RabbitMQ Explained
Bilibili Tech
Bilibili Tech
Mar 25, 2025 · Cloud Native

Technical Case Study: Accelerating Live‑to‑VOD Conversion for the 2025 Spring Festival Gala

By replacing the legacy FLV recorder with an m3u8‑based service, introducing a short‑segment, state‑machine transcoder, and deploying an event‑driven proxy and instant‑clipping UI, the team cut the 4‑hour‑40‑minute Spring Festival Gala’s live‑to‑VOD processing from 41 minutes to about eight minutes, achieving roughly a five‑fold speedup.

Distributed SystemsPerformance OptimizationVOD conversion
0 likes · 21 min read
Technical Case Study: Accelerating Live‑to‑VOD Conversion for the 2025 Spring Festival Gala
DataFunSummit
DataFunSummit
Mar 20, 2025 · Artificial Intelligence

Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training

The article traces the evolution of AI training stability from early manual operations on small GPU clusters to sophisticated, fault‑tolerant infrastructures for thousand‑card and ten‑thousand‑card models, detailing Baidu Baige’s metrics, monitoring, eBPF‑based diagnostics, and checkpoint strategies that reduce invalid training time and accelerate fault recovery.

Distributed SystemsLarge-Scale Trainingcheckpointing
0 likes · 22 min read
Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mar 20, 2025 · Backend Development

Comprehensive Guide to Apache Zookeeper: Architecture, Use Cases, and Commands

This article provides an in‑depth overview of Apache Zookeeper, covering its core concepts, common application scenarios such as pub/sub, configuration management and naming services, detailed architecture including znodes and node types, the watch mechanism, ZAB consensus protocol, and practical usage examples with Maven dependencies, Java client code, and command‑line operations.

Backend DevelopmentCoordination ServiceDistributed Systems
0 likes · 9 min read
Comprehensive Guide to Apache Zookeeper: Architecture, Use Cases, and Commands
FunTester
FunTester
Mar 18, 2025 · Operations

How to Build a Fault‑Isolation Shield for High‑Traffic Distributed Systems

The article explains how to construct a comprehensive fault‑isolation and protection system for modern distributed applications, covering entry‑side rate limiting, exit‑side circuit breaking, internal resource isolation, monitoring, chaos‑engineering validation, and automatic self‑healing mechanisms using tools such as Sentinel, Nginx, Hystrix, SkyWalking, Prometheus and Kubernetes.

Circuit BreakingDistributed SystemsMicroservices
0 likes · 7 min read
How to Build a Fault‑Isolation Shield for High‑Traffic Distributed Systems
Baidu Geek Talk
Baidu Geek Talk
Mar 17, 2025 · Industry Insights

From Manual Restarts to Automated Fault Tolerance: The Evolution of AI Training Stability

This article traces the decade‑long evolution of AI training stability—from early small‑model manual operations to large‑scale, multi‑thousand‑GPU clusters—detailing metrics like invalid training time, fault‑tolerance architectures, eBPF‑based hidden‑fault detection, BCCL enhancements, multi‑level restart strategies, and trigger‑based checkpointing that together shrink downtime from minutes to seconds.

AI trainingDistributed SystemsInfrastructure
0 likes · 22 min read
From Manual Restarts to Automated Fault Tolerance: The Evolution of AI Training Stability
Dual-Track Product Journal
Dual-Track Product Journal
Mar 14, 2025 · Operations

How Bad Inventory Sync Can Kill Your E‑commerce Business—and 3 Fixes to Save It

This article examines how delayed or inconsistent inventory synchronization leads to costly overselling and deadstock in e‑commerce, presents three destructive synchronization patterns, and offers a step‑by‑step guide—including real‑time messaging, distributed locks, rule‑engine integration, and intelligent alerts—to transform inventory management from a liability into a self‑healing system.

BackendDistributed SystemsOperations
0 likes · 8 min read
How Bad Inventory Sync Can Kill Your E‑commerce Business—and 3 Fixes to Save It
Su San Talks Tech
Su San Talks Tech
Mar 14, 2025 · Backend Development

Ensuring Idempotency in Distributed Systems: Patterns, Code, and Best Practices

This article explains the concept of idempotency, outlines scenarios where it is essential, analyzes common causes of idempotency problems, and presents a comprehensive set of solutions—including unique constraints, optimistic and pessimistic locks, distributed locks, token mechanisms, state machines, deduplication tables, and global request IDs—accompanied by practical code examples and database design guidelines.

BackendDistributed SystemsIdempotency
0 likes · 14 min read
Ensuring Idempotency in Distributed Systems: Patterns, Code, and Best Practices
Architect's Guide
Architect's Guide
Feb 26, 2025 · Backend Development

Why Microservices Are More About Organizational Structure Than Pure Technology

The article critically examines the hype around microservices, arguing that their true value lies in promoting modularity, clear ownership, and organizational clarity rather than solving inherent technical problems, and it traces these ideas back to classic software engineering principles and modern operational challenges.

Distributed SystemsScalabilitySoftware Architecture
0 likes · 22 min read
Why Microservices Are More About Organizational Structure Than Pure Technology
Ops Development & AI Practice
Ops Development & AI Practice
Feb 20, 2025 · Backend Development

Mastering Apache RocketMQ: Ports, Commands, and Monitoring Tips

This guide explains the key port configurations of Apache RocketMQ brokers, details essential mqadmin commands for managing topics, checking status, and monitoring consumer progress, and provides practical examples to help administrators efficiently operate and troubleshoot RocketMQ clusters.

CLIDistributed SystemsMessage Queue
0 likes · 7 min read
Mastering Apache RocketMQ: Ports, Commands, and Monitoring Tips
Radish, Keep Going!
Radish, Keep Going!
Feb 16, 2025 · Fundamentals

Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation

Consistent hashing, a cornerstone of distributed systems, balances load, enhances scalability, and minimizes data migration; this article explains its fundamentals, the drawbacks of basic implementations, the role of virtual nodes, and provides a complete Go-language example with code for adding, removing, and locating nodes.

Distributed Systemsconsistent hashingload balancing
0 likes · 11 min read
Master Consistent Hashing: Principles, Virtual Nodes, and Go Implementation
Top Architect
Top Architect
Feb 12, 2025 · Backend Development

Payment System Architecture Overview and Core Components

This article presents a comprehensive overview of a typical payment system architecture, detailing the division between transaction and payment cores, their interactions, service governance, data consistency, asynchronous processing, and practical production practices for building stable, scalable backend payment services.

BackendDistributed Systemsarchitecture
0 likes · 9 min read
Payment System Architecture Overview and Core Components
Alimama Tech
Alimama Tech
Feb 12, 2025 · Artificial Intelligence

HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling

HighService, Alibaba’s Pythonic AI service framework, accelerates large‑model inference and maximizes GPU utilization by separating CPU‑GPU processes, offering out‑of‑the‑box quantization, parallelism and caching, and dynamically reallocating idle GPUs across clusters through a master‑worker scheduler to keep online latency low while boosting offline throughput for diffusion and LLM workloads.

AI ServiceDistributed SystemsPython
0 likes · 16 min read
HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling
Cognitive Technology Team
Cognitive Technology Team
Feb 2, 2025 · Fundamentals

Common Misconceptions in Distributed System Design and Their Solutions

Designing distributed systems often falls prey to misconceptions such as assuming reliable networks, zero latency, unlimited bandwidth, inherent security, static topology, zero transmission cost, and full autonomy, but applying retries, idempotency, message queues, encryption, dynamic discovery, caching, and time protocols can mitigate these issues.

ConsensusDistributed SystemsLatency
0 likes · 5 min read
Common Misconceptions in Distributed System Design and Their Solutions
Code Mala Tang
Code Mala Tang
Jan 31, 2025 · Backend Development

Mastering Retry Patterns in Node.js: From Basics to Advanced Strategies

This article explores the retry pattern as a resilient design technique for distributed systems, detailing its fundamentals, a simple Node.js implementation, and advanced strategies such as exponential backoff with jitter, circuit breaker integration, comprehensive logging, and best‑practice guidelines for robust error handling.

Distributed SystemsError HandlingNode.js
0 likes · 11 min read
Mastering Retry Patterns in Node.js: From Basics to Advanced Strategies
JD Tech
JD Tech
Jan 23, 2025 · Databases

Comprehensive Migration Plan for MongoDB to Alternative Data Stores

This article presents a complete MongoDB migration solution, detailing the migration rhythm, code refactoring using a decorator pattern, data source replacement with JImKV, MySQL and ES, bulk and incremental data transfer strategies, and deployment safeguards such as monitoring, gray release, and rollback to ensure a seamless cut‑over without service disruption.

Data MigrationDecorator PatternDistributed Systems
0 likes · 8 min read
Comprehensive Migration Plan for MongoDB to Alternative Data Stores
Tencent Cloud Developer
Tencent Cloud Developer
Jan 22, 2025 · Cloud Native

Rate Limiting: Concepts, Algorithms, and Distributed Solutions

Rate limiting protects micro‑service stability by rejecting excess traffic, using algorithms such as fixed‑window, sliding‑window, leaky‑bucket and token‑bucket, and can be deployed locally or distributed via Redis, load‑balancers, or coordination services, each offering different trade‑offs in precision, scalability, and complexity.

Distributed SystemsGolangMicroservices
0 likes · 31 min read
Rate Limiting: Concepts, Algorithms, and Distributed Solutions
JD Cloud Developers
JD Cloud Developers
Jan 22, 2025 · Backend Development

Mastering High-Concurrency Inventory Deduction for Flash Sale Systems

This article explores practical strategies for handling the high‑concurrency inventory deduction problem in flash‑sale scenarios, covering lock‑based approaches, Redis caching, partitioned stock management, asynchronous updates, and distributed scaling techniques to prevent overselling and improve throughput.

Backend ArchitectureDistributed Systemshigh concurrency
0 likes · 11 min read
Mastering High-Concurrency Inventory Deduction for Flash Sale Systems
Architecture & Thinking
Architecture & Thinking
Jan 14, 2025 · Backend Development

Master RocketMQ Basic Messages: Lifecycle, Code Samples & Use Cases

This guide explains Apache RocketMQ’s ordinary message concept, its full lifecycle, how to create topics, Java code for sending and receiving messages, key configuration tips, and real‑world scenarios such as asynchronous decoupling and traffic‑shaping for micro‑service architectures.

Distributed SystemsJavaMessage Queue
0 likes · 9 min read
Master RocketMQ Basic Messages: Lifecycle, Code Samples & Use Cases
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 10, 2025 · Databases

Boost System Performance: Using Locality Principles to Cut Database Queries

This article explains the locality principle—time and space locality—and shows how applying these concepts to caching and data access in distributed systems can dramatically reduce database query volume, improve latency, and achieve up to 84% performance gains while managing memory and GC overhead.

Distributed SystemsPerformance Optimizationcaching
0 likes · 21 min read
Boost System Performance: Using Locality Principles to Cut Database Queries
Architect
Architect
Jan 9, 2025 · Industry Insights

How to Ensure Immediate Reads After Writes in Multi-Active Architectures

This article analyzes the "write‑after‑immediate‑read" challenge in multi‑active disaster‑recovery setups, breaks down solution directions, presents a three‑city five‑center case study, and outlines a four‑step model—distinguish scenarios, mark written data, assess latency, and enable near‑by access—to achieve consistent, low‑latency reads.

BackendData ConsistencyDistributed Systems
0 likes · 15 min read
How to Ensure Immediate Reads After Writes in Multi-Active Architectures
IT Architects Alliance
IT Architects Alliance
Jan 8, 2025 · Backend Development

Mastering High Concurrency in Distributed Systems: Strategies & Real-World Cases

This article explores the challenges of handling massive simultaneous requests in distributed architectures and presents practical solutions such as load balancing, distributed caching, asynchronous processing, and sharding, illustrated with case studies from major e‑commerce and social platforms.

Backend ArchitectureDistributed Systemsasynchronous processing
0 likes · 20 min read
Mastering High Concurrency in Distributed Systems: Strategies & Real-World Cases
vivo Internet Technology
vivo Internet Technology
Jan 8, 2025 · Cloud Native

vivo Internet Technology 2024 Year-End Technical Review

In its 2024 year‑end review, vivo Internet Technology published 44 technical articles, delivered 19 presentations on cloud‑native, AI, security and more, joined major open‑source foundations such as Linux, CNCF and CCF, and grew a community of over 70,000 technical professionals.

Distributed SystemsJavaPerformance Optimization
0 likes · 8 min read
vivo Internet Technology 2024 Year-End Technical Review
IT Architects Alliance
IT Architects Alliance
Jan 7, 2025 · Industry Insights

Why Multi-Active Architecture Matters and How to Build It

The article explains why multi‑active (active‑active) architecture is essential for modern enterprises, outlines its evolution from single‑server setups, details core principles like redundancy and data synchronization, compares common deployment patterns, examines industry use cases, and discusses challenges and mitigation strategies.

Data ConsistencyDistributed Systemscloud computing
0 likes · 21 min read
Why Multi-Active Architecture Matters and How to Build It
Su San Talks Tech
Su San Talks Tech
Jan 7, 2025 · Databases

7 Common Pitfalls of Database Sharding and How to Solve Them

This article examines seven typical challenges that arise after implementing database sharding—such as global ID conflicts, cross‑shard queries, distributed transactions, shard‑key design, data migration, pagination, and operational overhead—and provides practical solutions and code examples to address each issue.

Distributed SystemsID generationdatabase scaling
0 likes · 12 min read
7 Common Pitfalls of Database Sharding and How to Solve Them
IT Architects Alliance
IT Architects Alliance
Jan 6, 2025 · Fundamentals

Mastering the CAP Theorem: Balancing Consistency, Availability, and Partition Tolerance

An in‑depth guide explains the CAP theorem’s three pillars—Consistency, Availability, Partition Tolerance—illustrates why only two can be achieved simultaneously, and shows real‑world trade‑offs across e‑commerce, finance, and social platforms, while introducing the complementary BASE model for practical system design.

AvailabilityBASE modelCAP theorem
0 likes · 15 min read
Mastering the CAP Theorem: Balancing Consistency, Availability, and Partition Tolerance
Architect
Architect
Jan 3, 2025 · Operations

Designing Multi‑Active Distributed Systems: Overcoming Write Latency and Data Replication Challenges

This article analyzes the architectural impact of cross‑city multi‑active deployments, focusing on data‑layer design, write latency, sharding strategies, replication topologies, and routing considerations to achieve high availability, performance, and scalability in large‑scale distributed systems.

Distributed Systemsdata replicationmulti-active architecture
0 likes · 22 min read
Designing Multi‑Active Distributed Systems: Overcoming Write Latency and Data Replication Challenges
Architecture Digest
Architecture Digest
Jan 3, 2025 · Operations

Designing High‑Availability Architecture with Rate Limiting, Circuit Breaking, and Degradation Strategies

This article explains how to build a highly available distributed e‑commerce system by using load‑balancing, redundant servers, rate‑limiting techniques, circuit‑breaker patterns, and graceful degradation methods, and provides concrete Spring Cloud and Java code examples for each strategy.

BackendCircuit BreakingDistributed Systems
0 likes · 18 min read
Designing High‑Availability Architecture with Rate Limiting, Circuit Breaking, and Degradation Strategies
dbaplus Community
dbaplus Community
Jan 1, 2025 · Backend Development

Mastering Multi-Active Data Architecture: Reducing Write Latency and Ensuring High Availability

This article examines the challenges of building multi‑active distributed systems, focusing on the data layer’s role in high availability, write‑latency, sharding, isolation, replication strategies, and routing decisions, and provides concrete architectural patterns and practical guidelines for robust backend design.

Distributed SystemsLatencydata replication
0 likes · 23 min read
Mastering Multi-Active Data Architecture: Reducing Write Latency and Ensuring High Availability
BirdNest Tech Talk
BirdNest Tech Talk
Dec 29, 2024 · Fundamentals

Unlocking Distributed System Design: 20 Core Patterns Explained

This article distills the key design patterns behind distributed systems—covering replication, partitioning, consensus, and fault‑tolerance—by presenting each pattern’s problem statement, concrete solution, trade‑offs, and technical considerations, all illustrated with real‑world examples from projects like Kafka and Cassandra.

ConsensusDesign PatternsDistributed Systems
0 likes · 18 min read
Unlocking Distributed System Design: 20 Core Patterns Explained
macrozheng
macrozheng
Dec 28, 2024 · Operations

What Makes China’s 12306 Railway Ticketing System So Resilient?

The article examines China’s 12306 railway ticketing platform, tracing its evolution from early Unix‑based reservation software to a massive, real‑time, three‑tier distributed system that handles billions of requests during peak travel periods, highlighting its architectural challenges, high‑concurrency solutions, and unique national centralization.

ChinaDistributed Systemshigh concurrency
0 likes · 9 min read
What Makes China’s 12306 Railway Ticketing System So Resilient?
Selected Java Interview Questions
Selected Java Interview Questions
Dec 24, 2024 · Backend Development

Design and Implementation of a Custom Distributed Job Scheduling Framework (k‑job)

This article introduces the motivation, architecture, technology choices, and key implementation details of a lightweight, highly extensible distributed job scheduling framework built on gRPC, Protobuf, a custom name‑server, and a bespoke message‑queue, addressing limitations of existing solutions like Quartz, XXL‑Job, and PowerJob.

Distributed SystemsJavaJob Scheduling
0 likes · 14 min read
Design and Implementation of a Custom Distributed Job Scheduling Framework (k‑job)
JavaEdge
JavaEdge
Dec 23, 2024 · Backend Development

How Meta Achieves Near‑Perfect Cache Consistency: Lessons from Polaris

This article explains why cache consistency is critical for Meta, how the company measures and monitors consistency, the design of the Polaris system that detects and resolves stale cache entries, and provides a concrete Python‑style example illustrating the challenges and solutions.

ConsistencyDistributed SystemsMeta
0 likes · 14 min read
How Meta Achieves Near‑Perfect Cache Consistency: Lessons from Polaris
Architecture Digest
Architecture Digest
Dec 22, 2024 · Backend Development

Technical Overview and History of China’s 12306 Railway Ticketing System

The article provides a detailed, informal yet informative overview of the evolution, architecture, and massive scale challenges of China’s 12306 railway ticketing platform, tracing its roots from early Unix‑based systems to modern distributed backend solutions handling billions of requests during peak travel periods.

Backend ArchitectureChinaDistributed Systems
0 likes · 9 min read
Technical Overview and History of China’s 12306 Railway Ticketing System
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Dec 19, 2024 · Databases

Data Consistency Verification Practices and Implementation at Xiaohongshu

Xiaohongshu built a lock‑free, non‑disruptive data‑consistency verification tool that automatically selects optimal methods, handles heterogeneous sources and dynamic changes, performs full and incremental checks via chunked checksums or row‑by‑row comparison, quickly isolates mismatches, and supports automatic remediation, ensuring reliable migrations and sharding.

Data ConsistencyDistributed Systemsdata validation
0 likes · 16 min read
Data Consistency Verification Practices and Implementation at Xiaohongshu
Java Tech Enthusiast
Java Tech Enthusiast
Dec 17, 2024 · Databases

DBOS – Database‑Oriented Operating System

DBOS, a Database‑Oriented Operating System proposed by Matei Zaharia and Michael Stonebrake, builds the OS atop a distributed, ACID‑compliant database, storing all system and application state in tables, which simplifies scaling, ensures strong consistency, improves debugging, and reduces attack surface for cloud‑native workloads.

DBOSDistributed SystemsOperating System
0 likes · 8 min read
DBOS – Database‑Oriented Operating System
MaGe Linux Operations
MaGe Linux Operations
Dec 14, 2024 · Big Data

Master Kafka: From Core Concepts to Real-World Deployment

This comprehensive guide explains Kafka’s architecture, core APIs, topics and partitions, deployment steps, multi‑broker clustering, and practical use cases such as messaging, log aggregation, stream processing, and data import/export with Kafka Connect, providing a hands‑on tutorial for developers and engineers.

Distributed SystemsInstallationKafka
0 likes · 30 min read
Master Kafka: From Core Concepts to Real-World Deployment
Tencent Cloud Developer
Tencent Cloud Developer
Dec 12, 2024 · Backend Development

Common Rate Limiting Algorithms: Fixed Window, Sliding Window, Sliding Log, Leaky Bucket, and Token Bucket

The article examines five common rate‑limiting algorithms—Fixed Window, Sliding Window, Sliding Log, Leaky Bucket, and Token Bucket—detailing their principles, pros and cons, and providing complete C++ implementations to help developers choose the best approach for controlling traffic bursts and ensuring system stability.

BackendC++Distributed Systems
0 likes · 14 min read
Common Rate Limiting Algorithms: Fixed Window, Sliding Window, Sliding Log, Leaky Bucket, and Token Bucket
JD Tech Talk
JD Tech Talk
Dec 11, 2024 · Backend Development

Analysis of Message Queue Disorder Issues and Practical Solutions

This article examines the root causes of message queue disorder in distributed systems, illustrates real‑world impacts such as data loss during migration, and presents concrete mitigation strategies including ordered messaging, pre‑processing checks, state‑machine handling, and monitoring to improve system reliability.

Distributed SystemsMessage QueueReliability
0 likes · 9 min read
Analysis of Message Queue Disorder Issues and Practical Solutions
FunTester
FunTester
Dec 5, 2024 · Backend Development

Understanding Aeron: A High‑Performance Messaging Framework and Its Advantages

Aeron is an open‑source, low‑latency, high‑throughput messaging framework that leverages zero‑copy memory, shared‑memory IPC and UDP transport to deliver microsecond‑level latency for finance, gaming, and distributed systems, offering a simple API and powerful performance features.

AeronDistributed SystemsHigh-Performance Messaging
0 likes · 9 min read
Understanding Aeron: A High‑Performance Messaging Framework and Its Advantages
Architecture & Thinking
Architecture & Thinking
Dec 5, 2024 · Backend Development

Understanding Apache RocketMQ: Domain Model, Communication & Message Patterns

This article explains Apache RocketMQ's core components—including producers, topics, queues, and consumer groups—covers synchronous RPC versus asynchronous messaging, compares point‑to‑point and publish‑subscribe transmission models, and highlights their suitable scenarios and trade‑offs.

Backend DevelopmentDistributed SystemsMessage Queue
0 likes · 9 min read
Understanding Apache RocketMQ: Domain Model, Communication & Message Patterns
Sanyou's Java Diary
Sanyou's Java Diary
Dec 2, 2024 · Big Data

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower mechanics, logical and physical storage models, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, log retention and compaction, indexing, zero‑copy transmission, and the reliability concepts that ensure data durability.

Data StreamingDistributed SystemsKafka
0 likes · 18 min read
Understanding Kafka: Core Architecture, Storage, and Reliability Explained
BirdNest Tech Talk
BirdNest Tech Talk
Dec 1, 2024 · Fundamentals

How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls

Establishing an RDMA connection requires exchanging key parameters such as LID, QP number, and memory keys, and this article systematically outlines the essential information, compares six exchange methods—from static configuration to distributed services—and evaluates their advantages, drawbacks, and suitable scenarios.

Distributed SystemsInfiniBandNetworking
0 likes · 7 min read
How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls
Architecture and Beyond
Architecture and Beyond
Nov 30, 2024 · Artificial Intelligence

Scalable Engineering Architecture for AIGC Products: Principles, Design, and Implementation

This article examines why scalability is a core requirement for AIGC products and presents a comprehensive engineering architecture—including modular design, distributed systems, resource scheduling, queue management, and layered architecture—to achieve high performance, cost efficiency, and long‑term maintainability.

AIGCDistributed SystemsScalability
0 likes · 20 min read
Scalable Engineering Architecture for AIGC Products: Principles, Design, and Implementation
Lobster Programming
Lobster Programming
Nov 28, 2024 · Fundamentals

How Paxos Guarantees Strong Consistency in Distributed Systems

This article explains the Paxos consensus algorithm, detailing its roles (proposer, acceptor, learner), the two-phase prepare and accept process, handling of proposal numbers, and how it ensures strong consistency across distributed nodes through examples and diagrams.

Distributed SystemsPaxosalgorithm
0 likes · 9 min read
How Paxos Guarantees Strong Consistency in Distributed Systems
Tencent Cloud Developer
Tencent Cloud Developer
Nov 27, 2024 · Databases

Analyzing the Write‑After‑Read Consistency Challenge in Multi‑Active Distributed Architectures

The article examines the write‑after‑read consistency problem in multi‑active cross‑region systems, compares single‑write‑single‑read routing, quorum‑based multi‑write‑multi‑read, and read‑while‑copy methods, explains why primary‑secondary replication is preferred, and proposes a four‑step framework—scenario flagging, data marking, latency evaluation, and near‑by asynchronous replication—to meet WAR requirements efficiently.

ConsistencyDatabase ReplicationDistributed Systems
0 likes · 12 min read
Analyzing the Write‑After‑Read Consistency Challenge in Multi‑Active Distributed Architectures
Architecture & Thinking
Architecture & Thinking
Nov 25, 2024 · Backend Development

Mastering RocketMQ: Core Concepts, Comparison, and Java Implementation

This comprehensive guide introduces RocketMQ's architecture, compares it with RabbitMQ and Kafka, outlines typical use cases, explains key concepts such as producers, brokers, consumers, topics, tags, and offsets, and provides complete Java code examples for building producers and consumers.

Backend DevelopmentDistributed SystemsJava
0 likes · 14 min read
Mastering RocketMQ: Core Concepts, Comparison, and Java Implementation
Top Architect
Top Architect
Nov 23, 2024 · Backend Development

Integrating Spring Boot with XXL-Job for Distributed Task Scheduling

This article explains how to integrate Spring Boot with the open‑source XXL‑Job distributed task scheduler, covering XXL‑Job fundamentals, configuration of the admin console and executor, Maven dependencies, property settings, code examples, @XxlJob annotation parameters, best practices, and includes additional promotional material.

Backend DevelopmentDistributed SystemsJava
0 likes · 16 min read
Integrating Spring Boot with XXL-Job for Distributed Task Scheduling
DataFunSummit
DataFunSummit
Nov 22, 2024 · Artificial Intelligence

EasyRec Recommendation Algorithm Training and Inference Optimization

This article presents a comprehensive overview of EasyRec’s recommendation system architecture, detailing training and inference optimizations, embedding parallelism, CPU/GPU placement strategies, online learning pipelines, and network compression techniques that together improve scalability, latency, and cost efficiency.

Distributed SystemsEasyRecInference Optimization
0 likes · 15 min read
EasyRec Recommendation Algorithm Training and Inference Optimization
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 20, 2024 · Backend Development

Design and Implementation of a High‑Performance Message Notification System

This article presents a comprehensive design of a high‑performance, fault‑tolerant message notification system, covering service partitioning, system architecture, idempotent processing, dynamic error detection, thread‑pool management, retry mechanisms, and stability measures such as traffic‑spike handling, resource isolation, third‑party protection, monitoring, and active‑active deployment.

Backend ArchitectureDistributed SystemsJava
0 likes · 16 min read
Design and Implementation of a High‑Performance Message Notification System
Top Architect
Top Architect
Nov 20, 2024 · Big Data

Understanding Distributed Systems and Kafka: Architecture, Message Ordering, and Java Consumer Practices

This article explains the fundamentals of distributed systems, introduces Apache Kafka's architecture and components, discusses how Kafka ensures ordered message consumption, and provides Java consumer configuration tips to maintain message order, offering practical guidance for backend developers working with streaming data.

Big DataDistributed SystemsJava
0 likes · 11 min read
Understanding Distributed Systems and Kafka: Architecture, Message Ordering, and Java Consumer Practices
JavaEdge
JavaEdge
Nov 16, 2024 · Backend Development

How Netflix Built a Low‑Latency Distributed Counter Service at Scale

This article explains Netflix's distributed counter abstraction built on their time‑series service, detailing use cases, API design, counter types, implementation methods, control‑plane configuration, performance results, and future work to achieve near‑real‑time, low‑latency counting at massive scale.

Backend ArchitectureDistributed SystemsLow latency
0 likes · 25 min read
How Netflix Built a Low‑Latency Distributed Counter Service at Scale
Architecture & Thinking
Architecture & Thinking
Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataClickHouseCloud Native
0 likes · 22 min read
How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets
Volcano Engine Developer Services
Volcano Engine Developer Services
Nov 14, 2024 · Cloud Computing

Why Edge Cloud Is the Next Frontier: Trends, Challenges, and Solutions

This article examines the evolution of edge cloud from its early CDN roots to modern edge-native operating systems, outlines the business drivers and technical challenges such as massive node management, lightweight constraints, weak network environments, and multi‑shape compute needs, and presents the architecture, key components, and future directions of edge cloud solutions.

Distributed SystemsEdge ComputingResource Management
0 likes · 22 min read
Why Edge Cloud Is the Next Frontier: Trends, Challenges, and Solutions
Cognitive Technology Team
Cognitive Technology Team
Nov 14, 2024 · Operations

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

To ensure distributed applications can recover automatically from hardware, network, or service failures, this guide outlines three core capabilities—fault detection, graceful handling, and monitoring—plus practical strategies such as asynchronous component separation, retries, circuit breakers, isolation, load shedding, failover, compensation, checkpointing, graceful degradation, rate limiting, leader election, fault injection, chaos engineering, and use of availability zones.

Cloud NativeDistributed SystemsOperations
0 likes · 7 min read
Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems
DeWu Technology
DeWu Technology
Nov 13, 2024 · Backend Development

Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements

The new Rainbow Bridge architecture replaces the SLB‑based load‑balancing model with a self‑managed, multi‑AZ metadata center and enhanced SDK that aggregates node health, provides zone‑aware weighted routing, supports rapid failover and manual overrides, and delivers faster recovery and scalable traffic handling.

Distributed Systemsload balancingmetadata
0 likes · 11 min read
Evolution of Rainbow Bridge Architecture: Building a Self‑Managed Metadata Center and SDK Enhancements
Baidu Tech Salon
Baidu Tech Salon
Nov 8, 2024 · Cloud Computing

Design and Evolution of Baidu Canghai Storage Unified Technology Stack

Baidu Canghai Storage’s unified technology stack—comprising a meta‑aware distributed metadata layer, a hybrid single‑node‑distributed namespace, and an online erasure‑coding data layer—delivers AI‑driven, high‑performance, low‑cost, ZB‑scale cloud storage by modularizing metadata, namespace, and data services for object, file, and block workloads.

BaiduDistributed SystemsMicroservices
0 likes · 16 min read
Design and Evolution of Baidu Canghai Storage Unified Technology Stack
Huolala Tech
Huolala Tech
Nov 8, 2024 · Backend Development

How Huolala Built a Scalable Real‑Time Reconciliation Platform for Millions of Daily Transactions

Huolala’s real‑time reconciliation platform tackles massive daily transaction volumes by addressing distributed system consistency, high‑throughput data ingestion, dynamic cluster scaling, and security safeguards, enabling sub‑second settlement verification across hundreds of services.

Backend ArchitectureData ConsistencyDistributed Systems
0 likes · 10 min read
How Huolala Built a Scalable Real‑Time Reconciliation Platform for Millions of Daily Transactions
58 Tech
58 Tech
Nov 8, 2024 · Operations

Design and Optimization of an App Operation Platform: Ensuring High Availability, Performance, and Scalability

This article details the architecture, challenges, and optimization techniques of an app operation platform, covering its dual-engine design, caching strategies, and high‑availability principles that reduce response time to under 4 ms while supporting massive concurrent traffic.

App OperationsDistributed SystemsPerformance Optimization
0 likes · 7 min read
Design and Optimization of an App Operation Platform: Ensuring High Availability, Performance, and Scalability
Tencent Cloud Developer
Tencent Cloud Developer
Nov 7, 2024 · Backend Development

Cache Consistency Strategies and Best Practices for the Cache‑Aside Pattern

The article explains cache‑aside consistency challenges and compares four update strategies—DB‑then‑cache, cache‑then‑DB, DB‑then‑delete, and delete‑then‑DB—showing that deleting the cache after a successful DB write offers the smallest inconsistency window, while recommending TTLs, message‑queue invalidation, and multi‑key coordination for robust eventual consistency.

Cache ConsistencyDistributed Systemscache-aside
0 likes · 20 min read
Cache Consistency Strategies and Best Practices for the Cache‑Aside Pattern
Baidu Geek Talk
Baidu Geek Talk
Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

Big DataDistributed SystemsNewSQL
0 likes · 18 min read
Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers
Architect
Architect
Nov 3, 2024 · Backend Development

How Ctrip Scaled Its Ticket Booking System for Flash‑Sale Events

This article analyzes the challenges Ctrip faced when handling massive traffic during ticket flash‑sale events and details the architectural upgrades, caching strategies, database optimizations, supplier integration safeguards, and traffic‑control mechanisms that enabled stable, fast, and consistent booking experiences.

BackendDistributed SystemsSystem Architecture
0 likes · 18 min read
How Ctrip Scaled Its Ticket Booking System for Flash‑Sale Events
Architect
Architect
Oct 31, 2024 · Cloud Native

Designing a Resilient Stateful Distributed System for Cloud‑Native Environments

This article analyzes the motivations, models, and design considerations for building stateful distributed architectures—covering microservices, service discovery, access‑layer isolation, fault tolerance, scaling, and deployment strategies—to help architects create reliable, low‑latency cloud‑native systems.

Cloud NativeDistributed SystemsMicroservices
0 likes · 33 min read
Designing a Resilient Stateful Distributed System for Cloud‑Native Environments
JD Retail Technology
JD Retail Technology
Oct 31, 2024 · Big Data

JDQ Kafka Bandwidth Throttling Architecture and Optimization

This article presents an in‑depth analysis of Kafka's native throttling mechanisms, identifies their limitations in large‑scale e‑commerce scenarios, and introduces JDQ's multi‑dimensional, dynamic throttling architecture that ensures stable throughput and priority‑aware bandwidth management across broker failures and traffic spikes.

Distributed SystemsJDQKafka
0 likes · 17 min read
JDQ Kafka Bandwidth Throttling Architecture and Optimization
Tencent Cloud Developer
Tencent Cloud Developer
Oct 31, 2024 · Backend Development

Monolith vs Microservices: Evolution of Architecture and How to Choose

The article traces software architecture from early distributed systems through monoliths, SOA, microservices and serverless, explaining why each paradigm arose, the trade‑offs they entail, and how to decide between monolith and microservices based on team size, expertise, organizational structure, and business needs.

Distributed SystemsMicroservicesSoftware Architecture
0 likes · 25 min read
Monolith vs Microservices: Evolution of Architecture and How to Choose
Architect
Architect
Oct 30, 2024 · Backend Development

How to Build Distributed WebSocket Messaging with Spring, Redis, and Kafka

This article explains how to enable cross‑node WebSocket communication in a distributed Spring application by using a message queue (Redis or Kafka) to broadcast messages, tracking user connections with Redis, and providing a complete step‑by‑step implementation with code samples and configuration details.

Distributed SystemsJavaKafka
0 likes · 20 min read
How to Build Distributed WebSocket Messaging with Spring, Redis, and Kafka
Baidu Geek Talk
Baidu Geek Talk
Oct 30, 2024 · Cloud Computing

Baidu Cloud Infrastructure for AI-Native Era

Baidu Intelligent Cloud outlines how its evolving, high-performance infrastructure—featuring rapid 3-minute instance provisioning, over 200 GB bandwidth, elastic computing, specialized storage, and AI-driven MLOps tools—enables AI-native model training and deployment across booming sectors such as automotive and finance, supporting the industry’s shift to AI-centric cloud services.

Case StudiesDistributed SystemsMLOps
0 likes · 9 min read
Baidu Cloud Infrastructure for AI-Native Era
Tencent Cloud Middleware
Tencent Cloud Middleware
Oct 30, 2024 · Backend Development

How Kafka Guarantees High Reliability and Performance – A Deep Technical Dive

This article thoroughly examines Apache Kafka’s architecture, covering its macro components, ack strategies, replication mechanisms, high‑watermark handling, leader election, and performance optimizations such as batch sending, compression, PageCache, zero‑copy, mmap and sendfile, while also explaining common pitfalls like data loss and log corruption.

Distributed SystemsKafkaMessage Queue
0 likes · 31 min read
How Kafka Guarantees High Reliability and Performance – A Deep Technical Dive
Tencent Cloud Developer
Tencent Cloud Developer
Oct 22, 2024 · Industry Insights

Designing Stateful Distributed Systems: Core Principles and Architecture Patterns

This article analyzes the motivations, benefits, and challenges of building stateful distributed systems, compares monolithic, SOA, and microservice models, and provides detailed guidance on access layers, service discovery, fault tolerance, scaling, and data storage for cloud‑native architectures.

Cloud NativeDistributed SystemsMicroservices
0 likes · 29 min read
Designing Stateful Distributed Systems: Core Principles and Architecture Patterns
JavaEdge
JavaEdge
Oct 21, 2024 · Operations

Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture

This article explores the advantages of unitized architecture over traditional microservices, detailing how its modular design, dedicated routing layer, and tailored observability practices enhance system resilience, fault‑tolerance, and operational insight for large‑scale distributed applications.

Distributed SystemsResiliencefault tolerance
0 likes · 17 min read
Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture
Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGClickHouseCloud Native
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
Architect
Architect
Oct 17, 2024 · Operations

Designing Multi‑Active Distributed Systems: Key Factors and Replication Strategies

This article analyzes the architectural challenges of building large‑scale distributed systems with multi‑active (cross‑city) capabilities, focusing on data‑layer design, write latency, replication models, sharding techniques, and routing impacts to guide reliable, high‑performance infrastructure decisions.

Distributed Systemsarchitecturedata replication
0 likes · 22 min read
Designing Multi‑Active Distributed Systems: Key Factors and Replication Strategies
Tencent Cloud Developer
Tencent Cloud Developer
Oct 15, 2024 · Industry Insights

Why Write Latency Drives Multi‑Active Distributed Architecture Design

This article analyzes how write latency, write volume, isolation, and data replication strategies influence the design of multi‑active distributed systems, offering practical guidance on sharding, synchronous and asynchronous replication, routing, and architecture selection for high availability and performance across regions.

Distributed Systemsdata replicationhigh availability
0 likes · 23 min read
Why Write Latency Drives Multi‑Active Distributed Architecture Design
Qunar Tech Salon
Qunar Tech Salon
Oct 10, 2024 · Operations

Design and Architecture of a Distributed Task Scheduling System for Database Automation

This document outlines the terminology, background, requirements, task classifications, state model, and detailed architecture—including TaskScheduler, TaskWorker, and TaskConsole components—of a new distributed task scheduling system designed to replace Celery in a database automation platform, with emphasis on scalability, reliability, and extensibility.

Distributed SystemsLocksOperations
0 likes · 23 min read
Design and Architecture of a Distributed Task Scheduling System for Database Automation
MaGe Linux Operations
MaGe Linux Operations
Oct 7, 2024 · Operations

Why Choose RocketMQ? Features, Comparisons, and Reliability Explained

This article provides a comprehensive overview of RocketMQ, covering its architecture, key features such as high reliability, low latency and high throughput, comparisons with Kafka, RabbitMQ and ActiveMQ, and detailed mechanisms that ensure message durability, performance, and ordered consumption.

Distributed SystemsLow latencyMessage Queue
0 likes · 12 min read
Why Choose RocketMQ? Features, Comparisons, and Reliability Explained
Su San Talks Tech
Su San Talks Tech
Oct 5, 2024 · Backend Development

Mastering Idempotency: Design Patterns and Code Solutions for Reliable APIs

Idempotency ensures that repeated API calls produce the same result without side effects, and this guide explains its principles, common scenarios like payments and messaging, root causes of idempotency failures, and multiple implementation strategies—including unique constraints, optimistic and pessimistic locks, distributed locks, token mechanisms, state machines, and deduplication tables—with practical code examples.

BackendDistributed SystemsIdempotency
0 likes · 14 min read
Mastering Idempotency: Design Patterns and Code Solutions for Reliable APIs
dbaplus Community
dbaplus Community
Oct 3, 2024 · Operations

How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems

This article explains Netflix's chaos engineering practice, detailing the challenges of microservice reliability, the implementation of the Chaos Monkey tool, the step‑by‑step methodology, guiding principles, and real‑world outcomes that demonstrate improved system availability.

Chaos MonkeyDistributed SystemsNetflix
0 likes · 6 min read
How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems
Open Source Tech Hub
Open Source Tech Hub
Oct 1, 2024 · Backend Development

Build a Distributed Casbin Watcher with Workerman Redis in PHP

This guide explains how to implement a Casbin Watcher for distributed policy synchronization using Workerman's asynchronous Redis client in PHP, covering the underlying principles, required interfaces, code implementation, and a complete usage example with publish‑subscribe messaging.

CasbinDistributed SystemsPHP
0 likes · 7 min read
Build a Distributed Casbin Watcher with Workerman Redis in PHP
IT Services Circle
IT Services Circle
Sep 27, 2024 · Operations

Analysis of the Shanghai Stock Exchange Outage and System Design Lessons

The article recounts the Shanghai Stock Exchange’s sudden P0 outage that halted trading, analyzes the causes such as massive order volume and system bottlenecks, and discusses how distributed architectures and message‑queue based queuing can mitigate similar high‑concurrency failures.

Distributed SystemsOperationshigh concurrency
0 likes · 6 min read
Analysis of the Shanghai Stock Exchange Outage and System Design Lessons
AntData
AntData
Sep 26, 2024 · Databases

Apache HoraeDB (CeresDB): An Open‑Source Distributed Time‑Series Database

Apache HoraeDB (CeresDB) is an open‑source, distributed, high‑availability time‑series database developed by Ant Group, supporting multi‑dimensional queries, compatible with Prometheus and OpenTSDB, and offering SQL and OLAP capabilities for use cases such as APM, IoT monitoring, financial analytics, and AI‑infra observability.

Distributed SystemsObservabilitySQL
0 likes · 5 min read
Apache HoraeDB (CeresDB): An Open‑Source Distributed Time‑Series Database
Java Architecture Stack
Java Architecture Stack
Sep 26, 2024 · Backend Development

Deep Dive into SOFAJRaft: How Java Implements Multi‑Raft Consensus

This article examines the core implementation of SOFAJRaft, a high‑performance Java library based on the Raft consensus algorithm, covering node initialization, leader election, log replication, snapshot handling, fault recovery, and multi‑Raft‑Group support with detailed code examples.

ConsensusDistributed SystemsJava
0 likes · 13 min read
Deep Dive into SOFAJRaft: How Java Implements Multi‑Raft Consensus
Baidu Tech Salon
Baidu Tech Salon
Sep 25, 2024 · Backend Development

Innovative Solutions for Reducing Result Inconsistency in Baidu Search System

The paper introduces a production‑grade framework that uses tiny controlled traffic, feature‑flattening experiments, dynamic debugging, and an automated inspection flywheel to measure each component’s contribution to Baidu’s search result diff‑rate, isolate root causes, and dramatically reduce inconsistency without impacting real users.

DebuggingDistributed Systemsdata flattening
0 likes · 13 min read
Innovative Solutions for Reducing Result Inconsistency in Baidu Search System
Baidu Geek Talk
Baidu Geek Talk
Sep 25, 2024 · Industry Insights

How Baidu Eliminated Search Result Inconsistencies with Data‑Flattening Experiments

Baidu tackled the challenge of search result inconsistency by quantifying diff rates, designing a data‑flattening technique, leveraging fake traffic and dynamic debugging, orchestrating large‑scale experiments, and automating inspection, ultimately identifying all contributing features and dramatically reducing result volatility.

BaiduDistributed Systemsdata flattening
0 likes · 15 min read
How Baidu Eliminated Search Result Inconsistencies with Data‑Flattening Experiments
Architect
Architect
Sep 24, 2024 · Industry Insights

How Bilibili Re‑engineered Its Search Indexing Pipeline for Hour‑Level Turnaround

This article details Bilibili's transformation of its search offline indexing architecture—from a manual, low‑throughput MySQL‑centric process to a distributed, KV‑based, protobuf‑driven pipeline that leverages Taishan storage and Spark, cutting build cycles from days to hours while solving performance, consistency, and maintenance challenges.

Big DataDistributed SystemsProtobuf
0 likes · 24 min read
How Bilibili Re‑engineered Its Search Indexing Pipeline for Hour‑Level Turnaround
Su San Talks Tech
Su San Talks Tech
Sep 22, 2024 · Backend Development

Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions

This article explains why rate limiting is essential for microservice stability, introduces basic concepts like thresholds and rejection strategies, and walks through multiple algorithms—including fixed‑window, sliding‑window, sliding‑log, leaky‑bucket, token‑bucket—and their Java implementations as well as Redis‑based distributed approaches, complete with code samples and performance considerations.

Backend DevelopmentDistributed SystemsJava
0 likes · 25 min read
Mastering Rate Limiting: From Fixed Windows to Redis Distributed Solutions
Java Tech Enthusiast
Java Tech Enthusiast
Sep 20, 2024 · Backend Development

What Is RPC and Why It Is Not a Protocol

The article clarifies that RPC (Remote Procedure Call) is a mechanism—not a protocol—used to abstract network communication so remote methods can be invoked like local calls, illustrating its design with LPC, dynamic proxies, request handlers, and showing HTTP as one possible implementation.

Backend DevelopmentDistributed SystemsIPC
0 likes · 6 min read
What Is RPC and Why It Is Not a Protocol
Deepin Linux
Deepin Linux
Sep 19, 2024 · Backend Development

Comprehensive Guide to gRPC: Concepts, C++ Implementation, and Real‑World Use Cases

This article explains the limitations of traditional RPC, introduces gRPC and Protocol Buffers, details their architecture and performance advantages, provides step‑by‑step C++ server and client code, and discusses practical scenarios such as microservices, real‑time data processing, and a file‑storage service example.

C++Distributed SystemsMicroservices
0 likes · 29 min read
Comprehensive Guide to gRPC: Concepts, C++ Implementation, and Real‑World Use Cases
FunTester
FunTester
Sep 18, 2024 · Operations

Overview and Practice of Chaos Engineering

Chaos Engineering introduces controlled failures to test system resilience, covering its history, practical benefits, experiment design, and a comparison of popular open‑source and commercial tools for improving reliability in distributed and cloud‑native environments.

Distributed SystemsReliability
0 likes · 13 min read
Overview and Practice of Chaos Engineering
macrozheng
macrozheng
Sep 12, 2024 · Backend Development

How to Design Scalable, Unique Order Numbers for High‑Traffic Systems

This article examines common order‑number generation rules and compares four practical solutions—UUID, database auto‑increment, Snowflake algorithm, and Redis INCR—providing code examples and best‑practice recommendations for building globally unique, fast‑producing identifiers in distributed backend systems.

Distributed Systemsorder IDunique identifier
0 likes · 12 min read
How to Design Scalable, Unique Order Numbers for High‑Traffic Systems
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 11, 2024 · Backend Development

How a Two‑Level Cache Boosted High‑Concurrency Container Performance

By redesigning the caching layer with a two‑level architecture combining local and distributed caches, the author dramatically reduced CPU usage, lowered response times, and increased system capacity under high QPS workloads, while evaluating trade‑offs of various cache strategies, pre‑warming, refresh mechanisms, and operational considerations.

Distributed SystemsPerformance Optimizationcaching
0 likes · 11 min read
How a Two‑Level Cache Boosted High‑Concurrency Container Performance
MaGe Linux Operations
MaGe Linux Operations
Sep 10, 2024 · Backend Development

How Kafka Elects Leaders and Distributes Partitions: A Deep Dive

This article explains Kafka's leader election process, partition assignment strategy, distribution policies, file layout, and the evolution of consumer offset storage, providing a comprehensive overview of how Kafka ensures reliable and efficient message handling in a distributed environment.

Distributed SystemsKafkaPartition Assignment
0 likes · 5 min read
How Kafka Elects Leaders and Distributes Partitions: A Deep Dive