Tagged articles
1275 articles
Page 1 of 13
Su San Talks Tech
Su San Talks Tech
May 19, 2026 · Interview Experience

Designing a Hundred‑Billion‑Scale Message Queue: A ByteDance Interview Walkthrough

This article walks through the interview question of designing a message queue that handles billions of messages daily and peaks at millions of QPS, covering traffic calculations, core roles, storage and throughput techniques, scalability, high availability, observability, framework comparisons, a real‑world case study, and key follow‑up interview topics.

High ThroughputKafkaMessage Queue
0 likes · 12 min read
Designing a Hundred‑Billion‑Scale Message Queue: A ByteDance Interview Walkthrough
Data Party THU
Data Party THU
May 1, 2026 · Artificial Intelligence

Scaling Large-Scale Agent Networks: A Review of Topology, Memory, and Updates

This review examines why some large‑scale multi‑agent systems remain stable while others falter, introducing a three‑dimensional taxonomy—topology, memory scope, and update behavior—to explain scalability limits and highlighting world‑model inconsistency as a deeper bottleneck than communication protocols.

MemoryScalabilitydynamic updates
0 likes · 9 min read
Scaling Large-Scale Agent Networks: A Review of Topology, Memory, and Updates
Architecture & Thinking
Architecture & Thinking
Apr 30, 2026 · Cloud Native

How RocketMQ 5.0’s New Proxy Layer Enables Compute‑Storage Separation and Cloud‑Native Scaling

RocketMQ 5.0 replaces the monolithic Broker with a stateless Proxy layer that decouples compute from storage, solves scalability, multi‑protocol and cloud‑native adaptation challenges, and is demonstrated through detailed architecture comparisons, Java code samples, and two real‑world IoT and finance case studies showing significant performance and cost benefits.

Cloud NativeCompute-Storage SeparationMessage Queue
0 likes · 20 min read
How RocketMQ 5.0’s New Proxy Layer Enables Compute‑Storage Separation and Cloud‑Native Scaling
AI Waka
AI Waka
Apr 21, 2026 · Artificial Intelligence

Why Massive Prompts Fail and How Skills Transform AI Agents

The article explains how monolithic system prompts become costly, unreliable, and hard to maintain as AI agents grow, and demonstrates a modular Skill‑based architecture that loads knowledge on demand, improves scalability, debugging, and reuse.

AIAgentPrompt engineering
0 likes · 13 min read
Why Massive Prompts Fail and How Skills Transform AI Agents
DevOps Coach
DevOps Coach
Apr 20, 2026 · Operations

How Netflix Scaled Live Streaming Ops to 400+ Events a Year

This article chronicles Netflix's evolution from a single‑show‑per‑month live stream to a sophisticated, multi‑center operation handling over 400 live events annually, detailing the architectural shifts, role specializations, event‑tiering system, and automation that enabled massive scale and reliability.

Broadcast EngineeringEvent TieringLive Command Center
0 likes · 21 min read
How Netflix Scaled Live Streaming Ops to 400+ Events a Year
Data Party THU
Data Party THU
Apr 19, 2026 · Artificial Intelligence

Mapping Large-Scale AI Agent Networks: A 3‑Dimensional Classification Framework

The article reviews recent growth in AI agent marketplaces and systems, introduces a three‑dimensional framework—topology, memory scope, and update behavior—to categorize large‑scale multi‑agent networks, and highlights world‑model inconsistency as the core scalability bottleneck.

AI agentsScalabilityclassification framework
0 likes · 8 min read
Mapping Large-Scale AI Agent Networks: A 3‑Dimensional Classification Framework
JD Tech
JD Tech
Apr 16, 2026 · Industry Insights

How JD Revolutionized Coupon Search with a Stream‑Batch Unified Architecture

This article analyzes JD's end‑to‑end upgrade of its retail coupon search infrastructure, detailing the business drivers, data‑skew challenges, the shift from dual KV and batch pipelines to a unified stream‑batch model built on Apache Doris, and the resulting performance, resource and stability gains across multiple scenarios.

Apache DorisBatch ProcessingCoupon Search
0 likes · 12 min read
How JD Revolutionized Coupon Search with a Stream‑Batch Unified Architecture
Ray's Galactic Tech
Ray's Galactic Tech
Apr 11, 2026 · Operations

Mastering Production‑Grade Kubernetes: From kubectl Basics to Scalable Cluster Management

This comprehensive guide walks you through turning simple kubectl commands into a robust, production‑ready Kubernetes platform by covering core architecture, scheduling, resource governance, high‑availability design, observability, security, GitOps workflows, and real‑world case studies for large‑scale deployments.

KubernetesObservabilityOps
0 likes · 52 min read
Mastering Production‑Grade Kubernetes: From kubectl Basics to Scalable Cluster Management
Ray's Galactic Tech
Ray's Galactic Tech
Apr 9, 2026 · Backend Development

From Demo to Production: Building a Secure, Scalable Text‑to‑SQL Service with Spring AI Alibaba

This article explains how to turn a simple Text‑to‑SQL demo into a production‑grade service by covering the underlying principles, layered architecture, risk‑control mechanisms, multi‑tenant security, high‑concurrency strategies, caching, observability, and deployment practices using Spring AI Alibaba.

ObservabilityScalabilitySecurity
0 likes · 40 min read
From Demo to Production: Building a Secure, Scalable Text‑to‑SQL Service with Spring AI Alibaba

How Kafka Powers Scalable E‑commerce Order Processing with Go

This article walks through the challenges of a fast‑growing e‑commerce platform during peak sales, explains why Apache Kafka is the ideal asynchronous messaging backbone, and provides a complete Go implementation—including producers, consumers, best‑practice patterns, and real‑world use cases—to achieve high throughput, fault tolerance, and seamless scalability.

Distributed SystemsMessage QueueSarama
0 likes · 14 min read
How Kafka Powers Scalable E‑commerce Order Processing with Go
Ray's Galactic Tech
Ray's Galactic Tech
Apr 6, 2026 · Backend Development

Building a Production‑Ready Go RAG System: From Theory to Real‑World Deployment

This comprehensive guide explains why Go is ideal for Retrieval‑Augmented Generation, details the full RAG pipeline, presents production‑grade architecture, design patterns, code snippets, scaling strategies, multi‑tenant isolation, deployment best practices, observability, and common pitfalls for enterprise‑level implementations.

ObservabilityRAGScalability
0 likes · 32 min read
Building a Production‑Ready Go RAG System: From Theory to Real‑World Deployment
dbaplus Community
dbaplus Community
Mar 17, 2026 · Backend Development

18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges

This article examines eighteen concrete production systems—from URL shorteners and Amazon S3 to YouTube, Stripe, Slack, and ChatGPT—showing how their design choices illustrate core concepts such as sharding, caching, idempotency, real‑time messaging, and large‑scale engineering, providing a practical roadmap for software engineers.

Case StudiesDistributed SystemsScalability
0 likes · 13 min read
18 Real-World System Case Studies That Reveal 90% of Software Engineering Challenges
Data STUDIO
Data STUDIO
Mar 3, 2026 · Backend Development

How to Build a Never‑Crashing, Scalable Python Backend

This article walks through practical techniques for designing a highly concurrent Python backend that stays stable under load, covering architecture planning, async programming, load balancing, database scaling, distributed tasks, caching, rate limiting, monitoring, and graceful shutdown.

FastAPIPythonScalability
0 likes · 20 min read
How to Build a Never‑Crashing, Scalable Python Backend
Fighter's World
Fighter's World
Feb 28, 2026 · Industry Insights

How Giga Builds a Differentiated Edge in the Crowded AI Customer Service Market

Giga, an AI agent startup founded by IIT Kharagpur alumni, pivoted to AI customer service, leveraging a Python-as-Primitive architecture and the Atlas multi‑agent system to automate FDE work, achieve 98% resolution rates, and position itself against competitors through speed, complex‑scenario handling, and a reusable Skills library.

AI Customer ServiceAI agentsAtlas
0 likes · 21 min read
How Giga Builds a Differentiated Edge in the Crowded AI Customer Service Market
Top Architect
Top Architect
Feb 23, 2026 · Backend Development

How Taobao Scaled: 14 Evolution Steps of a Massive Backend Architecture

This article walks through the step‑by‑step evolution of a large‑scale e‑commerce backend—from a single‑server setup to microservices, containerization, and cloud platforms—highlighting the technical challenges, key technologies, and design principles that enable millions of concurrent users.

Backend ArchitectureScalabilitycloud computing
0 likes · 24 min read
How Taobao Scaled: 14 Evolution Steps of a Massive Backend Architecture
AI Waka
AI Waka
Feb 22, 2026 · Industry Insights

Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It

The article explains why naïve multi‑agent AI architectures collapse under load due to internal east‑west dependencies, and shows how applying 12‑Factor App and cloud‑native patterns—isolated workers, externalized state, short‑lived sessions, and strict orchestration—enable scalable, fault‑tolerant agentic systems.

12-factorCloud NativeDistributed Systems
0 likes · 17 min read
Why Multi‑Agent AI Fails at Scale and How 12‑Factor Cloud‑Native Principles Save It
Baidu Geek Talk
Baidu Geek Talk
Feb 9, 2026 · Databases

How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale

This article recounts how Baidu's storage team tackled the performance and scalability limits of traditional object storage by redesigning metadata handling with the Mantle and MantleX architectures, introducing a centralized IndexNode, strong consistency, delta‑record writes, and a seamless single‑node to distributed transition for massive file systems.

FilesystemPerformance OptimizationScalability
0 likes · 37 min read
How Mantle Redefined Cloud Object Storage Metadata for Billion‑File Scale
dbaplus Community
dbaplus Community
Feb 3, 2026 · Backend Development

When Microservices Become a Trap: Risks, Costs, and When They Really Pay Off

This article explains why microservices, while attractive for large systems, introduce hidden costs, operational complexity, network latency, data management challenges, and testing difficulties, and provides a decision framework to determine when a monolith‑first approach is more appropriate.

Backend ArchitectureMicroservicesScalability
0 likes · 15 min read
When Microservices Become a Trap: Risks, Costs, and When They Really Pay Off
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Feb 3, 2026 · Artificial Intelligence

INCS: A DRL‑Based Intent‑Driven Network‑Wide Configuration Synthesis Framework

The article presents INCS, a novel framework that combines graph neural networks and deep reinforcement learning to achieve protocol‑agnostic, millisecond‑level, globally optimized network configuration synthesis, addressing scalability, protocol dependence, and lack of optimization in traditional SMT‑based methods, and demonstrates its superior performance on large‑scale topologies.

DDPGGraph Neural NetworkNetwork Synthesis
0 likes · 8 min read
INCS: A DRL‑Based Intent‑Driven Network‑Wide Configuration Synthesis Framework
Huolala Tech
Huolala Tech
Jan 30, 2026 · Backend Development

How HuoLala Built a Scalable Todo Center to Handle Billions of Requests

To support HuoLala’s massive driver workflow, the team designed a platform‑wide Todo Center that standardizes tasks, optimizes performance, decouples services, and ensures strong and eventual consistency, while employing traffic‑shaping, asynchronous processing, and robust monitoring to sustain billions of daily queries with low latency.

Event-drivenMicroservicesScalability
0 likes · 14 min read
How HuoLala Built a Scalable Todo Center to Handle Billions of Requests
Raymond Ops
Raymond Ops
Jan 27, 2026 · Databases

Redis Sentinel vs Cluster: Which Architecture Wins for High‑Traffic Deployments?

This comprehensive guide compares Redis Sentinel and Redis Cluster, detailing their design philosophies, configuration examples, performance benchmarks, operational complexity, scalability, high‑availability features, and migration strategies, helping engineers choose the optimal solution for demanding production environments.

ClusterScalabilitymigration
0 likes · 36 min read
Redis Sentinel vs Cluster: Which Architecture Wins for High‑Traffic Deployments?
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 13, 2026 · Backend Development

Designing Scalable Comment Systems: From Nested Trees to Flat Floors

This article examines how to design a high‑performance comment system by comparing nested and flat (cover‑floor) database models, evaluating adjacency list, path enumeration, and closure table approaches, and outlining write‑asynchronous, cache‑first read strategies for millions of users.

BackendComment SystemDatabase design
0 likes · 5 min read
Designing Scalable Comment Systems: From Nested Trees to Flat Floors
DevOps Coach
DevOps Coach
Jan 10, 2026 · Operations

How to Scale Your Web App from 10K to Millions: 10 Essential Practices

This guide outlines ten practical steps—adding load balancers, horizontal scaling, stateless services, connection pooling, aggressive caching, read replicas, task queues, auto‑scaling, WebSocket gateways, and comprehensive monitoring—to reliably handle sudden traffic spikes and keep your application responsive and cost‑effective.

Auto ScalingScalabilitycaching
0 likes · 9 min read
How to Scale Your Web App from 10K to Millions: 10 Essential Practices
Java Architect Handbook
Java Architect Handbook
Jan 9, 2026 · Databases

What Happens When MySQL AUTO_INCREMENT Runs Out? Prevention and Recovery Strategies

This article analyzes the interview focus on MySQL auto‑increment primary key exhaustion, explains the underlying mechanism, outlines preventive design choices and monitoring, and provides detailed emergency response options, best‑practice recommendations, and common pitfalls for robust database management.

Database designScalabilityauto_increment
0 likes · 9 min read
What Happens When MySQL AUTO_INCREMENT Runs Out? Prevention and Recovery Strategies
dbaplus Community
dbaplus Community
Jan 8, 2026 · Backend Development

How Big Platforms Verify Username Availability in Milliseconds

This article walks through the layered architecture that large services like Instagram use to instantly check if a username is taken, starting from simple database queries, adding caching, employing Bloom filters, and finally using Trie structures for fast, memory‑efficient lookups.

Backend ArchitectureScalabilityTrie
0 likes · 10 min read
How Big Platforms Verify Username Availability in Milliseconds

From Minutes to Milliseconds: Atlas Architecture Solves Verification Bottlenecks

The paper presents Atlas, a native three‑layer distributed verification system that replaces centralized tools with switch, region, and center adapters, achieving sub‑20 ms validation for thousands of nodes and up to 1500× speedup over EPVerifier, while supporting incremental updates and preserving scalability.

AtlasScalabilitydistributed architecture
0 likes · 7 min read
From Minutes to Milliseconds: Atlas Architecture Solves Verification Bottlenecks
DevOps Coach
DevOps Coach
Jan 2, 2026 · Interview Experience

Why System Design Interviews Fail: Hidden Trade‑offs and Real‑World Failure Modes

The article reveals how system‑design interview candidates often rely on memorized patterns without understanding underlying trade‑offs, and shows how probing failure scenarios, questioning assumptions, and quantifying metrics can transform interview performance from rote diagrams to rigorous, data‑driven reasoning.

ScalabilitySystem Designarchitecture
0 likes · 10 min read
Why System Design Interviews Fail: Hidden Trade‑offs and Real‑World Failure Modes
dbaplus Community
dbaplus Community
Dec 30, 2025 · Backend Development

How to Tackle Massive Message Queue Backlogs in High‑Traffic Scenarios

During peak traffic like Double‑11, a message queue can accumulate millions of messages, and simply adding consumer instances only offers temporary relief; this article explains the partition model limits, how to calculate proper partition numbers, fast remediation tactics, and deep consumer‑side optimizations for robust, scalable processing.

BacklogKafkaMessage Queue
0 likes · 20 min read
How to Tackle Massive Message Queue Backlogs in High‑Traffic Scenarios
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 29, 2025 · Artificial Intelligence

How Alibaba’s Tair KVCache Manager Revolutionizes Enterprise‑Level LLM Cache Management

This article details the architecture and implementation of Tair KVCache Manager, an enterprise‑grade service that centralises KVCache metadata, decouples inference engines from storage, provides elastic scaling, multi‑tenant isolation, high availability, and performance‑optimised cache management for large‑scale LLM inference workloads.

Cache ManagementKVCacheLLM
0 likes · 28 min read
How Alibaba’s Tair KVCache Manager Revolutionizes Enterprise‑Level LLM Cache Management
macrozheng
macrozheng
Dec 26, 2025 · Databases

NewSQL vs Middleware Sharding: Which Architecture Really Wins?

This article objectively compares middleware‑based sharding solutions with native NewSQL distributed databases, examining their architectural differences, transaction handling, high‑availability, scaling, SQL support, storage engines, and maturity to help engineers decide which approach best fits their workload.

CAP theoremNewSQLPaxos
0 likes · 20 min read
NewSQL vs Middleware Sharding: Which Architecture Really Wins?
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 21, 2025 · Backend Development

How Elasticsearch Scales to Billions of Queries: Sharding, Inverted Index, Distributed Execution, and Replication

Elasticsearch achieves billion‑scale search performance by combining horizontal sharding, immutable inverted‑index segments, a two‑stage distributed Query/FETCH model, and multiple replicas with a coordinator node to ensure high concurrency, scalability, and availability.

Distributed QueryElasticsearchReplication
0 likes · 4 min read
How Elasticsearch Scales to Billions of Queries: Sharding, Inverted Index, Distributed Execution, and Replication
Woodpecker Software Testing
Woodpecker Software Testing
Dec 18, 2025 · Operations

How Load Testing Protects System Stability in High‑Traffic Internet Services

Load testing, a performance testing technique that simulates massive concurrent users, evaluates throughput, response time, and stability, follows a five‑step workflow—from requirement breakdown to analysis—and helps uncover bottlenecks such as database connection limits or CDN misconfigurations before production launch.

Cloud NativeJMeterLoad Testing
0 likes · 6 min read
How Load Testing Protects System Stability in High‑Traffic Internet Services
Alibaba Cloud Native
Alibaba Cloud Native
Nov 27, 2025 · Artificial Intelligence

How to Build Scalable Multi‑Agent AI Systems with RocketMQ’s Asynchronous Messaging

This article explains the communication challenges of modern multi‑agent AI applications and demonstrates how RocketMQ for AI’s event‑driven, asynchronous messaging architecture can improve scalability, reliability, and cost efficiency through a step‑by‑step weather‑and‑travel planning example.

AICloud NativeMulti-Agent
0 likes · 11 min read
How to Build Scalable Multi‑Agent AI Systems with RocketMQ’s Asynchronous Messaging
Bilibili Tech
Bilibili Tech
Nov 21, 2025 · Backend Development

How Bilibili Scaled Its Private Messaging System to Handle 10× Traffic

This article analyzes the current bottlenecks of Bilibili's private messaging service, explains the technical challenges of massive data volume and traffic spikes, and presents a comprehensive multi‑layer architecture upgrade—including cache strategies, BFF refactoring, database sharding, and consistency mechanisms—to ensure scalability and reliability.

BFFConsistencyMessaging
0 likes · 16 min read
How Bilibili Scaled Its Private Messaging System to Handle 10× Traffic
dbaplus Community
dbaplus Community
Nov 16, 2025 · Backend Development

How Shopify Scales 30 TB per Minute with a Monolithic Architecture

Shopify handles over 30 TB of data each minute and millions of requests by using a disciplined, modular monolithic architecture enhanced with hexagonal design, Pods isolation, real‑time data pipelines, and a heavily sharded MySQL deployment, demonstrating that simplicity can scale to internet‑level traffic without microservices.

Hexagonal ArchitectureScalabilitydatabases
0 likes · 10 min read
How Shopify Scales 30 TB per Minute with a Monolithic Architecture
php Courses
php Courses
Nov 1, 2025 · Backend Development

Why Laravel Queues Are the Secret to Lightning‑Fast Web Apps

In today's digital era, Laravel queues offload time‑consuming tasks from the main thread, boosting performance, scalability, and user experience, making applications more responsive and competitive by handling complex operations like email sending and order processing efficiently.

LaravelScalabilityperformance
0 likes · 7 min read
Why Laravel Queues Are the Secret to Lightning‑Fast Web Apps
NiuNiu MaTe
NiuNiu MaTe
Oct 29, 2025 · Backend Development

How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls

This article walks through the end‑to‑end design of a leaderboard that must serve over 100 million users with 100 k queries per second, covering requirement clarification, real‑time and accuracy challenges, technology selection such as Redis ZSet, multi‑layer architecture, sharding, caching, monitoring, and practical implementation tips to achieve low latency, high consistency, and cost‑effective scalability.

Big DataDistributed SystemsReal-Time
0 likes · 19 min read
How to Build a Billion‑User Real‑Time Leaderboard: Architecture, Tools, and Pitfalls
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 22, 2025 · Big Data

Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

Li Auto’s data team tackled the explosion of vehicle‑telemetry data—over a trillion rows and millions of signals per second—by redesigning their data foundation with Alibaba Cloud’s Hologres and Flink, achieving sub‑second latency, elastic scaling, high availability, and significant cost reductions across real‑time and offline workloads.

Car TelemetryData PlatformFlink
0 likes · 16 min read
Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink
Tencent Cloud Developer
Tencent Cloud Developer
Oct 22, 2025 · Backend Development

How Tencent News Cut PUSH Platform Code by 87% and Boosted Performance 3.5×

The article details how Tencent News' PUSH platform was re‑architected—consolidating modules, unifying the tech stack to Go, building an in‑house message channel, and introducing batch IO and priority scheduling—resulting in a 70% cost cut, 3.5‑fold throughput increase, and dramatically lower latency.

GolangMicroservicesScalability
0 likes · 20 min read
How Tencent News Cut PUSH Platform Code by 87% and Boosted Performance 3.5×
Alibaba Cloud Observability
Alibaba Cloud Observability
Oct 20, 2025 · Cloud Native

How ‘泡姆泡姆’ Leverages Cloud‑Native Architecture for Global Low‑Latency Gaming

The multiplayer party game 泡姆泡姆 combines colorful shooting, match‑3, physics puzzles and arcade mini‑games, and uses a cloud‑native stack on Alibaba Cloud Container Service with OpenKruiseGame, Keda‑driven auto‑scaling, multi‑region deployment, zero‑downtime updates and a three‑layer observability platform to deliver seamless low‑latency experiences worldwide.

Game DevelopmentObservabilityScalability
0 likes · 10 min read
How ‘泡姆泡姆’ Leverages Cloud‑Native Architecture for Global Low‑Latency Gaming
Ray's Galactic Tech
Ray's Galactic Tech
Oct 14, 2025 · Databases

Master Redis Key Naming: Best Practices for Scalable and Maintainable Data

Effective Redis key naming is essential for building robust, scalable applications; this guide outlines clear conventions—meaningful names, colon-separated namespaces, concise keys, proper ordering, TTL usage—and provides concrete examples across data types, common pitfalls, and a universal key template to improve readability, maintenance, and performance.

Database designKey NamingScalability
0 likes · 6 min read
Master Redis Key Naming: Best Practices for Scalable and Maintainable Data
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Oct 12, 2025 · Backend Development

Master System Design: 30 Core Concepts Every Backend Engineer Must Know

This article presents a comprehensive guide to essential system‑design concepts—including client‑server architecture, IP addressing, DNS, proxies, latency, HTTP/HTTPS, APIs, REST, GraphQL, databases, scaling, caching, microservices, message queues, rate limiting, API gateways, and more—illustrated with Spring Boot 3 examples and diagrams.

Backend ArchitectureScalabilitySystem Design
0 likes · 30 min read
Master System Design: 30 Core Concepts Every Backend Engineer Must Know
ITPUB
ITPUB
Oct 5, 2025 · Backend Development

How to Clear a 10‑Million‑Message Queue in 5 Hours: A Five‑Step Rescue Plan

When a flash‑sale causes a 10 million‑message backlog and consumers only process 200 messages per second, this guide shows a five‑step, 5‑hour strategy—horizontal scaling, message downgrade, flow control, temporary dump, and parallel blasting—to restore throughput and prevent system collapse.

Distributed SystemsKafkaPerformance Optimization
0 likes · 6 min read
How to Clear a 10‑Million‑Message Queue in 5 Hours: A Five‑Step Rescue Plan
IT Services Circle
IT Services Circle
Oct 1, 2025 · Backend Development

Designing a Fast, Reliable, Cost‑Effective Like System for High‑Traffic Apps

This article breaks down the essential requirements and architecture of a high‑performance like system, covering fast response, data consistency, scalability under traffic spikes, and cost‑efficient resource use, while detailing the technical stack, caching strategies, async persistence, and practical optimizations.

KafkaLike SystemScalability
0 likes · 17 min read
Designing a Fast, Reliable, Cost‑Effective Like System for High‑Traffic Apps
DevOps Coach
DevOps Coach
Sep 26, 2025 · Backend Development

30 Essential System Design Concepts Every Engineer Should Master

This comprehensive guide walks readers through the core building blocks of system design—from client‑server architecture, IP addressing, DNS, and proxies to databases, scaling strategies, caching, microservices, and API management—providing practical examples, diagrams, and code snippets to prepare for real‑world projects and technical interviews.

APIBackendScalability
0 likes · 31 min read
30 Essential System Design Concepts Every Engineer Should Master
vivo Internet Technology
vivo Internet Technology
Sep 24, 2025 · Backend Development

How Vivo Browser Scaled to Millions: Architecture Upgrade for High‑Performance Coin Incentive System

This article details how Vivo Browser's welfare center was re‑engineered—splitting services, sharding databases, adding arbitration and soft‑transaction mechanisms—to overcome traffic, I/O, and data‑consistency challenges, enabling stable operation at tens of millions of daily active users while reducing storage costs.

Backend ArchitectureData ConsistencyScalability
0 likes · 11 min read
How Vivo Browser Scaled to Millions: Architecture Upgrade for High‑Performance Coin Incentive System
Sanyou's Java Diary
Sanyou's Java Diary
Sep 22, 2025 · Backend Development

How to Build a Scalable Enterprise Unified Message Push System

This article explains why growing enterprises need a unified message‑push platform, outlines the core challenges such as multi‑channel integration, high concurrency, reliability, templating and extensibility, and then walks through a complete architecture design—including access, business, service and storage layers—to achieve a scalable, maintainable solution.

BackendEnterpriseMessage Push
0 likes · 10 min read
How to Build a Scalable Enterprise Unified Message Push System
IT Architects Alliance
IT Architects Alliance
Sep 19, 2025 · Fundamentals

Why Loose Coupling Is the Secret Sauce Behind Scalable Architecture

Loose coupling, a design principle that minimizes inter-component dependencies, enables scalable, testable, and flexible systems by using clear interfaces, event-driven architectures, API gateways, and service meshes, while also presenting trade‑offs such as added complexity, performance overhead, and consistency challenges in distributed environments.

MicroservicesScalabilitySoftware Architecture
0 likes · 12 min read
Why Loose Coupling Is the Secret Sauce Behind Scalable Architecture
DevOps Coach
DevOps Coach
Sep 18, 2025 · Backend Development

From Zero to Confident: Master System Design for Interviews and Real Projects

The author shares a step‑by‑step journey from feeling lost about system design to confidently tackling interview questions and real‑world architectures, outlining a learning roadmap, practical exercises, resource recommendations, and tips for applying and teaching the concepts.

BackendScalabilitylearning
0 likes · 8 min read
From Zero to Confident: Master System Design for Interviews and Real Projects
IT Services Circle
IT Services Circle
Sep 15, 2025 · Backend Development

How to Ace a Billion‑Scale URL Shortening System Design Interview

This article walks through the complete design of a high‑performance, highly available URL shortener—covering business value, requirement analysis, capacity estimation, API definitions, database schema, key generation algorithms, sharding, caching, load balancing, and expiration cleanup—so you can impress interviewers with a thorough, scalable solution.

Backend ArchitectureScalabilitySystem Design
0 likes · 25 min read
How to Ace a Billion‑Scale URL Shortening System Design Interview
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Sep 11, 2025 · Operations

Mastering Load Balancing: Single, Dual, and Multi‑Layer Architectures Explained

This article explains the fundamentals of load balancing, describing single‑layer, dual‑layer, and multi‑layer architectures, their advantages, disadvantages, and suitable scenarios, helping readers choose the right design based on traffic volume, availability, security, topology, budget, and operational capabilities.

OperationsScalabilityhigh availability
0 likes · 6 min read
Mastering Load Balancing: Single, Dual, and Multi‑Layer Architectures Explained
Architect's Guide
Architect's Guide
Sep 10, 2025 · Databases

Why and How to Implement Database Sharding: Strategies, Middleware, and Best Practices

This article explains why database sharding becomes essential as user growth and data volume surge, describes horizontal and vertical partitioning methods, compares range and hash routing, and reviews popular sharding middleware with their advantages and drawbacks to help you choose the right solution.

ScalabilityVertical Partitioningdatabase sharding
0 likes · 12 min read
Why and How to Implement Database Sharding: Strategies, Middleware, and Best Practices
dbaplus Community
dbaplus Community
Sep 6, 2025 · Fundamentals

30 Must‑Know System Design Concepts to Build Scalable, Reliable Applications

This article walks you through the 30 core system‑design concepts—from client‑server basics, IP, DNS, and load balancing to databases, sharding, caching, CAP theorem, microservices, message queues, rate limiting, API gateways and idempotency—showing how each piece fits together to create high‑performance, fault‑tolerant software.

APIScalabilitySystem Design
0 likes · 29 min read
30 Must‑Know System Design Concepts to Build Scalable, Reliable Applications
Refining Core Development Skills
Refining Core Development Skills
Sep 3, 2025 · Operations

When Should You Hire a Dedicated Performance Engineering Team?

This article explains why modern enterprises increasingly need specialized performance engineering teams, outlines their ROI through cost savings, latency reduction, scalability, and engineering efficiency, details the engineers' responsibilities, and provides practical hiring guidelines and real‑world case studies.

Cost OptimizationLatency ReductionScalability
0 likes · 29 min read
When Should You Hire a Dedicated Performance Engineering Team?
Tencent Cloud Developer
Tencent Cloud Developer
Sep 2, 2025 · Backend Development

How to Build a Scalable Enterprise Unified Message Push System

This article examines the core challenges of multi‑channel integration, high concurrency, reliability, and extensibility, then walks through a full‑link push workflow and presents a four‑layer architecture—including access, business, service, and storage layers—to guide the design of a robust, scalable enterprise message‑push platform.

BackendIntegrationMessage Push
0 likes · 10 min read
How to Build a Scalable Enterprise Unified Message Push System
FunTester
FunTester
Sep 1, 2025 · Operations

Why Load Testing Is Critical for High‑Traffic Apps and How to Do It Right

This article explains why load testing is essential for modern applications that must serve millions of users, outlines various test types and best‑practice steps, recommends tools and frameworks, and shows how continuous testing integrated into CI/CD pipelines ensures scalability, reliability, and optimal performance under unpredictable traffic spikes.

Load TestingPerformance MonitoringScalability
0 likes · 11 min read
Why Load Testing Is Critical for High‑Traffic Apps and How to Do It Right
High Availability Architecture
High Availability Architecture
Aug 28, 2025 · Fundamentals

5 Architecture Elements, 15 Design Principles & 6 Common Pitfalls

This article explains the essential components of software architecture—elements, structure, and connections—while presenting fifteen universal design principles, practical guidelines for monolithic, distributed, and microservice systems, and six common architectural mistakes to avoid, helping teams build scalable, reliable, and maintainable solutions.

Distributed SystemsMicroservicesScalability
0 likes · 21 min read
5 Architecture Elements, 15 Design Principles & 6 Common Pitfalls
DeWu Technology
DeWu Technology
Aug 27, 2025 · Backend Development

How to Build Scalable Go Systems: Principles, Patterns, and Code Practices

This article explains why scalable systems are essential, outlines core design principles such as the open‑closed and modular approaches, demonstrates Go implementations of strategy, middleware, plugin, and configuration‑driven architectures, and provides validation metrics and an evolution roadmap for building extensible backend services.

ConfigurationDesign PatternsGo
0 likes · 20 min read
How to Build Scalable Go Systems: Principles, Patterns, and Code Practices
Tencent Cloud Developer
Tencent Cloud Developer
Aug 26, 2025 · Artificial Intelligence

Building a Scalable, Observable Recommendation Scheduling Engine from Scratch

This article explains how recommendation systems work, distinguishes online services from offline computation, outlines a typical recommendation flow, and presents a three‑stage evolution (1.0, 2.0, 3.0) with design principles for stability, observability, and efficiency, culminating in a DAG‑based orchestration and traceable execution.

AIObservabilityScalability
0 likes · 13 min read
Building a Scalable, Observable Recommendation Scheduling Engine from Scratch
Wukong Talks Architecture
Wukong Talks Architecture
Aug 21, 2025 · Operations

Why LinkedIn Dropped Kafka for Northguard – A Deep Dive into Its Architecture

LinkedIn, the creator of Kafka, has largely abandoned Kafka in favor of a new log storage system called Northguard, whose design mirrors Apache Pulsar with features like storage‑compute separation, log striping, and a multi‑layer data model, offering superior scalability, operability, consistency, and durability for massive data streams.

Apache PulsarDistributed SystemsLinkedIn
0 likes · 22 min read
Why LinkedIn Dropped Kafka for Northguard – A Deep Dive into Its Architecture
mikechen
mikechen
Aug 20, 2025 · Backend Development

9 Proven High‑Performance Optimization Techniques for Large‑Scale Systems

This article presents nine practical strategies—including load balancing, database sharding, read‑write separation, caching, index tuning, CDN usage, asynchronous processing, code refinement, and algorithm selection—to dramatically improve the performance and scalability of large‑scale backend architectures.

Scalabilitycachingperformance
0 likes · 7 min read
9 Proven High‑Performance Optimization Techniques for Large‑Scale Systems
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Big Data

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Over ten years, Alibaba’s data engineers transformed a modest Hadoop‑based system into a globally‑scalable, high‑performance big data platform—ODPS/MaxCompute—supporting massive offline and real‑time workloads, pioneering innovations like the 5K cluster expansion, Blink streaming, and the unified ‘Moon’ migration.

AlibabaBig DataData Platform
0 likes · 25 min read
How Alibaba Built a World‑Class Big Data Platform Over a Decade
Selected Java Interview Questions
Selected Java Interview Questions
Jul 23, 2025 · Backend Development

Designing an Enterprise‑Level Unified Push Service: Architecture, Channels, and Scalability

This article explains how to build a unified, enterprise‑grade push platform that consolidates email, SMS, chat, DingTalk, WeChat and other social channels, outlines its evolutionary stages from modular modules to a full‑blown service, and details the functional, non‑functional and architectural components required for high‑performance, scalable notification delivery.

MicroservicesPush ServiceScalability
0 likes · 15 min read
Designing an Enterprise‑Level Unified Push Service: Architecture, Channels, and Scalability
dbaplus Community
dbaplus Community
Jul 16, 2025 · Databases

What’s the Optimal Database Connection Count? A Data‑Driven Study

This article examines why many applications set overly large database connection pools, proposes a standard of ten connections per instance based on load‑testing results, and validates the recommendation through single‑interface and link‑level performance experiments that show no throughput degradation at lower connection counts.

Performance TestingScalabilityThroughput
0 likes · 7 min read
What’s the Optimal Database Connection Count? A Data‑Driven Study
IT Services Circle
IT Services Circle
Jul 11, 2025 · Backend Development

10 Essential System Design Trade‑offs Every Engineer Should Master

Understanding system design trade‑offs is crucial for building robust software; this article examines ten common compromises—from vertical vs. horizontal scaling and SQL vs. NoSQL to CAP theorem, consistency models, REST vs. GraphQL, stateful vs. stateless architectures, caching strategies, and synchronous vs. asynchronous processing—highlighting their benefits and drawbacks.

Backend ArchitectureDistributed SystemsScalability
0 likes · 10 min read
10 Essential System Design Trade‑offs Every Engineer Should Master
Instant Consumer Technology Team
Instant Consumer Technology Team
Jul 9, 2025 · Cloud Native

Scaling a Financial Accounting System to 100k TPS with Cloud‑Native Microservices

This article examines how a ten‑year‑old financial accounting platform transformed from a monolithic design into a cloud‑native, micro‑service architecture that achieved massive scalability, high availability, and 24‑hour real‑time processing through distributed batch scheduling, elastic scaling, and intelligent fault‑tolerance.

Batch ProcessingScalabilitycloud-native
0 likes · 14 min read
Scaling a Financial Accounting System to 100k TPS with Cloud‑Native Microservices
IT Architects Alliance
IT Architects Alliance
Jul 6, 2025 · Operations

Why 80% of Performance Issues Stem from Architecture – and How to Fix Them

Most performance bottlenecks arise not from code but from architectural flaws, such as overly layered designs, synchronous calls, misconfigured connection pools, cache pitfalls, and inadequate monitoring, and the article outlines these issues and offers best‑practice strategies like async patterns, proper DB design, caching tiers, and progressive refactoring.

BackendScalabilitySystem Architecture
0 likes · 11 min read
Why 80% of Performance Issues Stem from Architecture – and How to Fix Them
Kuaishou Frontend Engineering
Kuaishou Frontend Engineering
Jul 3, 2025 · Frontend Development

How Kuaishou’s Tianshou Platform Scales Front‑End Quality for Billions of Users

The article reviews the evolution of Kuaishou's Tianshou front‑end quality assurance platform, its layered architecture, distributed scheduler, quality models, measurement functions, DMAIC process, and lessons learned in scaling to billions of DAU, offering a blueprint for building robust front‑end engineering systems.

Scalabilityarchitecturedmaic
0 likes · 25 min read
How Kuaishou’s Tianshou Platform Scales Front‑End Quality for Billions of Users
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 1, 2025 · Backend Development

How Taobao Scaled from LAMP to Cloud: A Deep Dive into Its Architecture Evolution

This article chronicles Taobao's technical evolution—from a LAMP stack through Oracle migration, Java adoption, de‑IOE optimization, self‑built storage and caching systems, service‑oriented design, middleware integration, and finally a cloud‑native architecture—highlighting the challenges and solutions for scalability, performance, and cost reduction.

Scalabilitycloud computingdatabase migration
0 likes · 11 min read
How Taobao Scaled from LAMP to Cloud: A Deep Dive into Its Architecture Evolution
IT Architects Alliance
IT Architects Alliance
Jun 27, 2025 · Cloud Computing

What Is Serverless Architecture and Why It’s Transforming Modern Cloud Computing

Serverless architecture shifts server management to cloud providers, offering on‑demand function‑as‑a‑service and backend‑as‑a‑service solutions that enable automatic scaling, cost efficiency, faster development, enhanced security, and versatile use cases across e‑commerce, IoT, mobile apps, and big‑data analytics.

DevOpsScalabilityServerless
0 likes · 12 min read
What Is Serverless Architecture and Why It’s Transforming Modern Cloud Computing
Big Data Technology Tribe
Big Data Technology Tribe
Jun 17, 2025 · Backend Development

Master System Design Interviews: Step-by-Step Prep Guide for Engineers

This article outlines a comprehensive, step‑by‑step roadmap for preparing system design interviews, covering foundational concepts, interview templates, high‑level and detailed design choices, practical resources, mock interview platforms, and company‑specific tailoring to boost candidates' success.

Backend ArchitectureInterview PreparationScalability
0 likes · 13 min read
Master System Design Interviews: Step-by-Step Prep Guide for Engineers
Dual-Track Product Journal
Dual-Track Product Journal
Jun 13, 2025 · Backend Development

Layered E‑Commerce Architecture: Blueprint for Scalable Platforms

This article breaks down a mature e‑commerce platform into six layered modules—user reach, business operation, transaction fulfillment, supply chain, infrastructure, and BI—detailing core functions, design considerations, and data‑driven processes to guide scalable system design.

BackendScalabilitySystem Architecture
0 likes · 7 min read
Layered E‑Commerce Architecture: Blueprint for Scalable Platforms
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jun 10, 2025 · Operations

Mastering Load Balancing: From Single‑Layer to Billion‑Scale Architectures

This article explains the essential role of load balancing in modern distributed systems and walks through single‑layer, double‑layer, and billion‑scale architectures, highlighting their design principles, benefits, trade‑offs, and typical deployment scenarios for high‑availability and high‑performance applications.

LVSNGINXScalability
0 likes · 6 min read
Mastering Load Balancing: From Single‑Layer to Billion‑Scale Architectures
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Jun 6, 2025 · Artificial Intelligence

Tackling the Top Challenges of Retrieval‑Augmented Generation (RAG)

The article enumerates common pitfalls of Retrieval‑Augmented Generation—such as missing content, low‑rank document misses, context limits, format errors, incomplete answers, scalability bottlenecks, complex PDF extraction, data‑quality issues, domain adaptation gaps, hallucinations, and feedback‑loop deficiencies—and offers concrete mitigation strategies ranging from data cleaning and prompt design to hybrid search, hierarchical retrieval, document compression, and automated evaluation.

Data QualityHybrid SearchLLM
0 likes · 9 min read
Tackling the Top Challenges of Retrieval‑Augmented Generation (RAG)
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 28, 2025 · Databases

Mastering Database Sharding: Design Strategies and Practical Cases

This article introduces the fundamentals of database sharding, outlines architectural evolution, explains vertical and horizontal splitting dimensions and strategies, discusses middleware choices, and presents real‑world case studies while addressing common challenges such as ID generation, join queries, pagination, and distributed transactions.

Scalabilitydatabase shardingmiddleware
0 likes · 8 min read
Mastering Database Sharding: Design Strategies and Practical Cases
macrozheng
macrozheng
May 27, 2025 · Backend Development

Scaling Username Uniqueness: DB, Redis Cache & Bloom Filter

This article examines three strategies for checking username uniqueness at massive scale—direct database queries, Redis caching, and Bloom filter techniques—detailing their implementations, performance trade‑offs, memory consumption, and suitability for billions of users.

Backend PerformanceScalabilitybloom-filter
0 likes · 11 min read
Scaling Username Uniqueness: DB, Redis Cache & Bloom Filter
Architects' Tech Alliance
Architects' Tech Alliance
May 23, 2025 · Artificial Intelligence

Why High‑Performance Networks Are Critical for Large‑Scale AI Model Training

The whitepaper explains that AI model training and inference rely on massive data computation, with model sizes reaching billions of parameters, demanding low‑latency, high‑bandwidth, stable, scalable, and manageable networks; it compares RDMA‑based InfiniBand and RoCE solutions and offers design recommendations for future AI compute clusters.

AIHigh‑Performance NetworkingInfiniBand
0 likes · 10 min read
Why High‑Performance Networks Are Critical for Large‑Scale AI Model Training
Code Ape Tech Column
Code Ape Tech Column
May 22, 2025 · Cloud Native

10 Essential Microservice Best Practices for Scalable, Secure Systems

This article outlines practical microservice best practices—including the Single Responsibility Principle, cross‑functional teams, appropriate tooling, asynchronous communication, DevSecOps, isolated data stores, independent deployment, orchestration, and monitoring—to help developers build maintainable, scalable, and secure cloud‑native applications.

Backend ArchitectureDevOpsMicroservices
0 likes · 13 min read
10 Essential Microservice Best Practices for Scalable, Secure Systems
Alibaba Cloud Developer
Alibaba Cloud Developer
May 15, 2025 · Databases

How PolarDB MySQL Limitless Redefines Cloud‑Native Database Performance

This article examines the architecture and innovations of Alibaba Cloud's PolarDB MySQL Limitless multi‑master cluster, detailing its cloud‑native design, high‑performance horizontal scaling, distributed transaction mechanisms, multi‑node DDL, high‑availability strategies, and record‑breaking TPC‑C benchmark results.

Cloud NativeDistributed TransactionsScalability
0 likes · 13 min read
How PolarDB MySQL Limitless Redefines Cloud‑Native Database Performance
Zhuanzhuan Tech
Zhuanzhuan Tech
May 15, 2025 · Databases

Dynamic Extension of Fields in Billion‑Row Tables: Challenges and Practical Solutions

This article examines the difficulties of adding new fields to a core billion‑row MySQL table—including locking, page splitting, and index degradation—and presents a configuration‑driven, three‑layer architecture that uses JSON extension fields, extension tables, and Elasticsearch to achieve safe, scalable dynamic schema evolution.

Database designDynamic SchemaJSON field
0 likes · 8 min read
Dynamic Extension of Fields in Billion‑Row Tables: Challenges and Practical Solutions
macrozheng
macrozheng
May 12, 2025 · Backend Development

Designing a Billion‑User Real‑Time Leaderboard: Redis vs MySQL

This article explores how to build a scalable, high‑performance leaderboard for hundreds of millions of users by comparing traditional database ORDER BY approaches with Redis sorted sets, addressing challenges such as hot keys, memory pressure, persistence risks, and presenting a divide‑and‑conquer implementation strategy.

Scalabilitybig-datahigh concurrency
0 likes · 11 min read
Designing a Billion‑User Real‑Time Leaderboard: Redis vs MySQL
MaGe Linux Operations
MaGe Linux Operations
May 11, 2025 · Cloud Native

How to Build a High‑Performance, Highly‑Available Cloud‑Native Ingress Gateway

When an Ingress gateway faces traffic exceeding 100,000 QPS, this guide outlines systematic performance optimizations, configuration tweaks, distributed architecture designs, traffic management, monitoring, and disaster‑recovery strategies—including hardware scaling, kernel tuning, DPDK, rate limiting, horizontal scaling, service mesh integration, and CDN offloading—to achieve high concurrency and high availability.

Scalabilitycloud-nativehigh-availability
0 likes · 8 min read
How to Build a High‑Performance, Highly‑Available Cloud‑Native Ingress Gateway
Meituan Technology Team
Meituan Technology Team
May 8, 2025 · Artificial Intelligence

Building a Mixed OR+ML Inference Framework with TritonServer: Architecture, Challenges, and Solutions

The article describes how a large‑scale dispatch system was re‑engineered with NVIDIA TritonServer to unify GPU‑accelerated operations‑research kernels and deep‑learning models, detailing a three‑stage architecture (in‑process, cross‑process, cross‑node), the performance, stability and memory challenges addressed, and future plans for heterogeneous GPU scaling.

GPUInferencePerformance Optimization
0 likes · 11 min read
Building a Mixed OR+ML Inference Framework with TritonServer: Architecture, Challenges, and Solutions
Architect
Architect
May 4, 2025 · Databases

NewSQL vs Middleware Sharding: Which Architecture Truly Wins?

This article objectively compares middleware‑based sharding with NewSQL distributed databases, examining architecture, distributed transactions, performance, high availability, scaling, SQL support, storage engines, and ecosystem maturity to help architects decide which solution fits their specific workload and operational constraints.

CAP theoremDatabase ArchitectureNewSQL
0 likes · 20 min read
NewSQL vs Middleware Sharding: Which Architecture Truly Wins?
Architecture and Beyond
Architecture and Beyond
May 1, 2025 · Industry Insights

How Tag Systems Become the Brain of Digital Content – An Architect’s Guide

This article examines tag systems as the neural network of digital content, comparing them with traditional hierarchies, tracing their evolution, outlining business‑driven design steps, and detailing architectural components, non‑functional requirements, integration patterns, and future AI‑enhanced trends.

AI taggingScalabilityarchitecture
0 likes · 24 min read
How Tag Systems Become the Brain of Digital Content – An Architect’s Guide