Tagged articles
2122 articles
Page 16 of 22
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 15, 2019 · Operations

Understanding RocketMQ: Architecture, Key Features, and Best Practices

This article provides a comprehensive overview of RocketMQ, covering its architecture, how it handles ordered and duplicate messages, transaction processing, producer and consumer mechanisms, storage design, subscription models, and practical best‑practice recommendations for building reliable distributed messaging systems.

Distributed SystemsDuplicationMessage Queue
0 likes · 23 min read
Understanding RocketMQ: Architecture, Key Features, and Best Practices
Mafengwo Technology
Mafengwo Technology
Dec 13, 2019 · Backend Development

How Go‑Powered IM Architecture Boosted E‑commerce Messaging Performance

This article explains how a travel e‑commerce platform rebuilt its instant‑messaging service with Go, separating business logic, introducing a dual‑layer distributed architecture, and optimizing performance and reliability to handle massive concurrent connections and improve overall GMV.

Distributed SystemsGoInstant Messaging
0 likes · 16 min read
How Go‑Powered IM Architecture Boosted E‑commerce Messaging Performance
Programmer DD
Programmer DD
Dec 9, 2019 · Backend Development

How to Ensure Idempotency in Distributed Systems: Strategies and Code Examples

This article explains the importance of idempotent operations in backend systems, defines the concept, and presents practical techniques such as unique indexes, token mechanisms, pessimistic and optimistic locking, distributed locks, state‑machine design, and API patterns, complete with code snippets and diagrams.

BackendDistributed SystemsIdempotency
0 likes · 10 min read
How to Ensure Idempotency in Distributed Systems: Strategies and Code Examples
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 9, 2019 · Backend Development

Mastering Cache Strategies: Avoid Pitfalls and Ensure Data Consistency

This article explains core caching concepts, common pitfalls such as cache penetration, breakdown and avalanche, presents classic update patterns like Cache‑Aside, Write‑Through and Write‑Behind, analyzes consistency challenges, and offers practical guidelines for designing robust multi‑level cache architectures.

Data ConsistencyDistributed SystemsPerformance Optimization
0 likes · 29 min read
Mastering Cache Strategies: Avoid Pitfalls and Ensure Data Consistency
Java Captain
Java Captain
Dec 5, 2019 · Databases

Understanding Redis: From Basic Concepts to Advanced Features and Deployment Strategies

This article provides a comprehensive overview of Redis, explaining its core data structures, caching use cases, persistence mechanisms, high‑availability features like Sentinel and replication, clustering for horizontal scaling, and client‑side capabilities such as transactions, Lua scripting, pipelining, and distributed locks.

ClusterDistributed SystemsLua
0 likes · 13 min read
Understanding Redis: From Basic Concepts to Advanced Features and Deployment Strategies
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 4, 2019 · Big Data

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

This article provides an in‑depth Flink interview guide covering the framework’s core concepts, advanced features such as fault‑tolerance, state management, and checkpointing, as well as detailed explanations of its architecture, APIs, partitioning strategies, and source‑code flow, complete with code examples.

Big DataDistributed SystemsFlink
0 likes · 29 min read
Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights
58 Tech
58 Tech
Dec 4, 2019 · Backend Development

Design and Practice of Billion‑User IM Long‑Connection Service at 58.com

The article summarizes the second 58.com instant‑messaging technology salon, detailing the architecture, high‑performance long‑connection design, multi‑thread socket models, read‑write diffusion storage strategies for single‑ and group‑chat, message synchronization mechanisms, and the IM SDK framework for audio‑video communication.

Distributed SystemsIMMessage Storage
0 likes · 20 min read
Design and Practice of Billion‑User IM Long‑Connection Service at 58.com
Meitu Technology
Meitu Technology
Dec 4, 2019 · Backend Development

Design and Implementation of lmstfy: A Redis‑Based Task Queue Service

lmstfy is a stateless, Redis‑backed task‑queue service from Meitu that provides delayed execution, automatic retries, priority handling, expiration, and a RESTful HTTP API, while supporting horizontal scaling via namespace‑based token routing, rich Prometheus metrics, and future disk‑based storage extensions.

Distributed SystemsTask Queuebackend service
0 likes · 15 min read
Design and Implementation of lmstfy: A Redis‑Based Task Queue Service
WeChat Backend Team
WeChat Backend Team
Nov 26, 2019 · Big Data

Plato: Tencent’s Open‑Source Engine Cutting Billion‑Node Graph Jobs to Minutes

Plato, the newly open‑sourced high‑performance graph computing framework from Tencent’s TGraph project, delivers industry‑leading speed and memory efficiency for billion‑node social network graphs, achieving minute‑level processing with as few as ten servers, and supports a wide range of graph algorithms and learning tasks.

Distributed SystemsOpen-sourcegraph computing
0 likes · 8 min read
Plato: Tencent’s Open‑Source Engine Cutting Billion‑Node Graph Jobs to Minutes
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 26, 2019 · Operations

Why Caching Is the Secret Weapon for High‑Performance Systems

This article systematically explains cache fundamentals, why caching is essential for performance, where caches can be placed in the architecture, their advantages, when to adopt them, and key design considerations for building reliable, high‑throughput systems.

Distributed SystemsScalabilitybackend-development
0 likes · 18 min read
Why Caching Is the Secret Weapon for High‑Performance Systems
Architecture Digest
Architecture Digest
Nov 25, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

APIsBig DataDistributed Systems
0 likes · 8 min read
Introduction to Apache Kafka: Core Concepts, Architecture, and APIs
dbaplus Community
dbaplus Community
Nov 20, 2019 · Backend Development

Designing a High‑Concurrency Ticket‑Spiking System for 1M Users and 10K Tickets

This article explains how to architect a high‑concurrency ticket‑seckill system that can handle one million simultaneous users buying ten thousand tickets, covering load‑balancing strategies, Nginx weighted round‑robin configuration, Go service implementation, Redis‑based inventory management, and performance testing results.

Distributed SystemsGoNGINX
0 likes · 20 min read
Designing a High‑Concurrency Ticket‑Spiking System for 1M Users and 10K Tickets
Efficient Ops
Efficient Ops
Nov 20, 2019 · Databases

Mastering Codis: Seamless Redis Scaling and High‑Availability Strategies

This comprehensive guide details how Codis extends Redis with a proxy‑based architecture to achieve transparent horizontal scaling, smooth data migration, high availability, fault tolerance, and operational best‑practices, while also covering common Redis pitfalls and performance tuning.

CodisDistributed Systemsredis
0 likes · 26 min read
Mastering Codis: Seamless Redis Scaling and High‑Availability Strategies
21CTO
21CTO
Nov 20, 2019 · Cloud Native

How Alibaba Cloud’s Middleware Evolved for the Cloud‑Native Era

In this presentation, Alibaba Cloud’s chief middleware architect Li Xiaoping outlines the evolution of internet middleware at Alibaba, explains the value and applications of cloud‑native middleware, and shares insights on future trends and challenges in the field.

Alibaba CloudDistributed SystemsTechnology Evolution
0 likes · 2 min read
How Alibaba Cloud’s Middleware Evolved for the Cloud‑Native Era
dbaplus Community
dbaplus Community
Nov 17, 2019 · Databases

How Hybrid Logical Clocks Power Distributed Transactions

This article explains why distributed databases need precise clocks, compares central, logical, and hybrid clock designs, and shows how hybrid logical clocks (HLC) together with two‑phase commit and other transaction techniques enable consistent, high‑throughput distributed transactions.

ConsistencyDistributed Systemsclocks
0 likes · 18 min read
How Hybrid Logical Clocks Power Distributed Transactions
Ctrip Technology
Ctrip Technology
Nov 14, 2019 · Operations

Chaos Engineering: Principles, Practices, and Lessons from Ctrip

The article explains Chaos Engineering as a discipline for deliberately injecting failures into distributed systems to uncover hidden weaknesses, outlines its five core principles, describes practical implementation steps and real‑world examples from Ctrip, and discusses future directions for reliability engineering.

Distributed SystemsFault InjectionOperations
0 likes · 9 min read
Chaos Engineering: Principles, Practices, and Lessons from Ctrip
Tencent Cloud Developer
Tencent Cloud Developer
Nov 14, 2019 · Big Data

Tencent Announces Open‑Source High‑Performance Graph Computing Framework Plato

Tencent has open‑sourced its high‑performance graph computing framework Plato, which can process billion‑node graphs in minutes on as few as ten servers, outpacing Spark GraphX by up to two orders of magnitude, and supports offline computation, representation learning, and integration with Kubernetes/YARN for social, recommendation, and biomedical applications.

Big DataDistributed SystemsOpen-source
0 likes · 7 min read
Tencent Announces Open‑Source High‑Performance Graph Computing Framework Plato
21CTO
21CTO
Nov 13, 2019 · Backend Development

Is a Mid‑Platform Just Another Microservice? Unpacking the Real Differences

This article clarifies the distinction between enterprise mid‑platforms and microservices by defining each concept, outlining Alibaba's mid‑platform methodology and technical stack, explaining microservice architecture, and showing how the two complement each other in modern large‑scale systems.

Distributed SystemsMicroservicesmid‑platform
0 likes · 10 min read
Is a Mid‑Platform Just Another Microservice? Unpacking the Real Differences
21CTO
21CTO
Oct 31, 2019 · Backend Development

Master Distributed Rate Limiting with Token Buckets, Redis, and Code

This article explains why rate limiting is essential for microservice stability, compares leaky‑bucket and token‑bucket algorithms, shows how to implement local and distributed throttling with Java's AtomicLong, Redis, and a control‑server architecture, and points to an open‑source project for practical use.

Distributed SystemsMicroservicesToken Bucket
0 likes · 9 min read
Master Distributed Rate Limiting with Token Buckets, Redis, and Code
Amap Tech
Amap Tech
Oct 31, 2019 · Backend Development

Evolution of Amap's Billion-Scale Traffic Access Layer Services

Sun Wei outlined Amap’s transformation of its traffic access layer—from handling 600,000‑plus QPS with sub‑2 ms latency through a fully asynchronous, stream‑based pipeline and reactive Vert.x/WebFlux experiments, to API aggregation, traffic tagging, and a roadmap toward distributed sidecar or SDK gateways for billion‑scale, low‑latency services.

Asynchronous ArchitectureDistributed SystemsService Mesh
0 likes · 11 min read
Evolution of Amap's Billion-Scale Traffic Access Layer Services
dbaplus Community
dbaplus Community
Oct 30, 2019 · Backend Development

Mastering Cache Layers: From HTTP to Distributed Systems

This article provides a comprehensive guide to caching technologies, covering HTTP caching, CDN caching, load‑balancer caching, in‑process caching, and distributed caching, while explaining strategies, algorithms, and common pitfalls such as cache avalanche, penetration, and breakdown.

BackendCDNDistributed Systems
0 likes · 19 min read
Mastering Cache Layers: From HTTP to Distributed Systems
High Availability Architecture
High Availability Architecture
Oct 22, 2019 · Backend Development

Ensuring In-Order Delivery of IM Messages: Causes and Solutions

This article analyzes why instant‑messaging (IM) messages can arrive out of order due to time discrepancies, network behavior, and multithreading, and proposes a comprehensive design using global sequence numbers, channel‑aware routing, client‑side caching, and ACK‑based flow control to guarantee ordered delivery.

BackendDistributed SystemsIM
0 likes · 9 min read
Ensuring In-Order Delivery of IM Messages: Causes and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2019 · Databases

High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience

This article reviews Alibaba HBase's evolution toward high availability, covering large‑cluster architecture, reliability metrics (MTTF/MTTR), disaster‑recovery strategies such as data replication and traffic switching, performance optimizations for extreme latency requirements, and lessons learned for building resilient distributed database services.

Distributed SystemsHBasePerformance Optimization
0 likes · 20 min read
High‑Availability Practices of Alibaba HBase: Large Clusters, MTTF/MTTR, Disaster Recovery, and Extreme Experience
Programmer DD
Programmer DD
Oct 19, 2019 · Cloud Computing

What Is Cloud Computing? Core Concepts and Real‑World Applications Explained

This article defines cloud computing as a distributed‑computing model that splits large tasks into many small programs processed across multiple servers, outlines its four key domains—computing, networking, storage, and applications—and highlights UCloud’s recent free‑course event and QR‑code registration for students and professionals.

Cloud ComputingDistributed SystemsUCloud
0 likes · 2 min read
What Is Cloud Computing? Core Concepts and Real‑World Applications Explained
21CTO
21CTO
Oct 16, 2019 · Backend Development

How Alibaba Built Its Scalable Business Middle Platform: Architecture & Lessons

This article outlines Alibaba's middle‑platform strategy, detailing the layered IT architecture, the evolution from IOE to distributed services and platformization, the practical methodology for building, governing, and evolving a business middle platform, and key takeaways for enterprises seeking large‑scale system governance.

AlibabaBackend ArchitectureDistributed Systems
0 likes · 6 min read
How Alibaba Built Its Scalable Business Middle Platform: Architecture & Lessons
Programmer DD
Programmer DD
Sep 29, 2019 · Backend Development

Build a Scalable Instant Messaging Server from Scratch – IM1.0.0 Features Explained

This article walks through constructing a lightweight, feature‑rich instant‑messaging backend (IM1.0.0) that supports one‑to‑one text/file messaging, delivery/read receipts, LDAP login, horizontal scaling via connector and transfer modules, user‑status management with Redis, offline storage using MySQL and message queues, and outlines the overall architecture.

Distributed SystemsInstant MessagingMessage Queue
0 likes · 9 min read
Build a Scalable Instant Messaging Server from Scratch – IM1.0.0 Features Explained
Programmer DD
Programmer DD
Sep 29, 2019 · Big Data

Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive

This article explores whether it is technically feasible to place all 1.4 billion Chinese users into one WeChat group, analyzing population statistics, message volume, CPU processing limits, network bandwidth, storage requirements, and cost implications with supporting calculations and references.

Big DataDistributed SystemsNetwork Bandwidth
0 likes · 12 min read
Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive
Architects' Tech Alliance
Architects' Tech Alliance
Sep 27, 2019 · Cloud Native

MinIO Object Storage System: Architecture, Design Principles, Features, and Performance

This article provides a comprehensive technical overview of MinIO, an open‑source, S3‑compatible object storage system, covering its design philosophy, data organization, distributed architecture, erasure‑coding, lock management, lambda notifications, backup strategies, performance optimizations, and a comparative analysis with Ceph, highlighting its suitability for AI, big‑data, and cloud‑native deployments.

Cloud NativeDistributed SystemsMinio
0 likes · 22 min read
MinIO Object Storage System: Architecture, Design Principles, Features, and Performance
DevOps Cloud Academy
DevOps Cloud Academy
Sep 25, 2019 · Cloud Native

Overview of Spring Cloud Alibaba Components for Distributed Application Development

Spring Cloud Alibaba offers a comprehensive suite of open‑source components—including Sentinel, Nacos, RocketMQ, Dubbo, Seata, ACM, OSS, SchedulerX, and SMS—to simplify building cloud‑native, micro‑service based distributed systems with traffic control, service discovery, messaging, and configuration management.

Distributed SystemsSpring Cloud Alibabajava
0 likes · 6 min read
Overview of Spring Cloud Alibaba Components for Distributed Application Development
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 24, 2019 · Big Data

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Alibaba’s 10‑year‑old search engine combines data source aggregation, incremental and real‑time indexing, and online services through platforms like Tisplus, Bahamut, Maat, Ha3, Build Service and Drogo, illustrating a comprehensive architecture that powers 1688’s search capabilities across multiple engines and deployment pipelines.

Backend ArchitectureBig DataDistributed Systems
0 likes · 10 min read
Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Sep 23, 2019 · Backend Development

Mastering High-Concurrency Technical Architecture: Core Principles and Solutions

This article explains what technical architecture is, breaks down its three core components—business technical problems, technical solutions, and technical components—and then dives deep into high‑concurrency challenges, systematic thinking methods, and practical approaches such as resource scaling, stateless design, load balancing, caching, and I/O optimization.

Backend ArchitectureDistributed SystemsScalability
0 likes · 12 min read
Mastering High-Concurrency Technical Architecture: Core Principles and Solutions
High Availability Architecture
High Availability Architecture
Sep 20, 2019 · Backend Development

Ensuring Idempotency and Preventing Double Payments in a Distributed Payments System

The article explains how Airbnb’s payment platform uses a generic idempotency library called Orpheus, combined with Java lambda‑driven transaction composition, to guarantee data consistency, avoid double charges, and handle retries in a low‑latency micro‑service architecture.

Database TransactionsDistributed SystemsIdempotency
0 likes · 19 min read
Ensuring Idempotency and Preventing Double Payments in a Distributed Payments System
21CTO
21CTO
Sep 18, 2019 · Backend Development

What I Learned From My Go Backend Engineer Interview at Tencent

After six months of studying MySQL, Redis, and distributed systems, I tackled a Go backend engineer interview at Tencent, detailing the questions on networking, databases, OS concepts, system design, and my reflections on what to improve for future interviews.

Distributed SystemsNetworkingSystem Design
0 likes · 9 min read
What I Learned From My Go Backend Engineer Interview at Tencent
Alibaba Cloud Native
Alibaba Cloud Native
Sep 18, 2019 · Cloud Native

Mastering Kubernetes Logging: Overcoming Real‑World Challenges

This article shares Alibaba's extensive experience building a Kubernetes‑based logging system, detailing the evolution from single‑machine to containerized environments, the critical role of observability, and the specific technical challenges such as dynamic log sources, integration complexity, and massive scale handling.

Distributed SystemsKubernetesObservability
0 likes · 9 min read
Mastering Kubernetes Logging: Overcoming Real‑World Challenges
JD Tech Talk
JD Tech Talk
Sep 12, 2019 · Databases

Reflections on ApacheCon 2019 in Las Vegas: ShardingSphere’s First Participation and Community Insights

The article recounts JD Digits architect Zhang Liang’s experience representing the Apache ShardingSphere community at ApacheCon 2019 in Las Vegas, describing the conference atmosphere, community interactions, ShardingSphere’s observability talk and Shark Tank showcase, and the growing Chinese contribution to the Apache ecosystem.

ApacheConDistributed SystemsObservability
0 likes · 5 min read
Reflections on ApacheCon 2019 in Las Vegas: ShardingSphere’s First Participation and Community Insights
Sohu Tech Products
Sohu Tech Products
Sep 11, 2019 · Backend Development

Design and Implementation of a High‑Concurrency Flash‑Sale System for Online Real‑Estate Opening

The article explains how to handle massive simultaneous user requests in a flash‑sale scenario by using rate limiting, caching, asynchronous processing, distributed locks, load balancing, and anti‑cheat mechanisms, illustrated with the Sohu Focus online opening system architecture.

Backend ArchitectureDistributed SystemsKafka
0 likes · 12 min read
Design and Implementation of a High‑Concurrency Flash‑Sale System for Online Real‑Estate Opening
Tencent Cloud Developer
Tencent Cloud Developer
Sep 11, 2019 · Big Data

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Big DataCluster OptimizationDistributed Systems
0 likes · 16 min read
YARN Practice and Technical Evolution at Kuaishou
dbaplus Community
dbaplus Community
Sep 10, 2019 · Big Data

Why Exactly‑Once Processing Is So Hard in Distributed Systems (And How to Tackle It)

This article explores the two toughest problems in distributed stream processing—exactly‑once message handling and ordering—by dissecting the underlying impossibility of perfect failure detectors, the liveness‑vs‑safety trade‑off, zombie processes, and the practical solutions employed by systems such as Flink, Kafka Streams, MillWheel, and Spark.

ConsensusDistributed SystemsExactly-Once
0 likes · 81 min read
Why Exactly‑Once Processing Is So Hard in Distributed Systems (And How to Tackle It)
DataFunTalk
DataFunTalk
Sep 4, 2019 · Backend Development

Apache Dubbo: Evolution, Ecosystem, and Future Roadmap for Microservices and Cloud‑Native Architecture

The article introduces Apache Dubbo, a high‑performance Java RPC framework, outlines its history, current features, multi‑language ecosystem, recent releases, and future plans such as cloud‑native integration, service‑mesh support, reactive programming, and the roadmap toward Dubbo 3.0.

Apache DubboCloud NativeDistributed Systems
0 likes · 13 min read
Apache Dubbo: Evolution, Ecosystem, and Future Roadmap for Microservices and Cloud‑Native Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Aug 24, 2019 · Big Data

Reimagining Big Data in a Post‑Hadoop World

The article analyzes the decline of Hadoop as the dominant big‑data platform, explains how cloud‑based services are replacing its complex on‑premises architecture, and outlines the lessons and future directions for enterprises navigating a post‑Hadoop landscape.

Big DataDistributed SystemsHadoop
0 likes · 12 min read
Reimagining Big Data in a Post‑Hadoop World
Amap Tech
Amap Tech
Aug 20, 2019 · Operations

Full‑Link Load Testing and Stability Assurance at Gaode: Architecture, Practices, and Future Directions

To guarantee stability for over 100 million daily users, Gaode combines capacity planning, traffic control, disaster recovery, monitoring, and pre‑plan drills with a self‑built full‑link load‑testing platform (TestPG) that replays realistic traffic in production‑like environments, isolates test loads, provides rapid configuration, detailed debugging, automated error capture, and comprehensive reporting, while planning future enhancements such as integrated topology monitoring, advanced pressure models, and confidence evaluation.

Distributed SystemsLoad Testingcapacity planning
0 likes · 20 min read
Full‑Link Load Testing and Stability Assurance at Gaode: Architecture, Practices, and Future Directions
Architecture Digest
Architecture Digest
Aug 19, 2019 · Big Data

Elasticsearch Cluster Architecture and Distributed Data System Design

This article explains Elasticsearch's cluster architecture, including nodes, indices, shards, replicas, deployment models, and data layer storage, and compares two types of distributed data system designs—local file‑system based and shared‑storage based—highlighting their advantages and trade‑offs.

Cluster ArchitectureDistributed SystemsElasticsearch
0 likes · 13 min read
Elasticsearch Cluster Architecture and Distributed Data System Design
Big Data Technology Architecture
Big Data Technology Architecture
Aug 16, 2019 · Big Data

In‑Depth Overview of HBase Architecture

This article provides a comprehensive, illustrated explanation of Apache HBase's architecture, covering its master‑slave components, region management, Zookeeper coordination, data flow for reads and writes, storage structures, compaction processes, fault recovery, and the system's strengths and limitations within the Hadoop ecosystem.

ArchitectureDistributed SystemsHBase
0 likes · 21 min read
In‑Depth Overview of HBase Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 16, 2019 · Backend Development

Mastering System Design: Real-World Lessons from Alibaba’s Architecture Veteran

An experienced Alibaba senior tech expert shares a comprehensive, step‑by‑step guide to system design, covering purpose, measurable goals, core design principles, detailed subsystem planning, and real case studies like HSF, T4, and multi‑site deployment, offering practical insights for architects to avoid common pitfalls.

Distributed SystemsSystem Designcase study
0 likes · 22 min read
Mastering System Design: Real-World Lessons from Alibaba’s Architecture Veteran
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Aug 16, 2019 · Operations

Building Scalable Degradation Plans: Lessons from Tong‑Cheng Yilong

At QCon Beijing 2019, senior architect Wang Junxiang shared Tong‑Cheng Yilong’s end‑to‑end degradation‑plan architecture, covering system design, data collection, metric computation, resource recovery, link‑level pre‑plan management, fault diagnosis, strategy extensibility, and high‑availability platform construction, offering practical insights for complex distributed systems.

Distributed Systemsdegradationhigh availability
0 likes · 4 min read
Building Scalable Degradation Plans: Lessons from Tong‑Cheng Yilong
Architecture Digest
Architecture Digest
Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka
0 likes · 56 min read
Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model
Qunar Tech Salon
Qunar Tech Salon
Aug 12, 2019 · Backend Development

QMQ: Design, Usage, and Implementation of Qunar's Distributed Message Queue

This article introduces QMQ, Qunar's internal distributed message queue, covering its background, design motivations, core concepts, code examples for producing and consuming both real‑time and delayed messages, transactional messaging support, and the overall architecture of its metaserver, broker, and delay components.

Distributed SystemsMessage QueueQMQ
0 likes · 18 min read
QMQ: Design, Usage, and Implementation of Qunar's Distributed Message Queue
Meituan Technology Team
Meituan Technology Team
Aug 8, 2019 · Backend Development

OCTO: Meituan's Distributed Microservice Communication Framework and Governance Platform

OCTO is Meituan's distributed microservice communication and governance platform that unifies service registration, discovery, load balancing, fault tolerance, gray releases, and call data across thousands of services, handling billions of calls with components such as OCTO‑RPC, OCTO‑NS, OCTO‑Portal, SGAgent, Oceanus, Watt, and MCC, achieving 99.999% success and moving toward API‑based naming and service‑mesh evolution.

BackendDistributed SystemsMicroservices
0 likes · 8 min read
OCTO: Meituan's Distributed Microservice Communication Framework and Governance Platform
Architecture Digest
Architecture Digest
Aug 8, 2019 · Big Data

Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management

This article provides a comprehensive overview of Kafka, covering its basic concepts, architecture, deployment, configuration, monitoring, producer and consumer settings, offset management, high availability, replication, leader election, and practical tips for deployment, tuning, and troubleshooting in production environments.

Distributed SystemsKafkaMessage Queue
0 likes · 37 min read
Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management
Architecture Digest
Architecture Digest
Aug 6, 2019 · Databases

FeatureKV: A High‑Performance, Scalable Key‑Value Store for Billion‑Scale Read/Write Workloads at WeChat

FeatureKV is a high‑performance, scalable key‑value storage system built on WeChat’s internal file system and metadata service, designed to handle billion‑scale read and write demands, support batch offline writes, provide version management, and achieve low‑latency online reads for services like Look‑at‑Look, ads, and payments.

Distributed Systemshigh performancekey-value store
0 likes · 23 min read
FeatureKV: A High‑Performance, Scalable Key‑Value Store for Billion‑Scale Read/Write Workloads at WeChat
Big Data Technology Architecture
Big Data Technology Architecture
Aug 5, 2019 · Big Data

Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr

This article explains Zookeeper’s core concepts, its ZAB consensus protocol, and surveys its essential roles in major big‑data components such as Kafka, Hadoop, HBase, and Solr, illustrating how it provides configuration, naming, coordination, leader election, and high‑availability services across distributed architectures.

Distributed SystemsHBaseHadoop
0 likes · 5 min read
Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr
Programmer DD
Programmer DD
Aug 4, 2019 · Operations

Simulating CPU and I/O Failures with Bash Scripts for Chaos Engineering

This article demonstrates how to create Bash scripts that fully saturate CPU and I/O resources, explains their role in fault injection within the Simian Army framework, and introduces the broader concepts and benefits of chaos engineering for building resilient distributed systems.

Distributed SystemsFault Injectionbash scripts
0 likes · 9 min read
Simulating CPU and I/O Failures with Bash Scripts for Chaos Engineering
NetEase Media Technology Team
NetEase Media Technology Team
Aug 2, 2019 · Backend Development

Delayed Message Queue Implementation: Use Cases, Comparison, and NetEase Open Course Practice

The article explains how delayed message queues replace inefficient scheduled‑task scans in distributed systems, outlines common use cases such as order timeouts and retries, compares RabbitMQ, RocketMQ, Kafka, ActiveMQ and Redis implementations, and details NetEase’s ActiveMQ‑based solution with idempotent processing and traceability.

ActiveMQDistributed SystemsKafka
0 likes · 13 min read
Delayed Message Queue Implementation: Use Cases, Comparison, and NetEase Open Course Practice
JD Retail Technology
JD Retail Technology
Jul 31, 2019 · Fundamentals

Consistency Levels and Consensus Algorithms: Paxos, ZAB, and Raft

This article explains distributed data consistency concepts, the CAP theorem, various consistency levels, and provides detailed overviews of three major consensus algorithms—Paxos, ZAB, and Raft—including their mechanisms, roles, and practical applications such as in CB‑SQL.

Distributed SystemsPaxosRaft
0 likes · 18 min read
Consistency Levels and Consensus Algorithms: Paxos, ZAB, and Raft
dbaplus Community
dbaplus Community
Jul 29, 2019 · Operations

How to Build a Cost‑Effective, Multi‑Layer Monitoring System for Distributed Applications

This article explains why comprehensive, multi‑layer monitoring is essential for distributed systems, outlines environment, program, and business metrics, recommends practical tools such as Zabbix, open‑falcon, Prometheus and Grafana, and provides a step‑by‑step evolution plan and alerting strategy.

Distributed SystemsMetricsObservability
0 likes · 10 min read
How to Build a Cost‑Effective, Multi‑Layer Monitoring System for Distributed Applications
vivo Internet Technology
vivo Internet Technology
Jul 24, 2019 · Backend Development

Spring Session Implementation Guide: Session Sharing with Redis in Distributed Systems

Spring Session enables distributed session sharing by storing HTTP session data in Redis, using a filter and listener configuration to replace Tomcat’s in‑memory storage, managing three Redis keys with coordinated expirations, and subscribing to keyspace events for reliable cleanup and cross‑instance access.

Distributed SystemsSession ManagementSpring Session
0 likes · 12 min read
Spring Session Implementation Guide: Session Sharing with Redis in Distributed Systems
Java Backend Technology
Java Backend Technology
Jul 21, 2019 · Backend Development

30 Essential Architecture Principles Every Backend Engineer Should Follow

This article presents thirty practical architecture principles—from keeping designs simple and avoiding unnecessary features to embracing automated testing, ROI, user‑centric decisions, and distributed system fundamentals—offering a comprehensive guide for backend developers to build scalable, maintainable, and user‑friendly software.

Distributed SystemsSoftware Architecturebackend design
0 likes · 12 min read
30 Essential Architecture Principles Every Backend Engineer Should Follow
dbaplus Community
dbaplus Community
Jul 15, 2019 · Backend Development

From Single Server to Cloud‑Native: How Taobao Scaled to Millions of Concurrent Users

This article walks through Taobao's architectural evolution—from a single‑server setup to distributed clusters, caching, load balancing, microservices, containerization, and finally cloud platforms—illustrating the technologies and design principles needed to handle hundred‑to‑hundred‑million concurrent requests.

ArchitectureBackendDistributed Systems
0 likes · 21 min read
From Single Server to Cloud‑Native: How Taobao Scaled to Millions of Concurrent Users
Youzan Coder
Youzan Coder
Jul 12, 2019 · Backend Development

How to Build NSQ Multi‑Data‑Center Deployment with Lookup‑Migrate

This article explains the design and implementation of NSQ dual‑ and multi‑data‑center architectures using a lookup‑migrate proxy, covering deployment scenarios, routing strategies, migration phases, JSON response transformations, and practical lessons learned for reliable message publishing and consumption across data centers.

BackendDistributed SystemsMessage Queue
0 likes · 13 min read
How to Build NSQ Multi‑Data‑Center Deployment with Lookup‑Migrate
High Availability Architecture
High Availability Architecture
Jul 11, 2019 · Backend Development

Introduction to Reactive Microservices Architecture and Design Principles

This article introduces the fundamentals of microservices and reactive systems, explains how reactive programming integrates with microservice architectures, compares traditional and reactive approaches, outlines core components, design principles, and technology choices such as Spring Cloud, and provides guidance for building scalable, resilient reactive microservices.

ArchitectureDistributed SystemsMicroservices
0 likes · 42 min read
Introduction to Reactive Microservices Architecture and Design Principles
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 3, 2019 · Backend Development

Deep Dive into Apache RocketMQ: Architecture, Routing, Storage, and High‑Availability Design

This article provides a comprehensive overview of Apache RocketMQ’s core architecture, including topic routing mechanisms, message storage file designs, high‑availability message sending, concurrent pull and consumption processes, HA synchronization, and transaction messaging, while offering practical learning steps and programming techniques for developers.

Distributed SystemsMessage QueueRocketMQ
0 likes · 14 min read
Deep Dive into Apache RocketMQ: Architecture, Routing, Storage, and High‑Availability Design
Architect's Tech Stack
Architect's Tech Stack
Jul 3, 2019 · Backend Development

Implementing a Distributed Rate Limiter with Redis, Spring Boot, and Lua Scripts

This article demonstrates how to build a distributed rate‑limiting component named shield‑ratelimiter using Redis’s INCR and TTL features, Spring‑Boot‑starter integration, Lua scripting for atomic operations, custom annotations, and AspectJ, providing a robust, configurable solution for limiting API calls in Java backend services.

Distributed SystemsLuarate limiting
0 likes · 16 min read
Implementing a Distributed Rate Limiter with Redis, Spring Boot, and Lua Scripts
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2019 · Big Data

How to Ensure High Availability of Message Queues (RabbitMQ and Kafka)

This article explains the concept of high availability for message queues, analyzes interview expectations, and details the HA mechanisms of RabbitMQ (including single, normal cluster, and mirrored modes) and Kafka (partition replication and leader election), highlighting their advantages, drawbacks, and practical considerations.

Distributed SystemsKafkaMessage Queue
0 likes · 11 min read
How to Ensure High Availability of Message Queues (RabbitMQ and Kafka)
MaGe Linux Operations
MaGe Linux Operations
Jul 1, 2019 · Backend Development

Designing a Scalable E‑Commerce System with Microservices, DDD, and Distributed Transactions

This article walks through building an e‑commerce platform using microservices, covering module decomposition, domain‑driven design, service splitting, technology stack choices, distributed transaction strategies, circuit‑breaker patterns, centralized configuration, monitoring, and capacity planning to guide developers from concept to deployment.

CAP theoremDDDDistributed Systems
0 likes · 27 min read
Designing a Scalable E‑Commerce System with Microservices, DDD, and Distributed Transactions
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 30, 2019 · Big Data

Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series

This article presents a carefully organized catalogue of over a hundred technical posts covering Flink source‑code analysis, fundamental and advanced big‑data structures, Hadoop ecosystem components, real‑time streaming with Spark and Kafka, as well as system design guidelines and miscellaneous insights, each linked to its original publication for easy reference.

Big DataDistributed SystemsFlink
0 likes · 6 min read
Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series
Architecture Digest
Architecture Digest
Jun 28, 2019 · Cloud Computing

Evolution of Alibaba's Technical Architecture and Lessons for Enterprise Systems

The article reviews Alibaba's architectural evolution from early PHP and JavaBean systems through EJB, Spring, and service‑oriented transformations using EDAS, DRDS, and ONS, highlighting challenges such as maintenance, data silos, database limits, and the resulting enterprise‑grade cloud‑native solutions.

AlibabaDistributed Systemsarchitecture evolution
0 likes · 10 min read
Evolution of Alibaba's Technical Architecture and Lessons for Enterprise Systems
Tencent Cloud Developer
Tencent Cloud Developer
Jun 27, 2019 · Databases

Evolution and Technical Analysis of Tencent Cloud Databases: TDSQL and CynosDB

Tencent Cloud’s database evolution progressed from early open‑source integration to self‑developed, cloud‑native solutions, producing the distributed, high‑availability TDSQL and the compute‑storage‑separated, log‑sinking CynosDB, each featuring advanced scheduling, sharding, cost‑based optimization, and asynchronous log replay to deliver scalable, low‑latency performance for diverse workloads.

Cloud NativeCynosDBDistributed Systems
0 likes · 17 min read
Evolution and Technical Analysis of Tencent Cloud Databases: TDSQL and CynosDB
HomeTech
HomeTech
Jun 27, 2019 · Operations

Design and Implementation of a Distributed Monitoring System at Autohome

The article describes Autohome's evolution from a Zabbix‑based monitoring setup to a custom, distributed monitoring platform, detailing its architectural components, design goals, implementation choices, product features, and future roadmap for fault localization and dynamic alerting.

AlertingArchitectureDistributed Systems
0 likes · 6 min read
Design and Implementation of a Distributed Monitoring System at Autohome
Java Backend Technology
Java Backend Technology
Jun 24, 2019 · Backend Development

What I Learned from 2 Months of Java Backend Interviews: Tips & Insights

Over two months, I interviewed with multiple companies for Java backend roles, sharing detailed experiences from technical rounds—covering JVM, concurrency, distributed locks, databases, and system design—to highlight key questions, effective answers, and practical advice for succeeding in similar backend development interviews.

Distributed SystemsJVMconcurrency
0 likes · 17 min read
What I Learned from 2 Months of Java Backend Interviews: Tips & Insights
ITPUB
ITPUB
Jun 22, 2019 · Databases

Master MySQL Replication, Sharding, and Distributed Deployment in 10 Minutes

This article provides a concise, ten‑minute guide to MySQL master‑slave and master‑master replication, data sharding principles and implementations, and various database deployment architectures—including single‑instance, replication‑based scaling, and sharding‑based scaling—while highlighting practical considerations, advantages, and common pitfalls.

Distributed SystemsMySQLMycat
0 likes · 15 min read
Master MySQL Replication, Sharding, and Distributed Deployment in 10 Minutes
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 20, 2019 · Operations

How Adaptive Load Balancing Can Tame Double‑11 Traffic Peaks

This article explains the challenges of handling Double‑11 traffic spikes, introduces adaptive load‑balancing concepts, analyzes the 5th Middleware Performance Challenge scenario, and outlines algorithm design considerations and evaluation steps for building a robust, self‑adjusting load‑balancing solution.

Distributed SystemsPerformance Testingadaptive algorithm
0 likes · 10 min read
How Adaptive Load Balancing Can Tame Double‑11 Traffic Peaks
DataFunTalk
DataFunTalk
Jun 19, 2019 · Backend Development

Apache Dubbo: High‑Performance Java RPC Framework – History, Ecosystem, and 2019 Roadmap

The article introduces Apache Dubbo, a high‑performance lightweight Java RPC framework, outlines its core capabilities, development history, technical ecosystem, 2019 plans, shares micro‑service implementation experiences, and provides speaker and community information for the DataFun big‑data forum.

Apache DubboDistributed SystemsJava RPC
0 likes · 3 min read
Apache Dubbo: High‑Performance Java RPC Framework – History, Ecosystem, and 2019 Roadmap
Tencent Cloud Developer
Tencent Cloud Developer
Jun 17, 2019 · Cloud Native

Service Mesh Implementation Challenges and Solutions: Practical Insights from Production Environment

Implementing a service mesh in production faces real‑world hurdles such as significant CPU consumption, 20‑50% performance loss, tangled sidecar responsibilities, missing registration support, and control‑plane bottlenecks, which can be mitigated by a central‑mesh fallback, IPC and lock‑free optimizations, staged sidecar splitting, and unified Pilot‑based service discovery.

Cloud NativeDistributed SystemsIstio
0 likes · 16 min read
Service Mesh Implementation Challenges and Solutions: Practical Insights from Production Environment
DataFunTalk
DataFunTalk
Jun 17, 2019 · Big Data

Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era

This article explores Hadoop’s role in the big‑data era, detailing its architecture, core components such as HDFS, YARN, MapReduce, Ozone and Submarine, the challenges of trillion‑scale data, and why its scalability, cost efficiency, and a mature ecosystem give it a competitive edge.

Data LakeDistributed SystemsHadoop
0 likes · 11 min read
Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 5, 2019 · Fundamentals

Understanding Paxos: A Beginner’s 30‑Minute Guide with Real‑World Analogy

This article explains the Paxos consensus algorithm in plain terms, using a relatable travel‑planning analogy to illustrate how proposers, acceptors, and majority voting achieve fault‑tolerant agreement in distributed systems, and connects the concept to real‑world implementations like Google’s Chubby and ZooKeeper.

Distributed SystemsPaxosalgorithm
0 likes · 13 min read
Understanding Paxos: A Beginner’s 30‑Minute Guide with Real‑World Analogy
Java Captain
Java Captain
Jun 2, 2019 · Big Data

Comprehensive Guide to Autumn Recruitment: Strategies, Learning Paths, and Interview Questions for Java and Big Data Positions

This article provides a detailed roadmap for candidates preparing for the autumn recruitment season, covering interview experience sharing, systematic learning routes, project preparation, essential Java and big‑data technologies, core algorithms, and practical interview question collections to help readers avoid common pitfalls and succeed in securing offers.

AlgorithmsAutumn RecruitmentBig Data
0 likes · 18 min read
Comprehensive Guide to Autumn Recruitment: Strategies, Learning Paths, and Interview Questions for Java and Big Data Positions
Architect's Tech Stack
Architect's Tech Stack
May 31, 2019 · Big Data

Kafka Architecture Overview: Producers, Consumers, Partitions, Replication, and Transactions

This article provides a comprehensive overview of Apache Kafka's architecture, covering topics such as producer and consumer workflows, partition and replica management, leader election, offset handling, message delivery semantics, transaction support, and file organization, illustrating how Kafka achieves high performance and scalability.

ConsumerDistributed SystemsKafka
0 likes · 18 min read
Kafka Architecture Overview: Producers, Consumers, Partitions, Replication, and Transactions
21CTO
21CTO
May 30, 2019 · Backend Development

Mastering Dubbo: Deep Dive into Java Service Governance

This article explores why Java remains the dominant backend language, introduces the Dubbo framework and its evolution, explains core concepts such as providers, consumers, and registries, and details practical configurations for registry, load balancing, rate limiting, governance, monitoring, and extensions like DubboX and REST support.

Distributed SystemsDubbobackend-development
0 likes · 24 min read
Mastering Dubbo: Deep Dive into Java Service Governance
Alibaba Cloud Developer
Alibaba Cloud Developer
May 28, 2019 · Databases

How Cloud‑Native Databases Are Redefining the Future of Data Management

This article examines the rapid rise of cloud‑native databases, detailing market trends, architectural innovations such as shared storage and shared‑nothing designs, and showcasing Alibaba Cloud's POLARDB, POLARDB‑X, AnalyticDB, and autonomous tuning platform iBTune, while highlighting security and high‑availability advancements.

AnalyticDBDistributed SystemsPolardb
0 likes · 19 min read
How Cloud‑Native Databases Are Redefining the Future of Data Management