Tagged articles
2122 articles
Page 6 of 22
Selected Java Interview Questions
Selected Java Interview Questions
Jul 15, 2023 · Operations

High‑Availability Architecture for a Large‑Scale Membership System

The article describes how a membership system serving billions of users across multiple platforms achieves high performance and high availability through dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis caching with distributed locks, dual‑center MySQL partitioning, and fine‑grained flow‑control and degradation strategies.

Backend ArchitectureDistributed SystemsElasticsearch
0 likes · 25 min read
High‑Availability Architecture for a Large‑Scale Membership System
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jul 10, 2023 · Big Data

Design and Implementation of the Log Reporting, Collection, and Distribution Pipeline in NetEase Cloud Music's Corona Front‑end Monitoring System

The article details NetEase Cloud Music’s Corona monitoring pipeline, explaining how SDKs report logs via an HTTP service, how a transmission layer normalizes and stores them, how a Flume‑like collector forwards logs to HBase and Kafka, and how Flink tasks shard and filter streams for various monitoring services while handling traffic spikes and offering an independent Node.js channel for other business units.

Distributed SystemsFlinkFrontend
0 likes · 10 min read
Design and Implementation of the Log Reporting, Collection, and Distribution Pipeline in NetEase Cloud Music's Corona Front‑end Monitoring System
Architects Research Society
Architects Research Society
Jul 7, 2023 · Operations

Design Patterns and Principles for Building Large‑Scale Systems

This article outlines key design patterns and principles—such as scalability, idempotency, asynchronous processing, health checks, circuit breakers, feature flags, bulkheads, service discovery, retries, metrics, rate limiting, back‑pressure, and canary releases—that enable large‑scale, reliable, and resilient distributed systems.

Distributed SystemsObservabilityReliability
0 likes · 16 min read
Design Patterns and Principles for Building Large‑Scale Systems
政采云技术
政采云技术
Jun 29, 2023 · Backend Development

Understanding RocketMQ: Architecture, Modules, and Deployment Essentials

This article provides a comprehensive overview of RocketMQ, covering its origin, core concepts, component roles, cluster deployment architecture, workflow, feature details, persistence mechanisms, and cleanup policies, offering developers a solid foundation for using this high‑performance messaging middleware.

ApacheArchitectureDistributed Systems
0 likes · 18 min read
Understanding RocketMQ: Architecture, Modules, and Deployment Essentials
Alibaba Cloud Native
Alibaba Cloud Native
Jun 26, 2023 · Cloud Native

How RocketMQ Evolved Its High‑Availability Architecture for Cloud‑Native Deployments

This article examines RocketMQ's high‑availability evolution—from early master‑slave and Raft‑based designs to the v5 DLedger fusion model—detailing replica groups, data sharding, election mechanisms, replication strategies, metric trade‑offs, log‑divergence handling, controller roles, heartbeat optimizations, and comparisons with Kafka and Pulsar, all illustrated with diagrams and code snippets.

Cloud NativeDLedgerDistributed Systems
0 likes · 36 min read
How RocketMQ Evolved Its High‑Availability Architecture for Cloud‑Native Deployments
DataFunSummit
DataFunSummit
Jun 21, 2023 · Databases

Forum on Building Ultra‑Scale Storage Systems: Insights from Baidu, Meituan, Ant Group, Xiaomi and Baidu Cloud

The forum gathers senior experts from Baidu, Meituan, Ant Group, Xiaomi and Baidu Cloud to share practical experiences and future trends on constructing ultra‑large‑scale file, block, KV and NoSQL storage systems, focusing on low‑cost, high‑performance solutions and architectural challenges.

Distributed SystemsKV storageblock storage
0 likes · 8 min read
Forum on Building Ultra‑Scale Storage Systems: Insights from Baidu, Meituan, Ant Group, Xiaomi and Baidu Cloud
dbaplus Community
dbaplus Community
Jun 20, 2023 · Operations

How Agricultural Bank Built a Chaos Engineering Platform for Resilience

The article outlines the Agricultural Bank of China's initiative to adopt chaos engineering, describing the challenges of modern distributed systems, the design and capabilities of their in‑house chaos platform, product research, industry comparisons, practical use cases across development, operations and disaster recovery, and future development directions.

Cloud NativeDistributed SystemsPlatform Development
0 likes · 14 min read
How Agricultural Bank Built a Chaos Engineering Platform for Resilience
Open Source Linux
Open Source Linux
Jun 20, 2023 · Operations

Mastering Load Balancing: Types, Architectures, and Algorithms Explained

This article explains why high‑performance clusters are complex, introduces the three main load‑balancing categories—DNS, hardware, and software—describes their definitions, advantages, and drawbacks, outlines typical combined architectures, and reviews common load‑balancing algorithms such as round‑robin, weighted round‑robin, least‑load, performance‑optimal, and hash‑based methods.

AlgorithmsDistributed SystemsHardware
0 likes · 6 min read
Mastering Load Balancing: Types, Architectures, and Algorithms Explained
FunTester
FunTester
Jun 19, 2023 · Big Data

Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability

This article provides a comprehensive overview of Kafka's architecture and fundamental concepts, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower synchronization, offset handling, message storage at both logical and physical layers, as well as producer and consumer workflows, partition assignment strategies, rebalancing, log management, zero‑copy I/O, and reliability mechanisms.

Distributed SystemsKafkaLog Management
0 likes · 22 min read
Kafka Architecture and Core Concepts: Brokers, Producers, Consumers, Topics, Partitions, Replicas, and Reliability
Baidu Geek Talk
Baidu Geek Talk
Jun 19, 2023 · Operations

How Baidu’s Tianyan Log Service Overcomes ELK’s Scaling and Performance Limits

This article examines the challenges of logging in distributed services, compares the traditional ELK stack with Baidu's Tianyan solution, details Tianyan's architecture—including Ingest, Store, Consumer, Elastic Agent, Fleet, APM, Beats, and Disruptor‑based high‑throughput pipelines—covers resource isolation, dynamic cleanup, and best‑practice recommendations for building a scalable, low‑latency log platform.

Distributed SystemsElastic StackLog Management
0 likes · 26 min read
How Baidu’s Tianyan Log Service Overcomes ELK’s Scaling and Performance Limits
Open Source Linux
Open Source Linux
Jun 16, 2023 · Backend Development

How Netflix’s Cloud Gateway Cuts Errors with Adaptive Load Balancing

Netflix’s cloud‑gateway team redesigned its load‑balancing stack—combining client latency, server utilization, and probabilistic choice‑of‑2 algorithms—to dramatically lower error rates, improve request distribution, and enhance fault‑tolerance for millions of requests per second.

Distributed SystemsNetflixadaptive algorithms
0 likes · 19 min read
How Netflix’s Cloud Gateway Cuts Errors with Adaptive Load Balancing
ITPUB
ITPUB
Jun 15, 2023 · Databases

How Domestic Databases Are Shaping China’s Financial Digital Transformation

Amid China’s push for digital and domestic technology, the article examines the evolution of native database products, the opportunities and challenges they face—especially in the financial sector—and how policy, cloud‑native architectures, distributed systems, and multi‑cloud demands are driving the next wave of innovation.

ChinaDigital TransformationDistributed Systems
0 likes · 10 min read
How Domestic Databases Are Shaping China’s Financial Digital Transformation
Sanyou's Java Diary
Sanyou's Java Diary
Jun 12, 2023 · Backend Development

Master RocketMQ 4.9.x Consumption: Architecture, Load Balancing, and Retry Strategies

This article walks through RocketMQ 4.9.x’s consumption architecture, explaining the roles of NameServer, Broker, Producer and Consumer, the publish‑subscribe model, storage structures, load‑balancing algorithms, long‑polling, concurrent and ordered consumption, progress persistence, and the built‑in retry mechanism.

ConsumerDistributed SystemsMessage Queue
0 likes · 28 min read
Master RocketMQ 4.9.x Consumption: Architecture, Load Balancing, and Retry Strategies
DeWu Technology
DeWu Technology
Jun 7, 2023 · Backend Development

Ensuring Data Consistency Across Microservices: Strategies and Design Principles

This article examines why data consistency between microservices is critical, defines key terminology, and presents two practical approaches—business‑side final consistency and platform‑side final consistency—detailing their core ideas, design principles, workflow diagrams, and real‑world implementation considerations such as idempotency, storage choices, latency tolerance, state‑machine design, concurrency control, and observability.

Data ConsistencyDistributed SystemsIdempotency
0 likes · 17 min read
Ensuring Data Consistency Across Microservices: Strategies and Design Principles
Code Ape Tech Column
Code Ape Tech Column
Jun 6, 2023 · Backend Development

Business Compensation Mechanisms: Rollback and Retry Strategies in Distributed Systems

The article explains business compensation mechanisms in distributed microservice architectures, detailing rollback and retry approaches, their implementation patterns, strategies, and practical considerations for achieving eventual consistency while handling failures and outlines best practices for idempotency, monitoring, and workflow engine design.

Distributed SystemsRetrybusiness compensation
0 likes · 14 min read
Business Compensation Mechanisms: Rollback and Retry Strategies in Distributed Systems
Top Architect
Top Architect
Jun 5, 2023 · Big Data

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

This article comprehensively explores Kafka’s core concepts, architecture, and the techniques it employs—such as ack strategies, replica synchronization, high‑watermark, leader‑epoch, zero‑copy, batch sending, compression, and reactor‑based networking—to achieve both strong reliability and high throughput in distributed messaging systems.

Distributed SystemsKafkaMessage Queue
0 likes · 31 min read
Deep Dive into Kafka’s High Reliability and High Performance Mechanisms
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 5, 2023 · Databases

Mastering Cache Consistency: Strategies to Prevent Stale Data in High‑Concurrency Systems

This article examines why cache‑database consistency problems arise under high concurrency, compares common update orders, explains delayed double‑delete and cache‑aside patterns, and presents practical solutions such as retry mechanisms, message queues, and MySQL binlog subscription to keep data synchronized.

Cache ConsistencyDistributed Systemscache-aside
0 likes · 11 min read
Mastering Cache Consistency: Strategies to Prevent Stale Data in High‑Concurrency Systems
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 5, 2023 · Artificial Intelligence

How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs

The Dynamic Graph Sampling (DGS) service, built on GraphLearn, delivers sub‑20 ms latency for real‑time GNN inference on large, constantly evolving graphs by separating storage from computation, using event‑driven pre‑sampling, lazy multi‑hop concatenation, and a publish‑subscribe architecture that scales linearly across distributed workers.

Alibaba CloudDistributed SystemsGraphLearn
0 likes · 12 min read
How Alibaba’s DGS Enables Real‑Time GNN Inference on Massive Dynamic Graphs
Architects Research Society
Architects Research Society
Jun 4, 2023 · Big Data

Understanding Transactions in Apache Kafka

This article explains the design, semantics, and practical usage of Apache Kafka's transaction API, covering why transactions are needed for exactly‑once processing, the underlying atomic multi‑partition writes, zombie fencing, consumer guarantees, Java API details, performance considerations, and operational best practices.

Apache KafkaDistributed SystemsExactly-Once
0 likes · 19 min read
Understanding Transactions in Apache Kafka
MaGe Linux Operations
MaGe Linux Operations
Jun 1, 2023 · Backend Development

How Netflix’s New Load‑Balancing Algorithm Cuts Errors by Orders of Magnitude

Netflix’s cloud‑gateway team redesigned Zuul’s load‑balancing using a combination of client latency, server utilization, choice‑of‑2 and Join‑the‑Shortest‑Queue algorithms, adding server‑reported metrics, adaptive thresholds and statistical decay, which dramatically reduced error rates, latency and improved traffic distribution in production.

Distributed SystemsNetflixPerformance
0 likes · 20 min read
How Netflix’s New Load‑Balancing Algorithm Cuts Errors by Orders of Magnitude
Liangxu Linux
Liangxu Linux
May 28, 2023 · Backend Development

Preventing Redis Cache Penetration, Avalanche, and Thundering Herd

This article explains the causes of Redis cache penetration, avalanche, and thundering herd, and provides practical mitigation strategies such as caching null values, using white‑lists, Bloom filters, pre‑warming hot keys, staggered expirations, multi‑level caching, and lock mechanisms.

BackendCacheDistributed Systems
0 likes · 7 min read
Preventing Redis Cache Penetration, Avalanche, and Thundering Herd
Didi Tech
Didi Tech
May 26, 2023 · Big Data

Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System

Didi’s redesigned Spatial‑Temporal Supply‑Demand System replaces a single‑Redis bottleneck with a multi‑cluster routing layer, semantic sharding, multi‑level caching and delayed queues, achieving higher horizontal scalability, fault isolation, ~30 % latency reduction, increased cache hit rates, fewer query nodes, and faster, code‑free feature configuration.

Configuration ManagementDistributed SystemsGolang
0 likes · 19 min read
Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System
dbaplus Community
dbaplus Community
May 25, 2023 · Databases

15 Powerful Redis Patterns for Scalable Backend Systems

This article presents fifteen practical Redis usage patterns—including caching, distributed sessions, locks, global IDs, counters, bitmaps, shopping carts, timelines, message queues, lotteries, likes, product tags, filtering, follow relationships, and ranking—illustrating how each can be implemented with commands and code snippets to build efficient, scalable backend services.

Data StructuresDistributed Systems
0 likes · 9 min read
15 Powerful Redis Patterns for Scalable Backend Systems
Code Ape Tech Column
Code Ape Tech Column
May 22, 2023 · Backend Development

Vertical Performance Optimization: Load Balancing Architecture and Practices

This article explores the evolution of load‑balancing architectures from Alibaba’s early systems to modern micro‑service meshes, detailing DNS, hardware, and software solutions, common algorithms, and real‑world case studies such as Double‑11, China Railway 12306, WeChat red packets, and Douyin, highlighting performance, scalability, and reliability considerations.

Distributed SystemsService Mesh
0 likes · 17 min read
Vertical Performance Optimization: Load Balancing Architecture and Practices
DataFunTalk
DataFunTalk
May 21, 2023 · Databases

Graph Database Storage Techniques and Practices with Galaxybase

This article introduces RDF and property graph models, explains the core goals of graph database storage, compares mainstream storage solutions such as array, linked‑list and LSM‑Tree approaches, and presents practical deployment experiences of the Galaxybase distributed graph database.

Distributed SystemsGalaxybaseGraph Database
0 likes · 23 min read
Graph Database Storage Techniques and Practices with Galaxybase
Selected Java Interview Questions
Selected Java Interview Questions
May 17, 2023 · Backend Development

Effective Cache Strategies for Large Distributed Systems

This article explains how to design and use various client‑side, CDN, and server‑side caching techniques—including HTTP Cache‑Control, Redis data structures, cache consistency patterns, and mitigation of cache penetration, breakdown, and avalanche—to improve performance and reliability of high‑traffic distributed applications.

CDNCache ConsistencyDistributed Systems
0 likes · 23 min read
Effective Cache Strategies for Large Distributed Systems
ITPUB
ITPUB
May 10, 2023 · Cloud Native

How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments

This article explains how Meituan’s storage team designed the MStore distributed storage platform to separate storage and compute, addressing scaling, cost, and reliability challenges of monolithic architectures, and details its cloud‑native components, data model, performance optimizations, observability, and the derived EBS block‑storage service.

Distributed SystemsMStorePerformance
0 likes · 16 min read
How Meituan’s MStore Achieves Scalable Storage‑Compute Separation in Cloud‑Native Environments
HelloTech
HelloTech
May 8, 2023 · Artificial Intelligence

One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

The article presents a comprehensive one‑stop AI platform that unifies training, model, feature, and decision services across cloud, edge, mobile, Flink, and application environments, detailing its architecture, the limitations of cloud‑centric inference, the advantages of localized inference, and the challenges and solutions for model and feature localization, SDK design, and future AutoML enhancements.

AI PlatformDistributed SystemsFlink
0 likes · 17 min read
One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions
Architects Research Society
Architects Research Society
May 6, 2023 · Databases

Understanding Eventual Consistency in Apache CouchDB

This article explains how Apache CouchDB achieves eventual consistency through its MVCC architecture, CAP theorem trade‑offs, incremental replication, and document‑level versioning, illustrating concepts such as local consistency, conflict resolution, and practical use‑cases for building scalable distributed systems.

CouchDBDistributed SystemsMVCC
0 likes · 21 min read
Understanding Eventual Consistency in Apache CouchDB
Top Architect
Top Architect
May 3, 2023 · Backend Development

Understanding RPC: Principles, Implementation Details, and Code Walkthrough

This article explains the fundamentals of Remote Procedure Call (RPC), covering its definition, core challenges, service registration and discovery with Zookeeper, client proxy generation, network transmission using Netty, serialization and compression, server-side request handling via reflection or Javassist, and performance comparisons between proxy strategies.

Distributed SystemsJavassistNetty
0 likes · 24 min read
Understanding RPC: Principles, Implementation Details, and Code Walkthrough
Top Architect
Top Architect
Apr 30, 2023 · Backend Development

Kafka Core Concepts, Architecture, Performance, and Operational Practices

This article provides a comprehensive overview of Kafka, covering its core value as a message queue, fundamental concepts, cluster architecture, log storage mechanisms, zero‑copy data transfer, high‑throughput and high‑availability design, consumer group behavior, rebalance strategies, and practical operational commands for managing topics, partitions, and offsets.

BackendDistributed SystemsKafka
0 likes · 31 min read
Kafka Core Concepts, Architecture, Performance, and Operational Practices
AntTech
AntTech
Apr 28, 2023 · Information Security

Threshold Proxy Re‑Encryption (TPRE) with National Cryptographic Algorithms for Secure Data Sharing

The article explains how cryptographic access control, especially a hybrid‑encrypted Threshold Proxy Re‑Encryption scheme built on national SM2/SM3/SM4 algorithms, offers high‑strength, decentralized, and efficient data authorization and sharing, addressing the limitations of traditional role‑based models.

Distributed Systemsaccess controlcryptography
0 likes · 5 min read
Threshold Proxy Re‑Encryption (TPRE) with National Cryptographic Algorithms for Secure Data Sharing
Architects Research Society
Architects Research Society
Apr 25, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka's transaction API, describes how exactly‑once processing is achieved in stream‑processing applications, outlines the Java client usage, and discusses the internal components, performance considerations, and best‑practice tips for developers.

Distributed SystemsExactly-OnceKafka
0 likes · 16 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance
Architecture Digest
Architecture Digest
Apr 23, 2023 · Backend Development

Kafka Core Concepts, Architecture, Performance Optimizations, and Production Deployment Guide

This article provides a comprehensive technical overview of Kafka, covering its core message‑queue value, architecture components such as producers, consumers, topics, partitions and replication, high‑performance mechanisms like zero‑copy and OS cache, resource planning for disks, memory, CPU and network, operational tools and commands, consumer‑group management, rebalance strategies, and internal scheduling mechanisms such as the time‑wheel.

Backend ArchitectureDistributed SystemsKafka
0 likes · 30 min read
Kafka Core Concepts, Architecture, Performance Optimizations, and Production Deployment Guide
Big Data Technology Architecture
Big Data Technology Architecture
Apr 22, 2023 · Big Data

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

This article comprehensively explores Kafka’s core architecture, explaining how asynchronous decoupling and traffic shaping are achieved, detailing the roles of producers, brokers, consumers, and ZooKeeper, and analyzing the reliability and performance techniques such as ACK policies, replication, idempotent and transactional producers, page‑cache flushing, zero‑copy, compression, batching, and load‑balancing strategies.

Distributed SystemsMessage QueueReliability
0 likes · 31 min read
Deep Dive into Kafka’s High Reliability and High Performance Mechanisms
JD Retail Technology
JD Retail Technology
Apr 19, 2023 · Databases

Understanding Distributed Data Consistency: CAP, BASE, and Transaction Solutions

This article explains why achieving data consistency in modern distributed systems is challenging, reviews ACID properties of local databases, discusses the CAP and BASE theorems, examines event ordering mechanisms, and compares practical solutions such as two‑phase commit, XA, local message tables, and MQ‑based transaction models.

BASE theoremCAP theoremData Consistency
0 likes · 19 min read
Understanding Distributed Data Consistency: CAP, BASE, and Transaction Solutions
Huolala Tech
Huolala Tech
Apr 17, 2023 · Big Data

How HuoLala Accelerated Ad‑hoc Queries with a Hybrid Offline Engine

This article describes how HuoLala identified slow ad‑hoc query performance in its Hive‑on‑Tez stack, surveyed comparable industry solutions, and built a multi‑engine hybrid offline service that dramatically improves query latency, outlines its architecture, key design decisions, production impact, and future roadmap.

Big DataDistributed SystemsSQL Routing
0 likes · 12 min read
How HuoLala Accelerated Ad‑hoc Queries with a Hybrid Offline Engine
Sanyou's Java Diary
Sanyou's Java Diary
Apr 13, 2023 · Backend Development

Master Dubbo: High‑Performance Java RPC Framework Explained

This comprehensive guide introduces Dubbo, a high‑performance Java RPC framework, covering its core concepts, architecture, configuration methods, load‑balancing and fault‑tolerance strategies, underlying communication mechanisms, and extension points, helping developers build robust distributed applications.

Distributed SystemsDubboJava RPC
0 likes · 24 min read
Master Dubbo: High‑Performance Java RPC Framework Explained
ITPUB
ITPUB
Apr 13, 2023 · Fundamentals

Mastering Distributed Transactions: From CAP to BASE and Practical Solutions

This article explains distributed transactions, the reasons they arise, the CAP and BASE theories that guide consistency trade‑offs, and outlines strong, eventual, and weak consistency solutions along with popular frameworks for implementing them in modern distributed systems.

BASE theoryCAP theoryDistributed Systems
0 likes · 11 min read
Mastering Distributed Transactions: From CAP to BASE and Practical Solutions
政采云技术
政采云技术
Apr 13, 2023 · Backend Development

Understanding the AKF Scale Cube: X, Y, Z Axes for System Scalability and Their Application to Kafka and Redis

The article explains the AKF Scale Cube model—horizontal replication (X axis), functional decomposition (Y axis), and data/service partitioning (Z axis)—and demonstrates how these three scaling dimensions can be applied to backend systems such as Kafka and Redis to achieve high availability, performance, and fault isolation.

AKF Scale CubeDistributed SystemsKafka
0 likes · 19 min read
Understanding the AKF Scale Cube: X, Y, Z Axes for System Scalability and Their Application to Kafka and Redis
IT Architects Alliance
IT Architects Alliance
Apr 13, 2023 · Backend Development

WebSocket Load Balancing Across Microservices Using a Single Annotation

This article explains how to solve the WebSocket message‑delivery problem in a micro‑service environment by introducing a lightweight library that uses a custom annotation to automatically forward messages between service instances, with detailed design, configuration, and code examples.

BackendDistributed SystemsMicroservices
0 likes · 12 min read
WebSocket Load Balancing Across Microservices Using a Single Annotation
Java Architect Essentials
Java Architect Essentials
Apr 12, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System

This article details the design and implementation of a high‑availability, billion‑scale membership system, covering Elasticsearch dual‑center clusters, traffic‑isolated architectures, deep ES optimizations, Redis caching strategies, MySQL migration with dual‑center partitioning, abnormal member relationship handling, and future fine‑grained flow‑control and degradation plans.

Distributed SystemsElasticsearchFlow Control
0 likes · 20 min read
High‑Availability Architecture for a Billion‑Scale Membership System
HomeTech
HomeTech
Apr 5, 2023 · Backend Development

Design and Implementation of a Real‑Time Cache Update System Based on Kafka and Distributed Cache

This article presents a comprehensive design and implementation of a real‑time cache update system that leverages Kafka‑driven database change streams, a centralized cache scheduling center, executor registration, broadcast and fail‑over scheduling, and a lightweight SDK to achieve millisecond‑level cache consistency for C‑end services.

BackendCacheDistributed Systems
0 likes · 10 min read
Design and Implementation of a Real‑Time Cache Update System Based on Kafka and Distributed Cache
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 31, 2023 · Backend Development

How Alibaba’s Custom Three‑Layer Distribution Boosts Scheduled Task Efficiency

This article walks through Alibaba's evolution from single‑machine scheduled jobs to a customized three‑layer distributed task framework, detailing classifications, Spring scheduling examples, batch processing integration, cluster distribution mechanics, and optimization techniques that maximize resource utilization and achieve smooth, balanced task execution.

AntschedulerDistributed Systemsspring
0 likes · 16 min read
How Alibaba’s Custom Three‑Layer Distribution Boosts Scheduled Task Efficiency
Open Source Linux
Open Source Linux
Mar 31, 2023 · Fundamentals

What Is a Network Operating System? Concepts, Functions, and Key Examples

This article explains the concept of network operating systems, traces their historical development, outlines core functions such as resource sharing, communication, security, and management, describes typical architectures, and introduces major examples like UNIX, Linux, NetWare, and Windows Server.

Distributed SystemsNOSNetwork Operating System
0 likes · 8 min read
What Is a Network Operating System? Concepts, Functions, and Key Examples
Code Ape Tech Column
Code Ape Tech Column
Mar 30, 2023 · Backend Development

How to Ensure No Message Loss in MQ Systems – Interview Guide and Practical Solutions

This article explains the common interview question of guaranteeing 100% message reliability in MQ middleware such as Kafka or RabbitMQ, outlines the three lifecycle stages of a message, discusses detection mechanisms, id generation, idempotent consumption, and handling message backlog, providing concrete design patterns and practical examples.

Distributed SystemsIdempotencyKafka
0 likes · 12 min read
How to Ensure No Message Loss in MQ Systems – Interview Guide and Practical Solutions
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 29, 2023 · Backend Development

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

ByteHouse evolved its real‑time import pipeline from a community ClickHouse architecture to a custom HaKafka engine and a cloud‑native design, addressing node failures, read‑write conflicts, scaling costs, and latency by introducing two‑level concurrency, memory tables, exactly‑once semantics, and robust fault‑tolerance.

Distributed SystemsKafkaReal-time Ingestion
0 likes · 15 min read
How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka
MaGe Linux Operations
MaGe Linux Operations
Mar 27, 2023 · Backend Development

Mastering Rate Limiting: Concepts, Algorithms, and Real-World Implementations

This article explains the fundamental concepts of rate limiting, including time and resource dimensions, various rule types such as QPS, connection count, bandwidth, black/white lists, and distributed considerations, then details common algorithms like token bucket, leaky bucket, sliding window, and practical implementations using Nginx, Guava, Redis, and Sentinel.

BackendDistributed SystemsToken Bucket
0 likes · 16 min read
Mastering Rate Limiting: Concepts, Algorithms, and Real-World Implementations
Top Architect
Top Architect
Mar 27, 2023 · Big Data

Kafka Architecture, Performance Optimization, and Production Deployment Guide

This article provides a comprehensive overview of Kafka’s core concepts, high‑performance design, cluster planning, resource evaluation, deployment steps, producer and consumer configurations, fault‑tolerance mechanisms, and operational tools, offering practical guidance for building and managing a high‑throughput Kafka production environment.

Cluster DeploymentConsumerDistributed Systems
0 likes · 31 min read
Kafka Architecture, Performance Optimization, and Production Deployment Guide
dbaplus Community
dbaplus Community
Mar 22, 2023 · Databases

Scaling an Airline Ticket Order Database: From Monolith to 64‑Shard Sharding

The article details how a rapidly growing airline ticket order system was re‑architected by identifying performance bottlenecks, applying vertical and horizontal sharding, optimizing cache layers, implementing dual‑write mechanisms, and planning a phased migration to achieve ten‑fold QPS growth while reducing resource usage and operational risk.

Cache OptimizationDistributed SystemsDual Write
0 likes · 38 min read
Scaling an Airline Ticket Order Database: From Monolith to 64‑Shard Sharding
Architect
Architect
Mar 21, 2023 · Operations

Log Management, Observability, and APM Practices in Distributed Systems

This article explains what logs are, when to record them, their value in large‑scale architectures, and how to build effective logging, metrics, and tracing platforms using tools such as ELK, Prometheus, and SkyWalking, while also presenting good and bad logging practices and sample batch‑log retrieval code.

APMDistributed SystemsELK
0 likes · 20 min read
Log Management, Observability, and APM Practices in Distributed Systems
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 16, 2023 · Databases

How ByteDance’s Abase Achieves Extreme High Availability in KV Storage

This article explains the evolution, architecture, and high‑availability solutions of ByteDance’s Abase KV storage system, detailing its multi‑write design, leader‑less approach, multi‑region deployment, consistency mechanisms, performance optimizations, and real‑world metrics that support billions of requests per second.

ByteDanceDistributed SystemsKV storage
0 likes · 20 min read
How ByteDance’s Abase Achieves Extreme High Availability in KV Storage
DataFunSummit
DataFunSummit
Mar 15, 2023 · Databases

Abase: ByteDance’s Large‑Scale Online KV Storage System – Architecture, High Availability, and Key Technologies

This article introduces Abase, ByteDance’s massive online KV storage system, detailing its evolution from a single‑cluster KV service to a multi‑region, multi‑tenant platform, and explains the high‑availability challenges and the leaderless multi‑write architecture, hybrid logical clocks, quorum settings, and performance optimizations that enable hundred‑billion QPS and sub‑10 ms latency.

ABaseByteDanceDatabase Architecture
0 likes · 19 min read
Abase: ByteDance’s Large‑Scale Online KV Storage System – Architecture, High Availability, and Key Technologies
Architects Research Society
Architects Research Society
Mar 15, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

This article explains why exactly‑once semantics are needed for stream‑processing applications, describes Kafka's transactional model and semantics, details the Java transaction API and its usage, and discusses the internal components, performance trade‑offs, and practical guidelines for building reliable Kafka‑based pipelines.

Distributed SystemsExactly-OnceKafka
0 likes · 17 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations
ITPUB
ITPUB
Mar 14, 2023 · Fundamentals

Master Distributed Systems: CAP, BASE, Locks, Transactions, Paxos & Raft

This comprehensive guide explores core distributed system concepts—including the CAP theorem and its trade‑offs, BASE consistency, various distributed lock strategies, multiple transaction patterns such as 2PC, 3PC, TCC and Seata, as well as consensus algorithms Paxos and Raft, while also covering idempotency and rate‑limiting techniques.

CAP theoremDistributed SystemsDistributed Transactions
0 likes · 29 min read
Master Distributed Systems: CAP, BASE, Locks, Transactions, Paxos & Raft
Programmer DD
Programmer DD
Mar 14, 2023 · Backend Development

Why This Spring ‘Full‑Stack’ Book Is a Must‑Read for Java Developers

The article reviews a comprehensive Spring framework book that covers the entire Spring family—from core concepts and data access to web development and cloud‑native microservices—using a practical, localized approach and a large milk‑tea shop case study to guide both beginners and experienced Java developers.

Distributed SystemsSpring Frameworkbackend-development
0 likes · 10 min read
Why This Spring ‘Full‑Stack’ Book Is a Must‑Read for Java Developers
Bilibili Tech
Bilibili Tech
Mar 14, 2023 · Big Data

Bilibili HDFS Erasure Coding Strategy and Implementation

Bilibili reduced petabyte‑scale storage costs by back‑porting erasure‑coding patches to its HDFS 2.8.4 cluster, deploying a parallel EC‑enabled cluster, adding a data‑proxy service, intelligent routing and block‑checking, and automating cold‑data migration, while noting write overhead and planning native acceleration.

Big DataData ReliabilityDistributed Systems
0 likes · 14 min read
Bilibili HDFS Erasure Coding Strategy and Implementation
FunTester
FunTester
Mar 13, 2023 · Operations

How Chaos Engineering Can Strengthen System Reliability: A Practical Guide

This article explains the origins and principles of chaos engineering, illustrates how fault‑injection scenarios expose system weaknesses, outlines step‑by‑step implementation—from tool selection and metric definition to execution and post‑mortem—and highlights its role in achieving high‑availability service level agreements.

DevOpsDistributed SystemsFault Injection
0 likes · 10 min read
How Chaos Engineering Can Strengthen System Reliability: A Practical Guide
JavaEdge
JavaEdge
Mar 8, 2023 · Backend Development

Choosing CP vs AP for Service Discovery: When to Use Zookeeper or a Message Bus

This article explains the importance of service discovery in high‑availability systems, compares DNS, VIP, Zookeeper‑based CP solutions and message‑bus‑based AP approaches, outlines their registration and subscription workflows, highlights scalability and consistency trade‑offs, and provides practical guidance for designing robust registration centers.

APDistributed SystemsMessage Bus
0 likes · 14 min read
Choosing CP vs AP for Service Discovery: When to Use Zookeeper or a Message Bus
Top Architect
Top Architect
Mar 8, 2023 · Backend Development

Implementing Rate Limiting with Redis: setnx, ZSet Sliding Window, and Token‑Bucket Approaches

This article explains three Redis‑based rate‑limiting techniques—using setnx for simple counters, leveraging ZSet for a sliding‑window algorithm, and applying a token‑bucket pattern with List—provides Java code examples for each method, discusses their advantages and drawbacks, and shows how to integrate them into backend services.

Distributed Systemsbackend-developmentjava
0 likes · 7 min read
Implementing Rate Limiting with Redis: setnx, ZSet Sliding Window, and Token‑Bucket Approaches
MaGe Linux Operations
MaGe Linux Operations
Feb 25, 2023 · Backend Development

Mastering Rate Limiting: Strategies, Algorithms, and Real‑World Implementations

This article explains how rate limiting protects system availability by controlling traffic flow, introduces common patterns such as circuit breaking, service degradation, delay and privilege handling, compares cache, degradation, and rate limiting, and details popular algorithms and practical code implementations for both single‑node and distributed environments.

Distributed SystemsGuavaToken Bucket
0 likes · 13 min read
Mastering Rate Limiting: Strategies, Algorithms, and Real‑World Implementations
JD Tech
JD Tech
Feb 23, 2023 · Backend Development

Comprehensive Guide to Scheduling Tasks: Algorithms, Java Implementations, and Distributed Solutions

This article provides an in‑depth overview of scheduled task processing, covering common business scenarios, fundamental principles, single‑machine algorithms such as min‑heap and time‑wheel, Java utilities like Timer, DelayQueue, ScheduledExecutorService, Spring Task, Quartz, and distributed approaches using Redis, Elastic‑Job, and XXL‑Job.

Distributed SystemsTime Wheelcron
0 likes · 22 min read
Comprehensive Guide to Scheduling Tasks: Algorithms, Java Implementations, and Distributed Solutions
Alibaba Cloud Native
Alibaba Cloud Native
Feb 23, 2023 · Cloud Native

How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power

This article explains how OpenYurt, an unobtrusive cloud‑native edge platform, integrates with the CNStack technology hub to deliver high‑availability, offline‑autonomous, and programmable edge services for Longyuan Power’s massive multi‑province server fleet.

CNStackCloud NativeDistributed Systems
0 likes · 10 min read
How OpenYurt Enables Large‑Scale Edge Computing for Longyuan Power
MaGe Linux Operations
MaGe Linux Operations
Feb 19, 2023 · Backend Development

Mastering API Protection: Rate Limiting, Caching, and Degradation for E‑Commerce Spikes

When a product suddenly surges in demand, this guide explains how to safeguard e‑commerce APIs using rate‑limiting algorithms (leaky bucket, token bucket, sliding window), Nginx and Java semaphore controls, distributed throttling with message queues, service degradation strategies, and caching techniques to maintain stability.

Distributed Systemse‑commercerate limiting
0 likes · 11 min read
Mastering API Protection: Rate Limiting, Caching, and Degradation for E‑Commerce Spikes
ITPUB
ITPUB
Feb 13, 2023 · Fundamentals

How a Bat-Borne Virus Explains the Gossip Protocol in Distributed Systems

Using a fictional coronavirus carried by a bat, the article illustrates the Gossip protocol’s mechanisms—direct mail, anti-entropy, and epidemic spread—to explain how distributed systems achieve eventual consistency, highlighting advantages, drawbacks, and practical considerations for storage components like Cassandra.

Anti-entropyDistributed SystemsGossip Protocol
0 likes · 10 min read
How a Bat-Borne Virus Explains the Gossip Protocol in Distributed Systems
Bilibili Tech
Bilibili Tech
Feb 7, 2023 · Cloud Native

Bilibili Configuration Center (Config & Paladin): Architecture, Features, and Performance

Bilibili’s Config Center evolved from the 2017 Config v1 monolith—offering unified UI, MySQL storage, and long‑polling—to the Raft‑based Paladin v2, which adds lifecycle management, tenant isolation, incremental publishing, high‑throughput caching, multi‑active deployment, validation and rich tooling, handling hundreds of thousands of configs and tens of thousands of concurrent clients with sub‑50 ms push latency while planning deeper K8s integration.

Distributed SystemsMicroservicesPaladin
0 likes · 15 min read
Bilibili Configuration Center (Config & Paladin): Architecture, Features, and Performance
IT Architects Alliance
IT Architects Alliance
Feb 6, 2023 · Cloud Native

What Is Kubernetes and Why Is It Hard to Get Started?

This article introduces Kubernetes as a Google‑originated container‑based distributed cluster management system, explaining its architecture, core components such as Master, Nodes, Pods, Services, etcd, and detailing how communication, scheduling, storage, external access, scaling, and controller coordination work together.

Cloud NativeDistributed SystemsKubernetes
0 likes · 8 min read
What Is Kubernetes and Why Is It Hard to Get Started?
Code Ape Tech Column
Code Ape Tech Column
Feb 6, 2023 · Backend Development

Understanding the Basic Structure and Technical Stack of RPC Architecture

This article explains the fundamental components of RPC architecture, details the client‑server roles, communication protocols, serialization methods, transport protocols, and synchronous versus asynchronous invocation patterns, providing a comprehensive guide for building a custom RPC framework.

Distributed SystemsRPCTransport Protocol
0 likes · 12 min read
Understanding the Basic Structure and Technical Stack of RPC Architecture
21CTO
21CTO
Feb 5, 2023 · Backend Development

How Meituan Scaled Its Code Hosting Platform to Millions of Repositories

This article details Meituan's three‑stage evolution of its self‑developed Code platform—from a single‑machine service to a multi‑machine read‑write‑separated system and finally to a distributed, sharded architecture—highlighting the scalability and high‑availability challenges faced and the engineering solutions implemented.

Backend ArchitectureDistributed SystemsScalability
0 likes · 24 min read
How Meituan Scaled Its Code Hosting Platform to Millions of Repositories
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 3, 2023 · Cloud Computing

Rethinking Cloud Computing: How Alibaba’s CIPU Redefines Compute Power

This article revisits cloud computing by tracing the evolution of compute power, exploring Alibaba Cloud’s infrastructure breakthroughs such as the CIPU processor and its core platforms, and analyzing how these advances reshape elastic, big‑data, high‑performance, and AI workloads while highlighting trust, cost, and self‑service challenges.

Alibaba CloudCIPUCloud Computing
0 likes · 32 min read
Rethinking Cloud Computing: How Alibaba’s CIPU Redefines Compute Power
Java Architect Essentials
Java Architect Essentials
Feb 2, 2023 · Backend Development

Comparison of Distributed Task Scheduling Frameworks: Elastic‑Job vs X‑Job and Other Solutions

This article examines common business scenarios that require timed execution, introduces single‑machine and distributed scheduling frameworks such as Timer, ScheduledExecutorService, Spring, Quartz, TBSchedule, Elastic‑Job, Saturn and XXL‑Job, and provides a detailed feature‑by‑feature comparison to help choose the most suitable solution.

BackendComparisonDistributed Systems
0 likes · 14 min read
Comparison of Distributed Task Scheduling Frameworks: Elastic‑Job vs X‑Job and Other Solutions
Meituan Technology Team
Meituan Technology Team
Feb 2, 2023 · R&D Management

Design and Evolution of Meituan's Distributed Code Hosting Platform

Meituan’s home‑grown Code platform evolved from a single‑server Git service to a distributed, sharded system with multi‑active replication, using Go‑based HTTP/SSH proxies, gRPC communication, and version‑based routing to achieve horizontal scalability, high availability, and millions of daily Git operations.

Distributed SystemsGitMeituan
0 likes · 22 min read
Design and Evolution of Meituan's Distributed Code Hosting Platform
JD Tech
JD Tech
Feb 2, 2023 · Fundamentals

Understanding the Byzantine Generals Problem and the Raft Consensus Algorithm

This article explains the Byzantine Generals problem, its fault‑tolerance limits, and how the Raft consensus algorithm solves a simplified version of the problem through leader election, log replication, and safety mechanisms, while also comparing Raft with Paxos, ZAB, and PBFT and providing Go code examples.

Byzantine GeneralsConsensus AlgorithmDistributed Systems
0 likes · 20 min read
Understanding the Byzantine Generals Problem and the Raft Consensus Algorithm
vivo Internet Technology
vivo Internet Technology
Jan 30, 2023 · Backend Development

Dubbo ZooKeeper Registry Implementation Principle Analysis

The article dissects Dubbo’s ZooKeeperRegistry by tracing its inheritance from AbstractRegistry through FailbackRegistry to CacheableFailbackRegistry, detailing local memory‑disk caching, retry logic via a timing wheel, URL‑push optimizations, and the ZooKeeper‑based ephemeral node and watcher mechanisms that enable dynamic service discovery, while also covering core ZooKeeper concepts.

Distributed SystemsDubboRPC Framework
0 likes · 20 min read
Dubbo ZooKeeper Registry Implementation Principle Analysis
Architect's Guide
Architect's Guide
Jan 28, 2023 · Backend Development

Implementing a Simple Java RPC Framework with Zookeeper, Netty, and Javassist

This article walks through the design and implementation of a lightweight Java RPC framework, covering core concepts such as service registration and discovery with Zookeeper, network communication via Netty, serialization, compression, dynamic proxy generation using Javassist, and performance comparisons between reflection and bytecode‑generated proxies.

Distributed SystemsJavassistNetty
0 likes · 23 min read
Implementing a Simple Java RPC Framework with Zookeeper, Netty, and Javassist
Top Architect
Top Architect
Jan 19, 2023 · Backend Development

Implementing a Simple Java RPC Framework: Architecture, Service Registration, Proxy Generation, and Network Transport

This article explains the principles and implementation of a lightweight Java RPC framework, covering service registration with Zookeeper, client-side dynamic proxies, serialization, compression, Netty-based network transport, and both reflection and Javassist proxy generation, with extensive code examples and performance comparison.

Distributed SystemsJavassistNetty
0 likes · 25 min read
Implementing a Simple Java RPC Framework: Architecture, Service Registration, Proxy Generation, and Network Transport
DataFunTalk
DataFunTalk
Jan 19, 2023 · Big Data

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

This article presents a comprehensive overview of Tencent's Alluxio project, covering the evolution of big‑data architecture, recent Alluxio research progress, typical deployment cases, and future work, while highlighting performance improvements, integration with cloud and AI workloads, and community contributions.

AIAlluxioBig Data
0 likes · 21 min read
Tencent Alluxio: Accelerating the Next Generation of Big Data and AI
Architect
Architect
Jan 18, 2023 · Databases

Design and Architecture of Bilibili's High‑Performance KV Storage System

This article presents the background, overall architecture, partitioning strategies, raft‑based replication, binlog support, multi‑active deployment, bulk‑load mechanisms, storage‑engine optimizations, load‑balancing policies, and failure‑detection & recovery techniques of a high‑reliability, high‑throughput key‑value store used at Bilibili.

Distributed SystemsKV storagePartitioning
0 likes · 22 min read
Design and Architecture of Bilibili's High‑Performance KV Storage System
DeWu Technology
DeWu Technology
Jan 16, 2023 · Cloud Native

Nacos Service Registration and Discovery: Principles and Implementation

The article explains Nacos’s open‑source service registry and discovery mechanisms, detailing client auto‑configuration, registration and health‑check workflows, server‑side instance handling, asynchronous copy‑on‑write processing, heartbeat cleanup, and cluster synchronization, while comparing its AP/CP capabilities to Zookeeper and Eureka.

Distributed SystemsMicroservicesNacos
0 likes · 55 min read
Nacos Service Registration and Discovery: Principles and Implementation
MaGe Linux Operations
MaGe Linux Operations
Jan 13, 2023 · Fundamentals

Why ULID Beats UUID: A Deep Dive into Unique, Sortable IDs

This article explains what ULID is, why it often outperforms UUID by combining timestamp and randomness for collision‑free, lexicographically sortable identifiers, details its specification, binary layout, encoding, and shows practical Python usage and common application scenarios.

Distributed SystemsULIDunique identifier
0 likes · 8 min read
Why ULID Beats UUID: A Deep Dive into Unique, Sortable IDs
Architect
Architect
Jan 12, 2023 · Operations

Critical Path Analysis for Latency Optimization in Large Distributed Systems

This article explains common latency analysis techniques, details the principles and implementation of critical path tracing, and demonstrates its practical application in Baidu App's recommendation service to efficiently identify and reduce performance bottlenecks in complex distributed architectures.

Distributed Systemscritical-pathlatency analysis
0 likes · 14 min read
Critical Path Analysis for Latency Optimization in Large Distributed Systems
Programmer DD
Programmer DD
Jan 11, 2023 · Databases

Redis Deep Dive: Pipelines, Pub/Sub, Persistence, Locks & Cluster

This comprehensive guide explores Redis fundamentals and advanced features, covering pipelines for reduced RTT, publish/subscribe messaging, key expiration strategies, transaction behavior, persistence mechanisms (RDB, AOF, hybrid), distributed locking techniques, sentinel high‑availability, and cluster sharding, with practical code examples and diagrams.

Distributed Systemsdatabaseredis
0 likes · 47 min read
Redis Deep Dive: Pipelines, Pub/Sub, Persistence, Locks & Cluster
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Jan 10, 2023 · Backend Development

Key Differences Between RPC and Message Queues (MQ) in Distributed Systems

This article explains the core distinctions between Remote Procedure Call (RPC) and Message Queue (MQ) technologies, covering their architectures, communication patterns, functional features, and performance considerations, and outlines typical use cases such as synchronous calls, decoupling, traffic shaping, and asynchronous processing in distributed systems.

Distributed SystemsMessage QueueMicroservices
0 likes · 6 min read
Key Differences Between RPC and Message Queues (MQ) in Distributed Systems
Architect
Architect
Jan 8, 2023 · Backend Development

Rethinking Microservices: From Hype to Core Architectural Principles

This article critically examines the microservices movement, tracing its historical roots, debunking common hype, and arguing that the true value lies in modular design, clear team ownership, and disciplined architectural practices rather than merely scaling distributed systems.

Distributed SystemsMicroservicesSoftware Architecture
0 likes · 23 min read
Rethinking Microservices: From Hype to Core Architectural Principles
Tencent Cloud Developer
Tencent Cloud Developer
Jan 5, 2023 · Cloud Native

QQ Music High-Availability Architecture Overview

QQ Music achieves high availability by layering redundant multi‑datacenter architecture, proactive chaos‑engineering toolchains, and comprehensive observability—including metrics, logging, tracing and profiling—while employing service grading, adaptive retry windows and EMA‑based dynamic timeouts to gracefully handle faults across its massive micro‑service ecosystem.

Distributed SystemsMicroservicesObservability
0 likes · 24 min read
QQ Music High-Availability Architecture Overview
DataFunTalk
DataFunTalk
Jan 3, 2023 · Big Data

Tencent Unified Big Data Scheduling Platform – Architecture, Design, and Operations

The article presents an in‑depth overview of Tencent's self‑developed Unified Scheduling Platform, detailing its system architecture, design challenges, performance optimizations, resource‑fair scheduling mechanisms, operational metrics, future roadmap, and a Q&A session that together illustrate how the platform enables massive offline data processing at scale.

Big DataDistributed SystemsPerformance Optimization
0 likes · 18 min read
Tencent Unified Big Data Scheduling Platform – Architecture, Design, and Operations
Top Architect
Top Architect
Jan 2, 2023 · Big Data

Optimizing Kafka at Meituan: Challenges and Solutions for a Large‑Scale Data Platform

This article details Meituan's use of Kafka as a unified data cache and distribution layer, outlines the challenges of massive scale and latency, and presents comprehensive optimizations across application, system, and cluster management layers, including disk balancing, migration acceleration, fetcher isolation, and full‑link monitoring.

Big DataDistributed SystemsKafka
0 likes · 22 min read
Optimizing Kafka at Meituan: Challenges and Solutions for a Large‑Scale Data Platform