Tagged articles
232 articles
Page 1 of 3
Machine Heart
Machine Heart
Mar 31, 2026 · Artificial Intelligence

ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)

ProMoE introduces a two‑step routing MoE framework with explicit semantic guidance that tackles the high spatial redundancy and functional heterogeneity of visual tokens, enabling diffusion transformers to scale efficiently and outperform dense models and prior MoE approaches across generation, convergence, and scaling benchmarks.

Diffusion TransformerExplicit RoutingMixture of Experts
0 likes · 9 min read
ProMoE: Explicit Routing Breaks the Scaling Bottleneck of Diffusion‑Transformer MoE (ICLR 2026)
MaGe Linux Operations
MaGe Linux Operations
Mar 30, 2026 · Cloud Native

How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive

This article examines the storage, query performance, high‑availability, and high‑cardinality challenges of running Prometheus on a thousand‑node Kubernetes cluster and presents a complete, step‑by‑step Thanos‑based architecture, capacity‑planning models, configuration examples, and operational best practices for reliable horizontal scaling.

KubernetesObservabilityPrometheus
0 likes · 34 min read
How to Scale Prometheus to Thousands of Nodes with Thanos: A Deep Dive
JD Retail Technology
JD Retail Technology
Mar 25, 2026 · Databases

How JD.com Scaled POP Order Elasticsearch to Handle Billions of Orders

This article analyzes the challenges of JD.com's POP order Elasticsearch storage—including data skew, oversized shards, frequent updates, and high maintenance costs—and details the multi‑layered architectural redesign that introduced tenant isolation, dual‑hash routing, differentiated shard strategies, and a dual‑active physical foundation to achieve high performance, scalability, and availability.

Data PartitioningElasticsearchOrder Management
0 likes · 16 min read
How JD.com Scaled POP Order Elasticsearch to Handle Billions of Orders
Architect Chen
Architect Chen
Mar 24, 2026 · Databases

How High Can Redis Really Scale? Real-World QPS Limits Explained

This article breaks down Redis performance limits, showing that a single node can handle roughly 100‑200k simple GET/SET QPS and up to 500‑700k with multithreaded I/O, while sharded clusters can theoretically reach millions of QPS, though practical factors affect the actual throughput.

ClusterQPSdatabase
0 likes · 6 min read
How High Can Redis Really Scale? Real-World QPS Limits Explained
PMTalk Product Manager Community
PMTalk Product Manager Community
Mar 18, 2026 · Product Management

When Your Team Is All Agents: How Product Management Must Evolve

The article analyses why using instant‑messaging groups to orchestrate multiple AI agents cannot scale to dozens or hundreds of agents, proposes a four‑layer ICSE architecture, compares three agent‑to‑agent communication models, and outlines the new governance, design, and roadmap responsibilities that product managers will need to master.

AI agentsICSE architecturegovernance
0 likes · 14 min read
When Your Team Is All Agents: How Product Management Must Evolve
MaGe Linux Operations
MaGe Linux Operations
Feb 27, 2026 · Artificial Intelligence

How to Deploy Scalable LLM Inference with vLLM on Kubernetes and GPU Scheduling

This guide explains how to deploy vLLM for large‑language‑model serving on Kubernetes, covering GPU resource management, tensor‑parallel configuration, continuous batching, quantization choices, autoscaling with HPA and KEDA, multi‑model routing, and best‑practice recommendations for performance, cost control, and high availability.

GPUKubernetesLLM inference
0 likes · 48 min read
How to Deploy Scalable LLM Inference with vLLM on Kubernetes and GPU Scheduling
Architect Chen
Architect Chen
Feb 13, 2026 · Databases

Boost MySQL Performance: Proven Tuning, Indexing, and Scaling Strategies

This guide presents practical MySQL optimization techniques—including SQL and index refinement, InnoDB and connection parameter tuning, cache layer integration, and architectural scaling with read‑write splitting and sharding—to dramatically increase query throughput and reduce latency.

Index OptimizationInnoDBmysql
0 likes · 6 min read
Boost MySQL Performance: Proven Tuning, Indexing, and Scaling Strategies
ITPUB
ITPUB
Jan 31, 2026 · Databases

How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS

OpenAI’s engineering team expanded a single‑primary PostgreSQL cluster with nearly 50 read‑only replicas, migrated write‑heavy workloads to Azure Cosmos DB, and applied extensive optimizations to reliably serve the global traffic of ChatGPT and the OpenAI API for 800 million users at multi‑million queries per second.

AzurePostgreSQLRead Replicas
0 likes · 24 min read
How OpenAI Scaled PostgreSQL to Support 800 Million Users and Millions of QPS
Radish, Keep Going!
Radish, Keep Going!
Jan 23, 2026 · Databases

How OpenAI Really Scaled PostgreSQL for Hundreds of Millions of Users

The article debunks OpenAI's sensational claim of handling 800 million ChatGPT users with a single PostgreSQL instance, revealing a pragmatic hybrid architecture that combines many read replicas, Azure CosmosDB for write‑heavy workloads, and top‑tier hardware, while highlighting cost and complexity considerations.

Azure CosmosDBDatabase ArchitecturePostgreSQL
0 likes · 6 min read
How OpenAI Really Scaled PostgreSQL for Hundreds of Millions of Users
DevOps Coach
DevOps Coach
Jan 20, 2026 · Cloud Native

How to Scale Kubernetes to Hundreds of Clusters: A Practical Enterprise Guide

This article walks you through the complete journey from a single Kubernetes cluster to a production‑grade, multi‑cluster platform, covering managed services, capacity planning, GitOps pipelines, networking, observability, cost optimisation, upgrade strategies, and the people and processes needed for sustainable large‑scale operations.

Cloud NativeCost ManagementInfrastructure
0 likes · 27 min read
How to Scale Kubernetes to Hundreds of Clusters: A Practical Enterprise Guide
Tencent Cloud Developer
Tencent Cloud Developer
Dec 30, 2025 · Backend Development

Mastering Microservices: Design Principles, Service Modeling, Integration, and Scaling Strategies

This comprehensive guide explains microservice fundamentals, when to adopt them, key design principles, service modeling techniques, integration patterns, versioning, data handling, monolith decomposition, Conway's law, scaling tactics, and the situations where microservices may not be the right choice, providing actionable insights for building resilient backend systems.

BackendIntegrationMicroservices
0 likes · 23 min read
Mastering Microservices: Design Principles, Service Modeling, Integration, and Scaling Strategies
Alibaba Cloud Observability
Alibaba Cloud Observability
Dec 29, 2025 · Cloud Native

How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs

This article examines Dify’s production‑scale bottlenecks caused by heavy PostgreSQL logging, explains why a cloud‑native log service (SLS) better matches the append‑only, high‑throughput nature of workflow logs, and provides a step‑by‑step migration guide that dramatically reduces database pressure, storage cost, and unlocks advanced analytics.

Alibaba Cloud Log ServiceCloud NativeDify
0 likes · 17 min read
How Alibaba Cloud Log Service Supercharges Dify’s Scaling and Cuts DB Costs
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Dec 17, 2025 · Databases

10 Essential Steps to Optimize Your Database for High‑Performance E‑Commerce

This article shares practical, step‑by‑step guidance from a 15‑year e‑commerce veteran on why, when, and how to optimize databases—including segregation, archiving, query tuning, replication lag detection, parameter tweaks, partitioning, ProxySQL, caching, vertical scaling, and monitoring—to achieve faster, more reliable services.

Database Optimizationmysqlperformance tuning
0 likes · 10 min read
10 Essential Steps to Optimize Your Database for High‑Performance E‑Commerce
Ray's Galactic Tech
Ray's Galactic Tech
Dec 16, 2025 · Operations

How to Eliminate Kafka Consumer Lag: 4 Proven Strategies and Advanced Tips

This guide explains why Kafka consumer lag occurs, presents four classic solutions—including horizontal scaling, performance tuning, multi‑group consumption, and offset reset—plus advanced practices like dead‑letter queues, partition design, rebalance mitigation, and monitoring to help engineers quickly diagnose and resolve backlog issues.

Consumer Lagbest-practicesscaling
0 likes · 8 min read
How to Eliminate Kafka Consumer Lag: 4 Proven Strategies and Advanced Tips
AntTech
AntTech
Oct 9, 2025 · Artificial Intelligence

Ling-1T: The Trillion‑Parameter AI Model Redefining Efficient Reasoning

Ling-1T, a trillion‑parameter flagship non‑thinking model, combines 50 billion active parameters per token, 128 K context, Evo‑CoT reasoning, and FP8 mixed‑precision training to achieve state‑of‑the‑art performance on complex reasoning, code generation, and multimodal tasks while outlining its architecture, benchmarks, limitations, and future roadmap.

AIBenchmarkFP8
0 likes · 11 min read
Ling-1T: The Trillion‑Parameter AI Model Redefining Efficient Reasoning
DevOps Coach
DevOps Coach
Oct 5, 2025 · Cloud Native

How Medium Scales Microservices with Kubernetes: Architecture, Tools, and Tuning

Medium explains why it chose Kubernetes for microservice management, describes its multi‑cluster deployment across four availability zones, details configuration tooling with Terraform, and shares scaling optimizations using a cluster over‑provisioner and pod preemption to achieve smoother node utilization.

Cloud NativeCluster OverprovisionerKubernetes
0 likes · 7 min read
How Medium Scales Microservices with Kubernetes: Architecture, Tools, and Tuning
MaGe Linux Operations
MaGe Linux Operations
Aug 19, 2025 · Big Data

Master Kafka High Availability: Replica Sync & Disaster Recovery Strategies

This article provides a comprehensive guide to building enterprise‑grade, highly available Kafka clusters, covering architecture design, hardware planning, production‑level broker configurations, ISR management, monitoring, fault‑tolerance procedures, rolling upgrades, capacity planning, and automation scripts for seamless operations.

KafkaOperationsdisaster-recovery
0 likes · 16 min read
Master Kafka High Availability: Replica Sync & Disaster Recovery Strategies
Data Party THU
Data Party THU
Aug 19, 2025 · Artificial Intelligence

Why RL Fine‑Tuning Fails to Extend LLM Reasoning Limits: Entropy Collapse Explained

This article examines how reinforcement learning fine‑tuning influences large language model reasoning, revealing that RL primarily amplifies pre‑trained capabilities, suffers from entropy collapse, and fails to push the model’s reasoning boundary, supported by extensive experiments on scaling laws, entropy analysis, and mitigation techniques.

LLMRLRLVR
0 likes · 24 min read
Why RL Fine‑Tuning Fails to Extend LLM Reasoning Limits: Entropy Collapse Explained
Ops Community
Ops Community
Jul 24, 2025 · Operations

How a Small E‑commerce Site Scaled to 10 Million Daily Visits: Real‑World Architecture Lessons

This article details a small‑to‑mid‑size e‑commerce platform’s journey from a few thousand daily page views to ten million, covering business challenges, three architecture evolution stages, key technical solutions, performance optimizations, cost‑control strategies, and practical automation tips.

OperationsPerformance Optimizationmonitoring
0 likes · 14 min read
How a Small E‑commerce Site Scaled to 10 Million Daily Visits: Real‑World Architecture Lessons
dbaplus Community
dbaplus Community
Jun 26, 2025 · Operations

How AI Can Transform Kubernetes Operations: 10 Smart Use Cases

This article explores ten practical AI‑driven scenarios for Kubernetes operations—including intelligent monitoring, automated scaling, log analysis, fault repair, resource optimization, CI/CD automation, security checks, knowledge‑base assistance, capacity planning, and an ops assistant—detailing methods, tools, and implementation tips.

AI OpsAutomationKubernetes
0 likes · 12 min read
How AI Can Transform Kubernetes Operations: 10 Smart Use Cases
IT Services Circle
IT Services Circle
Jun 21, 2025 · Backend Development

How Instagram Scaled to 14 Million Users: Inside Its Backend Architecture

This article recounts a 2009 photo‑sharing startup idea, then dives into Instagram’s backend design principles, cloud infrastructure, request flow, data storage, sharding, caching, background jobs, and monitoring, illustrating how disciplined engineering enabled rapid scaling to millions of users.

Backendclouddatabases
0 likes · 9 min read
How Instagram Scaled to 14 Million Users: Inside Its Backend Architecture
macrozheng
macrozheng
Apr 29, 2025 · Backend Development

How to Tame a 100× Traffic Surge: Practical Strategies for Backend Engineers

This guide walks backend developers through a step‑by‑step approach to handle sudden 100‑fold traffic spikes, covering emergency response, traffic analysis, robust system design, scaling techniques, circuit breaking, message queuing, and stress testing to keep services resilient and performant.

Backend PerformanceCircuit Breakingrate limiting
0 likes · 12 min read
How to Tame a 100× Traffic Surge: Practical Strategies for Backend Engineers
IT Services Circle
IT Services Circle
Apr 23, 2025 · Backend Development

Handling Sudden Traffic Spikes in Backend Systems

The article outlines a comprehensive approach for backend engineers to manage a sudden 100‑fold increase in traffic, covering emergency response, traffic analysis, robust system design, rate limiting, circuit breaking, scaling, sharding, pooling, caching, asynchronous processing, and stress testing to ensure system stability and performance.

Circuit BreakingLoad Testingasynchronous processing
0 likes · 13 min read
Handling Sudden Traffic Spikes in Backend Systems
Tencent Cloud Developer
Tencent Cloud Developer
Apr 23, 2025 · Cloud Native

Microservices Architecture: Principles, Modeling, Integration, and Scaling

Microservices are small, autonomous services that replace monolithic codebases by emphasizing loose coupling, high cohesion, bounded contexts, technology-agnostic integration via REST, RPC, or events, disciplined code governance, semantic versioning, local transactions with eventual consistency, and robust scaling patterns such as timeouts, circuit breakers, and auto-scaling, while reflecting organizational structure and avoiding premature complexity.

Distributed Systemsarchitecturescaling
0 likes · 19 min read
Microservices Architecture: Principles, Modeling, Integration, and Scaling
ITPUB
ITPUB
Apr 13, 2025 · Operations

How Cursor Scaled Its AI Code Editor: Lessons from Indexing to Object Storage

Cursor, the AI‑powered code editor, grew to handle billions of document queries and over a hundred‑million model calls daily, prompting a multi‑stage infrastructure overhaul that moved from a failing YugaByte setup to PostgreSQL RDS, then to object‑storage‑backed databases, while tackling indexing, inference scaling, and cold‑start challenges.

AIInferenceInfrastructure
0 likes · 11 min read
How Cursor Scaled Its AI Code Editor: Lessons from Indexing to Object Storage
Raymond Ops
Raymond Ops
Dec 19, 2024 · Operations

How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes

This guide explains how to use cAdvisor‑provided container network traffic counters as custom metrics for Kubernetes HPA, covering metric collection, Prometheus‑adapter configuration, verification, and a complete HPA testing workflow for elastic scaling of non‑CPU‑intensive workloads.

HPAKubernetesPrometheus
0 likes · 7 min read
How to Auto‑Scale Non‑CPU Apps with cAdvisor Network Metrics in Kubernetes
Goodme Frontend Team
Goodme Frontend Team
Nov 18, 2024 · Frontend Development

Add Rotation and Scaling to Video Previews with React and Vime

This article explains how to implement video rotation, fullscreen handling, and proportional scaling in a React application using the Vime library and CSS transforms, covering container setup, control customization, and code examples for a seamless user experience.

CSS transformfrontendrotation
0 likes · 10 min read
Add Rotation and Scaling to Video Previews with React and Vime
NewBeeNLP
NewBeeNLP
Oct 16, 2024 · Artificial Intelligence

Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention

This article reviews recent advances in training and inference for long‑sequence large language models, comparing ALIBI and RoPE position embeddings, exploring RoPE scaling techniques, analyzing attention optimizations, and outlining practical data, evaluation, and system frameworks for scalable LLM deployment.

Flash AttentionLLMRoPE
0 likes · 14 min read
Unlocking Long-Sequence LLMs: Position Embeddings, Scaling, and Efficient Attention
macrozheng
macrozheng
Aug 2, 2024 · Backend Development

How to Quickly Resolve Massive Kafka Message Backlog in Production

This guide explains why Kafka message backlogs occur, how to diagnose bugs, optimize consumer logic, and use temporary topics for emergency scaling, while emphasizing monitoring, alerts, and proper offset handling to keep your streaming system healthy.

BacklogConsumerJava
0 likes · 5 min read
How to Quickly Resolve Massive Kafka Message Backlog in Production
DevOps Cloud Academy
DevOps Cloud Academy
May 31, 2024 · Cloud Native

Optimizing RabbitMQ Performance on Kubernetes

This guide explains how to deploy RabbitMQ on Kubernetes and improve its performance through Helm installation, resource tuning, monitoring, scaling, security hardening, and advanced configuration techniques, providing practical code examples for each step.

KubernetesPerformance OptimizationRabbitMQ
0 likes · 9 min read
Optimizing RabbitMQ Performance on Kubernetes
MaGe Linux Operations
MaGe Linux Operations
May 25, 2024 · Databases

Redis Cluster Mastery: Step‑by‑Step Setup, Scaling, and Management Guide

This tutorial explains how Redis Cluster automatically shards data across multiple nodes, covering required TCP ports, hash‑slot sharding, master‑slave replication, consistency trade‑offs, essential configuration parameters, and step‑by‑step commands for creating, expanding, resharding, and managing a production‑grade Redis cluster.

ClusterConfigurationdatabase
0 likes · 18 min read
Redis Cluster Mastery: Step‑by‑Step Setup, Scaling, and Management Guide
JavaEdge
JavaEdge
May 18, 2024 · Cloud Native

Why We Abandoned Microservices: Lessons from Scaling a High‑Throughput Event Pipeline

The article recounts how a fast‑growing event‑processing platform initially embraced microservices, then faced queue bottlenecks, test‑suite overload, and operational complexity, leading the team to consolidate over 140 services into a single, shared‑queue architecture, and shares the practical insights and trade‑offs learned from this transition.

MicroservicesService Architecturebackend design
0 likes · 12 min read
Why We Abandoned Microservices: Lessons from Scaling a High‑Throughput Event Pipeline
DevOps Cloud Academy
DevOps Cloud Academy
May 6, 2024 · Cloud Native

How to Deploy a Highly Available Application on Kubernetes

This article explains key Kubernetes configurations—such as pod replicas, pod anti‑affinity, deployment strategies, graceful termination, probes, resource allocation, scaling, and disruption budgets—to achieve high availability and zero‑downtime deployments for containerized applications in production.

Cloud NativeKubernetesProbes
0 likes · 20 min read
How to Deploy a Highly Available Application on Kubernetes
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Apr 25, 2024 · Databases

Redis Cluster: Architecture, Setup, Testing, and High Availability

This article explains Redis Cluster's sharding architecture, demonstrates how to configure multiple Redis nodes on different ports, shows commands for creating and testing the cluster, and illustrates failover behavior, highlighting its scalability and high‑availability advantages over Sentinel mode for large‑scale data workloads.

Clusterdatabaseredis
0 likes · 11 min read
Redis Cluster: Architecture, Setup, Testing, and High Availability
ITPUB
ITPUB
Mar 27, 2024 · Backend Development

How Instagram Scaled to 14 Million Users with Just Three Engineers

This article details how Instagram grew from zero to 14 million users in just over a year using three engineers by applying three core principles and a reliable AWS‑based tech stack covering frontend, load balancing, backend, PostgreSQL sharding, S3 storage, Redis caching, asynchronous task queues, and comprehensive monitoring.

AWSBackendPostgreSQL
0 likes · 9 min read
How Instagram Scaled to 14 Million Users with Just Three Engineers
Ops Development Stories
Ops Development Stories
Mar 18, 2024 · Cloud Native

13 Essential Kubernetes Tips to Boost Scalability, Security, and Management

Discover 13 practical Kubernetes techniques—including PreStop hooks, automatic secret rotation, ephemeral containers, custom metric autoscaling, init containers, node affinity, taints and tolerations, pod priority, ConfigMaps, debugging tools, resource requests, CRDs, and API automation—to enhance application reliability, scalability, and security in cloud‑native environments.

KubernetesSecuritypod management
0 likes · 21 min read
13 Essential Kubernetes Tips to Boost Scalability, Security, and Management
DevOps Operations Practice
DevOps Operations Practice
Mar 14, 2024 · Operations

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

This article analyzes why a single Prometheus instance repeatedly runs out of memory and crashes, explains the underlying storage mechanisms, and presents practical solutions such as metric reduction, retention tuning, federation architecture, and remote storage integration to improve stability and scalability.

FederationPrometheusmonitoring
0 likes · 6 min read
Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions
Qunar Tech Salon
Qunar Tech Salon
Feb 20, 2024 · Databases

Qunar.com Redis Automation Operations System: Architecture, Deployment, Migration, Scaling, and Inspection

This article details Qunar.com's Redis automation operations system, covering background challenges, the high‑availability cluster architecture, resource management, automated deployment, various migration strategies, scaling mechanisms with RedisGate, inspection processes, and future AI‑driven enhancements.

AIAutomationDatabase operations
0 likes · 14 min read
Qunar.com Redis Automation Operations System: Architecture, Deployment, Migration, Scaling, and Inspection
ITPUB
ITPUB
Feb 13, 2024 · Databases

Achieve Seamless Second‑Level Database Scaling for High‑Throughput Microservices

This guide explains how to design a high‑concurrency, high‑throughput internet architecture that ensures database high availability with double‑master sync and virtual IPs, and how to horizontally shard and smoothly expand the cluster in seconds using configuration changes, reloads, and cleanup steps.

Microservicesdatabaseshigh availability
0 likes · 8 min read
Achieve Seamless Second‑Level Database Scaling for High‑Throughput Microservices
Programmer DD
Programmer DD
Dec 14, 2023 · Databases

How GitHub Upgraded Its 1200‑Node MySQL Cluster to 8.0 Without Downtime

GitHub detailed its year‑long, multi‑team effort to seamlessly upgrade over 1,200 MySQL servers—supporting more than 300 TB of data and 5.5 million queries per second—from 5.7 to 8.0, outlining the infrastructure, tools, and step‑by‑step migration strategy used to maintain service reliability.

GitHubdatabase migrationmysql
0 likes · 5 min read
How GitHub Upgraded Its 1200‑Node MySQL Cluster to 8.0 Without Downtime
Selected Java Interview Questions
Selected Java Interview Questions
Nov 26, 2023 · Databases

Understanding and Solving Hot Key Issues in Redis

Hot keys in Redis—high‑frequency accessed keys—can overload the cache and downstream databases, causing crashes; this article explains what hot keys are, why they arise, their risks, how to detect them, and practical mitigation strategies such as scaling clusters, using secondary caches, monitoring commands, and traffic analysis.

CacheDatabase PerformanceHot Key
0 likes · 6 min read
Understanding and Solving Hot Key Issues in Redis
Su San Talks Tech
Su San Talks Tech
Nov 20, 2023 · Databases

Mastering Redis Cluster: Architecture, Sharding, and Scaling Explained

This article explains Redis Cluster’s decentralized architecture, slot‑based sharding, node communication, data migration, and client redirection mechanisms, showing how to scale Redis horizontally while maintaining high availability and fault‑tolerance for large‑scale applications.

Clusterscaling
0 likes · 13 min read
Mastering Redis Cluster: Architecture, Sharding, and Scaling Explained
Laravel Tech Community
Laravel Tech Community
Oct 26, 2023 · Cloud Native

How Kuaishou Scales Live E‑commerce Flash Sales with an Elastic Container Cloud and Hybrid Cloud Architecture

To handle billions of daily users and massive flash‑sale spikes in its live‑ecommerce streams, Kuaishou built a large‑scale elastic container cloud, integrated with Alibaba Cloud in a hybrid‑cloud setup, employing load balancing, caching, message queues, rate‑limiting, and intelligent resource scheduling to achieve million‑request‑per‑second throughput and high availability.

KuaishouLive E‑commerceelastic container cloud
0 likes · 8 min read
How Kuaishou Scales Live E‑commerce Flash Sales with an Elastic Container Cloud and Hybrid Cloud Architecture
dbaplus Community
dbaplus Community
Oct 25, 2023 · Databases

ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance

This article compares ByConity and ClickHouse from a usage perspective, detailing their architectural differences, core components, basic operations such as table creation, data import and query, distributed transaction support, special table engines, scaling strategies, and deployment requirements.

ByConityClickHouseDistributed Transactions
0 likes · 26 min read
ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 21, 2023 · Operations

Scaling DevOps in Large Organizations: Normalization, Standardization, and Platformization

The article outlines how organizations over a hundred engineers must go beyond merely copying DevOps practices by adopting three progressive steps—normalization, standardization, and platformization—to achieve measurable, scalable efficiency, and concludes with a promotional notice for a Python‑based continuous deployment training course.

OperationsPlatformizationSoftware Engineering
0 likes · 8 min read
Scaling DevOps in Large Organizations: Normalization, Standardization, and Platformization
Senior Tony
Senior Tony
Sep 12, 2023 · Backend Development

What Really Powers High‑Concurrency Systems? Practical Solutions Explained

This article breaks down real‑world high‑concurrency strategies—horizontal scaling, caching, Elasticsearch, sharding, message‑queue smoothing, and cellization—explaining when each applies, their trade‑offs, and practical tips for building scalable, reliable backend services.

BackendMessage QueueSystem Design
0 likes · 9 min read
What Really Powers High‑Concurrency Systems? Practical Solutions Explained
21CTO
21CTO
Aug 26, 2023 · R&D Management

From Developer to CTO: Building Tech Ops and Scalable Teams

This guide shares a developer’s personal journey to becoming a CTO, covering the shift from coding to technical operations, building scalable team structures, adopting agile practices, and managing growth in a software startup.

CTOR&D managementTeam Structure
0 likes · 21 min read
From Developer to CTO: Building Tech Ops and Scalable Teams
Code Ape Tech Column
Code Ape Tech Column
Aug 15, 2023 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions

This article details the design and implementation of a highly available, high‑performance membership system serving over a billion users, covering dual‑center Elasticsearch clusters, traffic‑isolated three‑cluster ES architecture, Redis dual‑center caching, MySQL partitioned clusters, migration strategies, and refined flow‑control and degradation mechanisms.

Distributed SystemsElasticsearchhigh availability
0 likes · 20 min read
High‑Availability Architecture for a Billion‑Scale Membership System: Dual‑Center ES, Redis, and MySQL Solutions
Architect
Architect
Aug 10, 2023 · Operations

Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

The article explains how capacity management balances cost control and service quality through defined goals, three development stages, detailed resource optimization methods, stress‑testing metrics and standards, and automated scaling to achieve significant cost reductions while maintaining system stability.

OperationsPerformance TestingResource Optimization
0 likes · 10 min read
Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices
Meituan Technology Team
Meituan Technology Team
Aug 3, 2023 · Frontend Development

Rome: Enhancing Front‑end Development Collaboration and Efficiency at Meituan

The article details Meituan’s Rome front‑end framework, covering its business and technical background, the engineering ecosystem and evolution path, large‑scale upgrades, IDE‑based development assistance, efficiency and quality improvements, metric collection, real‑world adoption across 1,400+ projects, and future trends such as deeper dev‑chain integration and AI‑assisted coding.

Build OptimizationFrameworkIDE
0 likes · 29 min read
Rome: Enhancing Front‑end Development Collaboration and Efficiency at Meituan
Top Architect
Top Architect
Jul 6, 2023 · Databases

Understanding HikariCP Connection Pool Sizing: Principles, Experiments, and Practical Guidelines

This article translates and expands on HikariCP's pool‑sizing guidance, explaining why smaller database connection pools often yield better performance, presenting real‑world benchmark data for various pool sizes, and offering a simple formula to calculate an optimal pool size based on CPU cores and effective disks.

Connection PoolHikariCPPostgreSQL
0 likes · 10 min read
Understanding HikariCP Connection Pool Sizing: Principles, Experiments, and Practical Guidelines
Architecture & Thinking
Architecture & Thinking
Jun 9, 2023 · Backend Development

Why Do Message Queues Get Backlogged and How to Fix It Fast?

This article examines why message queues become backlogged—covering producer overload, broker persistence failures, and consumer bottlenecks—and outlines a step‑by‑step scaling and remediation strategy to restore smooth processing, including temporary queue expansion, load‑balanced forwarding, and post‑recovery cleanup.

BacklogOperationsscaling
0 likes · 6 min read
Why Do Message Queues Get Backlogged and How to Fix It Fast?
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Apr 11, 2023 · Cloud Native

Master Kubernetes Basics: Deploy, Scale, and Update Apps with Simple Commands

This article introduces Kubernetes as an open‑source container orchestration platform, explains its core objects like Pods, Services, ReplicaSets, and Deployments, clarifies its relationship with Docker, and provides a step‑by‑step example covering deployment, exposure, scaling, rolling updates, and rollback using kubectl commands.

DeploymentDevOpsKubernetes
0 likes · 5 min read
Master Kubernetes Basics: Deploy, Scale, and Update Apps with Simple Commands
21CTO
21CTO
Feb 10, 2023 · Cloud Native

Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough

This article introduces Kubernetes fundamentals through a series of questions and answers, covering its architecture, node communication, pod scheduling, data storage, external access, scaling mechanisms, and component coordination, all illustrated with clear diagrams.

Cluster ManagementContainersKubernetes
0 likes · 9 min read
Why Kubernetes Is So Hard to Master: A Beginner’s Q&A Walkthrough
Architects Research Society
Architects Research Society
Feb 9, 2023 · Fundamentals

Agile Architecture Strategies for Scaling Agile Development

This article explains how architecture remains a vital part of agile software development, covering agile‑first approaches, lifecycle‑wide modeling, ownership roles, scaling strategies, demand‑driven design, multi‑view modeling, and practical tips for communicating and evolving architecture without over‑building.

Modelingagilescaling
0 likes · 40 min read
Agile Architecture Strategies for Scaling Agile Development
Zhuanzhuan Tech
Zhuanzhuan Tech
Feb 8, 2023 · Operations

Capacity Management: Goals, Practices, and Optimization at ZuanZuan

This article outlines ZuanZuan’s capacity management approach, covering its objectives, development stages, water‑level metrics, resource optimization techniques, cluster capacity assessment, stress‑test indicators and standards, as well as scaling strategies, demonstrating how systematic capacity management reduces costs while ensuring service stability.

Cost OptimizationPerformance MonitoringResource Optimization
0 likes · 12 min read
Capacity Management: Goals, Practices, and Optimization at ZuanZuan
Tencent Tech
Tencent Tech
Jan 16, 2023 · Operations

How a Mini-Game Scaled to 100M DAU: Architecture, Ops, and Security Lessons

This article examines how the viral mini‑game "Sheep..." overcame its initial 5,000‑QPS bottleneck and scaled to over 100 million daily active users by redesigning its architecture, implementing cloud‑native auto‑scaling, enhancing operational monitoring with CLS, and fortifying security with WAF.

Securitycloud-nativegame-development
0 likes · 11 min read
How a Mini-Game Scaled to 100M DAU: Architecture, Ops, and Security Lessons
MaGe Linux Operations
MaGe Linux Operations
Jan 10, 2023 · Cloud Native

When Microservices Backfire: Lessons from Scaling a Data Service Platform

This case study examines S Company's transition to a microservice architecture for its data‑service platform, highlighting initial gains in visibility and deployment cost, the subsequent explosion of complexity, and the eventual rollback to a monolith with insights on trade‑offs, scaling, and operational overhead.

Operational Challengesarchitecturemonolith migration
0 likes · 12 min read
When Microservices Backfire: Lessons from Scaling a Data Service Platform
MaGe Linux Operations
MaGe Linux Operations
Dec 21, 2022 · Operations

Mastering Elasticsearch Nodes: Types, Roles, and Scaling Strategies

This guide explains the different Elasticsearch node types, their default roles, how to configure master‑eligible, data, ingest, and coordinating‑only nodes, and provides best‑practice recommendations for planning and scaling large clusters to ensure stability and performance.

Cluster ConfigurationCoordinating NodeData Node
0 likes · 12 min read
Mastering Elasticsearch Nodes: Types, Roles, and Scaling Strategies
Top Architect
Top Architect
Dec 16, 2022 · Databases

Comprehensive Guide to Database Horizontal Scaling, Sharding, and High Availability with MariaDB and Keepalived

This article presents a detailed analysis and step‑by‑step implementation of horizontal database scaling, including sharding strategies, shutdown and stop‑write plans, log‑based migration, dual‑write approaches, and a smooth 2N expansion method, while also covering MariaDB master‑master configuration, dynamic data source addition, and Keepalived high‑availability setup.

MariaDBhigh-availabilityscaling
0 likes · 37 min read
Comprehensive Guide to Database Horizontal Scaling, Sharding, and High Availability with MariaDB and Keepalived
vivo Internet Technology
vivo Internet Technology
Nov 16, 2022 · Operations

Understanding and Mitigating Bigkey Issues in Redis Operations

Bigkeys—Redis values over 1 MB or structures with more than 2,000 elements—cause memory imbalance, command blocking, network overload, and migration failures, so DBAs must detect them using built‑in commands or RDB analysis, split or partition oversized keys, and tune migration settings to preserve performance and availability.

BigKeyDatabase operationsperformance
0 likes · 14 min read
Understanding and Mitigating Bigkey Issues in Redis Operations
dbaplus Community
dbaplus Community
Sep 24, 2022 · Backend Development

Beyond Adding Servers: Mastering the AKF Scale Cube for Efficient Microservice Scaling

When service load spikes, instead of merely adding machines, this article explains how the AKF Scale Cube model—covering X‑axis horizontal scaling, Y‑axis functional or business splitting, and Z‑axis data partitioning—offers elegant, fine‑grained strategies to boost microservice performance and reliability.

AKF Scale CubeData PartitioningMicroservices
0 likes · 10 min read
Beyond Adding Servers: Mastering the AKF Scale Cube for Efficient Microservice Scaling
DevOps
DevOps
Sep 12, 2022 · Cloud Native

How Slack Designs, Operates, and Scales Its Remote Development Environments

The article explains Slack's cloud‑native development environment—a full, isolated copy of the Slack system running on AWS EC2—detailing why remote environments are used, how they are managed with custom tooling, and how dynamic provisioning enables massive scaling while controlling costs.

Cloud NativeDevelopment EnvironmentRemote Development
0 likes · 9 min read
How Slack Designs, Operates, and Scales Its Remote Development Environments
DevOps
DevOps
Sep 2, 2022 · Operations

Seven Lessons Learned When Growing Your Configuration Management

Scaling a configuration management team from a small startup to a large enterprise reveals seven key lessons about managing tool costs, customization control, infrastructure scalability, build environment governance, early adoption of third‑party solutions, ensuring traceability with many developers, and continuously evaluating tool costs versus market alternatives.

Build AutomationConfiguration ManagementDevOps
0 likes · 10 min read
Seven Lessons Learned When Growing Your Configuration Management
Zhuanzhuan Tech
Zhuanzhuan Tech
Jul 20, 2022 · Backend Development

Design and Evolution of the Price‑Increase Coupon Service for a C2B Recycling Platform

This article details the design, evolution, and scaling strategies of a price‑increase coupon system for a C2B digital product recycling platform, covering its initial experimental phase, platformization, sharding‑JDBC implementation, intelligent coupon recommendation, Elasticsearch integration, and operational optimizations for high‑throughput stability.

BackendCouponMicroservices
0 likes · 11 min read
Design and Evolution of the Price‑Increase Coupon Service for a C2B Recycling Platform
DevOps
DevOps
Jul 18, 2022 · R&D Management

Practical Strategies for Scaling Lean‑Agile Transformation in Large Development Teams

The article examines the challenges of moving large, multi‑team software organizations from waterfall to lean‑agile practices, offering concrete tactics for product planning, cross‑team coordination, integration, testing, and release, and concludes with a note on an upcoming DevOps hackathon.

scalingteam collaboration
0 likes · 10 min read
Practical Strategies for Scaling Lean‑Agile Transformation in Large Development Teams
Cloud Native Technology Community
Cloud Native Technology Community
Jul 12, 2022 · Cloud Native

How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey

This article presents a comprehensive, data‑driven case study of how Tencent’s internal Kubernetes/TKE platform reduced monthly CPU usage by up to 70% and memory usage by 50% through systematic cost data collection, VPA/HPA enhancements, custom scheduling, node‑level over‑commit, and safe node decommissioning, while maintaining zero‑incident reliability.

Cloud NativeCost OptimizationHPA
0 likes · 28 min read
How Tencent Cut Kubernetes CPU Costs by 70%: A Full‑Scale Cloud‑Native Optimization Journey
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
May 20, 2022 · Backend Development

Securing Public‑Facing Kafka: Authentication, Configuration, and Scaling Strategies

This article shares G7 Tech’s practical experience of exposing Kafka to the public internet, covering encryption, AAA, three authentication schemes, listener configuration, scaling for massive topics with Kubernetes, storage optimization, and integration with the gmq management platform and Kafka‑REST.

AuthenticationKafkaKubernetes
0 likes · 10 min read
Securing Public‑Facing Kafka: Authentication, Configuration, and Scaling Strategies
Architecture Digest
Architecture Digest
May 19, 2022 · Operations

Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring

The article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, appropriate load‑balancing algorithms, monitoring, and automated recovery, and also discusses high‑concurrency identification, CDN/OSS usage, and practical recommendations for cloud‑native environments.

Vertical Scalinghigh availabilityhorizontal scaling
0 likes · 11 min read
Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring
Cloud Native Technology Community
Cloud Native Technology Community
May 10, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,100 nodes and 200,000 Pods, describing cluster topology, workload generation, API server bottlenecks, controller manager and scheduler tuning, extensive etcd optimizations, and the resulting performance gains that met Kubernetes SLOs.

Cloud NativeKubernetesPayPal
0 likes · 13 min read
How PayPal Scaled Kubernetes to 4,100 Nodes and 200k Pods
HomeTech
HomeTech
Apr 27, 2022 · Big Data

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

This article details Car Home's AutoStream platform evolution from Storm to Flink‑based versions, covering real‑time application scenarios, strict budget‑controlled resource management, automatic scaling, lake‑house architecture with Iceberg, PyFlink integration, and future plans for resource optimisation and batch‑stream unification.

AutoStreamFlinkLakehouse
0 likes · 15 min read
AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices
IT Architects Alliance
IT Architects Alliance
Apr 27, 2022 · Operations

High‑Availability Architecture for a Billion‑Scale Membership System: ES Dual‑Center, Redis Caching, MySQL Migration, and Flow‑Control Strategies

This article details how a membership system serving billions of users achieves high performance and high availability through a dual‑center Elasticsearch cluster, traffic‑isolated ES clusters, Redis cache with distributed locks, MySQL dual‑center partitioning, and fine‑grained flow‑control and degradation mechanisms, all while ensuring zero‑downtime migrations and consistent data.

Flow Controldistributed-systemshigh-availability
0 likes · 20 min read
High‑Availability Architecture for a Billion‑Scale Membership System: ES Dual‑Center, Redis Caching, MySQL Migration, and Flow‑Control Strategies
Top Architect
Top Architect
Apr 3, 2022 · Databases

Designing Data Architecture for Microservices: Database Choices, Decoupling, and Scaling

This article explains how to design data architecture for microservice systems, covering the advantages of microservices, decoupling principles, lightweight APIs, DevOps integration, database per service versus shared databases, polyglot persistence, and why MongoDB is a suitable choice for scalable, dynamic, and sharded data storage.

Database designMongoDBarchitecture
0 likes · 17 min read
Designing Data Architecture for Microservices: Database Choices, Decoupling, and Scaling
Open Source Linux
Open Source Linux
Mar 17, 2022 · Cloud Native

How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods

PayPal’s engineering team detailed their journey of scaling Kubernetes from a few hundred nodes to over 4,000 nodes and 200,000 pods, describing the cluster topology, workload generation, bottlenecks in the API server, controller manager, scheduler, and etcd, and the optimizations that enabled stable performance at massive scale.

Cloud NativeKubernetesPayPal
0 likes · 12 min read
How PayPal Scaled Kubernetes to 4,000 Nodes and 200,000 Pods
21CTO
21CTO
Mar 13, 2022 · Backend Development

How Meituan Built a Fault‑Tolerant Instant Logistics Platform at Scale

Meituan’s instant logistics platform evolved from vertical services to a micro‑service, distributed architecture that handles massive order‑rider matching, ultra‑low latency, and high availability, leveraging AI for pricing, ETA, scheduling, and employing robust scaling, consistency, and disaster‑recovery techniques.

AIDistributed SystemsLogistics
0 likes · 10 min read
How Meituan Built a Fault‑Tolerant Instant Logistics Platform at Scale
Architecture Digest
Architecture Digest
Jan 13, 2022 · Backend Development

Scaling RabbitMQ to Million‑Message Throughput: Architecture, Sharding, Federation, and High‑Availability Practices

This article explains how to horizontally scale RabbitMQ clusters to handle millions of messages per second by leveraging cluster modes, mirror queues, sharding plugins, consistent‑hash exchanges, federation, and high‑availability configurations, while also covering practical scenarios such as retries, delayed tasks, and Spring AMQP integration.

FederationMessage QueueRabbitMQ
0 likes · 22 min read
Scaling RabbitMQ to Million‑Message Throughput: Architecture, Sharding, Federation, and High‑Availability Practices
21CTO
21CTO
Dec 24, 2021 · Operations

Why Xi'an’s One‑Code Pass Crashed: Analyzing System Overload and Scaling Fixes

On December 20 the Xi'an health‑code app "One‑Code Pass" suffered a massive outage as a sudden traffic surge overwhelmed its query‑heavy backend, exposing network bottlenecks and a lack of scaling mechanisms, prompting a detailed technical analysis and proposed architectural remedies.

rate limitingscalingsystem overload
0 likes · 9 min read
Why Xi'an’s One‑Code Pass Crashed: Analyzing System Overload and Scaling Fixes
Top Architect
Top Architect
Dec 22, 2021 · Operations

Load Balancing: Principles, Types, and Algorithms

This article explains the fundamentals of load balancing, covering its purpose, vertical and horizontal scaling, various classifications such as DNS, IP, link‑layer and hybrid methods, common algorithms like round‑robin and weighted, as well as hardware solutions, providing a comprehensive guide for building scalable, high‑availability systems.

AlgorithmsDistributed Systemshigh availability
0 likes · 13 min read
Load Balancing: Principles, Types, and Algorithms
Alibaba Cloud Native
Alibaba Cloud Native
Dec 6, 2021 · Cloud Native

How Alibaba Cloud’s ECS‑Based FaaS Achieves High‑Density, Low‑Latency Serverless Scaling

This article explains the design of an ECS‑based Function‑as‑a‑Service platform, covering multi‑tenant deployment, rapid horizontal scaling, resource‑utilization optimization, avalanche‑prevention strategies, and high‑density deployment techniques that together enable fast, cost‑effective cloud‑native serverless workloads.

Cloud NativeECSServerless
0 likes · 12 min read
How Alibaba Cloud’s ECS‑Based FaaS Achieves High‑Density, Low‑Latency Serverless Scaling