Tagged articles
2179 articles
Page 22 of 22
High Availability Architecture
High Availability Architecture
Jul 15, 2016 · Backend Development

High‑Availability Architecture for Weibo Paid Reading and Tipping Services

The article describes the high‑availability, high‑concurrency backend architecture of Weibo's paid reading and tipping platform, covering layered design, database sharding, asynchronous processing, monitoring, idempotency, distributed transaction handling, and security measures for a large‑scale internet‑finance system.

BackendIdempotencySecurity
0 likes · 8 min read
High‑Availability Architecture for Weibo Paid Reading and Tipping Services

Designing a Business‑Oriented High Availability Architecture for a Game Access System

The article presents a business‑centric high‑availability solution for a large‑scale game access platform, detailing measurable goals, a three‑dimensional architecture that includes client‑side retry, HTTP‑DNS, functional separation, multi‑region active‑active deployment, and automated, visual monitoring to achieve rapid problem detection, recovery, and minimal outage frequency.

Distributed Systemsbusiness continuityfault tolerance
0 likes · 23 min read
Designing a Business‑Oriented High Availability Architecture for a Game Access System
DevOps
DevOps
Jun 20, 2016 · Operations

A Comprehensive Overview of Popular DevOps Tools for IT Operations

This article provides a detailed overview of widely used DevOps tools—including monitoring solutions like Microsoft SCOM, Vistara, SolarWinds, Nimsoft, ServiceNow, automation platforms Chef and Puppet, container platform Docker, orchestration systems Apache Mesos and Kubernetes, as well as performance monitoring tools New Relic and Graphite/Grafana—highlighting their features, typical use cases, and important considerations.

AutomationDevOpsIT Operations
0 likes · 10 min read
A Comprehensive Overview of Popular DevOps Tools for IT Operations
MaGe Linux Operations
MaGe Linux Operations
Jun 15, 2016 · Operations

Build Python Scripts for Real-Time Linux Server Monitoring

This article explains how to create Python scripts that monitor Linux server CPU, load, memory, and network usage by reading data from the /proc virtual filesystem, providing step‑by‑step code examples and illustrating each script’s output with screenshots.

LinuxPythonmonitoring
0 likes · 11 min read
Build Python Scripts for Real-Time Linux Server Monitoring
Java High-Performance Architecture
Java High-Performance Architecture
Jun 14, 2016 · Backend Development

How Hotjar Scaled to 500M Daily Requests: 8 Lessons for Rapid Backend Growth

This article chronicles Hotjar's evolution from a simple two‑server setup to a robust, eight‑server architecture handling billions of daily requests, sharing eight practical lessons on scaling, CDN usage, language choice, data storage, monitoring, and cost‑effective optimizations for fast‑growing web services.

CDNPerformance Optimizationarchitecture
0 likes · 7 min read
How Hotjar Scaled to 500M Daily Requests: 8 Lessons for Rapid Backend Growth
Liulishuo Tech Team
Liulishuo Tech Team
May 27, 2016 · Mobile Development

Evolution of the Android Architecture of the English Fluency App

This article details the step‑by‑step evolution of the English Fluency Android app’s architecture, covering its early broadcast‑based design, the adoption of a plugin‑based modular core, multi‑process integration, auxiliary systems such as asynchronous loading, event bus, monitoring, and support components for file storage, DNS protection, image loading, and downloading.

AndroidMobile Developmentarchitecture
0 likes · 13 min read
Evolution of the Android Architecture of the English Fluency App
MaGe Linux Operations
MaGe Linux Operations
May 23, 2016 · Operations

Top 20 Linux Monitoring Tools Every Sysadmin Should Know

This guide surveys more than twenty essential Linux monitoring utilities—covering system, network, log, and infrastructure tools such as top, htop, ntopng, Nagios, and Zabbix—to help administrators efficiently diagnose performance issues and maintain reliable services.

Linuxmonitoringperformance
0 likes · 9 min read
Top 20 Linux Monitoring Tools Every Sysadmin Should Know
Architecture Digest
Architecture Digest
May 22, 2016 · Big Data

Design and Architecture of Youzan Unified Log Platform

The article details the design, components, and operational challenges of Youzan's unified log platform, describing its multi‑layer architecture, ingestion methods using rsyslog/logstash and Flume‑NG, Kafka‑based log center, processing pipelines with Storm/Spark, and storage in HDFS and Elasticsearch.

Distributed SystemsFlumeKafka
0 likes · 10 min read
Design and Architecture of Youzan Unified Log Platform
Efficient Ops
Efficient Ops
May 16, 2016 · Cloud Native

How JD Scaled to 100,000 Docker Containers: Lessons in Cloud‑Native Operations

This article details JD.com's journey from physical servers to a massive Docker‑based cloud‑native platform, covering challenges, architecture, elastic scheduling, monitoring, and resource‑driven operations that support tens of thousands of containers across multiple data centers.

DockerResource Managementelastic scheduling
0 likes · 26 min read
How JD Scaled to 100,000 Docker Containers: Lessons in Cloud‑Native Operations
21CTO
21CTO
May 14, 2016 · Backend Development

How We Scaled a Billion‑User System: From Monolith to Microservices

This article recounts how a rapidly growing online platform transformed a tightly coupled, fragile architecture into a scalable, high‑availability system by applying dynamic/static separation, read‑write splitting, caching, load‑balancing, intelligent monitoring, and finally migrating to a micro‑service architecture.

Backend ArchitectureCloud NativeMicroservices
0 likes · 11 min read
How We Scaled a Billion‑User System: From Monolith to Microservices
MaGe Linux Operations
MaGe Linux Operations
May 10, 2016 · Operations

10 Essential Practices to Prevent Operational Failures in Database Management

This article outlines ten practical guidelines for operations engineers—ranging from mandatory rollback testing and cautious handling of destructive commands to robust backup verification, vigilant monitoring, and disciplined handover procedures—to dramatically reduce system outages and improve overall reliability.

AutomationBackupOperations
0 likes · 18 min read
10 Essential Practices to Prevent Operational Failures in Database Management
Efficient Ops
Efficient Ops
May 7, 2016 · Operations

400+ Free DevOps Tools & Resources Every Sysadmin Should Know

This article compiles a curated list of over 400 free DevOps and system administration resources—including CI/CD services, monitoring tools, crash handling platforms, IaaS, PaaS, and DBaaS solutions—to help engineers streamline workflows and improve operational efficiency.

DevOpsIaaSPaaS
0 likes · 7 min read
400+ Free DevOps Tools & Resources Every Sysadmin Should Know
Baidu Intelligent Testing
Baidu Intelligent Testing
Apr 28, 2016 · Operations

Testing and Evaluation Practices for Baidu Doctor Platform

This article details Baidu Doctor’s comprehensive testing and monitoring strategies, covering user experience data analysis, source data trust, online monitoring systems, log‑based automated checks, retrieval backend testing, evaluation metrics, Badcase mining, and user search habit analysis to ensure high‑quality medical O2O services.

User experiencedata analysismedical platform
0 likes · 14 min read
Testing and Evaluation Practices for Baidu Doctor Platform
Architecture Digest
Architecture Digest
Apr 21, 2016 · Backend Development

Evolution and Refactoring of Autohome Mobile Backend Architecture

The article chronicles Autohome's mobile backend transformation from a monolithic ALL‑IN‑ONE design to a modular, high‑availability microservice architecture, detailing the challenges of traffic surge, resource coupling, and rapid releases, and describing the adopted solutions such as service decomposition, stateless design, Java migration, RPC framework, asynchronous components, and comprehensive monitoring and tracing.

MicroservicesMobileScalability
0 likes · 11 min read
Evolution and Refactoring of Autohome Mobile Backend Architecture
Big Data and Microservices
Big Data and Microservices
Apr 18, 2016 · Operations

Designing a Unified IT Operations Monitoring Indicator System for Banks

The article presents a comprehensive, business‑oriented IT operations monitoring framework for banks, detailing its lifecycle relevance, regulatory drivers, hierarchical AHP‑based design, indicator categories, weighting methods, SMART evaluation, and practical implementation steps to enhance risk control and service quality.

AHPIT OperationsITIL
0 likes · 12 min read
Designing a Unified IT Operations Monitoring Indicator System for Banks
Big Data and Microservices
Big Data and Microservices
Apr 1, 2016 · Operations

How to Build a Business‑Transaction‑Centric IT Operations Monitoring System

This article outlines a comprehensive approach for designing an IT operations monitoring platform that focuses on real‑time business transaction metrics, automatic topology discovery, event‑transaction correlation, deep component diagnostics, and unified data processing to improve availability, performance, and fault‑resolution speed in large‑scale data centers.

AutomationBusiness TransactionFault Diagnosis
0 likes · 15 min read
How to Build a Business‑Transaction‑Centric IT Operations Monitoring System
Efficient Ops
Efficient Ops
Mar 31, 2016 · Operations

Rethinking CMDB: Building Scalable, Automated Configuration Management for Modern Ops

This talk explores the challenges of building and maintaining a CMDB, proposes a goal‑driven, industry‑referenced modeling approach, and outlines practical steps such as tagging, relationship mapping, dynamic attributes, automation, and visualization to create a service‑oriented, scalable configuration management database.

CMDBModelingmonitoring
0 likes · 11 min read
Rethinking CMDB: Building Scalable, Automated Configuration Management for Modern Ops
21CTO
21CTO
Mar 22, 2016 · Operations

Build a Scalable Unified Monitoring & Alert Platform with Ganglia & Centreon

This article explains how to design and implement a unified operations monitoring and alerting platform by combining Ganglia for data collection with Centreon for alerting, covering architecture layers, module functions, integration steps, and practical Q&A for large‑scale deployments.

AlertingAutomationCentreon
0 likes · 20 min read
Build a Scalable Unified Monitoring & Alert Platform with Ganglia & Centreon
Big Data and Microservices
Big Data and Microservices
Mar 19, 2016 · Operations

Essential Linux Commands for Comprehensive System Inspection

This guide compiles essential Linux commands for inspecting system details, resources, disks, networks, processes, users, services, and installed programs, providing concise descriptions that help administrators quickly gather kernel, hardware, memory, storage, and runtime information.

LinuxShellSystem Administration
0 likes · 6 min read
Essential Linux Commands for Comprehensive System Inspection
21CTO
21CTO
Mar 17, 2016 · Operations

How Vipshop’s Three‑Tier Monitoring System Keeps Services Running Smoothly

This article explains Vipshop’s multi‑layer monitoring architecture, detailing system‑level metrics, application‑level tracing with the Mercury platform, and business‑level KPI dashboards, while describing the data pipelines that collect, process, and alert on distributed logs to ensure reliable operations.

Distributed SystemsOperationsVipshop
0 likes · 4 min read
How Vipshop’s Three‑Tier Monitoring System Keeps Services Running Smoothly
Java High-Performance Architecture
Java High-Performance Architecture
Mar 16, 2016 · Operations

How Vipshop’s Three‑Tier Monitoring System Keeps Services Running Smoothly

Vipshop’s three‑tier monitoring system—covering system, application (Mercury), and business layers—collects and analyzes logs from distributed components, providing real‑time metrics, slow‑call detection, error tracing, and configurable alerts to help engineers quickly pinpoint and resolve performance issues.

APMAlertingDistributed Systems
0 likes · 4 min read
How Vipshop’s Three‑Tier Monitoring System Keeps Services Running Smoothly
Java High-Performance Architecture
Java High-Performance Architecture
Mar 15, 2016 · Operations

Building a 3-Dimensional Automated Visual Monitoring System for High-Availability

The article describes a three-dimensional, automated, visual monitoring approach for high-availability systems, detailing a five-layer monitoring model, automated log collection using Logstash-Redis-Elasticsearch, and visualization techniques that together reduce fault-locating time and improve operational efficiency.

AutomationOperationsSystem Design
0 likes · 5 min read
Building a 3-Dimensional Automated Visual Monitoring System for High-Availability
Qunar Tech Salon
Qunar Tech Salon
Mar 9, 2016 · Backend Development

Design and Lessons Learned from Meizu's Real-Time Message Push System

The article details Meizu's large‑scale real‑time push architecture, covering system scale, four‑layer design, power‑consumption optimizations, network reliability challenges, massive connection handling, load‑balancing strategies, strict monitoring, and gray‑release practices to ensure high performance and stability.

Performance Optimizationbackend scalabilitygray release
0 likes · 12 min read
Design and Lessons Learned from Meizu's Real-Time Message Push System
21CTO
21CTO
Mar 7, 2016 · Backend Development

How Meizu Scaled Real‑Time Push to 600 K Messages/min: Architecture, Pitfalls & Solutions

This article details Meizu's real‑time push system handling 25 million online users and 50 billion daily PVs, describing its four‑layer backend architecture, challenges such as power consumption, mobile network instability, massive connections, and the monitoring and gray‑release strategies used to ensure reliability and performance.

Backend ArchitectureLarge Scale Messaginggray release
0 likes · 12 min read
How Meizu Scaled Real‑Time Push to 600 K Messages/min: Architecture, Pitfalls & Solutions
Architect
Architect
Mar 5, 2016 · Backend Development

Design and Lessons from Meizu Real-Time Push Architecture

The article recounts Meizu architect Yu Xiaobo's presentation on the company's real‑time push system, describing its massive scale, four‑layer backend architecture, challenges such as power consumption, mobile network instability, massive connections, and the monitoring and gray‑release strategies employed to ensure reliability.

gray releasehigh concurrencyload balancing
0 likes · 12 min read
Design and Lessons from Meizu Real-Time Push Architecture
Architecture Digest
Architecture Digest
Mar 5, 2016 · Operations

Dianping Operations Architecture Overview and Best Practices

This article presents a comprehensive overview of Dianping's operations architecture, detailing team organization, multi‑data‑center infrastructure, monitoring layers, automation tools, configuration management systems, incident analysis, lessons learned, and future directions such as Docker and PaaS adoption.

AutomationDevOpsDocker
0 likes · 16 min read
Dianping Operations Architecture Overview and Best Practices
dbaplus Community
dbaplus Community
Mar 3, 2016 · Operations

Why Every Developer Must Master Core Ops Skills

The article explains why developers need to understand operations—covering resource usage, fault handling, platform basics, and essential ops tools—so they can write maintainable code, avoid common pitfalls, and collaborate effectively with ops teams for reliable, high‑performance services.

OperationsSoftware Engineeringcoding standards
0 likes · 14 min read
Why Every Developer Must Master Core Ops Skills
Efficient Ops
Efficient Ops
Mar 2, 2016 · Databases

How DBMP Automates MySQL Management and Cuts DBA Workload

This article explains why the DBMP platform was created to automate MySQL operations, describes its architecture and key features such as host management, instance groups, backup, slow‑query handling, and scheduled tasks, and outlines future optimization directions and common technical Q&A.

Backupdatabase automationfailover
0 likes · 14 min read
How DBMP Automates MySQL Management and Cuts DBA Workload
Architecture Digest
Architecture Digest
Mar 2, 2016 · Operations

Scaling Service Architecture and Operations: Lessons from ChuYe's Engineering Practices

The article recounts ChuYe's evolution from a monolithic setup to a clustered micro‑service architecture, detailing the challenges of debugging, deployment, and monitoring, and describing the solutions implemented—including service clustering, automated deployment platforms, Docker usage, and comprehensive logging and audit systems—to improve agility and operational efficiency.

Deployment AutomationMicroservicesService Architecture
0 likes · 9 min read
Scaling Service Architecture and Operations: Lessons from ChuYe's Engineering Practices
Architecture Digest
Architecture Digest
Mar 1, 2016 · Backend Development

Design and Challenges of Meizu Real-Time Message Push System

The article details Meizu's large‑scale real‑time push architecture, covering system scale, four‑layer design, mobile power‑saving optimizations, network instability handling, massive connection techniques, load‑balancing strategies, comprehensive monitoring, and gray‑release deployment practices.

gray releasehigh concurrencyload balancing
0 likes · 11 min read
Design and Challenges of Meizu Real-Time Message Push System
Architecture Digest
Architecture Digest
Feb 13, 2016 · Backend Development

Evolution of Xiaomi Web Architecture: From Monolith to Scalable Microservices and Cloud‑Native Solutions

The article chronicles Xiaomi Web's architectural journey from a simple three‑engineer monolith in 2011 through systematic service decomposition, asynchronous messaging, database sharding with Cobar, cloud‑native scaling, advanced caching, virtual inventory allocation, and sophisticated monitoring, illustrating practical lessons for building high‑performance e‑commerce platforms.

MicroservicesSystem Architecturecloud
0 likes · 12 min read
Evolution of Xiaomi Web Architecture: From Monolith to Scalable Microservices and Cloud‑Native Solutions
Qunar Tech Salon
Qunar Tech Salon
Feb 3, 2016 · Backend Development

The Value, Modes, and Practices of Performance Optimization

This article explains the benefits and drawbacks of performance optimization, distinguishes between single‑application and structural optimization approaches, outlines common steps, tools, and techniques for each, and presents case studies illustrating architectural evolution for improved scalability and stability.

Scalabilityarchitecturecaching
0 likes · 7 min read
The Value, Modes, and Practices of Performance Optimization
21CTO
21CTO
Jan 28, 2016 · Operations

How to Build High‑Availability Systems: Lessons from a Transaction Platform Evolution

This article shares practical insights on achieving high availability by understanding goals, decomposing requirements, designing resilient architectures, ensuring operability, testing rigorously, and reducing release risk, illustrated through the multi‑stage evolution of a transaction system.

MicroservicesOperationsScalability
0 likes · 14 min read
How to Build High‑Availability Systems: Lessons from a Transaction Platform Evolution
Efficient Ops
Efficient Ops
Jan 26, 2016 · Operations

How Real-Time Log Analytics Transforms IT Operations

This article explains IT Operation Analytics (ITOA), its data sources, use cases, evolution of log management, and how a real‑time log search platform can improve monitoring, security, and business analysis for large‑scale IT environments.

Log AnalyticsSecuritymonitoring
0 likes · 13 min read
How Real-Time Log Analytics Transforms IT Operations
Efficient Ops
Efficient Ops
Jan 17, 2016 · Operations

From Telecom to Startup: A Veteran Ops Engineer Shares Career Lessons

Veteran operations engineer Wang Jinyin recounts his journey from telecom system development to leading ops teams at Tencent, YY, and UC, then founding Youwei, offering practical insights on standardization, automation, DevOps integration, and team building for modern IT operations.

DevOpscareermonitoring
0 likes · 17 min read
From Telecom to Startup: A Veteran Ops Engineer Shares Career Lessons
Architect
Architect
Jan 15, 2016 · Backend Development

WeChat Architecture: Strategies for Massive Scale, Agile Development, and Reliability

The article summarizes Tencent's WeChat technical director Zhou Hao's presentation on how the massive messaging platform achieves rapid growth, high availability, and agile development through a three‑pronged strategy of precise product design, flexible project management, and robust backend technologies such as modular system decomposition, extensible protocols, gray‑release deployment, and comprehensive monitoring.

Agile DevelopmentSystem ArchitectureWeChat
0 likes · 17 min read
WeChat Architecture: Strategies for Massive Scale, Agile Development, and Reliability
21CTO
21CTO
Jan 9, 2016 · Big Data

How We Scaled Real‑Time Log Analysis to 2 TB Daily with ELK

This article shares the author's practical experience building a real‑time log analysis platform at Sina, covering service scope, ELK architecture, performance optimizations, usability improvements, new features, common pitfalls, and a concise Q&A for engineers handling massive log streams.

ELKElasticsearchKafka
0 likes · 12 min read
How We Scaled Real‑Time Log Analysis to 2 TB Daily with ELK
21CTO
21CTO
Jan 8, 2016 · Backend Development

How Didi Scaled Ride‑Hailing: LBS, Long‑Connection, and Real‑Time Data Solutions

Facing explosive traffic growth in 2014, Didi’s ride‑hailing platform tackled critical challenges by redesigning its LBS architecture, replacing unstable long‑connection services with an AIO‑based framework, partitioning databases, adopting Dubbo and RocketMQ for distributed processing, and building a real‑time monitoring and data center using Storm, HBase, and custom SQL‑to‑HBase translation.

Real-time ProcessingRide Hailingdatabase sharding
0 likes · 14 min read
How Didi Scaled Ride‑Hailing: LBS, Long‑Connection, and Real‑Time Data Solutions
ITPUB
ITPUB
Jan 4, 2016 · Backend Development

Designing a Scalable 100k-Server Monitoring System: Architecture and Lessons Learned

The article outlines the architecture, design principles, challenges, and performance optimizations of a large‑scale server monitoring system built for handling hundreds of gigabytes of data per day with high availability, low latency alerts, and multi‑platform support.

C programmingmonitoringreal-time alerts
0 likes · 11 min read
Designing a Scalable 100k-Server Monitoring System: Architecture and Lessons Learned
ITPUB
ITPUB
Dec 11, 2015 · Backend Development

Inside Meizu’s Real‑Time Push System: Architecture, Challenges & Solutions

This article presents a detailed walkthrough of Meizu’s real‑time push platform, covering its four‑layer architecture, high‑concurrency design, micro‑service RPC framework, power‑saving strategies, duplicate‑message handling, DNS reliability, load‑balancing tactics, monitoring setup, and gray‑release deployment.

BackendMicroservicesReal-Time
0 likes · 11 min read
Inside Meizu’s Real‑Time Push System: Architecture, Challenges & Solutions
dbaplus Community
dbaplus Community
Dec 8, 2015 · Backend Development

How to Build a High‑Availability SaaS Customer Service Platform from Scratch

This article shares practical insights on rapidly creating a SaaS customer service platform, designing high‑availability architecture, and boosting overall system performance through load balancing, database replication, distributed caching, CDN acceleration, front‑end SPA frameworks, advanced search, and comprehensive monitoring.

CDNSaaSarchitecture
0 likes · 12 min read
How to Build a High‑Availability SaaS Customer Service Platform from Scratch
Qunar Tech Salon
Qunar Tech Salon
Dec 7, 2015 · Operations

Four Powerful System Monitoring Tools: htop, iotop, apachetop, and Glances

The article presents four command‑line monitoring utilities—htop, iotop, apachetop, and Glances—explaining their features, typical use cases, installation commands, and visual examples to help Linux users gain real‑time insight into processes, I/O, web traffic, and overall system health.

GlancesLinuxapachetop
0 likes · 4 min read
Four Powerful System Monitoring Tools: htop, iotop, apachetop, and Glances
Java High-Performance Architecture
Java High-Performance Architecture
Dec 1, 2015 · Operations

What Is Nagios? Key Features, Components, and Limitations Explained

Nagios is an enterprise‑grade, open‑source monitoring framework that tracks server, service, and network metrics such as CPU usage, memory, disk space, and network throughput, alerts via email or SMS on anomalies, and consists of a core, plugins, and extensions, though it lacks built‑in reporting and has configuration limitations.

IT infrastructureNagiosOperations
0 likes · 3 min read
What Is Nagios? Key Features, Components, and Limitations Explained

LinkedIn’s Kafka at Scale: Architecture, Optimizations, and Operational Practices

The article details how LinkedIn has scaled Kafka from handling billions to trillions of messages daily, describing quota enforcement, a ZooKeeper‑free consumer, reliability enhancements, security plans, monitoring frameworks, fault‑injection testing, cluster balancing, and integration with other internal data systems.

Big DataKafkaLinkedIn
0 likes · 12 min read
LinkedIn’s Kafka at Scale: Architecture, Optimizations, and Operational Practices
21CTO
21CTO
Nov 27, 2015 · Backend Development

How Baiba’s Backend Powers 90% Mobile Commerce: Architecture Deep Dive

This article details Baiba's evolution from a simple flash‑sale site to a mobile‑centric e‑commerce platform, describing its backend flow through CDN, caching layers, PHP‑FPM, memcached, Redis, MySQL, search engines, monitoring tools, deployment pipelines, and future plans for service‑orientation and hybrid apps.

BackendDeploymentPHP
0 likes · 15 min read
How Baiba’s Backend Powers 90% Mobile Commerce: Architecture Deep Dive
21CTO
21CTO
Nov 12, 2015 · Operations

Scaling Suning’s E‑Commerce for Double‑11: System Splitting and Resilience

Suning’s technical team shares how they prepared for Double‑11 by splitting monolithic services into focused modules, building a robust foundational platform with cloud, middleware, and monitoring tools, refining R&D processes, and implementing comprehensive load‑testing, optimization, and emergency response plans to ensure system stability under massive traffic.

Load Testingemergency responsee‑commerce architecture
0 likes · 13 min read
Scaling Suning’s E‑Commerce for Double‑11: System Splitting and Resilience
21CTO
21CTO
Nov 12, 2015 · Cloud Computing

Scaling Mogujie's Private Cloud for 11.11: Architecture, Stability & Ops Insights

This article details how Mogujie's private cloud platform, built on OpenStack, Docker, and KVM, was engineered and optimized to handle the massive traffic of the 11.11 shopping festival, covering architectural choices, stability measures, monitoring, disaster recovery, performance tuning, and integration with existing operations systems.

DockerKVMOpenStack
0 likes · 10 min read
Scaling Mogujie's Private Cloud for 11.11: Architecture, Stability & Ops Insights
21CTO
21CTO
Nov 12, 2015 · Operations

How Vipshop Scales Flash Sales: Architecture Strategies for High‑Concurrency E‑Commerce

This article explains how Vipshop’s flash‑sale platform handles massive traffic spikes by redesigning system modules, adopting service‑oriented architecture, implementing async processing, multi‑stage caching, database optimizations, and comprehensive monitoring to ensure stability and scalability.

BackendService Architecturecaching
0 likes · 16 min read
How Vipshop Scales Flash Sales: Architecture Strategies for High‑Concurrency E‑Commerce

Designing a Business‑Oriented High‑Availability Architecture for Game Access Systems

The article presents a comprehensive, business‑centric high‑availability architecture for a game access platform, detailing measurable goals, a three‑layered design, client‑side retry with HTTP‑DNS, functional separation and degradation, multi‑region active‑active deployment, and automated, visual monitoring to achieve rapid issue detection, recovery, and minimal downtime.

Distributed Systemsbusiness reliabilityfault tolerance
0 likes · 23 min read
Designing a Business‑Oriented High‑Availability Architecture for Game Access Systems
MaGe Linux Operations
MaGe Linux Operations
Oct 21, 2015 · Operations

How JobCenter Transforms Distributed Task Scheduling in E‑Commerce

JobCenter is a distributed task coordination platform that replaces crontab with a unified scheduling, monitoring, and alerting system, enabling e‑commerce teams to manage thousands of web‑service‑based jobs, ensure reliable execution, and gain clear visibility into task performance.

AutomationDistributed SystemsOperations
0 likes · 7 min read
How JobCenter Transforms Distributed Task Scheduling in E‑Commerce
Efficient Ops
Efficient Ops
Oct 13, 2015 · Operations

Boosting IT Operations Performance: Lean Metrics, CI/CD, and Smart Automation

The article explores how focusing on IT performance through lean principles, precise throughput and latency metrics, continuous integration, trust between development and operations, visualization, and end‑to‑end monitoring can transform operations teams into high‑speed, value‑driven service providers.

IT performanceLean Operationscontinuous integration
0 likes · 11 min read
Boosting IT Operations Performance: Lean Metrics, CI/CD, and Smart Automation
21CTO
21CTO
Sep 28, 2015 · Operations

Mastering Log Management: 16 Rules to Boost System Reliability

This article presents a comprehensive set of logging best‑practice rules—from defining log levels and classifications to using RequestIDs, monitoring alerts, and managing log size—aimed at improving system reliability, troubleshooting speed, and operational efficiency.

DebuggingLog ManagementOperations
0 likes · 23 min read
Mastering Log Management: 16 Rules to Boost System Reliability
21CTO
21CTO
Sep 27, 2015 · Big Data

How Weidian Built a Scalable Big Data Platform for Mobile Commerce

This article outlines the design and implementation of Weidian’s end‑to‑end big data processing platform, covering dataset definition, data collection via Flume‑based DataAgent, transmission through Databus, storage options such as HDFS, Kafka and Elasticsearch, and the monitoring and resource‑integration strategies that support massive mobile commerce logs.

ElasticsearchFlumeHadoop
0 likes · 18 min read
How Weidian Built a Scalable Big Data Platform for Mobile Commerce
Efficient Ops
Efficient Ops
Sep 13, 2015 · Operations

How Tencent’s BlueKing Platform Automates Ops: Key Takeaways from the Efficient Operations Talk

This article summarizes a detailed Q&A from the Efficient Operations talk, covering BlueKing’s integration with databases, agent resource management, alarm de‑duplication, automation workflows, development language choices, data handling, and the platform’s suitability for various enterprise environments.

BlueKingDatabase operationsDevOps
0 likes · 13 min read
How Tencent’s BlueKing Platform Automates Ops: Key Takeaways from the Efficient Operations Talk
High Availability Architecture
High Availability Architecture
Aug 16, 2015 · Backend Development

High‑Availability Architecture and Scaling Experience of Snowball During Stock Market Turbulence

This article shares Snowball's (Xueqiu) high‑availability architecture, performance optimizations, and scaling strategies—including hybrid cloud migration, service decomposition, in‑memory caching, and metric‑driven monitoring—implemented to handle massive traffic spikes and operational challenges during a volatile stock market period.

Backendarchitecturehigh-availability
0 likes · 21 min read
High‑Availability Architecture and Scaling Experience of Snowball During Stock Market Turbulence
Qunar Tech Salon
Qunar Tech Salon
Aug 13, 2015 · Operations

How to Determine Whether a Server Is Still in Use Before Decommissioning

This article outlines a systematic approach for ops teams to assess whether a low‑utilization server is still needed by checking user logins, running services, network connections, cron jobs, and file storage—including specific commands for MySQL and PostgreSQL data inspection—to avoid accidental data loss during reclamation.

Servercrondecommission
0 likes · 6 min read
How to Determine Whether a Server Is Still in Use Before Decommissioning

A Curated List of Essential Linux Command-Line Tools

This article presents a comprehensive collection of useful Linux command-line utilities—including dstat, screen, tmux, multitail, rsync, and many others—explaining their purposes, typical use cases, and where to obtain them, helping system administrators and developers improve productivity and monitoring.

BackupLinuxSystem Administration
0 likes · 12 min read
A Curated List of Essential Linux Command-Line Tools

Key Insights on Microservices Adoption and DevOps Practices from QCon London

At QCon London, GDS architect Michael Brunton‑Spall explained how DevOps principles underpin successful microservice operations, covering service identification, initial creation, ownership, essential tooling, monitoring depth, failure handling, and practical deployment practices for scaling from a single service to a full ecosystem.

DevOpsMicroservicesOperations
0 likes · 7 min read
Key Insights on Microservices Adoption and DevOps Practices from QCon London
MaGe Linux Operations
MaGe Linux Operations
Jun 16, 2015 · Operations

Inside Dianping’s Ops: Building Scalable Monitoring, Automation, and Self‑Service Platforms

This article details how Dianping’s sub‑40‑person operations team structures its groups, designs a dual‑datacenter architecture, and creates comprehensive monitoring, automation, configuration, and analysis systems—including Zabbix, Cat, workflow, Button, and a custom radar platform—to achieve high‑availability, self‑service, and continuous improvement.

AutomationDevOpsInfrastructure
0 likes · 18 min read
Inside Dianping’s Ops: Building Scalable Monitoring, Automation, and Self‑Service Platforms

How to Achieve Efficient Operations Management

This article outlines the concept of efficient operations, analyzes why it is difficult to achieve, and presents practical strategies—including clear responsibilities, technical specialization, management professionalism, and good customer interaction—to improve operational efficiency in technology teams.

ManagementOperationsefficiency
0 likes · 14 min read
How to Achieve Efficient Operations Management
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Apr 8, 2015 · Cloud Computing

Practices in Building Distributed Technologies for Large‑Scale Cloud Computing Platforms

The article summarizes Dr. Zhang Wensong’s 2014 ArchSummit keynote on the challenges, architectural design, storage strategies, performance optimizations, monitoring, and future directions of Alibaba Cloud’s large‑scale distributed cloud computing platform, covering ECS, SLB, RDS, OCS and full‑link analytics.

ECSPerformance OptimizationRDS
0 likes · 17 min read
Practices in Building Distributed Technologies for Large‑Scale Cloud Computing Platforms
MaGe Linux Operations
MaGe Linux Operations
Apr 8, 2015 · Operations

Four Essential Strategies to Elevate Data Center Operations

This article outlines four key practices—comprehensive engineering documentation, robust business backup, continuous online monitoring, and regular periodic inspections—that together ensure optimal performance, reliability, and long‑term benefits for data center operations.

BackupData centerDocumentation
0 likes · 7 min read
Four Essential Strategies to Elevate Data Center Operations
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Mar 6, 2015 · Backend Development

Designing Scalable Stateless Architecture: Sessions, Caching, Sharding & Monitoring

The article explains how to achieve horizontal scalability by making applications stateless, using client‑side cookies for session data, applying various caching layers, splitting services and databases with sharding, adopting asynchronous messaging, storing unstructured data, and integrating monitoring with alerting.

cachingmonitoringsharding
0 likes · 9 min read
Designing Scalable Stateless Architecture: Sessions, Caching, Sharding & Monitoring
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Mar 5, 2015 · Mobile Development

Evolution Stages and Architecture of Mobile Taobao: API Gateway, Bundles, WebApp, and Support Systems

The article outlines Mobile Taobao's four development stages, the introduction and scaling of an API gateway, the bundle‑based mobile architecture with WebApp and PackageApp components, and the comprehensive R&D, testing, operations, and release support mechanisms that enable large‑scale, resilient mobile commerce.

Deploymentapi-gatewaybundle
0 likes · 14 min read
Evolution Stages and Architecture of Mobile Taobao: API Gateway, Bundles, WebApp, and Support Systems
Qunar Tech Salon
Qunar Tech Salon
Feb 5, 2015 · Backend Development

WeChat Architecture: Scaling to Hundreds of Millions Users with Agile Development and Robust Operations

The talk reveals how WeChat achieved rapid growth to over 100 million users by combining precise product timing, an aggressive agile mindset, and a resilient technical backbone built on modular large‑system design, extensible protocols, gray‑release deployment, comprehensive monitoring, and fault‑tolerant disaster‑recovery strategies.

Agile DevelopmentBackendWeChat
0 likes · 17 min read
WeChat Architecture: Scaling to Hundreds of Millions Users with Agile Development and Robust Operations
MaGe Linux Operations
MaGe Linux Operations
Aug 5, 2014 · Operations

Essential Linux Server Troubleshooting Checklist: 13 Practical Steps

When a Linux server experiences a failure, this guide walks you through a comprehensive 13‑step checklist—covering problem context, user activity, process inspection, network services, resource usage, hardware, I/O performance, logs, and scheduled tasks—to help you quickly pinpoint and resolve the root cause.

CLILinuxmonitoring
0 likes · 10 min read
Essential Linux Server Troubleshooting Checklist: 13 Practical Steps
MaGe Linux Operations
MaGe Linux Operations
Jul 27, 2014 · Operations

Essential Tools Every Linux Sysadmin Must Master

This guide outlines the ten crucial Linux operations tools—from system basics and networking services to shell scripting, text processing, databases, firewalls, monitoring, clustering, and backup—providing a comprehensive roadmap for aspiring sysadmins to become proficient in just a few months.

LinuxShell scriptingmonitoring
0 likes · 7 min read
Essential Tools Every Linux Sysadmin Must Master