Tagged articles

Operations

3329 articles · Page 25 of 34

Jul 22, 2019 · Operations

Understanding Forward and Reverse Proxies: Concepts, Differences, and Nginx Configuration

This article explains the fundamentals of forward and reverse proxies, compares their characteristics and differences, and provides practical Nginx configuration examples for implementing reverse proxy, load balancing, and cross‑origin handling in web applications.

NginxOperationsReverse Proxy

0 likes · 10 min read

Understanding Forward and Reverse Proxies: Concepts, Differences, and Nginx Configuration

DevOps

Jul 22, 2019 · Operations

DevOps Team Topologies: Anti‑Types, Types, and Choosing the Right Structure

This article explains the various DevOps team topologies—including anti‑patterns A‑G and nine positive types—detailing their characteristics, applicability, and potential effectiveness so organizations can select the most suitable structure for their value‑stream delivery goals.

Anti-PatternOperationsTeam Topology

0 likes · 14 min read

DevOps Team Topologies: Anti‑Types, Types, and Choosing the Right Structure

360 Zhihui Cloud Developer

Jul 18, 2019 · Operations

Why Bosun Beats Alertmanager and Kapacitor for Container Alerting

This article compares three container alerting frameworks—Alertmanager, Kapacitor, and Bosun—explains why Bosun was chosen for its flexible HTTP API rule deployment and low learning curve, and provides step‑by‑step configuration, rule definition, notification, and templating examples for integrating Bosun with Prometheus.

AlertingBosunOperations

0 likes · 9 min read

Why Bosun Beats Alertmanager and Kapacitor for Container Alerting

Efficient Ops

Jul 14, 2019 · Operations

How 5G Is Driving DevOps Adoption in Telecom: Guangdong Mobile’s Cloud‑Native Journey

At the 2019 DevOps International Summit in Beijing, Guangdong Mobile and Huawei unveiled their first carrier‑level DevOps deployment for 5G network elements, detailing the challenges of traditional rollout, the shift to X86‑based cloud infrastructure, and a step‑by‑step pipeline that cut deployment time by over 90%.

5GAutomationOperations

0 likes · 18 min read

How 5G Is Driving DevOps Adoption in Telecom: Guangdong Mobile’s Cloud‑Native Journey

MaGe Linux Operations

Jul 12, 2019 · Operations

Essential Linux Commands Every Engineer Should Master

This guide compiles the most indispensable Linux commands—from directory and file manipulation, navigation, and text processing to compression, daily system administration, status monitoring, networking, and database access—providing concise examples and practical tips for both beginners and seasoned users.

LinuxOperationscommand-line

0 likes · 14 min read

Essential Linux Commands Every Engineer Should Master

FunTester

Jul 12, 2019 · Operations

Installing and Localizing Netdata: A Real‑Time Linux Performance Monitoring Tool

This guide explains how to install Netdata, a fast web‑based Linux performance monitor, and apply a Chinese localization by using a forked repository, running the provided installer script, and configuring the service to view detailed system metrics through a clean UI.

Operations

0 likes · 6 min read

Installing and Localizing Netdata: A Real‑Time Linux Performance Monitoring Tool

Efficient Ops

Jul 11, 2019 · Operations

Unlocking China’s First DevOps Capability Maturity Model: Certification Guide

This article explains China’s inaugural DevOps Capability Maturity Model, its continuous delivery requirements, the assessment process, participating enterprises, and how organizations can apply for certification in the upcoming evaluation round.

Capability Maturity ModelOperationsassessment

0 likes · 6 min read

Unlocking China’s First DevOps Capability Maturity Model: Certification Guide

Ctrip Technology

Jul 11, 2019 · Cloud Native

Ctrip’s Continuous Delivery Practices and Unified Build Platform with Jenkins on Kubernetes

This article describes Ctrip’s large‑scale continuous delivery system, its benefits for efficiency, quality, reliability and team collaboration, the evolution of its deployment models, the design of a unified Jenkins‑based build platform, and practical experiences running Jenkins on Kubernetes with elastic scheduling and workspace management.

CI/CDJenkinsKubernetes

0 likes · 19 min read

Ctrip’s Continuous Delivery Practices and Unified Build Platform with Jenkins on Kubernetes

360 Tech Engineering

Jul 8, 2019 · Operations

Common ETCD Issues and Recovery Procedures

This guide explains ETCD’s high‑availability architecture and provides detailed step‑by‑step recovery procedures for single‑node failures, majority‑node outages, and database‑space‑exceeded errors, including status checks, member removal and addition, snapshot restoration, compaction, defragmentation, and alarm clearing.

EtcdOperationsRecovery

0 likes · 7 min read

Common ETCD Issues and Recovery Procedures

Mafengwo Technology

Jul 4, 2019 · Backend Development

Scaling MaFengWo’s Payment Center: From 1.0 to 2.0 Architecture & Key Lessons

This article details how MaFengWo’s payment center evolved from a simple 1.0 implementation to a robust 2.0 architecture, covering core capabilities, modular design, monitoring, and the operational lessons learned for building a high‑availability, scalable payment platform.

OperationsSystem Monitoringbackend design

0 likes · 16 min read

Scaling MaFengWo’s Payment Center: From 1.0 to 2.0 Architecture & Key Lessons

Qunar Tech Salon

Jul 3, 2019 · Operations

Analysis of International Roaming Network Issues and the QunarNDT Diagnostic Tool

This article presents a case study of an overseas user experiencing video upload failures due to international roaming, explains how Qunar's network detection tool (QunarNDT) was used to diagnose the issue, outlines the findings about domestic routing and timeout, and offers recommendations for improving user experience.

CDNOperationsQunar

0 likes · 7 min read

Analysis of International Roaming Network Issues and the QunarNDT Diagnostic Tool

Architects' Tech Alliance

Jun 30, 2019 · Operations

How DNS and GSLB Enable Multi-Active Data Center Load Balancing

This article explains DNS fundamentals, the step‑by‑step resolution process, TTL caching, and how DNS‑based Global Server Load Balancing (GSLB) can direct traffic to the nearest active data‑center, providing a practical guide for building multi‑active, high‑availability infrastructures.

DNSData CenterGSLB

0 likes · 10 min read

How DNS and GSLB Enable Multi-Active Data Center Load Balancing

MaGe Linux Operations

Jun 30, 2019 · Operations

Mastering Load Balancing: LVS, Nginx, and HAProxy Explained

This article introduces server clustering and load‑balancing concepts, compares popular software such as LVS, Nginx, and HAProxy, explains their architectures, NAT and DR modes, and outlines each solution's strengths and weaknesses for building high‑performance web services.

HAProxyLVSOperations

0 likes · 14 min read

Mastering Load Balancing: LVS, Nginx, and HAProxy Explained

DevOps Cloud Academy

Jun 29, 2019 · Operations

Prometheus Overview: Architecture, Metrics, Data Collection, and Storage

This article provides a comprehensive overview of Prometheus, an open‑source monitoring and alerting system, covering its origins, key features, architecture, core components, metric types, data collection methods, service discovery, storage options, and query capabilities.

AlertmanagerGrafanaOperations

0 likes · 9 min read

Prometheus Overview: Architecture, Metrics, Data Collection, and Storage

360 Tech Engineering

Jun 28, 2019 · Operations

A Comprehensive Guide to Puppet: Architecture, Installation, and Resource Management

This article provides an in‑depth overview of Puppet, covering its background, C/S architecture, installation steps for master, CA and agent nodes, detailed configuration options, resource definitions, and common package and file resources, illustrating how to automate large‑scale server cluster management.

Operationsconfiguration management

0 likes · 13 min read

A Comprehensive Guide to Puppet: Architecture, Installation, and Resource Management

21CTO

Jun 27, 2019 · Operations

From Hundreds to Thousands: Scaling Operations and Building a Custom Monitoring System

This article recounts AdMaster's five‑year journey from a few dozen servers to thousands, detailing the evolution of their monitoring infrastructure, the challenges faced at each scale stage, and the design of a self‑built, distributed monitoring platform that delivers real‑time alerts, visualized data, and business‑level insights.

Operationsinfrastructurescaling

0 likes · 14 min read

From Hundreds to Thousands: Scaling Operations and Building a Custom Monitoring System

ITPUB

Jun 26, 2019 · Operations

How to Prevent Catastrophic rm -rf Mistakes in Linux Shell Scripts

This article explains common scenarios where empty variables, spaces, special characters, or failed directory changes cause accidental deletions in Linux, and provides practical shell techniques—such as quoting, parameter expansion, set -u, and logical checks—to safeguard against disastrous rm -rf commands.

LinuxOperationsSafety

0 likes · 8 min read

How to Prevent Catastrophic rm -rf Mistakes in Linux Shell Scripts

MaGe Linux Operations

Jun 26, 2019 · Operations

Scaling Ops: From Hundreds to Thousands of Servers – Lessons from AdMaster

This article shares AdMaster's five‑year operations journey, detailing how the team scaled monitoring from under 200 machines to over a thousand, the evolution of their monitoring stack, the design of a custom distributed system, and practical Q&A insights for large‑scale infrastructure management.

Operationsscaling

0 likes · 15 min read

Scaling Ops: From Hundreds to Thousands of Servers – Lessons from AdMaster

Efficient Ops

Jun 23, 2019 · Operations

How to Diagnose and Fix Java Application Slowdowns: CPU, GC, and Thread Issues

This guide explains how to identify and resolve common Java production problems such as sudden CPU spikes, excessive Full GC, thread blocking, waiting states, and deadlocks by using tools like top, jstack, jstat, and memory‑dump analysis to pinpoint the root cause and apply appropriate fixes.

GCOperationsThread Dump

0 likes · 18 min read

How to Diagnose and Fix Java Application Slowdowns: CPU, GC, and Thread Issues

DevOps Cloud Academy

Jun 20, 2019 · Operations

Step-by-Step Installation and Configuration of Node Exporter, Alertmanager, Prometheus, and Grafana for Monitoring and Alerting

This guide walks through downloading, extracting, and setting up Node Exporter, Alertmanager, Prometheus, and Grafana on a Linux server, configuring their systemd services, customizing alert rules, and verifying the monitoring and alerting pipeline with screenshots of each verification step.

AlertmanagerGrafanaOperations

0 likes · 7 min read

Step-by-Step Installation and Configuration of Node Exporter, Alertmanager, Prometheus, and Grafana for Monitoring and Alerting

ITPUB

Jun 20, 2019 · Operations

Essential Ops Lessons: Avoid Disasters with Backups, Permissions, and Monitoring

This article shares hard‑earned operational guidelines for Linux servers, covering safe testing, cautious use of rm ‑rf, the importance of backups, strict access control, SSH hardening, firewall rules, intrusion detection, systematic monitoring, performance tuning, and maintaining a calm mindset to prevent costly incidents.

OperationsServer Administrationmonitoring

0 likes · 12 min read

Essential Ops Lessons: Avoid Disasters with Backups, Permissions, and Monitoring

Efficient Ops

Jun 11, 2019 · Operations

What Powers WeChat’s Billion‑User Scale? Inside Its DevOps Journey

WeChat, China’s top social app with over a billion users, has applied DevOps practices to dramatically improve development efficiency, code quality, and accelerate the feedback cycle from requirements to delivery, while confronting real‑world challenges in tooling, processes, reliability, and automation costs.

Large ScaleOperationsWeChat

0 likes · 3 min read

What Powers WeChat’s Billion‑User Scale? Inside Its DevOps Journey

DevOps Cloud Academy

Jun 9, 2019 · Operations

Prometheus Metric Definitions, Types, and Data Samples

This article explains Prometheus metric naming conventions, label usage, metric types such as Counter, Gauge, Summary, and Histogram, and describes the structure of data samples, providing examples and best‑practice guidelines for defining and classifying metrics in monitoring systems.

ObservabilityOperationsPrometheus

0 likes · 5 min read

Prometheus Metric Definitions, Types, and Data Samples

Programmer DD

Jun 7, 2019 · Operations

Why Most Alerts Fail and How to Build Actionable Monitoring

This article explains the fundamental flaws of typical alert systems, distinguishes between business rule and reliability monitoring, outlines essential metrics and strategies for effective alerts, and presents simple yet powerful anomaly‑detection algorithms to ensure alerts are actionable and reduce noise.

AlertingAnomaly DetectionOperations

0 likes · 21 min read

Why Most Alerts Fail and How to Build Actionable Monitoring

Big Data Technology & Architecture

Jun 3, 2019 · Big Data

Design and Implementation of Alibaba Cloud's 10PB+ Daily Log Service

This article presents an in‑depth interview with Alibaba Cloud senior expert Sun Tingtao, detailing the architecture, core features, design challenges, and operational strategies of the Alibaba Cloud Log Service that handles over 10 PB of daily log data for massive, diverse production workloads.

Alibaba CloudBig DataIndexing

0 likes · 12 min read

Design and Implementation of Alibaba Cloud's 10PB+ Daily Log Service

MaGe Linux Operations

Jun 3, 2019 · Operations

How to Safely Prevent Accidental rm -rf Deletions in Linux Shell

This article explains common scenarios that lead to accidental directory or file deletions in Linux shell scripts—such as empty variables, spaces in paths, special characters, and failed cd commands—and provides practical Bash techniques like variable expansion checks, quoting, set -u, logical short‑circuiting, and safer prompts to avoid catastrophic rm -rf mistakes.

LinuxOperationsSafety

0 likes · 8 min read

How to Safely Prevent Accidental rm -rf Deletions in Linux Shell

Mafengwo Technology

May 31, 2019 · Operations

How We Built a Scalable Monitoring & Alert System for Large‑Scale Transportation Services

This article explains how the team designed and implemented a unified monitoring and alert platform for a multi‑service transportation business, covering architecture, data collection, storage, rule engine, alert delivery, troubleshooting aids, encountered pitfalls, and future enhancements.

AlertingElasticsearchKafka

0 likes · 13 min read

How We Built a Scalable Monitoring & Alert System for Large‑Scale Transportation Services

Efficient Ops

May 30, 2019 · Operations

Enterprise‑Scale DevOps Secrets from China’s Top Banks Revealed

The 2019 Enterprise‑Level DevOps Empowerment Forum in Chengdu gathered experts from major Chinese banks and telecoms to share practical experiences, including China Merchants Bank’s K8s‑based pipeline, measurement challenges, and collaborative Q&A, illustrating how organizations can accelerate DevOps adoption and improve delivery efficiency.

Continuous IntegrationEnterpriseKubernetes

0 likes · 9 min read

Enterprise‑Scale DevOps Secrets from China’s Top Banks Revealed

MaGe Linux Operations

May 29, 2019 · Operations

Essential Linux Ops Tools: Install & Use Nethogs, IOZone, IOTop, and More

This guide introduces a collection of practical Linux operations tools—including Nethogs, IOZone, IOTop, IPtraf, IFTop, HTop, NMON, MultiTail, Fail2ban, Tmux, Agedu, NMap and Httperf—providing concise installation commands, basic usage examples, and key options to help system administrators monitor performance, security and resources efficiently.

LinuxOperationsPerformance

0 likes · 11 min read

Essential Linux Ops Tools: Install & Use Nethogs, IOZone, IOTop, and More

21CTO

May 24, 2019 · Operations

How Meituan’s R&D Team Cut Tens of Millions in Resource Costs: A Practical Guide

This article details Meituan's R&D team's systematic PDCA‑based approach to resource cost optimization, covering methodology definition, planning, execution, checking, and iterative improvement across infrastructure, big‑data, and shared services, ultimately saving tens of millions of yuan.

Big DataCost OptimizationOperations

0 likes · 22 min read

How Meituan’s R&D Team Cut Tens of Millions in Resource Costs: A Practical Guide

Beike Product & Technology

May 23, 2019 · Backend Development

Investigation of Nginx 502 Errors Caused by PHP‑FPM Warning Triggering a FastCGI Buffer Defect

This article analyses why seemingly normal PHP‑FPM requests can cause Nginx to return 502 errors, revealing a FastCGI fastcgi_buffer_size bug triggered by warning output, describing the reproduction steps, detailed packet analysis, the underlying protocol mechanics, and practical recommendations for developers and operators.

502 errorBufferNginx

0 likes · 17 min read

Investigation of Nginx 502 Errors Caused by PHP‑FPM Warning Triggering a FastCGI Buffer Defect

Efficient Ops

May 21, 2019 · Operations

Essential Linux Ops Tools: Nethogs, IOZone, IOTop, and More

This guide introduces a dozen practical Linux operation tools—including Nethogs, IOZone, IOTop, IPtraf, IFTop, Fail2ban, Tmux, and others—providing concise descriptions, download links, and ready‑to‑run installation commands to help system administrators boost monitoring, performance testing, and security on their servers.

LinuxOperationsTools

0 likes · 12 min read

Essential Linux Ops Tools: Nethogs, IOZone, IOTop, and More

MaGe Linux Operations

May 19, 2019 · Operations

Mastering Modern Operations: Trends, Skill Maps, and Big Data Monitoring Strategies

This article explores the evolution of operations roles, presents detailed skill maps for system, web, big‑data, and container operations, explains essential log types, and outlines ELK‑based architectures and big‑data‑driven monitoring practices for building a robust, future‑proof operations platform.

ELKOperationscontainer

0 likes · 15 min read

Mastering Modern Operations: Trends, Skill Maps, and Big Data Monitoring Strategies

Efficient Ops

May 16, 2019 · Operations

How Alibaba’s AI‑Powered Data Centers Achieve Scalable, Reliable Operations

This article examines Alibaba Cloud’s intelligent data center ecosystem, covering market share, global distribution, operational challenges, AIOps evolution, multi‑layered infrastructure platforms, demand forecasting, fault prediction, and future smart‑automation prospects for large‑scale cloud operations.

AIOpsAlibaba CloudOperations

0 likes · 13 min read

How Alibaba’s AI‑Powered Data Centers Achieve Scalable, Reliable Operations

Liangxu Linux

May 15, 2019 · Operations

Essential Backup Tools for Developers: Git, Rsync, Dropbox, and Time Machine

This guide reviews four practical backup solutions—Git for versioned file control, Rsync for command‑line incremental syncing, Dropbox for cloud‑based GUI storage, and macOS Time Machine for full system snapshots—explaining their key features, typical use cases, and basic setup steps.

GitOperationsbackup

0 likes · 6 min read

Essential Backup Tools for Developers: Git, Rsync, Dropbox, and Time Machine

Efficient Ops

May 14, 2019 · Operations

How to Master Multi‑Cloud Operations: Lessons from a Gaming Company’s Hybrid Architecture

This talk shares a senior director’s experience building a hybrid multi‑cloud infrastructure for a game company, covering stability, efficiency, cost challenges, design‑for‑failure principles, standardization, resource automation, and the cultural and organizational factors that affect successful cloud operations.

Cost OptimizationHybrid CloudMulti-Cloud

0 likes · 20 min read

How to Master Multi‑Cloud Operations: Lessons from a Gaming Company’s Hybrid Architecture

Qunar Tech Salon

May 14, 2019 · Operations

Understanding Linux Cgroups for Container Resource Management

This article explains the fundamentals of Linux control groups (cgroups), their components and relationships, and provides step‑by‑step guidance on creating hierarchies, mounting, configuring subsystems, and applying cgroup limits to Docker and Kubernetes containers.

Operationscgroupscontainer

0 likes · 9 min read

Understanding Linux Cgroups for Container Resource Management

360 Quality & Efficiency

May 9, 2019 · Operations

Using Paramiko to Establish SSH Connections and Switch to Root Privileges

This tutorial explains how to install the Python Paramiko library, create an SSH connection to a remote server using password or key authentication, and then elevate to root privileges with sudo, while highlighting additional capabilities of the module.

OperationsParamikoPython

0 likes · 3 min read

Using Paramiko to Establish SSH Connections and Switch to Root Privileges

Architects' Tech Alliance

May 8, 2019 · Operations

How to Choose the Right Server Rack: Key Factors and Best Practices

This guide explains how to select and grade server racks, outlines essential criteria such as load capacity, ventilation, power distribution, and cable management, and compares three cable‑routing techniques to help data‑center operators make reliable, future‑proof decisions.

Hardware SelectionOperationscable management

0 likes · 12 min read

How to Choose the Right Server Rack: Key Factors and Best Practices

Efficient Ops

May 6, 2019 · Operations

How Live Streaming Ops Ensure Real-Time Reliability at Scale

Zhang Guanshi, the operations director at Huya Live, shares how his team designs a hybrid‑cloud architecture, implements a six‑pillar reliability framework, and leverages real‑time monitoring, AIOps, and rapid‑recovery tools to maintain stable, low‑latency live video streams for millions of viewers.

Live StreamingOperationsReliability Engineering

0 likes · 22 min read

How Live Streaming Ops Ensure Real-Time Reliability at Scale

Efficient Ops

May 5, 2019 · Operations

How Qunar Uses AI-Driven Fault Prediction to Boost System Reliability

This article outlines Qunar's operational strategy for reducing failures and extending uptime through precise fault detection, rapid recovery, and AI-powered predictive health management, detailing the evolution of their OPS processes, practical implementations, and future challenges in applying PHM to internet services.

AIOpsOperationsPHM

0 likes · 18 min read

How Qunar Uses AI-Driven Fault Prediction to Boost System Reliability

iQIYI Technical Product Team

Apr 26, 2019 · Operations

Design and Implementation of iQIYI CDN Inspection System

iQIYI built a three‑component CDN Inspection System that automatically generates tasks, centrally processes and analyzes results, and runs edge measurements to monitor millions of hybrid CDN servers in real time, detecting configuration errors, file mismatches and traffic anomalies, enabling proactive remediation and 100 % local coverage.

CDNCloud ComputingOperations

0 likes · 11 min read

Design and Implementation of iQIYI CDN Inspection System

DevOps

Apr 24, 2019 · Operations

2019 Accelerate State of DevOps Survey: Participation Guide, Insights, and Interview with Nicole Forsgren

This article introduces the 2019 Accelerate State of DevOps survey, explains how to join the questionnaire, provides background on previous reports, shares a detailed interview with Nicole Forsgren about research design and key findings such as architecture, cloud adoption, and outsourcing, and encourages community participation.

AccelerateCloudNicole Forsgren

0 likes · 35 min read

2019 Accelerate State of DevOps Survey: Participation Guide, Insights, and Interview with Nicole Forsgren

dbaplus Community

Apr 24, 2019 · Operations

Choosing and Tuning Open‑Source Monitoring Stacks for Large‑Scale Operations

This article reviews common open‑source monitoring tools, shares the evolution of China Unicom's big‑data platform monitoring, and provides practical guidance on selecting collectors, databases, and visualization components, with detailed configurations for Prometheus, Alertmanager, Grafana, and automation recovery techniques.

AlertmanagerGrafanaInfluxDB

0 likes · 19 min read

Choosing and Tuning Open‑Source Monitoring Stacks for Large‑Scale Operations

Efficient Ops

Apr 24, 2019 · Operations

Why Every Ops Change Should Be Treated Like a Project

This article shares practical lessons from a real‑world ops incident, emphasizing the need for clear change background, optimal timing, project‑style management, and strict process adherence to reduce risk and improve production reliability.

Best PracticesChange ManagementOperations

0 likes · 9 min read

Why Every Ops Change Should Be Treated Like a Project

Didi Tech

Apr 23, 2019 · Industry Insights

What the First Global DevOps Standard Means for Didi and the Industry

The article explains the launch of the world’s first DevOps capability maturity model, the collaborative effort behind it, Didi’s role as a standards workgroup member, and how its OE (OneExperience) platform embodies the new guidelines to streamline the entire software delivery lifecycle.

Capability Maturity ModelDidiIndustry Standard

0 likes · 5 min read

What the First Global DevOps Standard Means for Didi and the Industry

21CTO

Apr 19, 2019 · Operations

From Junior to Senior Ops Engineer: Master the Skills to Level Up

This guide walks you through the entire career ladder of a senior operations engineer, covering essential Linux, networking, monitoring, container, automation, and security skills, while offering practical advice on job roles, learning paths, and professional growth.

Operationscontainerizationdevops

0 likes · 13 min read

From Junior to Senior Ops Engineer: Master the Skills to Level Up

ITPUB

Apr 19, 2019 · Operations

How to Level Up from Junior to Senior DevOps Engineer: A Complete Roadmap

This guide outlines the career stages, skill sets, and practical tasks for DevOps engineers—from entry‑level troubleshooting to senior‑level architecture, automation, and performance optimization—providing concrete learning paths, tools, and personal development advice to help engineers advance their operations careers.

AutomationCareerLinux

0 likes · 12 min read

How to Level Up from Junior to Senior DevOps Engineer: A Complete Roadmap

Efficient Ops

Apr 18, 2019 · Operations

Choosing the Right Monitoring Stack: From Nagios to Prometheus & Grafana

This article reviews common open‑source monitoring combinations, compares their strengths and weaknesses, and shares practical guidance on selecting collectors, storage back‑ends, and visualization tools such as Telegraf, InfluxDB, Prometheus, Grafana, and alertmanager for large‑scale data platform operations.

GrafanaInfluxDBOperations

0 likes · 12 min read

Choosing the Right Monitoring Stack: From Nagios to Prometheus & Grafana

58UXD

Apr 18, 2019 · Operations

How Winning Design Strategies Boosted Spring Festival Campaign Traffic

This article dissects the 2019 Spring Festival (春运) campaign by 58.com, revealing how a win‑win design mindset, data‑driven insights, and integrated business collaboration transformed user experience, increased traffic, and delivered measurable results across multiple channels and game‑based interactions.

Design thinkingOperationsdata analysis

0 likes · 11 min read

How Winning Design Strategies Boosted Spring Festival Campaign Traffic

Architecture Digest

Apr 18, 2019 · Databases

MySQL High Performance Optimization Guidelines and Best Practices

This article presents a comprehensive set of MySQL high‑performance optimization guidelines, covering naming conventions, table design, data types, index strategies, SQL coding standards, replication, backup, and operational best practices to improve efficiency, reliability, and scalability of database systems.

Database DesignIndexingMySQL

0 likes · 19 min read

MySQL High Performance Optimization Guidelines and Best Practices

Efficient Ops

Apr 17, 2019 · Fundamentals

Mastering Scalable Web Architecture: From Front‑End to Data Center

An in‑depth guide walks through the essential layers of modern website architecture—including front‑end optimization, application frameworks, service distribution, storage solutions, backend processing, monitoring, security, and data‑center design—offering practical strategies for building high‑performance, scalable web systems.

FrontendOperationssecurity

0 likes · 11 min read

Mastering Scalable Web Architecture: From Front‑End to Data Center

ITPUB

Apr 15, 2019 · Operations

Essential Practices to Prevent Operational Failures and Boost System Availability

This guide outlines six practical strategies—rollback testing, cautious destructive actions, clear command prompts, verified backups, careful handovers, and proactive monitoring—to help operations teams minimize outages and maintain high system availability.

Change ManagementIncident PreventionOperations

0 likes · 6 min read

Essential Practices to Prevent Operational Failures and Boost System Availability

MaGe Linux Operations

Apr 14, 2019 · Operations

Mastering Load Balancing: When to Choose LVS, Nginx, or HAProxy

This article explains how modern internet systems use server clusters and load balancers, compares the three most popular software solutions—LVS, Nginx, and HAProxy—covers their architectures, NAT and DR modes, advantages, disadvantages, and provides guidance on selecting the right tool for different scale scenarios.

HAProxyLVSNginx

0 likes · 13 min read

Mastering Load Balancing: When to Choose LVS, Nginx, or HAProxy

NetEase Game Operations Platform

Apr 13, 2019 · Operations

Automating Service Discovery and Load Balancing with Consul, HAProxy, and Docker in a Microservices Architecture

This article explains how to transform a traditional monolithic deployment into a fully automated micro‑services environment by containerizing services, using Consul for dynamic service discovery and configuration, and configuring HAProxy with DNS resolvers to achieve seamless load balancing and zero‑downtime updates.

AutomationConsulDocker

0 likes · 12 min read

Automating Service Discovery and Load Balancing with Consul, HAProxy, and Docker in a Microservices Architecture

DevOps Cloud Academy

Apr 9, 2019 · Operations

Chapter 3: Managing Jenkins (Projects, Views, Plugins)

This guide explains Jenkins project management, covering naming conventions, creating new projects, configuring build history, parameterized builds, triggers, Jenkinsfile setup, as well as building, viewing logs, and debugging pipelines with illustrative screenshots.

CI/CDJenkinsOperations

0 likes · 2 min read

Chapter 3: Managing Jenkins (Projects, Views, Plugins)

Java High-Performance Architecture

Apr 9, 2019 · Operations

Mastering Load Balancing: Types, Algorithms, and Best Practices

This article outlines the three main load‑balancing methods—DNS, hardware, and software—detailing their advantages and drawbacks, then explains common algorithms such as round‑robin, weighted round‑robin, least‑connections, performance‑based, and hash, and provides guidance on combining them for optimal architecture.

Network ArchitectureOperationsalgorithms

0 likes · 5 min read

Mastering Load Balancing: Types, Algorithms, and Best Practices

360 Quality & Efficiency

Apr 4, 2019 · Operations

Understanding System Load Average and CPU Usage in Linux

This article explains the meaning of the Linux uptime/top output, defines system load average as the average number of runnable and uninterruptible processes, distinguishes it from CPU utilization, and provides guidance on interpreting load values for single‑core and multi‑core systems.

CPU usageLinuxOperations

0 likes · 8 min read

Understanding System Load Average and CPU Usage in Linux

58 Tech

Apr 4, 2019 · Operations

Redesign of the Signal System for Task Scheduling and Dependency Management

This article explains the shortcomings of the legacy signal design in a scheduling platform, outlines four major dependency problems, and presents a newly engineered signal system with modular functions, instance ID generation, competitive priority rules, and state management to reliably support complex cross‑period and parallel job dependencies.

OperationsTask schedulingpriority handling

0 likes · 9 min read

Redesign of the Signal System for Task Scheduling and Dependency Management

DevOps

Apr 3, 2019 · Operations

DevOps Transformation: Stories of Role Integration and Work Consolidation

The article examines real‑world DevOps transformation cases, illustrating how shifting operations staff into development teams can create both integration challenges and opportunities, and proposes a framework for distinguishing repeatable versus unique work to guide effective consolidation, standardization, and automation in software delivery.

OperationsTeam IntegrationWork Consolidation

0 likes · 10 min read

DevOps Transformation: Stories of Role Integration and Work Consolidation

Efficient Ops

Apr 1, 2019 · Operations

Beyond Linux: Mastering Modern Operations – From Deployment to Cloud

This article explores the full spectrum of modern operations, covering environment deployment, troubleshooting, backup, high availability, monitoring, security, automation, virtualization, and cloud services, while highlighting essential tools and best practices for both Linux and Windows environments.

AutomationCloudDeployment

0 likes · 8 min read

Beyond Linux: Mastering Modern Operations – From Deployment to Cloud

Efficient Ops

Mar 31, 2019 · Operations

How to Design Actionable Alerts and Effective Monitoring Strategies

This article explains why most alerts are poorly designed, defines actionable alerts, outlines monitoring objectives, discusses metric selection, and presents simple yet powerful algorithms for anomaly detection to improve system reliability and operational efficiency.

Anomaly DetectionObservabilityOperations

0 likes · 21 min read

How to Design Actionable Alerts and Effective Monitoring Strategies

Programmer DD

Mar 31, 2019 · Cloud Computing

10 Hard‑Earned AWS Lessons That Shape Modern Cloud Architecture

Reflecting on a decade of AWS, this article shares ten hard‑earned lessons—from building evolvable systems and anticipating failures to prioritizing security, automation, and open platforms—that guide the design, operation, and scaling of cloud services for today’s enterprises.

AWSAutomationCloud Computing

0 likes · 13 min read

10 Hard‑Earned AWS Lessons That Shape Modern Cloud Architecture

NetEase Game Operations Platform

Mar 30, 2019 · Operations

Understanding and Applying udev Rules for Linux Device Management

This article explains the evolution of Linux device management, introduces udev’s architecture and key features, details rule file syntax and operators, and provides practical examples—including disabling VXLAN offload, setting I/O scheduler, and enabling SR‑IOV—complete with command‑line snippets.

AutomationLinuxOperations

0 likes · 9 min read

Understanding and Applying udev Rules for Linux Device Management

Efficient Ops

Mar 28, 2019 · Information Security

How Leading Tech Companies Audit and Control Ops Permissions

This article explains how large enterprises such as BAT and banks implement strict auditing and supervision of operational privileges, using personal accounts, command logging, OSSEC monitoring, firewall limits, and cross‑team oversight to enforce the principle of least privilege.

ComplianceOperationsPrivilege Management

0 likes · 6 min read

How Leading Tech Companies Audit and Control Ops Permissions

Ctrip Technology

Mar 28, 2019 · Operations

Comprehensive Guide to Enterprise WiFi Planning, Deployment, and Operations – Practices from Ctrip

This article presents a detailed, practice‑driven guide for enterprise WiFi, covering network planning, full‑coverage design, channel optimization, security, KPI‑based monitoring, probe‑based measurement, troubleshooting techniques, and real‑world case studies from Ctrip, highlighting how systematic operations can ensure high‑quality wireless service.

Case StudyEnterpriseOperations

0 likes · 16 min read

Comprehensive Guide to Enterprise WiFi Planning, Deployment, and Operations – Practices from Ctrip

DevOps Cloud Academy

Mar 27, 2019 · Operations

Chapter 2 – Installing Jenkins

This guide details the prerequisites, multiple deployment methods (WAR, macOS, Windows, Linux), and post‑installation configuration steps for Jenkins, including unlocking the instance, installing plugins, creating an admin user, setting an update site, and configuring a slave node.

CI/CDInstallationJenkins

0 likes · 5 min read

Full-Stack Internet Architecture

Mar 25, 2019 · Operations

Useful Linux Command‑Line Tips to Boost Productivity

This article presents a collection of practical Linux command‑line shortcuts and techniques—including cursor navigation, history execution, disk and memory inspection, process management, multi‑command chaining, and file handling—that can significantly improve efficiency for developers and system administrators.

OperationsShellbash

0 likes · 12 min read

Useful Linux Command‑Line Tips to Boost Productivity

58 Tech

Mar 25, 2019 · Artificial Intelligence

Machine Learning‑Based Threshold‑Free Monitoring for Business Metrics

This article describes a monitoring system that leverages machine learning to perform threshold‑free, real‑time anomaly detection on macro business indicators such as network traffic and access volume, detailing its architecture, sample labeling, model training, and multi‑level alarm strategies.

AIAnomaly DetectionMachine Learning

0 likes · 7 min read

Machine Learning‑Based Threshold‑Free Monitoring for Business Metrics

58 Tech

Mar 25, 2019 · Operations

Alarm Convergence, Merging, and Self‑Healing in the 58 Monitoring Platform

The article describes how the 58 monitoring platform reduces alarm storms through alarm convergence, intelligent merging using Gini‑based decision trees, and automated self‑healing, thereby improving alert quality, cutting noise by about 70%, and helping engineers resolve incidents faster.

Operationsalarm convergencealert merging

0 likes · 9 min read

Alarm Convergence, Merging, and Self‑Healing in the 58 Monitoring Platform

Efficient Ops

Mar 24, 2019 · Databases

DBA Nightmares: Real‑World Incident Stories and Hard‑Earned Lessons

A collection of vivid DBA anecdotes reveals common pitfalls—from missed alerts and accidental production restarts to unsafe terminal habits—and distills practical safeguards that any database operator can adopt to avoid costly mishaps.

Best PracticesDBADatabases

0 likes · 5 min read

DBA Nightmares: Real‑World Incident Stories and Hard‑Earned Lessons

Tencent Cloud Developer

Mar 19, 2019 · Cloud Computing

Why Cloud Computing Is the Future Path for Operations Professionals

Ops engineers who embrace the cloud—leveraging serverless, Kubernetes, AI, edge and elastic resources—gain cost‑efficient scalability, avoid on‑premise limitations, and open career paths such as cloud reliability engineer, solution architect, integration specialist or technical operations manager, ensuring relevance in the dominant, irreversible cloud‑first future.

Operationscareer developmentsolution architecture

0 likes · 6 min read

Why Cloud Computing Is the Future Path for Operations Professionals

JD Tech

Mar 19, 2019 · R&D Management

Challenges and Proper Practices for Measuring Software R&D Efficiency

The article examines the difficulties of quantifying software development efficiency, critiques common metric approaches, and proposes a principled framework that emphasizes global, outcome‑oriented indicators across delivery efficiency, quality, and capability to guide systematic R&D performance improvement.

OperationsR&D efficiencydelivery capability

0 likes · 9 min read

Challenges and Proper Practices for Measuring Software R&D Efficiency

Continuous Delivery 2.0

Mar 19, 2019 · Operations

Key Metrics for Agile Teams: From Lead Time to Security Indicators

This article explains how software teams can select, combine, and interpret nine essential metrics—including lead time, cycle time, team velocity, defect rates, MTBF, MTTR, and security incident counts—to drive continuous improvement, align with business goals, and ultimately achieve successful outcomes.

AgileLead TimeOperations

0 likes · 12 min read

Key Metrics for Agile Teams: From Lead Time to Security Indicators

Alibaba Cloud Developer

Mar 18, 2019 · Operations

Alibaba Hema’s 7‑Layer Funnel & 23 Tactics for Ultra‑Fast Delivery Stability

The article outlines Alibaba’s Hema delivery platform’s end‑to‑end stability strategy, detailing a 7‑layer funnel review process, three core norms (development, architecture, stability), and 23 practical tactics—including core‑noncore isolation, proactive monitoring, fault prevention, rapid recovery, and service‑level controls—to ensure reliable 30‑minute deliveries despite complex logistics and external disruptions.

OperationsStabilityarchitecture

0 likes · 13 min read

Alibaba Hema’s 7‑Layer Funnel & 23 Tactics for Ultra‑Fast Delivery Stability

21CTO

Mar 17, 2019 · Operations

Mastering Forward and Reverse Proxies: When and How to Configure Nginx & Apache

This article explains the concepts of forward and reverse proxy servers, why organizations use them, and provides step‑by‑step configuration examples for both Nginx and Apache to help engineers enforce network policies and secure service access.

NginxOperationsapache

0 likes · 5 min read

Mastering Forward and Reverse Proxies: When and How to Configure Nginx & Apache

Architects' Tech Alliance

Mar 17, 2019 · Operations

The Push Toward 400G Data Center Networking: Technologies, Market Drivers, and Future Outlook

This article examines how data center operators and the supply chain are advancing toward 400 Gbps Ethernet, detailing the technical innovations, market forces, and future challenges that shape high‑speed networking, optical modules, and ASIC development for ultra‑large scale data centers.

400GASICData Center

0 likes · 12 min read

The Push Toward 400G Data Center Networking: Technologies, Market Drivers, and Future Outlook

Efficient Ops

Mar 14, 2019 · Operations

9 Essential Logging Best Practices to Boost System Performance

This article presents nine practical logging best‑practice recommendations—from understanding human and machine audiences and standardizing log formats to leveraging metrics, proper alerting, severity levels, contextual information, and advanced framework features—helping operations teams improve system performance and troubleshooting efficiency.

Best PracticesLoggingObservability

0 likes · 11 min read

9 Essential Logging Best Practices to Boost System Performance

Efficient Ops

Mar 14, 2019 · Operations

Why IT Operations Must Evolve: From Cost Center to Strategic Asset

The article examines how rapid cloud adoption, AI‑ops, and DevOps blur traditional IT operations roles, arguing that ops must shift from a low‑value cost center to a profit‑generating, efficiency‑driving function through mindset change, institutional innovation, expanded responsibilities, modern tools, and continuous skill upgrades.

Cloud ComputingIT OperationsOperations

0 likes · 15 min read

Why IT Operations Must Evolve: From Cost Center to Strategic Asset

JD Tech

Mar 14, 2019 · Operations

Understanding Server Clustering and Load Balancing: LVS, Nginx, and HAProxy

This article explains server clustering and load‑balancing concepts, detailing the architecture and operation of LVS, Nginx, and HAProxy, and compares their advantages, disadvantages, and typical deployment scenarios; it also discusses NAT and DR modes, load‑balancer placement, and best‑practice recommendations for different traffic volumes.

HAProxyLVSNetwork Architecture

0 likes · 12 min read

Understanding Server Clustering and Load Balancing: LVS, Nginx, and HAProxy

JD Tech

Mar 13, 2019 · Operations

Evolution of JD Digital Technology’s Host Monitoring System “DiTing”: From V1 to V3

The article chronicles the design, evolution, and lessons learned of JD Digital Technology’s self‑built host monitoring platform “DiTing”, detailing its initial requirements, V1 architecture, subsequent V2 and V3 redesigns, encountered challenges, and future directions toward intelligent operations.

Big DataOperationsmonitoring

0 likes · 12 min read

Evolution of JD Digital Technology’s Host Monitoring System “DiTing”: From V1 to V3

58 Tech

Mar 12, 2019 · Operations

Overview of the Octopus Automation Platform Architecture and Core Modules

The article introduces Octopus, the core automation service of 58 Group, detailing its overall architecture, the Octopus Agent lifecycle, communication mechanisms, management center capabilities, and key functional modules such as server information collection, command execution, deployment, permission control, and file transfer.

APIAgentAutomation

0 likes · 11 min read

Overview of the Octopus Automation Platform Architecture and Core Modules

21CTO

Mar 11, 2019 · Operations

Why the US Navy’s Aegis UI Chooses 2D Over 3D – Lessons for High‑Stakes Interface Design

The article dissects the Aegis combat system’s dual‑screen UI, explaining why a simple 2D top‑down map paired with a side view outperforms flashy 3D graphics, how multi‑target blocks replace tables for faster decision‑making, and how human‑factor testing, eye‑tracking and standardized symbols dramatically improve combat efficiency.

AegisOperationshuman factors

0 likes · 14 min read

Why the US Navy’s Aegis UI Chooses 2D Over 3D – Lessons for High‑Stakes Interface Design

JD Tech Talk

Mar 11, 2019 · Operations

Evolution of JD Digital Technology’s Host Monitoring System “Diting”: Architecture from V1 to V3

The article chronicles the design, implementation, and iterative evolution of JD Digital Technology’s in‑house host monitoring platform Diting, detailing its V1, V2, and V3 architectures, the challenges encountered at each stage, and future directions toward intelligent, automated operations.

AlertingBig DataOperations

0 likes · 14 min read

Evolution of JD Digital Technology’s Host Monitoring System “Diting”: Architecture from V1 to V3

Efficient Ops

Mar 10, 2019 · Operations

Why Operations Won’t Die: A Veteran’s Perspective

A seasoned operations professional argues that despite sensational claims, the ops function remains essential—driven by its core responsibilities of quality, cost, efficiency, and security, evolving with cloud computing, DevOps, and emerging IoT demands.

Cloud ComputingIT infrastructureOperations

0 likes · 11 min read

Why Operations Won’t Die: A Veteran’s Perspective

MaGe Linux Operations

Mar 8, 2019 · Operations

Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling

This article explains the principles and components of high‑availability (HA) clusters, covering active/standby nodes, resource stickiness and constraints, heartbeat and quorum mechanisms, split‑brain avoidance, failure detection methods, and the minimal setup required for a reliable web‑service HA deployment.

ClusteringHigh AvailabilityOperations

0 likes · 14 min read

Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling

DevOps

Mar 7, 2019 · Operations

The Illusion of Tool‑Stacked DevOps and the Need for a True DevOps Culture

This article examines how DevOps has been reduced to a collection of automation tools, critiques the resulting "same‑bed‑different‑dreams" separation of development and operations, and outlines the cultural principles—shared responsibility, trust, autonomy, built‑in quality, feedback, and automation—necessary for a genuine DevOps transformation.

AutomationCultureOperations

0 likes · 12 min read

The Illusion of Tool‑Stacked DevOps and the Need for a True DevOps Culture

Efficient Ops

Mar 7, 2019 · Operations

Why Operations Won’t Die: The Real Role of Ops in the Cloud Era

The article argues that operations will not disappear, explaining its essential functions—quality, cost, efficiency, and security—how cloud computing reshapes the role, the evolution toward DevOps, and why both cloud outages and industry trends actually underscore ops’ enduring importance.

AutomationCloud ComputingOperations

0 likes · 11 min read

Why Operations Won’t Die: The Real Role of Ops in the Cloud Era

MaGe Linux Operations

Mar 7, 2019 · Operations

What a Mistake Taught Me About Operations: Lessons from My First Day

A personal account of switching jobs and the first day on an operations team, detailing a critical menu‑click error, the rapid response, and the deeper reflections on risk awareness, teamwork, and professional growth in high‑stakes production environments.

Incident ManagementOperationscareer transition

0 likes · 7 min read

What a Mistake Taught Me About Operations: Lessons from My First Day

Efficient Ops

Mar 6, 2019 · Databases

How NetEase Built an Automated DBA Platform with AIOps for Massive Scale

This article details NetEase's journey in designing and implementing a large‑scale database automation platform, covering its requirements, tool‑based operations, architecture, AIOps integration, and the practical lessons learned for managing thousands of database clusters efficiently.

AIOpsDatabase AutomationOperations

0 likes · 20 min read

How NetEase Built an Automated DBA Platform with AIOps for Massive Scale

MaGe Linux Operations

Mar 6, 2019 · Operations

Master Essential Linux Shell Scripts for System Monitoring and Automation

This guide presents practical Bash scripting techniques—including precautions, random string generation, color output functions, bulk user creation, package checks, service status verification, host liveness testing, resource monitoring, disk usage audits, and website availability checks—to help you automate Linux system administration tasks effectively.

AutomationOperationsShell Scripting

0 likes · 5 min read

Master Essential Linux Shell Scripts for System Monitoring and Automation

Efficient Ops

Mar 5, 2019 · Operations

Should Ops Professionals Pivot? Navigating Career Paths in the Age of Cloud and DevOps

This article reflects on the misconceptions of IT operations roles, examines why many view ops as a low‑value job, and offers a practical roadmap—automation, DevOps, cloud services, and Python—to help ops engineers future‑proof their careers amid evolving industry trends.

AutomationCareerCloud Computing

0 likes · 9 min read

DevOps

Mar 3, 2019 · Operations

The Evolution of DevOps: From Early Computing to Agile Software Development

This article traces the historical development of DevOps from the early days of self‑developed and self‑maintained computer programs, through the rise of professional developers and operations engineers, to the modern agile era where development and operations must collaborate to meet rapid market changes.

IT OperationsOperationsdevops

0 likes · 13 min read

DevOps

Feb 27, 2019 · Operations

A Historical Overview of DevOps: From a Belgian Consultant to a Global Movement

This article traces the evolution of DevOps from Patrick Debois' 2007 frustrations as a Belgian IT consultant through key conferences, blogs, and publications that shaped the DevOps movement, highlighting its roots in Agile practices and the convergence of development and operations.

Operationscontinuous deliverydevops

0 likes · 9 min read

A Historical Overview of DevOps: From a Belgian Consultant to a Global Movement

Efficient Ops

Feb 27, 2019 · Operations

Master Linux System Monitoring: Essential Commands & Smem Tips

This guide walks you through Linux command‑line shortcuts, the five key system‑operation metrics, and powerful tools like smem, ps, and sort to efficiently monitor CPU, memory, processes, disks, and network while also handling zombie processes.

LinuxOperationsSystem Monitoring

0 likes · 8 min read

Master Linux System Monitoring: Essential Commands & Smem Tips

DevOps

Feb 26, 2019 · Operations

Planning a DevOps Infrastructure for Traditional Enterprises: Capabilities and Tool Mapping

This article analyzes the essential capabilities required for building a DevOps infrastructure in traditional enterprises across foundation, development, testing, operations, and project management, mapping each capability to representative tools and offering guidance on flexible, evolving architecture design.

Operationsdevopsinfrastructure

0 likes · 12 min read

Planning a DevOps Infrastructure for Traditional Enterprises: Capabilities and Tool Mapping

AntTech

Feb 22, 2019 · Operations

Technical Risk Prevention Platform: Building Fault Immunity for Financial Transaction Systems

The article outlines Ant Financial's technical risk prevention platform, describing the challenges of financial‑grade distributed architectures, the multi‑layer risk assurance system, the TRaaS platform's risk baseline, handling, and change‑control mechanisms, and how these practices empower partners to achieve high‑availability and secure financial services.

OperationsPlatform Engineeringfinancial technology

0 likes · 13 min read

Technical Risk Prevention Platform: Building Fault Immunity for Financial Transaction Systems

360 Tech Engineering

Feb 20, 2019 · Databases

Pika Best Practices: 30 Tips for Optimizing the RocksDB‑Based Redis‑Compatible Storage

This article presents thirty practical recommendations for deploying, configuring, and maintaining Pika—a high‑capacity, RocksDB‑backed Redis‑compatible storage system—covering version selection, thread settings, hardware choices, key design, memory management, replication, backup, compaction, security, and monitoring to achieve reliable and high‑performance operation.

Database TuningOperationsPika

0 likes · 16 min read

Pika Best Practices: 30 Tips for Optimizing the RocksDB‑Based Redis‑Compatible Storage

Efficient Ops

Feb 19, 2019 · Operations

Turning Middleware Pain into Power: Practical Ops Strategies for Financial Systems

This talk reveals why middleware operations in financial institutions feel especially painful, examines the specific cost, autonomy, and reliability challenges, and outlines a step‑by‑step evolution toward tool‑driven platforms, hybrid‑cloud deployment, and AIOps that reduce manual toil and improve system resilience.

AIOpsCloudMiddleware

0 likes · 20 min read

Turning Middleware Pain into Power: Practical Ops Strategies for Financial Systems

Qunar Tech Salon

Feb 19, 2019 · Operations

Forbidden City Night Festival Ticketing Chaos and How to Recover a Crashed Website

The article recounts the Forbidden City’s first night‑time Lantern Festival event, the overwhelming demand that caused the museum’s ticketing website to crash, and includes an interview with a senior operations engineer who explains the causes of such overloads and outlines rapid mitigation and scaling strategies.

Operationsscalingsystem reliability

0 likes · 6 min read

Forbidden City Night Festival Ticketing Chaos and How to Recover a Crashed Website