Tagged articles
3281 articles
Page 22 of 33
Yanxuan Tech Team
Yanxuan Tech Team
Jun 15, 2020 · Operations

How Yanxuan Built a Scalable Third‑Party Warehouse & Delivery System

This article explains how Yanxuan created a fully third‑party warehousing and distribution platform that supports domestic and international e‑commerce scenarios, outlines its core business, data and branding capabilities, tackles key operational challenges, and details the system’s evolution from initial setup to automated operations.

LogisticsOperationsdelivery
0 likes · 14 min read
How Yanxuan Built a Scalable Third‑Party Warehouse & Delivery System
Liangxu Linux
Liangxu Linux
Jun 13, 2020 · Operations

Mastering Monitoring: From Basics to Advanced Zabbix Practices

This comprehensive guide explains why monitoring is essential for operations, outlines monitoring goals and methods, reviews a wide range of open‑source tools, details a Zabbix‑based workflow, enumerates key metrics across hardware, system, application, network, security and business layers, and offers practical alerting and interview tips.

AlertingOperationsZabbix
0 likes · 21 min read
Mastering Monitoring: From Basics to Advanced Zabbix Practices
JD Retail Technology
JD Retail Technology
Jun 11, 2020 · Operations

How JD Health Engineered System Stability for the 618 Mega‑Sale

Facing unprecedented traffic during the 2020 618 shopping festival, JD Health’s product R&D team implemented comprehensive rehearsals, stress testing, architecture reviews, dual‑channel risk controls, and 24‑hour monitoring to ensure system stability and rapid response for its health‑care e‑commerce platforms.

618 promotionJD HealthOperations
0 likes · 5 min read
How JD Health Engineered System Stability for the 618 Mega‑Sale
TAL Education Technology
TAL Education Technology
Jun 11, 2020 · Big Data

Data Quality Monitoring: Standards, Practices, and Technical Solutions

This article outlines the importance of data quality in the big‑data era, defines evaluation criteria such as integrity, accuracy, consistency and timeliness, describes daily monitoring and reconciliation processes, and proposes technical solutions and challenges for building a comprehensive data‑quality monitoring platform.

Data GovernanceData QualityOperations
0 likes · 7 min read
Data Quality Monitoring: Standards, Practices, and Technical Solutions
JD Retail Technology
JD Retail Technology
Jun 10, 2020 · Operations

Logistics R&D Preparation for the 618 Promotion: System Readiness, Stress Testing, and Real‑Time Monitoring

The logistics R&D team spent 62 days preparing for the 618 promotion by analyzing core processes, applying stress tests, implementing fault‑tolerant architectures, planning capacity, and deploying real‑time monitoring tools to ensure system stability and performance under peak traffic.

OperationsPerformance TestingSystem Design
0 likes · 7 min read
Logistics R&D Preparation for the 618 Promotion: System Readiness, Stress Testing, and Real‑Time Monitoring
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jun 10, 2020 · Operations

Understanding the “Magic Number” in Sales Operations: Common Pitfalls and Correct Approaches

Sales teams often chase arbitrary “magic numbers” like 100 calls or 100 minutes, mistaking correlation for causation; this article explains the original meaning of magic numbers in user retention analysis, highlights statistical errors, and outlines a data‑driven four‑step process to identify true performance drivers and actionable strategies.

Data-drivenOperationsbusiness metrics
0 likes · 9 min read
Understanding the “Magic Number” in Sales Operations: Common Pitfalls and Correct Approaches
FunTester
FunTester
Jun 9, 2020 · Operations

How to Quickly and Accurately Locate Online Bugs: A Practical Guide for Testers

This article outlines a systematic, step‑by‑step approach for testers to gather evidence, reproduce, and diagnose production bugs on mobile and web platforms, emphasizing thorough data collection, environment checks, and collaborative debugging to improve response speed and reliability.

Operationsbug trackingmobile testing
0 likes · 6 min read
How to Quickly and Accurately Locate Online Bugs: A Practical Guide for Testers
Liangxu Linux
Liangxu Linux
Jun 7, 2020 · Operations

How to Diagnose Linux Server Performance in the First 60 Seconds

When you log into a Linux server for performance troubleshooting, Netflix’s engineering team shows that running ten standard command‑line tools within the first minute gives a comprehensive view of system load, resource saturation, errors, and bottlenecks, enabling rapid root‑cause analysis.

OperationsPerformance Monitoringcommand-line
0 likes · 21 min read
How to Diagnose Linux Server Performance in the First 60 Seconds
DevOps Cloud Academy
DevOps Cloud Academy
Jun 7, 2020 · Operations

Enabling SSL for Jenkins with a Self‑Signed Certificate

This guide walks through generating a self‑signed SSL certificate using OpenSSL, converting it to PKCS12 and JKS formats, placing the keystore on the Jenkins server, updating Jenkins configuration for HTTPS, and testing the secure connection.

DevOpsJenkinsOperations
0 likes · 5 min read
Enabling SSL for Jenkins with a Self‑Signed Certificate
dbaplus Community
dbaplus Community
Jun 6, 2020 · Operations

How to Seamlessly Migrate Elasticsearch from Cloud to On‑Premises Without Downtime

This article walks through a practical, step‑by‑step migration of an Elasticsearch cluster from a public‑cloud environment to a self‑hosted data‑center, covering strategy, configuration changes, node role separation, manual data transfer, and post‑migration re‑enabling of automatic balancing to ensure a smooth, low‑impact transition.

Cluster MigrationElasticsearchOperations
0 likes · 16 min read
How to Seamlessly Migrate Elasticsearch from Cloud to On‑Premises Without Downtime
Beike Product & Technology
Beike Product & Technology
Jun 5, 2020 · Cloud Native

Fundamentals of Microservice Architecture: Service Splitting, Registration, Load Balancing, Rate Limiting, and Circuit Breaking

This article provides a comprehensive introduction to microservice architecture, covering service decomposition, registration and discovery methods, client‑driven load balancing, rate‑limiting and circuit‑breaking strategies, and the design of a self‑built application delivery platform for cloud‑native environments.

Circuit BreakingCloud NativeOperations
0 likes · 23 min read
Fundamentals of Microservice Architecture: Service Splitting, Registration, Load Balancing, Rate Limiting, and Circuit Breaking
Efficient Ops
Efficient Ops
Jun 4, 2020 · Operations

2020 Ops Insights: Salaries, Cloud Security Rankings, and Market Trends

The article compiles 2020 industry data, revealing programmer salary averages, Alibaba Cloud's second‑place global security rating, DB‑Engines database popularity, IDC's cloud services market growth, Baidu's accelerated cloud center construction, a dip in global Ethernet switch revenue, and China Mobile's massive data‑center investment.

Information SecurityOperationscloud computing
0 likes · 8 min read
2020 Ops Insights: Salaries, Cloud Security Rankings, and Market Trends
Meituan Technology Team
Meituan Technology Team
Jun 4, 2020 · Databases

Meituan MySQL Database Inspection System Architecture and Design

Meituan’s MySQL database inspection system uses a three‑layer architecture—execution agents managed by Crane, a metadata‑rich inspection database, and an integrated application UI—to run 64 automated checks, resolve over 8,000 hazards with sub‑four‑day remediation, and continuously improve automation and analytics.

Database InspectionOperationsSystem Architecture
0 likes · 11 min read
Meituan MySQL Database Inspection System Architecture and Design
Ctrip Technology
Ctrip Technology
Jun 4, 2020 · Operations

Efficient Online Performance Testing Using a Mirror Cluster Self‑Service Platform

This article describes how Ctrip’s senior testing manager designed a self‑service online performance testing solution based on a Mirror cluster, detailing its architecture, implementation steps, safety measures, result aggregation, current limitations, and overall impact on testing efficiency and reliability.

Operationsmirror clusteronline load testing
0 likes · 9 min read
Efficient Online Performance Testing Using a Mirror Cluster Self‑Service Platform
DevOps
DevOps
Jun 3, 2020 · Operations

DevOps Guiding Principles Framework and the Three‑Step Implementation Method

This article explains the core DevOps philosophy—including Lean, Agile, CI, CD and Continuous Delivery—describes its five‑point framework of culture, automation, lean‑agile core, measurement and sharing, and details a three‑step implementation method of fast flow, fast feedback, and continuous learning with practical practices and examples.

Continuous DeliveryDevOpsLean
0 likes · 16 min read
DevOps Guiding Principles Framework and the Three‑Step Implementation Method
Yanxuan Tech Team
Yanxuan Tech Team
Jun 1, 2020 · Operations

Why Simulation Is Essential for E‑Commerce Supply Chain Optimization

Simulation enables low‑cost, time‑agnostic testing of e‑commerce supply‑chain strategies—covering procurement, inter‑warehouse allocation, and order dispatch—by combining precedent, experimentation, and modeling, offering a more flexible and comprehensive alternative to AB‑testing for evaluating policies, algorithms, and configurations.

Operationse‑commerceoptimization
0 likes · 15 min read
Why Simulation Is Essential for E‑Commerce Supply Chain Optimization
Zhongtong Tech
Zhongtong Tech
Jun 1, 2020 · Backend Development

How ZTO Express Built ZMS: A Scalable Cloud‑Native Message Middleware Platform

ZMS is ZTO Express's cloud‑native message middleware platform built on RocketMQ and Kafka that automates deployment, provides a unified SDK, supports multi‑datacenter operation, and offers comprehensive monitoring, enabling seamless scaling and fault‑tolerant messaging for billions of daily events.

Message MiddlewareOperationsRocketMQ
0 likes · 8 min read
How ZTO Express Built ZMS: A Scalable Cloud‑Native Message Middleware Platform
Architecture Digest
Architecture Digest
May 22, 2020 · Operations

A Step‑by‑Step Debugging Journey of Data Drop After a Feature Release

The article recounts a detailed troubleshooting process—including data verification, code review, DBA assistance, local debugging, environment comparison, logging, packet capture, service restarts, async‑to‑sync changes, load testing, and Kafka partition tuning—that ultimately identified a Kafka partition bottleneck as the cause of a sudden data‑volume decline after a new feature went live.

Operationsasync‑syncdebugging
0 likes · 8 min read
A Step‑by‑Step Debugging Journey of Data Drop After a Feature Release
Efficient Ops
Efficient Ops
May 21, 2020 · Operations

What’s Shaping Ops Today? Women, Security Breaches, Cloud Deals & Tool Updates

This roundup covers why ops teams should hire more women, the low female representation in Chinese IT operations, a high‑scoring Python exam, legal cases of source‑code theft and data‑breach, a major cloud procurement in Zhejiang, plus the latest releases of Grafana 7.0 and Redis 6.0.3, and Zoom’s service change in China.

NewsOperationscloud
0 likes · 9 min read
What’s Shaping Ops Today? Women, Security Breaches, Cloud Deals & Tool Updates
Efficient Ops
Efficient Ops
May 17, 2020 · Operations

How EMonitor Outperforms CAT: Deep Dive into Modern Monitoring Architecture

EMonitor, Meituan’s unified monitoring platform, extends CAT’s concepts with real‑time 10‑second aggregation, richer metric types, advanced dashboards, and seamless integration across IaaS, PaaS, and application layers, illustrating the evolution from log‑based monitoring to a comprehensive, proactive observability system.

CATEMonitorOperations
0 likes · 15 min read
How EMonitor Outperforms CAT: Deep Dive into Modern Monitoring Architecture
转转QA
转转QA
May 13, 2020 · Operations

QA Transformation: Applying HTTP DIFF and Visual UI Automation to Operational and Order‑Related Requirements

This article describes how the QA team at ZuanZuan YouPin shifted from traditional functional testing to an assisted model by introducing HTTP DIFF for short‑flow operational features and visual UI automation for dynamic pages, as well as data‑construction and online order inspection techniques for complex order‑related scenarios.

HTTP DIFFOperationsQA
0 likes · 7 min read
QA Transformation: Applying HTTP DIFF and Visual UI Automation to Operational and Order‑Related Requirements
Programmer DD
Programmer DD
May 12, 2020 · Operations

Boost RabbitMQ Reliability: Proven Strategies for Producers, Consumers, and Ops

This comprehensive guide explains how to enhance RabbitMQ reliability by covering confirmation mechanisms, producer and consumer configurations, queue mirroring, alerting, monitoring metrics, and health‑check commands, providing actionable steps for developers and operations teams to ensure stable message delivery.

Message QueueOperationsRabbitMQ
0 likes · 22 min read
Boost RabbitMQ Reliability: Proven Strategies for Producers, Consumers, and Ops
FunTester
FunTester
May 8, 2020 · Operations

How to Use Arthas monitor to Track Java Method Performance and Latency

This article explains how the open‑source Java diagnostic tool Arthas can monitor method execution with the monitor command, describes each monitoring metric, shows how to configure the sampling interval, and demonstrates the impact on response time using a concrete code example.

ArthasBackendMethod Profiling
0 likes · 4 min read
How to Use Arthas monitor to Track Java Method Performance and Latency
DevOps Cloud Academy
DevOps Cloud Academy
May 5, 2020 · Operations

GitLab CI/CD Practices: From Traditional Release Model to Continuous Integration, Delivery, and Tool Comparison

This article explains the drawbacks of the traditional software release process, introduces continuous integration, delivery, and deployment concepts, compares GitLab CI/CD with Jenkins, and outlines the architecture, configuration files, and advantages of using GitLab’s built‑in CI/CD platform.

Continuous DeliveryDevOpsGitLab CI
0 likes · 12 min read
GitLab CI/CD Practices: From Traditional Release Model to Continuous Integration, Delivery, and Tool Comparison
ITPUB
ITPUB
May 3, 2020 · Operations

Mastering IT Monitoring: Goals, Methods, Tools, and Best Practices

This comprehensive guide explains why monitoring is essential for reliable operations, outlines clear monitoring objectives, walks through practical monitoring methods, compares popular open‑source tools, details a Zabbix‑based workflow, and lists key hardware, system, application, network, security, API, performance, and business metrics to track.

IT infrastructureOperationsZabbix
0 likes · 19 min read
Mastering IT Monitoring: Goals, Methods, Tools, and Best Practices
Laravel Tech Community
Laravel Tech Community
May 2, 2020 · Operations

Comprehensive MySQL and Linux Operations Interview Guide

This guide compiles essential MySQL security steps, master‑slave replication principles, backup scripts, Linux boot overview, common port services, virus mitigation, monitoring tools, nginx optimization, InnoDB lock troubleshooting, replication lag reduction, high‑availability components, data migration utilities, and automation configuration management techniques for operations engineers.

LinuxOperationsautomation
0 likes · 13 min read
Comprehensive MySQL and Linux Operations Interview Guide
Qunhe Technology Quality Tech
Qunhe Technology Quality Tech
Apr 29, 2020 · Operations

How Our Team Built a Stable SIT Environment: Lessons in Test Environment Governance

This article documents the step‑by‑step practices of a six‑person test‑environment availability team that unified middleware, streamlined deployment pipelines, piloted business usage, introduced monitoring and recovery mechanisms, and created a comprehensive SIT environment handbook to improve integration testing stability and operational efficiency.

DeploymentOperationsSIT
0 likes · 19 min read
How Our Team Built a Stable SIT Environment: Lessons in Test Environment Governance
FunTester
FunTester
Apr 29, 2020 · Operations

Interface Performance Testing Resources

This page compiles a comprehensive collection of Chinese-language articles and tutorials on interface performance testing, covering tools such as netdata and timewatch, load‑testing strategies for various APIs, JVM heap dump extraction, asynchronous request measurement, and best practices for reducing test errors.

BackendOperationsinterface testing
0 likes · 8 min read
Interface Performance Testing Resources
DevOps Cloud Academy
DevOps Cloud Academy
Apr 28, 2020 · Operations

Understanding CI/CD: Traditional Release Model, Continuous Integration, Delivery, and Deployment with GitLab and Jenkins

This article explains the shortcomings of the traditional software release process, introduces continuous integration, delivery, and deployment concepts, compares GitLab CI and Jenkins features, and outlines the advantages, architecture, and practical usage of CI/CD pipelines for DevOps teams.

Continuous DeliveryDevOpsGitLab
0 likes · 12 min read
Understanding CI/CD: Traditional Release Model, Continuous Integration, Delivery, and Deployment with GitLab and Jenkins
Qunhe Technology Quality Tech
Qunhe Technology Quality Tech
Apr 22, 2020 · Product Management

How a Smart Ticketing System Transformed User Feedback for a Cloud Design Platform

This article details the design, architecture, and operational results of a listening platform and smart ticketing system that automatically captures key user feedback, streamlines processing, and improves response times, showcasing a data‑driven approach to product management and operations.

Operationsprocess automationproduct-management
0 likes · 11 min read
How a Smart Ticketing System Transformed User Feedback for a Cloud Design Platform
FunTester
FunTester
Apr 20, 2020 · Operations

Quick‑Start Guide to Arthas: Debugging Java Applications in Minutes

Learn how to install and launch Alibaba’s open‑source Arthas tool, explore its dashboard, run essential commands like thread and watch, and see a practical Java demo, all in a concise step‑by‑step tutorial that gets you debugging Java processes fast.

ArthasOperationsTutorial
0 likes · 3 min read
Quick‑Start Guide to Arthas: Debugging Java Applications in Minutes
FunTester
FunTester
Apr 19, 2020 · Operations

How to Load Test Phone Number Binding with Dynamic UID‑Based Numbers

This article walks through the challenges of load‑testing a phone‑binding feature that swaps between two number prefixes while preserving the original UID‑derived number, detailing validation rules, a configurable solution, test design, and the full Groovy‑based load‑test script.

API automationGroovyLoad Testing
0 likes · 7 min read
How to Load Test Phone Number Binding with Dynamic UID‑Based Numbers
MaGe Linux Operations
MaGe Linux Operations
Apr 17, 2020 · Operations

Mastering System Monitoring: Goals, Methods, Tools, and Best Practices

This comprehensive guide explains why monitoring is vital for operations, outlines monitoring objectives, methods, core processes, and a detailed overview of open‑source and commercial tools—including Zabbix, Open‑Falcon, and MRTG—while covering metrics, alert handling, and interview preparation for effective system monitoring.

OperationsZabbixsystem metrics
0 likes · 19 min read
Mastering System Monitoring: Goals, Methods, Tools, and Best Practices
Open Source Linux
Open Source Linux
Apr 16, 2020 · Operations

Essential Linux Server Hardening: 10 Steps to Optimize After Installation

This guide walks you through ten practical steps—including switching to local yum mirrors, installing key packages, disabling SELinux and the firewall, trimming startup services, tightening SSH settings, syncing time, raising file descriptor limits, and disabling ping—to boost the performance and security of a freshly installed Linux server.

LinuxOperationssecurity
0 likes · 7 min read
Essential Linux Server Hardening: 10 Steps to Optimize After Installation
dbaplus Community
dbaplus Community
Apr 15, 2020 · Operations

How to Diagnose and Fix a Dual‑Leader ZooKeeper Cluster

This article walks through a real‑world ZooKeeper incident where a five‑node cluster showed two leaders, explains the election rules, analyzes log and configuration mismatches, assesses business impact, and provides a step‑by‑step recovery plan to restore normal service without data loss.

ClusterOperationsZooKeeper
0 likes · 10 min read
How to Diagnose and Fix a Dual‑Leader ZooKeeper Cluster
FunTester
FunTester
Apr 14, 2020 · Operations

Spot Performance Problems Without Writing a Single Line of Code

Experienced developers can often identify performance bottlenecks simply by reviewing code implementations, configuration settings such as timeouts, intervals, database and Redis parameters, as well as service monitoring data, container and JVM configurations, allowing them to avoid unnecessary test scripts and code changes.

ConfigurationDevOpsOperations
0 likes · 2 min read
Spot Performance Problems Without Writing a Single Line of Code
Liangxu Linux
Liangxu Linux
Apr 13, 2020 · Operations

How to Prevent Accidental Deletion in Linux Shell Scripts

This article explains common Linux shell pitfalls—empty variables, spaces in paths, special characters, and failed cd commands—that can cause accidental file deletion, and provides concrete code examples and best‑practice solutions to avoid such disasters.

OperationsSafety
0 likes · 5 min read
How to Prevent Accidental Deletion in Linux Shell Scripts
Continuous Delivery 2.0
Continuous Delivery 2.0
Apr 13, 2020 · Operations

Facebook Configuration Management: Practices, Statistics, and Cultural Insights

This article summarizes Facebook's holistic configuration management practices, presenting cultural influences, storage growth, size distribution, update frequency, change magnitude, and author collaboration statistics, while linking to a series of translated articles that explore tools such as Configerator, GateKeeper, and MobileConfig.

Configuration ManagementOperationsTooling
0 likes · 10 min read
Facebook Configuration Management: Practices, Statistics, and Cultural Insights
Efficient Ops
Efficient Ops
Apr 12, 2020 · Operations

Master Incident Management: Definitions, Processes, and Best Practices

This guide explains fault management fundamentals—from ITIL‑based definitions and why it matters, to fault level classification, monitoring, emergency response, recovery, post‑mortem analysis, continuous improvement, and practical advice for practitioners—providing a comprehensive, actionable framework for reliable operations.

Continuous ImprovementITILOperations
0 likes · 11 min read
Master Incident Management: Definitions, Processes, and Best Practices
DevOps
DevOps
Apr 10, 2020 · Operations

Spotify’s Scaled Agile Framework: Organizational Structure and Practices

The article examines Spotify’s scaled agile model, detailing its organizational units—Squads, Tribes, Chapters, and Guilds—along with their characteristics, governance, dependency management, and comparison to other large‑scale agile frameworks such as SAFe, LeSS, and Scrum@Scale.

DevOpsOperationsScaled Agile
0 likes · 18 min read
Spotify’s Scaled Agile Framework: Organizational Structure and Practices
DevOps Cloud Academy
DevOps Cloud Academy
Apr 9, 2020 · Operations

Why DevOps Is Essential for Modern IT Operations

The article explains how traditional IT silos hinder rapid incident response, outlines common symptoms of poorly managed applications, and argues that adopting DevOps—supported by cloud‑native infrastructure, automation, and shared responsibility—delivers higher transparency, employee autonomy, operational quality, and customer satisfaction.

DevOpsIT CultureOperations
0 likes · 7 min read
Why DevOps Is Essential for Modern IT Operations
High Availability Architecture
High Availability Architecture
Apr 8, 2020 · Operations

Slack's Deployment Process: Balancing Speed and Reliability

This article explains how Slack’s engineering team designs a multi‑stage deployment pipeline—including release branches, staging, dogfood, canary, and percentage rollouts—while emphasizing rapid iteration, visibility, and reliability through fast and atomic deployment mechanisms.

DeploymentOperationsReliability
0 likes · 8 min read
Slack's Deployment Process: Balancing Speed and Reliability
Architects Research Society
Architects Research Society
Apr 5, 2020 · Cloud Native

Lessons from Google, eBay, and Amazon on Large‑Scale Multi‑Language Microservice Architecture

The article examines how Google, eBay, Twitter and Amazon evolved their massive systems into multi‑language microservice ecosystems, highlighting the organic growth of services, incentive‑driven design, standards emergence, service ownership, operational practices, and anti‑patterns for building and scaling cloud‑native architectures.

MicroservicesOperationsScalability
0 likes · 20 min read
Lessons from Google, eBay, and Amazon on Large‑Scale Multi‑Language Microservice Architecture
Java Captain
Java Captain
Apr 1, 2020 · Operations

Comprehensive Guide to Online Environment Deployment and Operations Practices

This article provides a thorough overview of planning, provisioning, and managing online production environments—including user sizing, bandwidth estimation, database design, OS versus container deployment, middleware selection, security, monitoring, SSH shortcuts, file transfer tools, automation scripts, Docker setup, and log viewing techniques—aimed at giving developers a complete operational perspective.

DeploymentDockerOperations
0 likes · 16 min read
Comprehensive Guide to Online Environment Deployment and Operations Practices
360 Quality & Efficiency
360 Quality & Efficiency
Mar 31, 2020 · Operations

Using Supervisor for Process Management on Linux: Installation, Configuration, and Practical Example

This article explains why nohup cannot monitor scripts, introduces Supervisor as a Python‑based process monitor, shows how to install it on CentOS, Ubuntu, and via pip, details the supervisord.conf and program .ini configurations, demonstrates a sample Python script, and outlines common commands for managing and restarting services.

ConfigurationLinuxOperations
0 likes · 6 min read
Using Supervisor for Process Management on Linux: Installation, Configuration, and Practical Example
JD Retail Technology
JD Retail Technology
Mar 31, 2020 · Operations

How Sigma’s Event Management Evolved from Zero to Maturity: Standards, Processes, and Platform Insights

The article outlines the Sigma Quality Management Platform’s event management journey across three development stages—establishing basic standards, expanding processes and channels, and achieving mature, integrated governance—while highlighting current challenges, continuous standard refinement, efficiency gains, and practical implementation details.

Event ManagementOperationsPlatform Development
0 likes · 11 min read
How Sigma’s Event Management Evolved from Zero to Maturity: Standards, Processes, and Platform Insights
DevOps
DevOps
Mar 30, 2020 · Operations

Efficient Value Stream in the Construction of Huoshenshan and Leishenshan Hospitals: A DevOps Case Study

This article presents a detailed DevOps case study of the Huoshenshan and Leishenshan hospital construction, outlining a ten‑day timeline of parallel and serial value‑streams, highlighting extreme efficiency, short lead times, and a high proportion of value‑added activities across infrastructure, power, communications, medical systems, and IT equipment.

DevOpsHospital ConstructionOperations
0 likes · 6 min read
Efficient Value Stream in the Construction of Huoshenshan and Leishenshan Hospitals: A DevOps Case Study
DevOps Engineer
DevOps Engineer
Mar 29, 2020 · Operations

Top 14 CI/CD Tools and Their Key Features

This article presents a comprehensive overview of the 14 most popular CI/CD tools, describing their main functionalities, licensing models, and official websites to help teams choose the most suitable solution for fast and reliable software delivery.

Continuous DeliveryDevOpsOperations
0 likes · 20 min read
Top 14 CI/CD Tools and Their Key Features
Programmer DD
Programmer DD
Mar 27, 2020 · Operations

How to Choose Reliable Software Outsourcing Platforms: Global and Domestic Options

This guide reviews the most trustworthy international and Chinese software outsourcing platforms, outlines the essential skills for foreign contracts, highlights key features of each service, and offers practical advice on managing client expectations, expanding your freelance channels, and building a personal brand for long‑term success.

Operationsdomestic platformsfreelance platforms
0 likes · 12 min read
How to Choose Reliable Software Outsourcing Platforms: Global and Domestic Options
Continuous Delivery 2.0
Continuous Delivery 2.0
Mar 26, 2020 · Operations

Facebook Configuration Management: Challenges, Design, and Large‑Scale Distribution

The article examines Facebook’s massive, real‑time configuration management system, describing its rapid change frequency, the engineering challenges of configuration sprawl, authoring, validation, dependency handling, and the scalable, reliable distribution mechanisms that keep billions of devices and servers consistently updated.

Configuration ManagementDeploymentOperations
0 likes · 10 min read
Facebook Configuration Management: Challenges, Design, and Large‑Scale Distribution
Dual-Track Product Journal
Dual-Track Product Journal
Mar 25, 2020 · Product Management

Mastering E‑Commerce Category Design: From Backend Foundations to Frontend Mapping

This article explains how well‑designed product categories serve as the backbone of an e‑commerce platform, covering the concepts of backend and frontend categories, the construction of category trees, and various mapping strategies that help both users find items quickly and operators manage large inventories efficiently.

BackendMappingOperations
0 likes · 8 min read
Mastering E‑Commerce Category Design: From Backend Foundations to Frontend Mapping
DevOps
DevOps
Mar 25, 2020 · Operations

DevOps Case Study: ‘Small Team, Big Backend’ Organizational Structure in the Rapid Construction of Huoshenshan and Leishenshan Hospitals

This article reviews the rapid ten‑day construction of Huoshenshan and Leishenshan hospitals from a DevOps perspective, highlighting how a ‘small team, big backend’ organizational model—mirroring agile and networked structures—enabled efficient coordination across multiple industries and swift project delivery.

DevOpsOperationscase study
0 likes · 6 min read
DevOps Case Study: ‘Small Team, Big Backend’ Organizational Structure in the Rapid Construction of Huoshenshan and Leishenshan Hospitals
Liangxu Linux
Liangxu Linux
Mar 24, 2020 · Operations

What Do Linux Professionals Actually Do? Exploring Ops and Development Careers

This article breaks down the diverse career paths within Linux, detailing the core responsibilities of operations roles—ensuring stable services and data security—and the various development tracks, from application and embedded programming to low‑level kernel and driver engineering.

Operationscareersoftware-engineering
0 likes · 10 min read
What Do Linux Professionals Actually Do? Exploring Ops and Development Careers
21CTO
21CTO
Mar 24, 2020 · Operations

Mastering System Resilience: Rate Limiting, Circuit Breaking, and Degradation

To keep systems highly available under sudden traffic spikes, developers employ three core strategies—rate limiting, circuit breaking, and service degradation—each controlling request flow, isolating failures, and gracefully reducing functionality to maintain stability, with practical examples and algorithmic approaches explained.

Circuit BreakingOperationsrate limiting
0 likes · 5 min read
Mastering System Resilience: Rate Limiting, Circuit Breaking, and Degradation
Efficient Ops
Efficient Ops
Mar 22, 2020 · Operations

Why Nightingale Is Shaping the Future of Enterprise Monitoring

Nightingale, an open‑source enterprise monitoring platform from Didi, combines cloud‑native design, high availability, flexible plugins, and a powerful object‑tree navigation to meet the monitoring needs of both small clusters and massive deployments, while extending and improving upon Open‑Falcon.

AlertingOperationsarchitecture
0 likes · 10 min read
Why Nightingale Is Shaping the Future of Enterprise Monitoring
Architects' Tech Alliance
Architects' Tech Alliance
Mar 22, 2020 · Operations

How to Migrate Legacy Mainframe Workloads to x86: A Step‑by‑Step Guide

This article outlines a comprehensive methodology for migrating small‑mainframe platforms—including hardware assessment, solution design, implementation steps, risk evaluation, and three common data‑migration techniques—so that businesses can safely transition workloads to modern x86 servers while preserving data integrity and service continuity.

Data MigrationLVMOperations
0 likes · 12 min read
How to Migrate Legacy Mainframe Workloads to x86: A Step‑by‑Step Guide
Efficient Ops
Efficient Ops
Mar 20, 2020 · Operations

How Zhejiang Mobile Revamped IT Operations with AIOpsDev and SRE

Zhejiang Mobile’s IT Operations team announced a strategic shift from reactive ticket‑driven maintenance to a proactive, AI‑powered AIOpsDev model, establishing new departments, adopting SRE practices, and leveraging cloud‑native technologies to dramatically improve efficiency, reliability, and digital transformation.

DevOpsITILOperations
0 likes · 7 min read
How Zhejiang Mobile Revamped IT Operations with AIOpsDev and SRE
MaGe Linux Operations
MaGe Linux Operations
Mar 19, 2020 · Operations

Mastering Game Operations: From RAID Configurations to Load Balancer Choices

This article explains the fundamentals of operations and game operations, outlines server management strategies for hundreds of machines, compares RAID levels, evaluates load balancers (LVS, Nginx, HAProxy), discusses proxy servers (Squid, Varnish, Nginx), and clarifies middleware, JDK, Tomcat ports, and CDN concepts.

OperationsRAIDgame operations
0 likes · 8 min read
Mastering Game Operations: From RAID Configurations to Load Balancer Choices
Open Source Linux
Open Source Linux
Mar 19, 2020 · Operations

Why Is My Server CPU at 99%? Pinpoint Java Thread Bottlenecks Fast

After an alert showed a data platform server’s CPU usage soaring to 98.94%, this article walks through a systematic investigation—from spotting the high‑load process with top, tracing the offending Java thread using pwdx and jstack, to optimizing the time‑conversion utility that caused the overload.

CPUOperationsdebugging
0 likes · 7 min read
Why Is My Server CPU at 99%? Pinpoint Java Thread Bottlenecks Fast
Open Source Linux
Open Source Linux
Mar 17, 2020 · Operations

Ultimate Linux Command Cheat Sheet for System Administration

This guide presents a comprehensive cheat sheet of essential Linux commands, covering online queries, file and directory management, content viewing, compression, information display, file searching, user and permission handling, network operations, disk and filesystem tasks, system monitoring, shutdown/reboot procedures, and process management.

LinuxOperationscommand-line
0 likes · 3 min read
Ultimate Linux Command Cheat Sheet for System Administration
DevOps
DevOps
Mar 16, 2020 · Operations

JD.com DevOps Case Study: Agile Transformation, Continuous Delivery, and Organizational Practices

This case study examines JD.com’s evolution into a technology‑driven enterprise, detailing its corporate culture, the “ABCDE” technology strategy, the implementation of DevOps and agile practices through the CALMS framework, and how unified continuous‑delivery platforms and operational metrics have driven growth, efficiency, and pandemic response.

Big DataContinuous DeliveryDevOps
0 likes · 16 min read
JD.com DevOps Case Study: Agile Transformation, Continuous Delivery, and Organizational Practices
FunTester
FunTester
Mar 14, 2020 · Operations

Why Load Testing Is Essential for Every CI Pipeline

Load testing, which simulates thousands of real users, is crucial for uncovering performance bottlenecks that functional tests miss, and integrating automated load tests into every CI cycle helps prevent crashes, protect revenue, and ensure reliable software delivery.

JenkinsLoad TestingOperations
0 likes · 5 min read
Why Load Testing Is Essential for Every CI Pipeline
Efficient Ops
Efficient Ops
Mar 11, 2020 · Operations

How to Elevate Your Monitoring System: Proven Practices from Top DevOps Models

This article explains why modern services depend on highly available, scalable monitoring, outlines a systematic way to assess and improve monitoring capabilities using open‑source tools and the DevOps Capability Maturity Model, and details concrete improvement points across data collection, management, and application.

DevOpsOperationsobservability
0 likes · 9 min read
How to Elevate Your Monitoring System: Proven Practices from Top DevOps Models
Efficient Ops
Efficient Ops
Mar 10, 2020 · Operations

How to Build Anti‑Fragile Operations in the Cloud Era

This article explains the anti‑fragility concept, illustrates how cloud‑based systems become increasingly vulnerable to unexpected events, and offers practical strategies—including risk reduction, choice diversification, proactive experimentation, and biologically inspired resilience—to transform operations and turn shocks into opportunities.

DevOpsOperationsResilience
0 likes · 19 min read
How to Build Anti‑Fragile Operations in the Cloud Era
Youku Technology
Youku Technology
Mar 10, 2020 · Operations

Big Drama Quality Assurance Process at Alibaba Entertainment

Alibaba Entertainment’s Big Drama Assurance framework applies automated and manual quality checks across production, operations, rights, playback, online monitoring, and emergency response, using a unified platform that detects and resolves issues before and after launch to protect revenue, uphold paid‑member rights, and ensure a seamless viewing experience.

AlibabaOperationscontent assurance
0 likes · 7 min read
Big Drama Quality Assurance Process at Alibaba Entertainment
Efficient Ops
Efficient Ops
Mar 8, 2020 · Operations

How We Scaled a Live‑Streaming Platform from 10K to 1M Concurrent Users in 3 Days

This article recounts how a pandemic‑era live‑streaming service rapidly expanded from ten‑thousand to one‑million concurrent viewers within three days by analyzing the pre‑deployment assessment, container‑based scaling, monitoring, emergency response plans, and post‑launch optimizations.

Cloud NativeOperationslive streaming
0 likes · 11 min read
How We Scaled a Live‑Streaming Platform from 10K to 1M Concurrent Users in 3 Days
Big Data Technology Architecture
Big Data Technology Architecture
Mar 7, 2020 · Operations

How to Perform a Graceful Shutdown of an Elasticsearch Node

This article outlines a step‑by‑step procedure for safely taking an Elasticsearch node offline—checking master‑eligible settings, adjusting minimum_master_nodes, excluding the node from routing, waiting for shard relocation, stopping the service, and restoring the cluster routing—ensuring no data loss or service interruption.

Cluster ManagementDevOpsElasticsearch
0 likes · 6 min read
How to Perform a Graceful Shutdown of an Elasticsearch Node
Continuous Delivery 2.0
Continuous Delivery 2.0
Mar 6, 2020 · Operations

Google Incident Postmortem Checklist

The article presents a detailed Google‑derived post‑mortem checklist covering event data collection, root‑cause analysis, lessons learned, actionable improvement items, and review procedures to ensure systematic, non‑blame‑focused incident handling.

OperationsRoot Cause Analysisaction items
0 likes · 5 min read
Google Incident Postmortem Checklist
Liangxu Linux
Liangxu Linux
Mar 5, 2020 · Operations

Essential Linux Commands Every Engineer Should Master

This guide compiles the most essential Linux commands for directory handling, file manipulation, text processing, compression, system monitoring, networking, and routine administration, providing concise examples and practical tips to help beginners and seasoned users alike navigate and manage Unix-like environments efficiently.

OperationsUnixcommand-line
0 likes · 13 min read
Essential Linux Commands Every Engineer Should Master