Tagged articles
3281 articles
Page 20 of 33
Yanxuan Tech Team
Yanxuan Tech Team
Dec 14, 2020 · Operations

Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems

This article examines the critical role of stability governance in evolving systems, outlines a three‑stage framework—usability, monitoring alerts, and online emergency—illustrated with a case study of an electronic waybill service, and shares concrete strategies for prevention, detection, response, and post‑mortem to achieve predictable, observable, and fast‑acting reliability.

Operationsgovernanceincident response
0 likes · 11 min read
Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Dec 11, 2020 · Operations

How to Build Effective Stability Governance for E‑commerce Logistics Services

This article analyzes the concept of stability governance, outlines its five fault‑management sub‑domains, examines the pain points of an electronic waybill service, and presents a comprehensive three‑phase strategy—prevention, perception, reach, mitigation, and post‑mortem—backed by concrete implementation steps in availability, monitoring, and online emergency handling.

LogisticsOperationsincident response
0 likes · 12 min read
How to Build Effective Stability Governance for E‑commerce Logistics Services
Continuous Delivery 2.0
Continuous Delivery 2.0
Dec 11, 2020 · Operations

Interview with Lori Lamkin on Microsoft’s DevOps Journey and Practices

In this interview, Microsoft Visual Studio Cloud Services Program Management Director Lori Lamkin shares the evolution of their DevOps journey, covering rhythm changes, team autonomy, continuous delivery, trunk‑based development, dogfooding, frequent deployments, testing, security, telemetry, metrics, and the cultural shift toward operational responsibility.

MetricsMicrosoftOperations
0 likes · 13 min read
Interview with Lori Lamkin on Microsoft’s DevOps Journey and Practices
JD Cloud Developers
JD Cloud Developers
Dec 10, 2020 · Big Data

Designing Impactful Big‑Screen Data Visualizations: Principles and Real‑World Examples

This article explains how large‑screen data visualizations turn raw numbers into intuitive graphics, outlines key design principles such as focusing on an overview first, limiting metrics to 8‑12, balancing ratio and numeric indicators, and using maps for regional insights, and showcases JD Cloud’s 11.11 monitoring dashboards as practical examples.

Operationscloud computingdashboard design
0 likes · 7 min read
Designing Impactful Big‑Screen Data Visualizations: Principles and Real‑World Examples
Qunar Tech Salon
Qunar Tech Salon
Dec 10, 2020 · Operations

Improving International Hotel After‑Sales Service: Metrics, Optimization Strategies, and Risk Prediction with LightGBM

The article analyzes the after‑sales process of international hotel bookings, defines key metrics such as defect rate and SPO, describes operational improvements, and presents a LightGBM‑based risk‑prediction model to reduce on‑site defects and enhance overall service efficiency.

Hotel IndustryLightGBMOperations
0 likes · 14 min read
Improving International Hotel After‑Sales Service: Metrics, Optimization Strategies, and Risk Prediction with LightGBM
Liangxu Linux
Liangxu Linux
Dec 9, 2020 · Operations

Essential Safety Checklist for Critical Linux Commands

A practical guide warns developers and operators to stay vigilant when executing risky Linux commands, offering step‑by‑step precautions, backup strategies, and safe aliases to prevent accidental data loss or system damage.

LinuxOperationscommand safety
0 likes · 8 min read
Essential Safety Checklist for Critical Linux Commands
Top Architect
Top Architect
Dec 9, 2020 · Operations

Designing High Availability for Redis Using Sentinel

This article explains how Redis Sentinel provides high‑availability for Redis clusters by monitoring masters and slaves, automatically failing over to a new master, and offering three methods for receiving failover notifications, while recommending an indirect‑service approach for scalable integration.

ConfigurationOperationsfailover
0 likes · 7 min read
Designing High Availability for Redis Using Sentinel
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 8, 2020 · Operations

From Ops Engineer to Cloud Leader: 10 Years of Growth at Alibaba

This article chronicles a senior Alibaba technologist’s decade‑long journey through operations, monitoring, resource management, and product development, sharing practical insights on system automation, team leadership, career promotion, and the mindset needed to evolve from a junior engineer to a cloud‑native solutions architect.

Career DevelopmentOperationsautomation
0 likes · 21 min read
From Ops Engineer to Cloud Leader: 10 Years of Growth at Alibaba
Efficient Ops
Efficient Ops
Dec 8, 2020 · Cloud Native

How Cloud‑Native Transforms Game Operations: Insights from Tencent’s DataMore Platform

This article details how Tencent's IEG Value‑Added Services team migrated a massive game data‑marketing platform to a cloud‑native architecture, outlining the business scenario, the cloud‑native developer platform, operational transformation challenges, technical practices such as asset management, orchestration, dynamic scheduling, monitoring, tracing, chaos engineering, CI/CD, and the resulting cost, stability, efficiency, and business empowerment benefits.

Operationsautomationcloud-native
0 likes · 31 min read
How Cloud‑Native Transforms Game Operations: Insights from Tencent’s DataMore Platform
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2020 · Operations

Understanding Data Centers: Architecture, Reliability, and Emerging Technologies

This article explains what a data center is, its core components of compute, storage, and networking, the operational and architectural considerations for reliability and security, and reviews industry standards and emerging technologies such as edge computing, cloud integration, SDN, HCI, containers, NVMe, and GPU acceleration.

Edge ComputingGPUInfrastructure
0 likes · 12 min read
Understanding Data Centers: Architecture, Reliability, and Emerging Technologies
Practical DevOps Architecture
Practical DevOps Architecture
Dec 5, 2020 · Cloud Native

Common Docker CLI Commands Overview

This article provides a concise reference of essential Docker command‑line operations, including searching, pulling, listing, running, inspecting, managing containers and images, and additional utilities, helping developers, testers, and operations engineers efficiently work with Docker without a graphical interface.

CLIContainersDevOps
0 likes · 6 min read
Common Docker CLI Commands Overview
ITPUB
ITPUB
Dec 3, 2020 · Operations

Essential Linux Command-Line Tricks for System Administration

This guide compiles practical Linux shell commands and scripts for tasks such as locating and moving files, batch extraction, text manipulation with sed, directory checks, disk‑space monitoring with email alerts, log analysis, firewall rules, SNMP queries, and more, helping sysadmins automate routine operations efficiently.

OperationsScriptingShell
0 likes · 8 min read
Essential Linux Command-Line Tricks for System Administration
Programmer DD
Programmer DD
Dec 3, 2020 · Operations

Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls

This article shares practical experiences with Prometheus in Kubernetes, covering core principles, limitations, common exporters, metric selection, capacity planning, high‑availability strategies, query optimization, and integration with Grafana, offering actionable guidance for building reliable, scalable monitoring solutions.

ExportersGrafanaKubernetes
0 likes · 31 min read
Mastering Prometheus in Kubernetes: Practical Tips, Exporter Guide, and Common Pitfalls
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Dec 2, 2020 · Operations

Essential Linux Commands Every Sysadmin Should Master

This guide compiles the most frequently used Linux commands for file navigation, content viewing, searching, permission handling, text processing, compression, system shutdown, and process management, providing clear syntax examples and practical tips to boost operational efficiency for system administrators.

LinuxOperationsShell
0 likes · 16 min read
Essential Linux Commands Every Sysadmin Should Master
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Dec 2, 2020 · Industry Insights

Why Service Governance Is Critical for Large‑Scale Systems and How to Build It

Managing hundreds or thousands of tightly coupled services inevitably creates diverse operational challenges, so effective service governance—covering definition, lifecycle, versioning, registration, monitoring, ownership, testing, and security—is essential, and can be realized through a unified DevOps‑driven platform.

CMDBMicroservicesOperations
0 likes · 12 min read
Why Service Governance Is Critical for Large‑Scale Systems and How to Build It
Efficient Ops
Efficient Ops
Dec 1, 2020 · Operations

Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform

At the 2020 GOPS Global Operations Conference, Tencent’s senior operations engineer Xie Hailin detailed the design and implementation of the Panshi platform—a comprehensive, high‑availability solution that unifies change management, fault handling, continuous operation, and disaster recovery to ensure uninterrupted payment services for billions of daily transactions.

Operationsaiopschange management
0 likes · 24 min read
Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform
Efficient Ops
Efficient Ops
Nov 30, 2020 · Operations

Essential Linux Command Cheat Sheet: Find, Move, Sed, Monitoring & More

This guide presents a collection of essential Linux command-line techniques, covering file searching and moving, batch unzipping, powerful sed one‑liners, directory checks, disk usage monitoring with email alerts, log analysis, SNMP queries, firewall rules, and various scripting shortcuts for efficient system administration.

OperationsScripting
0 likes · 8 min read
Essential Linux Command Cheat Sheet: Find, Move, Sed, Monitoring & More
Top Architect
Top Architect
Nov 28, 2020 · Operations

Comprehensive Nginx Guide: Architecture, Reverse Proxy, Load Balancing, Static/Dynamic Separation, and High Availability

This article provides a detailed overview of Nginx, covering its high‑performance architecture, reverse‑proxy concepts, load‑balancing strategies, static‑dynamic separation techniques, common command‑line operations, configuration file structure, practical reverse‑proxy and load‑balancing examples, and a high‑availability solution using Keepalived.

OperationsServer Configurationhigh availability
0 likes · 10 min read
Comprehensive Nginx Guide: Architecture, Reverse Proxy, Load Balancing, Static/Dynamic Separation, and High Availability
Ops Development Stories
Ops Development Stories
Nov 27, 2020 · Operations

How to Monitor Redis with Zabbix Agent2: A Complete Guide

This article explains how to use Zabbix Agent2 to monitor Redis, covering the plugin's architecture, configuration priority, methods for retrieving INFO, CONFIG, health status, and slow‑query logs, as well as practical steps to set up the Redis template in Zabbix.

Agent2DevOpsOperations
0 likes · 9 min read
How to Monitor Redis with Zabbix Agent2: A Complete Guide
Java Backend Technology
Java Backend Technology
Nov 26, 2020 · Operations

Microservices vs Monoliths: Which Wins the Operational Battle?

This article compares microservices and monolithic architectures across eight operational dimensions—network latency, complexity, reliability, resource usage, scaling precision, throughput, deployment time, and communication—showing where each approach excels and concluding which wins overall.

MicroservicesOperationsScalability
0 likes · 12 min read
Microservices vs Monoliths: Which Wins the Operational Battle?
Top Architect
Top Architect
Nov 23, 2020 · Operations

Comprehensive Architecture Skill Maps and DevOps Tool Classification

This article compiles extensive architecture skill maps, categorizes DevOps tools across development, deployment, and maintenance phases, and discusses related technologies such as cloud computing, big data, and security, providing a detailed reference for architects and engineers seeking a holistic view of modern software delivery ecosystems.

DevOpsOperationsSkillMap
0 likes · 10 min read
Comprehensive Architecture Skill Maps and DevOps Tool Classification
Taobao Frontend Technology
Taobao Frontend Technology
Nov 23, 2020 · Operations

Achieving 1‑5‑10 Front‑End Monitoring with JSTracker for Double‑11

This article explains how the JSTracker platform was used to build a comprehensive end‑to‑end front‑end monitoring and data analysis solution that meets the 1‑5‑10 safety production goal—detecting issues within one minute, locating them in five, and fixing them in ten—by improving coverage, subscription, metrics, and gray‑release monitoring for Alibaba’s Double‑11 promotion.

Operationsgray releaseincident response
0 likes · 15 min read
Achieving 1‑5‑10 Front‑End Monitoring with JSTracker for Double‑11
Architect's Tech Stack
Architect's Tech Stack
Nov 23, 2020 · Backend Development

Graceful Shutdown in Java: Using Shutdown Hooks and ThreadPool Management

This article explains the concept of graceful shutdown for Java services, demonstrates how to register shutdown hooks with Runtime.getRuntime().addShutdownHook, provides a complete code example using thread pools, shows the execution results, and discusses best practices for safely terminating applications.

Graceful ShutdownOperationsThreadPool
0 likes · 8 min read
Graceful Shutdown in Java: Using Shutdown Hooks and ThreadPool Management
Architects Research Society
Architects Research Society
Nov 22, 2020 · Operations

UiPath Server Platform Architecture and Deployment Process

UiPath’s server platform is organized into three logical layers—Presentation, Web Service, and Persistence—each providing specific components such as REST APIs, web applications, Elasticsearch, and SQL Server, and follows a VCS‑managed deployment workflow that moves projects from development through QA to production.

OperationsRPAServer
0 likes · 4 min read
UiPath Server Platform Architecture and Deployment Process
DeWu Technology
DeWu Technology
Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseOperations
0 likes · 6 min read
HBase Operations and Use Cases for High‑Concurrency E‑commerce
Xianyu Technology
Xianyu Technology
Nov 19, 2020 · Operations

Rapid and Safe Migration of a Centralized Microservice Platform to Department‑Built Infrastructure

The team migrated a large, multi‑service microservice publishing platform—including Xianyu, Taobao, Alipay, and Tmall—from a centralized environment to a department‑built infrastructure in ten working days by cloning the repo, updating configurations, separating databases, rigorously verifying functionality across dev, pre‑release, and production, and ensuring isolation and monitoring for stability.

Data MigrationDeploymentOperations
0 likes · 7 min read
Rapid and Safe Migration of a Centralized Microservice Platform to Department‑Built Infrastructure
JD Retail Technology
JD Retail Technology
Nov 18, 2020 · Industry Insights

How JD.com Used AI and Operations Science to Power 11.11 Supply‑Chain Success

JD.com's intelligent supply‑chain team combined AI‑driven forecasting, S&OP planning, real‑time inventory response, smart fulfillment, anti‑arbitrage detection, price governance, and precise C2M delivery to dramatically cut costs, improve inventory turnover, and deliver a seamless 11.11 shopping experience.

Artificial IntelligenceLogisticsOperations
0 likes · 18 min read
How JD.com Used AI and Operations Science to Power 11.11 Supply‑Chain Success
Taobao Frontend Technology
Taobao Frontend Technology
Nov 14, 2020 · Operations

How Alibaba Scales Double 11: Inside the Tech Behind 5,000 Billion‑Yuan Sales

Alibaba's Double 11 set a new record with 498.2 billion yuan in sales and a peak of 583,000 orders per second, prompting the Taobao tech team to confront unprecedented concurrency, live‑streaming spikes, and evolving consumer behavior through advanced cloud‑native architectures and real‑time content integration.

Cloud NativeOperationsScalability
0 likes · 14 min read
How Alibaba Scales Double 11: Inside the Tech Behind 5,000 Billion‑Yuan Sales
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 13, 2020 · Operations

Building and Optimizing a Consul‑Based Service Registry for iQIYI's Microservice Platform

iQIYI’s Consul‑based service registry, tightly integrated with its QAE container platform and API gateway, suffered a multi‑DC outage caused by network jitter and a metrics‑library lock‑contention bug, which was resolved by upgrading Go, go‑metrics, and Raft, adding extensive monitoring, redundant DC registration, and dedicated per‑gateway Consul clusters to ensure continued stability and scalability.

ConsulMicroservicesOperations
0 likes · 17 min read
Building and Optimizing a Consul‑Based Service Registry for iQIYI's Microservice Platform
Practical DevOps Architecture
Practical DevOps Architecture
Nov 13, 2020 · Operations

Ansible Core Concepts and Basic Command Examples

This article introduces essential Ansible terminology such as control node, managed nodes, inventory, host files, modules, tasks, playbooks, and roles, and demonstrates basic commands for user creation, directory management, file deletion, and package handling on Linux hosts.

AnsibleConfiguration ManagementDevOps
0 likes · 5 min read
Ansible Core Concepts and Basic Command Examples
dbaplus Community
dbaplus Community
Nov 10, 2020 · Operations

Essential Elasticsearch Tuning Tips for Performance and Stability

This guide consolidates practical Elasticsearch tuning techniques—from configuration file settings and system‑level adjustments to usage‑level optimizations—covering memory locking, discovery, fault detection, queue sizing, JVM heap, file descriptors, translog handling, bulk indexing, shard management, and best practices to achieve a stable, high‑performance cluster.

ClusterOperationsTuning
0 likes · 18 min read
Essential Elasticsearch Tuning Tips for Performance and Stability
JD Cloud Developers
JD Cloud Developers
Nov 10, 2020 · Cloud Computing

How JD Cloud Powers the 11.11 Mega Sale: Scaling, High Availability, and Monitoring Strategies

This article reveals how JD's Zhilian Cloud prepares for the massive 11.11 shopping festival by rapidly mobilizing teams, defining protection scopes, estimating resources, implementing high‑availability across regions and AZs, applying business degradation and elastic scaling, and establishing comprehensive monitoring and rehearsal practices to ensure a smooth, resilient promotion.

Operationscloud computingmonitoring
0 likes · 13 min read
How JD Cloud Powers the 11.11 Mega Sale: Scaling, High Availability, and Monitoring Strategies
DevOps
DevOps
Nov 9, 2020 · Operations

Understanding Process‑Oriented Organizational Construction: Business Flow, Process, IT, Data, Quality, and Operations

The article explains how a company can achieve a process‑oriented organization by defining business flow, aligning processes, leveraging IT to solidify data handling, integrating quality standards, and establishing continuous operations, emphasizing the need for clear concepts and roles across the enterprise.

IT enablementOperationsbusiness flow
0 likes · 23 min read
Understanding Process‑Oriented Organizational Construction: Business Flow, Process, IT, Data, Quality, and Operations
Efficient Ops
Efficient Ops
Nov 8, 2020 · Operations

6 Essential kubectl Tricks to Master Kubernetes Troubleshooting

This guide presents six practical kubectl commands—including get, events, logs, yaml output, scaling, and port‑forwarding—along with detailed usage tips to help you quickly diagnose and resolve common issues in Kubernetes deployments.

DevOpsKubernetesOperations
0 likes · 6 min read
6 Essential kubectl Tricks to Master Kubernetes Troubleshooting
Practical DevOps Architecture
Practical DevOps Architecture
Nov 8, 2020 · Operations

How to Configure BIOS and Create RAID 5 Volumes

This guide walks through entering the BIOS, selecting system settings, configuring the storage controller, creating RAID 0/1/5 volumes with specific capacities, and verifying the status of each logical volume using step‑by‑step screenshots.

BIOSOperationsRAID
0 likes · 3 min read
How to Configure BIOS and Create RAID 5 Volumes
Java Architect Essentials
Java Architect Essentials
Nov 8, 2020 · Operations

What Happens If You Destroy All of Alipay’s Storage Servers? A Deep Dive into Data Center Architecture and Disaster Recovery

The article explores the consequences of destroying Alipay’s storage servers, detailing typical financial data center architectures, backup strategies, power redundancy, fire suppression systems, and the practical challenges of crippling such facilities, while highlighting regulatory and physical security measures.

BackupFire SuppressionOperations
0 likes · 8 min read
What Happens If You Destroy All of Alipay’s Storage Servers? A Deep Dive into Data Center Architecture and Disaster Recovery
Dual-Track Product Journal
Dual-Track Product Journal
Nov 3, 2020 · Operations

WMS vs Inventory Management: Key Differences and Benefits

This article explains the relationship between Warehouse Management Systems (WMS) and Inventory Management Systems, clarifies their definitions and distinctions, outlines how company size influences system architecture, and describes the layered inventory model (sales, scheduling, physical layers) along with its operational advantages.

OperationsProduct ArchitectureSupply Chain
0 likes · 10 min read
WMS vs Inventory Management: Key Differences and Benefits
FunTester
FunTester
Oct 30, 2020 · Operations

Mastering Mobile DevOps: A Complete Guide to CI/CD, Testing, and Release

This article explains how organizations can adopt Mobile DevOps by integrating continuous integration, automated testing on real devices, systematic build, packaging, release, configuration, and monitoring steps to achieve faster, higher‑quality mobile app delivery within the SDLC.

Automated TestingMobile DevOpsOperations
0 likes · 7 min read
Mastering Mobile DevOps: A Complete Guide to CI/CD, Testing, and Release
Top Architect
Top Architect
Oct 28, 2020 · Operations

Top Open-Source API Management Tools and Platforms

This article presents a curated list of leading open‑source API management solutions, describing their key features such as rate limiting, authentication, analytics, developer portals, and deployment options to help developers and operations teams choose the most suitable tool for their API lifecycle needs.

API ManagementOperationsapi-gateway
0 likes · 11 min read
Top Open-Source API Management Tools and Platforms
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Oct 27, 2020 · Cloud Native

Common Kubernetes and Docker Commands

This article provides a concise reference of frequently used Kubernetes (kubectl) and Docker command‑line instructions, covering cluster inspection, pod and service queries, resource creation, deletion, as well as container inspection, logging, and interactive shell access.

CLICloud NativeContainers
0 likes · 5 min read
Common Kubernetes and Docker Commands
DevOps Coach
DevOps Coach
Oct 26, 2020 · Operations

Mastering Visual Management in DevOps: Key Practices and Common Pitfalls

This article explains Google’s DevOps solution framework, focusing on the measurement pillar by detailing how to implement visual management boards, avoid typical mistakes, improve their effectiveness, and measure their impact against team goals, while referencing the DORA study that underpins the approach.

DevOpsMetricsOperations
0 likes · 11 min read
Mastering Visual Management in DevOps: Key Practices and Common Pitfalls
Architects' Tech Alliance
Architects' Tech Alliance
Oct 25, 2020 · Operations

Understanding Data Backup Techniques: File‑Level, Block‑Level, Remote Copy, Snapshots and Volume Clone

This article explains the fundamentals and classifications of data backup technologies—including file‑level and block‑level protection, remote file copy, remote volume imaging, snapshot mechanisms, CoFW vs RoFW strategies, and volume clone methods—while also covering backup destinations, paths, and common backup strategies.

BackupData ProtectionOperations
0 likes · 20 min read
Understanding Data Backup Techniques: File‑Level, Block‑Level, Remote Copy, Snapshots and Volume Clone
Liangxu Linux
Liangxu Linux
Oct 24, 2020 · Operations

Master Linux Cron: From Basics to Advanced Scheduling

This guide explains the Linux cron daemon, how to control the crond service, configure system and user crontabs, manage permissions, create custom cron scripts, use the crontab command syntax, and provides numerous practical scheduling examples.

OperationsSchedulingSystem Administration
0 likes · 11 min read
Master Linux Cron: From Basics to Advanced Scheduling
dbaplus Community
dbaplus Community
Oct 22, 2020 · Operations

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared

This article systematically explains monitoring fundamentals, the seven core functions of a monitoring system, proper usage practices, common monitoring objects and metrics, the basic data flow, and provides detailed comparisons of three popular open‑source solutions—Zabbix, Open‑Falcon, and Prometheus—to guide informed selection decisions.

Open-FalconOperationsSystem Design
0 likes · 20 min read
Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared
Java Backend Technology
Java Backend Technology
Oct 22, 2020 · Information Security

What Caused the Massive P1 Outage? A Real‑World Security Scanning Bug Uncovered

A sudden P1 incident reset all user passwords, and after a thorough investigation the team discovered that a security‑scanning tool’s weak‑password check repeatedly hit login attempts, triggering a bug that caused the outage, highlighting the critical need for proper incident response and security engineering.

Information SecurityOperationsP1 incident
0 likes · 7 min read
What Caused the Massive P1 Outage? A Real‑World Security Scanning Bug Uncovered
Efficient Ops
Efficient Ops
Oct 21, 2020 · Operations

Mastering Sampler: Real-Time Shell Command Monitoring, Visualization, and Alerts

Sampler is a lightweight tool that lets you execute shell commands, visualize their output, and set up alerts using simple YAML configurations, enabling real‑time monitoring of databases, message queues, deployment scripts, and remote servers without requiring a full‑blown monitoring stack.

OperationsYAML configurationsampler
0 likes · 14 min read
Mastering Sampler: Real-Time Shell Command Monitoring, Visualization, and Alerts
Efficient Ops
Efficient Ops
Oct 20, 2020 · Operations

Why Do TIME_WAIT Connections Surge in High‑Concurrency Scenarios and How to Fix Them

During high‑concurrency traffic, servers can accumulate large numbers of TCP connections in the TIME_WAIT state, which can exhaust local ports and cause “address already in use” errors; this article explains the phenomenon, its underlying TCP mechanics, and practical configuration and kernel tweaks to mitigate the issue.

LinuxNetworkingOperations
0 likes · 9 min read
Why Do TIME_WAIT Connections Surge in High‑Concurrency Scenarios and How to Fix Them
DevOps
DevOps
Oct 20, 2020 · Cloud Computing

Chaos Monkey and the Simian Army: Building Resilient Cloud Systems

The article explains how Netflix uses Chaos Monkey and a suite of related tools, collectively called the Simian Army, to deliberately inject failures into their cloud infrastructure, continuously test fault‑tolerance, and ensure high availability and reliability for their streaming service.

NetflixOperationsSimian Army
0 likes · 7 min read
Chaos Monkey and the Simian Army: Building Resilient Cloud Systems
DevOps Coach
DevOps Coach
Oct 15, 2020 · Operations

Explore Jenkinsclient: A Powerful Cross‑Platform CLI for Jenkins

Jenkinsclient is an open‑source, Python‑based, cross‑platform command‑line client that offers Docker‑style commands to manage multiple Jenkins instances, covering configuration, nodes, plugins, credentials, jobs, queues, executors, and builds, with simple installation via pip.

CLIDevOpsJenkins
0 likes · 5 min read
Explore Jenkinsclient: A Powerful Cross‑Platform CLI for Jenkins
Liangxu Linux
Liangxu Linux
Oct 15, 2020 · Operations

Top 16 Essential Tools Every Network Engineer Should Master

A comprehensive guide lists sixteen indispensable network troubleshooting utilities—from classic commands like Ping and Traceroute to advanced platforms such as Nmap, Wireshark, and OpenVAS—explaining their core functions, typical use cases, and how they help engineers quickly pinpoint and resolve connectivity issues.

OperationsWiresharknetwork troubleshooting
0 likes · 9 min read
Top 16 Essential Tools Every Network Engineer Should Master
Meituan Technology Team
Meituan Technology Team
Oct 15, 2020 · Artificial Intelligence

AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)

Meituan’s AIOps initiative replaces manual rule‑based monitoring with the Horae platform, which automatically classifies time‑series metrics, applies CNN and XGBoost models to detect periodic anomalies, achieves over 90 % precision in production, and paves the way for broader metric types, forecasting, and advanced fault‑localization.

HoraeMeituanOperations
0 likes · 33 min read
AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)
ITPUB
ITPUB
Oct 15, 2020 · Operations

How a Huawei Maintenance Engineer Turned Painful On‑Call Duty into Efficient Knowledge Management

A Huawei maintenance engineer shares a decade‑long journey of turning 24/7 on‑call pain into systematic knowledge management, building comprehensive fault‑handling documentation, automating tools, and guiding the team’s evolution toward SRE practices that dramatically reduce manual effort and improve reliability.

DocumentationHuaweiOperations
0 likes · 14 min read
How a Huawei Maintenance Engineer Turned Painful On‑Call Duty into Efficient Knowledge Management
DevOps
DevOps
Oct 15, 2020 · Operations

Agile and DevOps: Friends or Foes? Understanding Their Relationship and Practices

This article explores the nuanced relationship between Agile and DevOps, clarifying common misconceptions, detailing how Scrum and continuous delivery intersect, and presenting the three‑layer DevOps model that helps teams integrate cultural, technical, and delivery practices for better collaboration and value delivery.

Continuous DeliveryDevOpsOperations
0 likes · 11 min read
Agile and DevOps: Friends or Foes? Understanding Their Relationship and Practices
IT Architects Alliance
IT Architects Alliance
Oct 13, 2020 · Cloud Native

Designing Fault‑Tolerant Microservices Architecture

Microservice architectures increase system complexity and failure rates, so this article explains key reliability patterns—such as graceful degradation, change management, health checks, self‑healing, fallback caches, retry logic, rate limiting, circuit breakers, and testing—to help engineers design resilient, high‑availability services.

Cloud NativeMicroservicesOperations
0 likes · 23 min read
Designing Fault‑Tolerant Microservices Architecture
DevOps Cloud Academy
DevOps Cloud Academy
Oct 13, 2020 · Operations

DevOps Fundamentals: Reducing Batch Size and Eliminating Constraints

This article explains DevOps by describing how to create balanced workflows, reduce batch sizes to speed feedback, adopt trunk‑based development with continuous integration and delivery, and continuously identify and remove constraints such as long‑lived feature branches and slow environment provisioning.

Batch SizeConstraintsDevOps
0 likes · 7 min read
DevOps Fundamentals: Reducing Batch Size and Eliminating Constraints
Top Architect
Top Architect
Oct 12, 2020 · Backend Development

Nginx Overview: Architecture, Reverse Proxy, Load Balancing, Static/Dynamic Separation, and High Availability

This article provides a comprehensive guide to Nginx, covering its high‑performance architecture, reverse‑proxy and load‑balancing concepts, static‑dynamic separation, common commands, configuration file structure, practical deployment examples, and high‑availability setup using Keepalived.

Operationshigh availabilityload balancing
0 likes · 11 min read
Nginx Overview: Architecture, Reverse Proxy, Load Balancing, Static/Dynamic Separation, and High Availability
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 11, 2020 · Operations

How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data

Alibaba Cloud’s Log Service (SLS) has evolved into a unified observability middle‑platform that handles tens of petabytes daily, offering integrated storage, processing, and AI‑driven analysis for logs, metrics, and traces, while addressing challenges of data ingestion, performance, and scalability across diverse Ops scenarios.

Big DataLog AnalyticsOperations
0 likes · 16 min read
How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data
MaGe Linux Operations
MaGe Linux Operations
Oct 11, 2020 · Operations

How to Install and Use Bpytop: A Fast, Visual Terminal Resource Monitor

This guide explains why terminal enthusiasts need system resource monitoring, introduces the efficient visual tool Bpytop, and provides step‑by‑step instructions for preparing prerequisites, installing via source or package managers, running, customizing, and locating its configuration file.

InstallationLinuxOperations
0 likes · 5 min read
How to Install and Use Bpytop: A Fast, Visual Terminal Resource Monitor
HaoDF Tech Team
HaoDF Tech Team
Oct 9, 2020 · Operations

Automated Deployment Solution for HaoDF WeChat Mini Programs

This article describes how HaoDF built an automated, visual CI/CD pipeline for its WeChat mini programs, replacing manual testing and release steps with a platform that handles environment configuration, QR‑code generation, code merging, and deployment while improving efficiency, reducing errors, and supporting future scaling.

DevOpsOperationsWeChat Mini Program
0 likes · 9 min read
Automated Deployment Solution for HaoDF WeChat Mini Programs
ITPUB
ITPUB
Oct 9, 2020 · Operations

How to Streamline Call Center Incident Management: Practical Steps and Best Practices

This guide walks through a real‑world call‑center slowdown incident, outlines common fault‑handling techniques, proposes monitoring enhancements, details a comprehensive emergency‑response plan, and introduces intelligent event‑processing concepts to help operations teams resolve outages faster and more reliably.

Operationsautomationcall center
0 likes · 15 min read
How to Streamline Call Center Incident Management: Practical Steps and Best Practices
DevOps Coach
DevOps Coach
Oct 9, 2020 · Operations

How to Master Database Change Management for Zero‑Downtime Deployments

This article explains Google DevOps’s four capability categories, dives into DORA‑backed best practices for database change management—including communication, migration scripts, tooling, zero‑downtime strategies, common pitfalls, and key metrics—to help teams deliver changes safely and quickly.

DevOpsDoRAOperations
0 likes · 13 min read
How to Master Database Change Management for Zero‑Downtime Deployments
Liangxu Linux
Liangxu Linux
Oct 8, 2020 · Operations

Master tmux: Keep Long‑Running Scripts Alive on Remote Servers

This guide explains how to use tmux—a terminal multiplexer—to create, detach, reattach, and manage sessions, windows, and panes on Linux servers, ensuring scripts continue running even when SSH connections drop or terminals close.

LinuxOperationsSession Management
0 likes · 15 min read
Master tmux: Keep Long‑Running Scripts Alive on Remote Servers
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 3, 2020 · Databases

Why Is Redis Slowing Down? Common Causes and How to Diagnose Them

This article explains the typical reasons for Redis latency spikes—including complex commands, large keys, expiration bursts, memory limits, fork overhead, AOF settings, swap usage, and network saturation—and provides practical steps and commands to identify and mitigate each issue.

LatencyOperationsdatabase
0 likes · 18 min read
Why Is Redis Slowing Down? Common Causes and How to Diagnose Them
DevOps Coach
DevOps Coach
Oct 1, 2020 · Operations

Mastering Deployment Automation: Google’s DevOps Best Practices

This guide explains Google’s DevOps solution built on DORA research, outlines the four DevOps capability categories, and provides detailed steps, best practices, common pitfalls, improvement methods, and measurement techniques for implementing reliable, automated software deployments.

Deployment AutomationDevOpsOperations
0 likes · 11 min read
Mastering Deployment Automation: Google’s DevOps Best Practices
Open Source Linux
Open Source Linux
Sep 30, 2020 · Operations

Mastering Nginx: Reverse Proxy, Load Balancing, and High Availability Guide

This comprehensive guide explains Nginx's core concepts—including reverse proxy, load balancing, static‑dynamic separation, common commands, configuration blocks, and high‑availability setup with Keepalived—providing practical examples, diagrams, and code snippets for reliable server deployment.

NginxOperationsServer Configuration
0 likes · 11 min read
Mastering Nginx: Reverse Proxy, Load Balancing, and High Availability Guide
JavaEdge
JavaEdge
Sep 27, 2020 · Operations

Mastering Blue‑Green, Canary, and Dark Launch Deployments: A Practical Guide

This article explains three key deployment strategies—Blue‑Green, Canary (gray release), and Dark Launch (feature toggles)—detailing their concepts, step‑by‑step traffic switching processes, rollback mechanisms, database considerations, and practical usage scenarios for reliable production releases.

Blue‑Green deploymentDark LaunchDeployment Strategies
0 likes · 10 min read
Mastering Blue‑Green, Canary, and Dark Launch Deployments: A Practical Guide
Tencent Cloud Developer
Tencent Cloud Developer
Sep 27, 2020 · Operations

Elasticsearch Cluster Capacity Planning, Index Configuration, and Performance Optimization

This guide outlines practical capacity‑planning, index‑design, and write‑performance tuning for Tencent Cloud Elasticsearch clusters, covering compute and storage sizing, optimal shard counts, rollover strategies, bulk API settings, health monitoring, and common troubleshooting steps to ensure stable, high‑throughput search services.

Cluster PlanningElasticsearchOperations
0 likes · 19 min read
Elasticsearch Cluster Capacity Planning, Index Configuration, and Performance Optimization
MaGe Linux Operations
MaGe Linux Operations
Sep 25, 2020 · Operations

Discover Spug: A Lightweight, Agentless Automation Platform for Small Teams

Spug is an open‑source, agent‑less automation operations platform designed for small‑to‑medium enterprises, offering host management, batch command execution, online terminals, file transfer, application deployment, task scheduling, configuration, monitoring and alerting, with easy Docker installation and a rich web UI.

DeploymentDockerOperations
0 likes · 6 min read
Discover Spug: A Lightweight, Agentless Automation Platform for Small Teams
DevOps Cloud Academy
DevOps Cloud Academy
Sep 25, 2020 · Operations

Understanding DevOps, SecOps, and DevSecOps: Definitions, Benefits, and Choosing the Right Approach

This guide explains the concepts of DevOps, SecOps, and DevSecOps, outlines their respective benefits, and helps organizations decide which security‑focused operational model best fits their needs by comparing their focus on integration, automation, and collaboration across development, operations, and security teams.

CollaborationDevOpsDevSecOps
0 likes · 6 min read
Understanding DevOps, SecOps, and DevSecOps: Definitions, Benefits, and Choosing the Right Approach
Alibaba Cloud Native
Alibaba Cloud Native
Sep 24, 2020 · Cloud Native

Tackling Ultra‑Large‑Scale Service Mesh Deployment: Lessons from Alibaba

This article details Alibaba's practical experience deploying Service Mesh at massive scale, covering architectural evolution, key challenges, traffic interception, hot‑upgrade mechanisms, performance optimizations, and operational tooling that together enable reliable, low‑overhead service communication in a cloud‑native environment.

Cloud NativeEnvoyIstio
0 likes · 22 min read
Tackling Ultra‑Large‑Scale Service Mesh Deployment: Lessons from Alibaba
Programmer DD
Programmer DD
Sep 24, 2020 · Operations

Why 58% of IT Professionals Say Windows 10 Updates Are Useless

A recent Computerworld survey reveals that a majority of IT staff find Windows 10's twice‑yearly updates either useless or of little value, with many preferring older Windows versions and criticizing forced update policies.

OperationsPatch managementWindows
0 likes · 3 min read
Why 58% of IT Professionals Say Windows 10 Updates Are Useless
JD.com Experience Design Center
JD.com Experience Design Center
Sep 23, 2020 · Operations

Boost B2B Operations Efficiency with Template‑Based Design

B‑end operational activities often involve frequent, short‑term, high‑pressure tasks that drain design resources, so this article explains how generic design templates and collaborative online tools can streamline these demands, freeing up manpower and improving overall operational efficiency.

B2BOperationsResource Management
0 likes · 2 min read
Boost B2B Operations Efficiency with Template‑Based Design