Tagged articles

Operations

3329 articles · Page 21 of 34

Nov 6, 2020 · Operations

How to Identify, Clean, and Limit Docker Container Logs to Prevent Disk Space Exhaustion

This guide explains how to locate Docker container log files on Linux, provides shell scripts to list and truncate oversized logs, and shows how to configure per‑container and global log size limits using docker‑compose and the Docker daemon to keep host disk usage under control.

DockerLinuxOperations

0 likes · 4 min read

How to Identify, Clean, and Limit Docker Container Logs to Prevent Disk Space Exhaustion

High Availability Architecture

Nov 6, 2020 · Operations

My Philosophy on Alerting: Principles for Effective Monitoring and Incident Management

This article translates and expands on the author’s seven‑year experience with monitoring and alerting, presenting symptom‑based principles, practical guidelines for rule design, incident handling, and operational processes to create a robust, low‑noise alerting system.

ObservabilityOperationsmonitoring

0 likes · 16 min read

My Philosophy on Alerting: Principles for Effective Monitoring and Incident Management

Dual-Track Product Journal

Nov 3, 2020 · Operations

WMS vs Inventory Management: Key Differences and Benefits

This article explains the relationship between Warehouse Management Systems (WMS) and Inventory Management Systems, clarifies their definitions and distinctions, outlines how company size influences system architecture, and describes the layered inventory model (sales, scheduling, physical layers) along with its operational advantages.

OperationsProduct ArchitectureWMS

0 likes · 10 min read

WMS vs Inventory Management: Key Differences and Benefits

Laravel Tech Community

Nov 1, 2020 · Operations

NGINX 1.19.4 Mainline Release: New SSL Directives and Configuration Examples

The NGINX 1.19.4 mainline release adds new SSL directives such as ssl_conf_command and ssl_reject_handshake, introduces a proxy_smtp_auth mail proxy feature, and provides configuration examples showing how to reject SSL handshakes for all server names except a specified one.

OperationsSSLWeb Server

0 likes · 2 min read

NGINX 1.19.4 Mainline Release: New SSL Directives and Configuration Examples

FunTester

Oct 30, 2020 · Operations

Mastering Mobile DevOps: A Complete Guide to CI/CD, Testing, and Release

This article explains how organizations can adopt Mobile DevOps by integrating continuous integration, automated testing on real devices, systematic build, packaging, release, configuration, and monitoring steps to achieve faster, higher‑quality mobile app delivery within the SDLC.

CI/CDContinuous IntegrationMobile DevOps

0 likes · 7 min read

Mastering Mobile DevOps: A Complete Guide to CI/CD, Testing, and Release

Top Architect

Oct 28, 2020 · Operations

Top Open-Source API Management Tools and Platforms

This article presents a curated list of leading open‑source API management solutions, describing their key features such as rate limiting, authentication, analytics, developer portals, and deployment options to help developers and operations teams choose the most suitable tool for their API lifecycle needs.

API GatewayAPI ManagementOperations

0 likes · 11 min read

Top Open-Source API Management Tools and Platforms

IT Architects Alliance

Oct 27, 2020 · Information Security

Understanding File‑Level and Block‑Level Backup, Snapshots, and Clone Technologies

This article explains the principles and differences of file‑level and block‑level backup, remote file copy, remote volume imaging, snapshot mechanisms (including CoFW and RoFW), clone techniques, and various backup destinations, paths, and strategies used to ensure data reliability and redundancy.

Data ProtectionOperationsSnapshot

0 likes · 20 min read

Understanding File‑Level and Block‑Level Backup, Snapshots, and Clone Technologies

MaGe Linux Operations

Oct 27, 2020 · Operations

Build a Highly Available RabbitMQ Cluster with Docker, HAProxy, and Keepalived

This guide walks through creating a resilient RabbitMQ cluster using two disk nodes and one RAM node, Docker and docker‑compose for deployment, HAProxy for load balancing with VIP failover, and Keepalived for master‑backup high availability, including configuration, scripts, and testing steps.

HAHAProxyKeepalived

0 likes · 17 min read

Build a Highly Available RabbitMQ Cluster with Docker, HAProxy, and Keepalived

Full-Stack Internet Architecture

Oct 27, 2020 · Cloud Native

Common Kubernetes and Docker Commands

This article provides a concise reference of frequently used Kubernetes (kubectl) and Docker command‑line instructions, covering cluster inspection, pod and service queries, resource creation, deletion, as well as container inspection, logging, and interactive shell access.

CLIContainersDocker

0 likes · 5 min read

DevOps Coach

Oct 26, 2020 · Operations

Mastering Visual Management in DevOps: Key Practices and Common Pitfalls

This article explains Google’s DevOps solution framework, focusing on the measurement pillar by detailing how to implement visual management boards, avoid typical mistakes, improve their effectiveness, and measure their impact against team goals, while referencing the DORA study that underpins the approach.

OperationsVisual Managementdevops

0 likes · 11 min read

Mastering Visual Management in DevOps: Key Practices and Common Pitfalls

360 Tech Engineering

Oct 26, 2020 · Operations

Troubleshooting rsyslog-Induced Python RPC Service Startup Failure

This article details the investigation and resolution of a Python RPC service startup failure caused by rsyslog issues, explains the daemonize logic with code snippets, and concludes with a lottery winner announcement.

DaemonizeOperationsPython RPC

0 likes · 4 min read

Troubleshooting rsyslog-Induced Python RPC Service Startup Failure

Architects' Tech Alliance

Oct 25, 2020 · Operations

Understanding Data Backup Techniques: File‑Level, Block‑Level, Remote Copy, Snapshots and Volume Clone

This article explains the fundamentals and classifications of data backup technologies—including file‑level and block‑level protection, remote file copy, remote volume imaging, snapshot mechanisms, CoFW vs RoFW strategies, and volume clone methods—while also covering backup destinations, paths, and common backup strategies.

Data ProtectionOperationsSnapshot

0 likes · 20 min read

Understanding Data Backup Techniques: File‑Level, Block‑Level, Remote Copy, Snapshots and Volume Clone

Liangxu Linux

Oct 24, 2020 · Operations

Master Linux Cron: From Basics to Advanced Scheduling

This guide explains the Linux cron daemon, how to control the crond service, configure system and user crontabs, manage permissions, create custom cron scripts, use the crontab command syntax, and provides numerous practical scheduling examples.

OperationsSchedulingcron

0 likes · 11 min read

Master Linux Cron: From Basics to Advanced Scheduling

dbaplus Community

Oct 22, 2020 · Operations

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared

This article systematically explains monitoring fundamentals, the seven core functions of a monitoring system, proper usage practices, common monitoring objects and metrics, the basic data flow, and provides detailed comparisons of three popular open‑source solutions—Zabbix, Open‑Falcon, and Prometheus—to guide informed selection decisions.

Open-FalconOperationsSystem Design

0 likes · 20 min read

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, and Prometheus Compared

Java Backend Technology

Oct 22, 2020 · Information Security

What Caused the Massive P1 Outage? A Real‑World Security Scanning Bug Uncovered

A sudden P1 incident reset all user passwords, and after a thorough investigation the team discovered that a security‑scanning tool’s weak‑password check repeatedly hit login attempts, triggering a bug that caused the outage, highlighting the critical need for proper incident response and security engineering.

Information SecurityOperationsP1 incident

0 likes · 7 min read

What Caused the Massive P1 Outage? A Real‑World Security Scanning Bug Uncovered

Efficient Ops

Oct 21, 2020 · Operations

Mastering Sampler: Real-Time Shell Command Monitoring, Visualization, and Alerts

Sampler is a lightweight tool that lets you execute shell commands, visualize their output, and set up alerts using simple YAML configurations, enabling real‑time monitoring of databases, message queues, deployment scripts, and remote servers without requiring a full‑blown monitoring stack.

OperationsYAML configurationsampler

0 likes · 14 min read

Mastering Sampler: Real-Time Shell Command Monitoring, Visualization, and Alerts

IT Architects Alliance

Oct 20, 2020 · Industry Insights

From Capability Open Platform to Middle Platform: Architecture, Benefits, and Ecosystem

This article analyzes the architecture of capability open platforms, explains how they evolve into capability middle platforms that aggregate and manage API services, and explores their role in building open, collaborative ecosystems across e‑commerce, travel, and service domains.

API integrationCapability PlatformEcosystem Architecture

0 likes · 18 min read

From Capability Open Platform to Middle Platform: Architecture, Benefits, and Ecosystem

Efficient Ops

Oct 20, 2020 · Operations

Why Do TIME_WAIT Connections Surge in High‑Concurrency Scenarios and How to Fix Them

During high‑concurrency traffic, servers can accumulate large numbers of TCP connections in the TIME_WAIT state, which can exhaust local ports and cause “address already in use” errors; this article explains the phenomenon, its underlying TCP mechanics, and practical configuration and kernel tweaks to mitigate the issue.

LinuxOperationsServer

0 likes · 9 min read

Why Do TIME_WAIT Connections Surge in High‑Concurrency Scenarios and How to Fix Them

DevOps

Oct 20, 2020 · Cloud Computing

Chaos Monkey and the Simian Army: Building Resilient Cloud Systems

The article explains how Netflix uses Chaos Monkey and a suite of related tools, collectively called the Simian Army, to deliberately inject failures into their cloud infrastructure, continuously test fault‑tolerance, and ensure high availability and reliability for their streaming service.

NetflixOperationsSimian Army

0 likes · 7 min read

Chaos Monkey and the Simian Army: Building Resilient Cloud Systems

MaGe Linux Operations

Oct 18, 2020 · Operations

How to Deploy the Nightingale Ops Platform with Docker: Step‑by‑Step Guide

This tutorial walks you through installing Didi's open‑source Nightingale operations platform using Docker, covering code retrieval, Docker‑Compose setup, node configuration, systemd service creation, and accessing the web UI to manage resources, jobs, and monitoring dashboards.

DeploymentOperationsopen-source

0 likes · 5 min read

How to Deploy the Nightingale Ops Platform with Docker: Step‑by‑Step Guide

DevOps Coach

Oct 15, 2020 · Operations

Explore Jenkinsclient: A Powerful Cross‑Platform CLI for Jenkins

Jenkinsclient is an open‑source, Python‑based, cross‑platform command‑line client that offers Docker‑style commands to manage multiple Jenkins instances, covering configuration, nodes, plugins, credentials, jobs, queues, executors, and builds, with simple installation via pip.

CLIJenkinsOperations

0 likes · 5 min read

Explore Jenkinsclient: A Powerful Cross‑Platform CLI for Jenkins

Liangxu Linux

Oct 15, 2020 · Operations

Top 16 Essential Tools Every Network Engineer Should Master

A comprehensive guide lists sixteen indispensable network troubleshooting utilities—from classic commands like Ping and Traceroute to advanced platforms such as Nmap, Wireshark, and OpenVAS—explaining their core functions, typical use cases, and how they help engineers quickly pinpoint and resolve connectivity issues.

Network ToolsNmapOperations

0 likes · 9 min read

Top 16 Essential Tools Every Network Engineer Should Master

Meituan Technology Team

Oct 15, 2020 · Artificial Intelligence

AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)

Meituan’s AIOps initiative replaces manual rule‑based monitoring with the Horae platform, which automatically classifies time‑series metrics, applies CNN and XGBoost models to detect periodic anomalies, achieves over 90 % precision in production, and paves the way for broader metric types, forecasting, and advanced fault‑localization.

AIOpsHoraeMachine Learning

0 likes · 33 min read

AIOps at Meituan: Architecture and Practice of Time‑Series Anomaly Detection (Part 1)

ITPUB

Oct 15, 2020 · Operations

How a Huawei Maintenance Engineer Turned Painful On‑Call Duty into Efficient Knowledge Management

A Huawei maintenance engineer shares a decade‑long journey of turning 24/7 on‑call pain into systematic knowledge management, building comprehensive fault‑handling documentation, automating tools, and guiding the team’s evolution toward SRE practices that dramatically reduce manual effort and improve reliability.

AutomationHuaweiKnowledge Management

0 likes · 14 min read

How a Huawei Maintenance Engineer Turned Painful On‑Call Duty into Efficient Knowledge Management

DevOps

Oct 15, 2020 · Operations

Agile and DevOps: Friends or Foes? Understanding Their Relationship and Practices

This article explores the nuanced relationship between Agile and DevOps, clarifying common misconceptions, detailing how Scrum and continuous delivery intersect, and presenting the three‑layer DevOps model that helps teams integrate cultural, technical, and delivery practices for better collaboration and value delivery.

OperationsScrumbancontinuous delivery

0 likes · 11 min read

Agile and DevOps: Friends or Foes? Understanding Their Relationship and Practices

NetEase Yanxuan Technology Product Team

Oct 14, 2020 · Operations

How NetEase Yanxuan’s Supply Chain Simulation System Boosts Decision Accuracy by 97%

NetEase Yanxuan built a cloud‑native supply‑chain simulation platform that standardizes data flows, achieves 97% forecasting accuracy, processes billions of orders daily, and reduces decision cycles, enabling over thirty major business choices with cost savings of tens of millions.

OperationsSimulationcloud-native

0 likes · 4 min read

How NetEase Yanxuan’s Supply Chain Simulation System Boosts Decision Accuracy by 97%

IT Architects Alliance

Oct 13, 2020 · Cloud Native

Designing Fault‑Tolerant Microservices Architecture

Microservice architectures increase system complexity and failure rates, so this article explains key reliability patterns—such as graceful degradation, change management, health checks, self‑healing, fallback caches, retry logic, rate limiting, circuit breakers, and testing—to help engineers design resilient, high‑availability services.

Operationscircuit breakercloud-native

0 likes · 23 min read

Designing Fault‑Tolerant Microservices Architecture

DevOps Cloud Academy

Oct 13, 2020 · Operations

DevOps Fundamentals: Reducing Batch Size and Eliminating Constraints

This article explains DevOps by describing how to create balanced workflows, reduce batch sizes to speed feedback, adopt trunk‑based development with continuous integration and delivery, and continuously identify and remove constraints such as long‑lived feature branches and slow environment provisioning.

Batch SizeConstraintsContinuous Integration

0 likes · 7 min read

DevOps Fundamentals: Reducing Batch Size and Eliminating Constraints

Top Architect

Oct 12, 2020 · Backend Development

Nginx Overview: Architecture, Reverse Proxy, Load Balancing, Static/Dynamic Separation, and High Availability

This article provides a comprehensive guide to Nginx, covering its high‑performance architecture, reverse‑proxy and load‑balancing concepts, static‑dynamic separation, common commands, configuration file structure, practical deployment examples, and high‑availability setup using Keepalived.

High AvailabilityOperationsReverse Proxy

0 likes · 11 min read

Nginx Overview: Architecture, Reverse Proxy, Load Balancing, Static/Dynamic Separation, and High Availability

Alibaba Cloud Developer

Oct 11, 2020 · Operations

How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data

Alibaba Cloud’s Log Service (SLS) has evolved into a unified observability middle‑platform that handles tens of petabytes daily, offering integrated storage, processing, and AI‑driven analysis for logs, metrics, and traces, while addressing challenges of data ingestion, performance, and scalability across diverse Ops scenarios.

AIOpsBig DataLog Analytics

0 likes · 16 min read

How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data

MaGe Linux Operations

Oct 11, 2020 · Operations

How to Install and Use Bpytop: A Fast, Visual Terminal Resource Monitor

This guide explains why terminal enthusiasts need system resource monitoring, introduces the efficient visual tool Bpytop, and provides step‑by‑step instructions for preparing prerequisites, installing via source or package managers, running, customizing, and locating its configuration file.

InstallationLinuxOperations

0 likes · 5 min read

How to Install and Use Bpytop: A Fast, Visual Terminal Resource Monitor

dbaplus Community

Oct 9, 2020 · Databases

Accelerating MySQL Full & Incremental Recovery: Practical Steps and Optimizations

This article outlines the challenges of MySQL backup restoration, categorizes common data loss scenarios, and provides detailed step‑by‑step procedures for full backup recovery, incremental binlog recovery, and optimized workflows using parallel replication and binlog‑server tricks to reduce downtime.

BinlogMySQLOperations

0 likes · 9 min read

Accelerating MySQL Full & Incremental Recovery: Practical Steps and Optimizations

HaoDF Tech Team

Oct 9, 2020 · Operations

Automated Deployment Solution for HaoDF WeChat Mini Programs

This article describes how HaoDF built an automated, visual CI/CD pipeline for its WeChat mini programs, replacing manual testing and release steps with a platform that handles environment configuration, QR‑code generation, code merging, and deployment while improving efficiency, reducing errors, and supporting future scaling.

CI/CDOperationsWeChat Mini Program

0 likes · 9 min read

Automated Deployment Solution for HaoDF WeChat Mini Programs

ITPUB

Oct 9, 2020 · Operations

How to Streamline Call Center Incident Management: Practical Steps and Best Practices

This guide walks through a real‑world call‑center slowdown incident, outlines common fault‑handling techniques, proposes monitoring enhancements, details a comprehensive emergency‑response plan, and introduces intelligent event‑processing concepts to help operations teams resolve outages faster and more reliably.

AutomationIncident ManagementOperations

0 likes · 15 min read

How to Streamline Call Center Incident Management: Practical Steps and Best Practices

DevOps Coach

Oct 9, 2020 · Operations

How to Master Database Change Management for Zero‑Downtime Deployments

This article explains Google DevOps’s four capability categories, dives into DORA‑backed best practices for database change management—including communication, migration scripts, tooling, zero‑downtime strategies, common pitfalls, and key metrics—to help teams deliver changes safely and quickly.

DORAOperationsZero Downtime

0 likes · 13 min read

How to Master Database Change Management for Zero‑Downtime Deployments

Full-Stack Internet Architecture

Oct 8, 2020 · Databases

Setting Up Redis Sentinel for High Availability: Configuration and Failover Guide

This guide explains how to configure Redis Sentinel to monitor master‑slave instances, automatically promote a slave on master failure, and verify the high‑availability setup with detailed configuration files, startup commands, status checks, and failover testing steps.

High AvailabilityOperationsRedis

0 likes · 10 min read

Setting Up Redis Sentinel for High Availability: Configuration and Failover Guide

Liangxu Linux

Oct 8, 2020 · Operations

Master tmux: Keep Long‑Running Scripts Alive on Remote Servers

This guide explains how to use tmux—a terminal multiplexer—to create, detach, reattach, and manage sessions, windows, and panes on Linux servers, ensuring scripts continue running even when SSH connections drop or terminals close.

LinuxOperationsremote scripting

0 likes · 15 min read

Master tmux: Keep Long‑Running Scripts Alive on Remote Servers

ITFLY8 Architecture Home

Oct 3, 2020 · Databases

Why Is Redis Slowing Down? Common Causes and How to Diagnose Them

This article explains the typical reasons for Redis latency spikes—including complex commands, large keys, expiration bursts, memory limits, fork overhead, AOF settings, swap usage, and network saturation—and provides practical steps and commands to identify and mitigate each issue.

LatencyOperationsPerformance Tuning

0 likes · 18 min read

Why Is Redis Slowing Down? Common Causes and How to Diagnose Them

DevOps Coach

Oct 1, 2020 · Operations

Mastering Deployment Automation: Google’s DevOps Best Practices

This guide explains Google’s DevOps solution built on DORA research, outlines the four DevOps capability categories, and provides detailed steps, best practices, common pitfalls, improvement methods, and measurement techniques for implementing reliable, automated software deployments.

Best PracticesCI/CDDeployment Automation

0 likes · 11 min read

Mastering Deployment Automation: Google’s DevOps Best Practices

Open Source Linux

Sep 30, 2020 · Operations

Mastering Nginx: Reverse Proxy, Load Balancing, and High Availability Guide

This comprehensive guide explains Nginx's core concepts—including reverse proxy, load balancing, static‑dynamic separation, common commands, configuration blocks, and high‑availability setup with Keepalived—providing practical examples, diagrams, and code snippets for reliable server deployment.

High AvailabilityNginxOperations

0 likes · 11 min read

Mastering Nginx: Reverse Proxy, Load Balancing, and High Availability Guide

Java Architect Essentials

Sep 28, 2020 · Operations

How to Build a Scalable Log Monitoring System for Hundreds of Microservices

In large‑scale microservice environments, centralized log collection, filtering, and visualization using Filebeat, Elastic APM, Kafka Streams, Grafana and Prometheus can turn scattered logs into actionable operational data while controlling resource costs.

GrafanaLog MonitoringOperations

0 likes · 9 min read

How to Build a Scalable Log Monitoring System for Hundreds of Microservices

JavaEdge

Sep 27, 2020 · Operations

Mastering Blue‑Green, Canary, and Dark Launch Deployments: A Practical Guide

This article explains three key deployment strategies—Blue‑Green, Canary (gray release), and Dark Launch (feature toggles)—detailing their concepts, step‑by‑step traffic switching processes, rollback mechanisms, database considerations, and practical usage scenarios for reliable production releases.

Blue-Green DeploymentCI/CDCanary Deployment

0 likes · 10 min read

Mastering Blue‑Green, Canary, and Dark Launch Deployments: A Practical Guide

Tencent Cloud Developer

Sep 27, 2020 · Operations

Elasticsearch Cluster Capacity Planning, Index Configuration, and Performance Optimization

This guide outlines practical capacity‑planning, index‑design, and write‑performance tuning for Tencent Cloud Elasticsearch clusters, covering compute and storage sizing, optimal shard counts, rollover strategies, bulk API settings, health monitoring, and common troubleshooting steps to ensure stable, high‑throughput search services.

Cluster PlanningElasticsearchOperations

0 likes · 19 min read

Elasticsearch Cluster Capacity Planning, Index Configuration, and Performance Optimization

MaGe Linux Operations

Sep 25, 2020 · Operations

Discover Spug: A Lightweight, Agentless Automation Platform for Small Teams

Spug is an open‑source, agent‑less automation operations platform designed for small‑to‑medium enterprises, offering host management, batch command execution, online terminals, file transfer, application deployment, task scheduling, configuration, monitoring and alerting, with easy Docker installation and a rich web UI.

DeploymentDockerOperations

0 likes · 6 min read

Discover Spug: A Lightweight, Agentless Automation Platform for Small Teams

DevOps Cloud Academy

Sep 25, 2020 · Operations

Understanding DevOps, SecOps, and DevSecOps: Definitions, Benefits, and Choosing the Right Approach

This guide explains the concepts of DevOps, SecOps, and DevSecOps, outlines their respective benefits, and helps organizations decide which security‑focused operational model best fits their needs by comparing their focus on integration, automation, and collaboration across development, operations, and security teams.

AutomationCollaborationDevSecOps

0 likes · 6 min read

Understanding DevOps, SecOps, and DevSecOps: Definitions, Benefits, and Choosing the Right Approach

Alibaba Cloud Native

Sep 24, 2020 · Cloud Native

Tackling Ultra‑Large‑Scale Service Mesh Deployment: Lessons from Alibaba

This article details Alibaba's practical experience deploying Service Mesh at massive scale, covering architectural evolution, key challenges, traffic interception, hot‑upgrade mechanisms, performance optimizations, and operational tooling that together enable reliable, low‑overhead service communication in a cloud‑native environment.

EnvoyIstioLarge Scale

0 likes · 22 min read

Tackling Ultra‑Large‑Scale Service Mesh Deployment: Lessons from Alibaba

Programmer DD

Sep 24, 2020 · Operations

Why 58% of IT Professionals Say Windows 10 Updates Are Useless

A recent Computerworld survey reveals that a majority of IT staff find Windows 10's twice‑yearly updates either useless or of little value, with many preferring older Windows versions and criticizing forced update policies.

OperationsPatch ManagementWindows

0 likes · 3 min read

Why 58% of IT Professionals Say Windows 10 Updates Are Useless

JD.com Experience Design Center

Sep 23, 2020 · Operations

Boost B2B Operations Efficiency with Template‑Based Design

B‑end operational activities often involve frequent, short‑term, high‑pressure tasks that drain design resources, so this article explains how generic design templates and collaborative online tools can streamline these demands, freeing up manpower and improving overall operational efficiency.

B2BOperationsResource Management

0 likes · 2 min read

Boost B2B Operations Efficiency with Template‑Based Design

Laravel Tech Community

Sep 22, 2020 · Databases

Common Redis Latency Issues and How to Diagnose Them

This article explains why Redis latency can suddenly increase—covering high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation—and provides practical diagnostic steps and mitigation techniques.

LatencyOperationsOptimization

0 likes · 17 min read

Common Redis Latency Issues and How to Diagnose Them

58UXD

Sep 22, 2020 · Operations

How Flexible Staffing and Digital Transformation Can Revive Post‑Pandemic SMEs

The article explores how small and medium‑sized enterprises can recover from pandemic setbacks by adopting flexible employment models, leveraging digital tools for management and customer insight, and shifting to stronger online promotion while controlling costs and improving resilience.

OperationsSMEbusiness strategy

0 likes · 9 min read

How Flexible Staffing and Digital Transformation Can Revive Post‑Pandemic SMEs

Alibaba Cloud Native

Sep 21, 2020 · Operations

Why Chaos Engineering Is Essential for Cloud‑Native High Availability

This article explains the need for chaos engineering in modern distributed and cloud‑native systems, outlines the challenges faced by architects, developers, testers and product teams, and provides step‑by‑step guidance on using ChaosBlade and Alibaba's AHAS platform for effective fault‑injection experiments.

High AvailabilityOperationschaos engineering

0 likes · 9 min read

Why Chaos Engineering Is Essential for Cloud‑Native High Availability

High Availability Architecture

Sep 21, 2020 · Operations

Full‑Link Load Testing Practices for iQIYI Payment System

This article describes iQIYI's payment team approach to full‑link load testing, covering background challenges, systematic problem exploration, preparation of test environments, traffic modeling, execution safeguards, practical results, and future plans to improve capacity verification and system reliability.

Operationscapacity planningfull-link testing

0 likes · 10 min read

Full‑Link Load Testing Practices for iQIYI Payment System

DevOps Cloud Academy

Sep 20, 2020 · Operations

Essential Capabilities for High‑Performance Software Delivery (Based on Accelerate)

The article outlines twenty‑four key capabilities across continuous delivery, architecture, product and process, lean management, and culture that research shows drive superior software delivery performance and organizational outcomes.

CultureLean ManagementOperations

0 likes · 10 min read

Essential Capabilities for High‑Performance Software Delivery (Based on Accelerate)

MaGe Linux Operations

Sep 18, 2020 · Operations

Essential Linux Operations Metrics for Effective Monitoring

This guide enumerates the key Linux system metrics—covering CPU, memory, disk, I/O, network, kernel parameters, RAID, SMART, NTP, and process information—that open-falcon agents collect every minute to enable comprehensive operations monitoring and timely issue detection.

Open-FalconOperationsSystem Performance

0 likes · 12 min read

Essential Linux Operations Metrics for Effective Monitoring

转转QA

Sep 18, 2020 · Operations

Testing Environment Troubleshooting: Characteristics, Common Issues, and Practical Solutions

This article examines the complexities of testing environments, outlines typical causes of failures such as resource constraints, external dependencies, and service bugs, and provides systematic troubleshooting methods, useful tools, and real‑world case studies to improve reliability and efficiency.

OperationsTestingTroubleshooting

0 likes · 11 min read

Testing Environment Troubleshooting: Characteristics, Common Issues, and Practical Solutions

IT Architects Alliance

Sep 14, 2020 · Operations

Implementation of Service Chain Monitoring and End-to-End Process Monitoring

This article explains how to design and implement service‑chain (APM) monitoring and end‑to‑end process monitoring in distributed systems, covering concepts such as spans and traces, TRACE_ID generation, logging practices, visualisation techniques, and a practical expense‑report use case with code examples.

APMDistributed TracingOperations

0 likes · 15 min read

Implementation of Service Chain Monitoring and End-to-End Process Monitoring

dbaplus Community

Sep 14, 2020 · Operations

How iQIYI Scaled Real‑Time Log Monitoring for 100M+ Users with Spark, Flink and Druid

Facing a surge to over 100 million members, iQIYI rebuilt its monitoring stack by ingesting four log types, adopting Spark Streaming, Flink and Druid for real‑time analysis, and optimizing resource usage, which cut incident resolution time by more than 80 % while supporting billion‑level data volumes.

DruidFlinkKafka

0 likes · 12 min read

How iQIYI Scaled Real‑Time Log Monitoring for 100M+ Users with Spark, Flink and Druid

Efficient Ops

Sep 13, 2020 · Operations

Master Nginx: Reverse Proxy, Load Balancing, and High‑Availability Essentials

This guide explains Nginx’s core concepts—including reverse proxy, load balancing, static‑dynamic separation, common commands, configuration blocks, and high‑availability setup with Keepalived—providing step‑by‑step examples and practical diagrams for reliable web service deployment.

High AvailabilityKeepalivedOperations

0 likes · 11 min read

Master Nginx: Reverse Proxy, Load Balancing, and High‑Availability Essentials

TAL Education Technology

Sep 10, 2020 · Cloud Native

Accelerating Project Deployment with a Container Platform and Domain Convergence

This article describes how the infrastructure team reduced new project deployment time to under an hour by combining a container platform with domain convergence, detailing the processes, automation pipelines, Kubernetes-based deployment, autoscaling, logging, and security considerations for efficient, cloud‑native operations.

Deployment AutomationKubernetesOperations

0 likes · 17 min read

Accelerating Project Deployment with a Container Platform and Domain Convergence

Efficient Ops

Sep 9, 2020 · Operations

Mastering Incident Management: Core Principles and Practical Methods

This guide outlines essential incident management principles—prioritizing business restoration and timely escalation—followed by detailed methodologies such as restart, isolation, and degradation, and explains role responsibilities, user impact handling, and post‑incident summarization for continuous improvement.

Incident ManagementOperationsfault handling

0 likes · 10 min read

Mastering Incident Management: Core Principles and Practical Methods

IT Architects Alliance

Sep 8, 2020 · Operations

How to Diagnose Linux Server Performance Issues in the First 60 Seconds

This guide walks you through ten essential Linux command‑line tools—such as uptime, vmstat, iostat, and top—showing how Netflix’s performance engineers use them to quickly assess system load, resource saturation, and errors within the first minute of investigation.

LinuxOperationsTroubleshooting

0 likes · 19 min read

How to Diagnose Linux Server Performance Issues in the First 60 Seconds

Efficient Ops

Sep 8, 2020 · Operations

From Firefighting to Arson: Mastering Ops Availability in Three Stages

The article outlines a three‑stage ops maturity model—firefighting, fire prevention, and arson—explains how proactive fault‑injection drills, continuous availability improvements, and aligning technical metrics with business value can transform operations from reactive responders into strategic value creators.

Fault InjectionIncident ManagementOperations

0 likes · 8 min read

From Firefighting to Arson: Mastering Ops Availability in Three Stages

58UXD

Sep 7, 2020 · Operations

Designing a High‑Impact Brand‑Driven Operation Campaign on a Tight Timeline

This article details how, despite limited resources and time, a product team designed and executed the “Part‑time Gold Rush” operation—defining goals, targeting young users, building a memorable brand, applying 5W1H strategy, leveraging AARRR growth tactics, and achieving revenue and traffic targets.

AARRRCase StudyOperations

0 likes · 9 min read

dbaplus Community

Sep 6, 2020 · Operations

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

The article outlines G Bank’s transition from a single‑threaded commercial monitoring solution to a self‑developed, open‑source based alert system that leverages Akka for parallel collection, Apache Dubbo for distributed processing, and Apache Ignite for in‑memory storage, achieving million‑level alert capacity, sub‑100 ms latency, and linear scalability.

AkkaApache DubboApache Ignite

0 likes · 17 min read

Building a High‑Performance Monitoring Alert System with Akka, Dubbo, and Ignite

Efficient Ops

Sep 3, 2020 · Operations

What Recent Cloud and Data Center Incidents Reveal About Industry Risks?

A roundup of recent tech news covering a Cisco sabotage case, a London data‑center fire, Linux's 29th anniversary, Gartner's China ICT trends, major cloud investments, Windows 95 milestones, Didi's GPU server launch, Hainan's DNS project, Dell’Oro's market report, executive share reductions, and an upcoming global operations conference.

Cloud ComputingData CenterGPU

0 likes · 10 min read

What Recent Cloud and Data Center Incidents Reveal About Industry Risks?

Efficient Ops

Sep 2, 2020 · Operations

Why Consistent Shell Script Standards Matter: A Practical Guide

This guide explains the importance of shell script coding standards, outlines core principles such as correctness, readability, maintainability, and consistency, and provides detailed recommendations on file naming, encoding, line length, indentation, comments, testing, and safe use of commands to improve script quality and reduce maintenance costs.

Operationsbashcoding standards

0 likes · 26 min read

Why Consistent Shell Script Standards Matter: A Practical Guide

MaGe Linux Operations

Aug 30, 2020 · Operations

How to Seamlessly Upgrade Nginx from 1.16 to 1.18 with Zero Downtime

This guide walks through verifying the existing Nginx 1.16.1 process, compiling and configuring Nginx 1.18.0 with identical options, performing a zero‑downtime binary replacement, and handling rollback procedures using signals and process management commands on a Linux server.

LinuxNginxOperations

0 likes · 14 min read

How to Seamlessly Upgrade Nginx from 1.16 to 1.18 with Zero Downtime

Architecture Digest

Aug 30, 2020 · Cloud Native

Migrating Docker Images, Containers, and Volumes: Practical Techniques

This article explains how to migrate Docker images, containers, and data volumes using save/load, export/import, and backup/restore commands, offering practical steps for offline environments, complex production services, and volume handling while highlighting the limitations of conventional approaches.

Container MigrationOperationsVolume Backup

0 likes · 7 min read

Migrating Docker Images, Containers, and Volumes: Practical Techniques

Tencent Cloud Developer

Aug 28, 2020 · Databases

Automating Data Balancing for ClickHouse Clusters on Tencent Cloud

Tencent Cloud’s managed ClickHouse service now includes an automated data‑balancing feature that, after user authorization and bandwidth configuration, creates migration plans to redistribute tables across new or decommissioned nodes, eliminating manual rebalancing, reducing operational overhead, and ensuring balanced storage during elastic scaling.

ClickHouseOperationsdata balancing

0 likes · 8 min read

Automating Data Balancing for ClickHouse Clusters on Tencent Cloud

FunTester

Aug 27, 2020 · Industry Insights

Is Fiddler Everywhere Worth It? A Critical Review of Features, Pricing, and Roadmap

This article critically examines Fiddler Everywhere, comparing its free and Pro editions, highlighting missing capabilities, analyzing the product roadmap and update cadence, and concluding why the tool is unlikely to become popular despite its attractive pricing.

Fiddler EverywhereOperationsdebugging

0 likes · 6 min read

Is Fiddler Everywhere Worth It? A Critical Review of Features, Pricing, and Roadmap

Laravel Tech Community

Aug 25, 2020 · Operations

NetBox 2.9.1 Release Highlights and New Features

NetBox 2.9.1, an IP address and data center infrastructure management tool built on Django and PostgreSQL, introduces several enhancements including SLAAC address status, nested LAG support, version details on error pages, and a backward‑compatible remote authentication backend parameter.

DCIMDjangoIPAM

0 likes · 2 min read

NetBox 2.9.1 Release Highlights and New Features

Efficient Ops

Aug 25, 2020 · Operations

How to Build an Enterprise‑Grade Observability System and Master Incident Response

This article explains how enterprises adopting SRE can design a comprehensive observability platform—covering metrics, logs, and tracing—while also detailing effective incident response, post‑mortem practices, testing, capacity planning, automation tool development, and user‑experience focus to improve overall operational reliability.

ObservabilityOperationsSRE

0 likes · 17 min read

How to Build an Enterprise‑Grade Observability System and Master Incident Response

DevOps Cloud Academy

Aug 25, 2020 · Operations

A Simple Four‑Step Process for Prioritizing DevOps Work

This article outlines a practical four‑step process—Define, Scope, Experiment, Analyze—to help DevOps engineers prioritize automation tasks, assess pain points, and align improvements with business value, offering actionable guidance for effective pipeline and workflow optimization.

AutomationOperationsdevops

0 likes · 6 min read

A Simple Four‑Step Process for Prioritizing DevOps Work

Ops Development Stories

Aug 25, 2020 · Operations

ESrally Guide: Install, Configure, and Benchmark Elasticsearch Performance

ESrally is the official Elasticsearch benchmarking tool; this guide walks through its installation prerequisites, step‑by‑step setup of Python, JDK, and Git, configuration of tracks, cars, pipelines, and challenges, and demonstrates real‑world performance comparisons across Elasticsearch versions and hardware platforms.

BenchmarkingESrallyElasticsearch

0 likes · 16 min read

ESrally Guide: Install, Configure, and Benchmark Elasticsearch Performance

DevOps

Aug 25, 2020 · Operations

IDCF Phase 5 DevOps Case Study: Traditional Banking Practice and Lessons Learned

This article details a month‑long DevOps case study conducted by the IDCF team on traditional banking, describing the four guiding principles, the six‑stage workflow from team formation to retrospection, the research findings across major Chinese banks, and the resulting best‑case award and future digital‑transformation discussions.

Case StudyFinTechOperations

0 likes · 7 min read

IDCF Phase 5 DevOps Case Study: Traditional Banking Practice and Lessons Learned

Aikesheng Open Source Community

Aug 24, 2020 · Operations

Prometheus Data Query Basics and Practical Usage Guide

This article introduces Prometheus' query language PromQL, explains instant and range vector selectors, label matching, offset handling, storage design, common functions and aggregation operators, and provides practical advice for efficient querying and avoiding performance issues.

OperationsPromQLPrometheus

0 likes · 13 min read

Prometheus Data Query Basics and Practical Usage Guide

DevOps Cloud Academy

Aug 22, 2020 · Operations

Common Mistakes in DevOps Implementation and How to Avoid Them

The article outlines ten frequent pitfalls that organizations encounter when adopting DevOps—such as out‑of‑order delivery, misunderstandings of DevOps roles, lack of flexibility, speed over quality, isolated teams, unautomated databases, insufficient incident handling, limited expertise, security neglect, and team fatigue—and provides practical guidance to prevent these errors for more successful DevOps outcomes.

AutomationCI/CDOperations

0 likes · 11 min read

Common Mistakes in DevOps Implementation and How to Avoid Them

DevOps Cloud Academy

Aug 20, 2020 · Operations

How DevOps Can Reduce Technical Debt During Cloud Migration

This article explains what technical debt is, why it accumulates in both development and operations, and outlines four DevOps‑driven strategies—including building cross‑functional teams, automation, containerization, and API‑centric design—to identify, track, and repay technical debt while improving cloud migration outcomes.

AutomationCI/CDContainers

0 likes · 10 min read

How DevOps Can Reduce Technical Debt During Cloud Migration

Efficient Ops

Aug 19, 2020 · Operations

How End-State‑Oriented Monitoring Transforms Operations and AIOps

This article explains the concept of end‑state‑oriented monitoring, its significance for modern operations, the shortcomings of existing solutions, and a layered design approach that leverages real‑time data, service catalogs, and AI to achieve secure, stable, efficient, and low‑cost operations.

AIOpsOperationsdevops

0 likes · 13 min read

How End-State‑Oriented Monitoring Transforms Operations and AIOps

Senior Brother's Insights

Aug 19, 2020 · Operations

Essential Ops Lessons: Avoid Disasters with Backups, Monitoring, and Secure Practices

This guide shares hard‑earned lessons from real‑world server administration, emphasizing careful testing, confirming commands before execution, limiting simultaneous operators, always backing up configurations, protecting data, tightening SSH and firewall security, implementing comprehensive monitoring, and applying disciplined performance‑tuning practices to maintain stable, reliable services.

OperationsPerformance Tuningbackup

0 likes · 12 min read

Essential Ops Lessons: Avoid Disasters with Backups, Monitoring, and Secure Practices

dbaplus Community

Aug 17, 2020 · Operations

Master Server Troubleshooting: Diagnose, Optimize, and Keep Your Backend Stable

This article shares practical experience on backend troubleshooting, outlining common failure types, a step‑by‑step diagnosis workflow, essential tools, and systematic optimization techniques for performance, stability and maintainability, helping engineers quickly stop losses, pinpoint root causes, and implement robust fixes.

Operationsbackendmaintainability

0 likes · 21 min read

Master Server Troubleshooting: Diagnose, Optimize, and Keep Your Backend Stable

Open Source Linux

Aug 17, 2020 · Operations

Step-by-Step Guide to Install and Configure Zabbix on CentOS 7

This tutorial walks you through installing Zabbix on CentOS 7, covering prerequisite disabling of SELinux and firewalls, adding repositories, installing server, web, and database components, configuring files, securing MariaDB, starting services, and completing the web‑based setup with language customization.

CentOSInstallationLinux

0 likes · 7 min read

Step-by-Step Guide to Install and Configure Zabbix on CentOS 7

FunTester

Aug 15, 2020 · Operations

Why Quality Management Is Critical for Project Success

This article explains the importance of quality management in projects, outlines its two main dimensions—process quality and product quality—details the multiple benefits of systematic quality control, and provides an eight‑step framework for creating an effective quality management plan.

OperationsProject ManagementQA

0 likes · 5 min read

Why Quality Management Is Critical for Project Success

Alibaba Cloud Native

Aug 15, 2020 · Operations

Master Alibaba Cloud ROS Templates: A Step‑by‑Step Guide to Automate Resource Orchestration

This guide walks you through the challenges of rapid system provisioning, explains the benefits of Alibaba Cloud Resource Orchestration Service (ROS), and provides detailed, IDE‑integrated steps for creating, managing, and deploying ROS templates and resource stacks efficiently.

Alibaba CloudAutomationIDE

0 likes · 11 min read

Master Alibaba Cloud ROS Templates: A Step‑by‑Step Guide to Automate Resource Orchestration

Zhongtong Tech

Aug 14, 2020 · Operations

How Intelligent Routing Is Revolutionizing Logistics Operations with AI and Big Data

At the 9th China Logistics Technology Summit, ZTO Express’s Le Aihua explained how AI‑driven intelligent routing and big‑data analytics are reshaping logistics networks, boosting efficiency, cutting costs, and positioning logistics as a critical infrastructure in the post‑pandemic era.

AIIntelligent RoutingOperations

0 likes · 4 min read

How Intelligent Routing Is Revolutionizing Logistics Operations with AI and Big Data

Yanxuan Tech Team

Aug 14, 2020 · Operations

How to Build a Robust Event‑Tracking Management Platform for Scalable Data Quality

This article explains how a comprehensive event‑tracking management platform can streamline definition, offline and online assurance, automate testing, and monitor data quality across multiple client platforms, improving collaboration and reducing errors in fast‑growing business environments.

OperationsPlatformTesting

0 likes · 13 min read

How to Build a Robust Event‑Tracking Management Platform for Scalable Data Quality

IT Architects Alliance

Aug 13, 2020 · Operations

How Dada Scaled Its Log System to 130 Billion Daily Entries with Kubernetes and Storm

This article details how Dada built a Kubernetes‑mixed log platform that handles over 130 billion logs per day, stores more than 14 TB daily, and maintains a 300 TB total volume by automating collection with Filebeat, parsing with Storm, and optimizing Elasticsearch with hot‑cold nodes.

ElasticsearchKubernetesOperations

0 likes · 12 min read

How Dada Scaled Its Log System to 130 Billion Daily Entries with Kubernetes and Storm

DevOps Cloud Academy

Aug 13, 2020 · Operations

Integrating DevOps Toolchains for Enterprise‑Scale End‑to‑End Communication and Collaboration

The article explains how integrating DevOps toolchains can achieve enterprise‑scale end‑to‑end communication and collaboration without forcing teams to change their workflows, discusses common bottlenecks, presents unified versus loosely‑coupled integration approaches, and offers practical recommendations for building an inclusive, interconnected DevOps ecosystem.

CollaborationEnterpriseOperations

0 likes · 10 min read

Integrating DevOps Toolchains for Enterprise‑Scale End‑to‑End Communication and Collaboration

DevOps Cloud Academy

Aug 12, 2020 · Operations

10 International Companies That Successfully Transformed to DevOps in 2020

This article reviews ten well‑known enterprises—including Adidas, Capital One, Verizon, Disney, and Starbucks—that have undertaken large‑scale DevOps and cloud‑native transformations, detailing the challenges they faced, the cultural and technical changes implemented, and the measurable business benefits achieved.

Case StudyOperationscloud-native

0 likes · 13 min read

10 International Companies That Successfully Transformed to DevOps in 2020

Efficient Ops

Aug 11, 2020 · Operations

How Multi‑Cloud Disaster Recovery Boosts Site Availability: Lessons from Real‑World DR Drills

This article shares a detailed case study of building multi‑cloud site disaster‑recovery and fault‑drill practices at Kaixin Network, covering high‑availability concepts, architectural redesign, pain points, automated one‑click switching, and future self‑healing with chaos engineering to improve reliability.

Disaster RecoveryHigh AvailabilityMulti-Cloud

0 likes · 15 min read

How Multi‑Cloud Disaster Recovery Boosts Site Availability: Lessons from Real‑World DR Drills

Full-Stack Internet Architecture

Aug 11, 2020 · Operations

Diagnosing a Slow Production Server with Linux Commands (top, vmstat, free, df, iostat, sar)

This guide explains how to analyze a sluggish production server by examining overall system metrics, CPU usage, memory, swap, disk space, and I/O using common Linux commands such as top, vmstat, free, df, iostat, and sar, with practical interpretation tips for each output field.

LinuxOperationsServer Diagnostics

0 likes · 8 min read

Diagnosing a Slow Production Server with Linux Commands (top, vmstat, free, df, iostat, sar)

Java Architect Essentials

Aug 11, 2020 · Operations

Four Essential Linux Monitoring Tools for Operations Engineers

This article introduces four widely used Linux monitoring tools—iotop, htop, IPTraf, and Monit—explaining their features, usage scenarios, and how they help operations engineers diagnose performance issues without a GUI, including real‑time I/O tracking, visual CPU/memory graphs, network traffic analysis, and flexible alerting.

HtopIPTrafLinux

0 likes · 7 min read

Four Essential Linux Monitoring Tools for Operations Engineers

IT Architects Alliance

Aug 6, 2020 · Operations

Eight Essential Steps for Successful Disaster Recovery Drills

This guide outlines eight practical steps—including defining scope, forming a planning team, setting clear objectives, designing realistic scenarios, creating evaluation checklists, assigning roles, conducting pre‑drill briefings, and performing post‑drill reviews—to help organizations execute effective, repeatable disaster recovery exercises that strengthen business continuity.

Best PracticesDisaster RecoveryOperations

0 likes · 9 min read

Eight Essential Steps for Successful Disaster Recovery Drills

Java Backend Technology

Aug 4, 2020 · Operations

Which Tech Companies Actually Follow a 9‑5‑5 Work‑Life Balance?

This article explains the 955 work‑life‑balance concept, contrasts it with the 996 model, and provides a curated list of tech companies that are generally considered to follow a 9‑5‑5 schedule, while noting regional and departmental variations.

955 scheduleOperationsTech Companies

0 likes · 5 min read

Which Tech Companies Actually Follow a 9‑5‑5 Work‑Life Balance?

Cloud Native Technology Community

Jul 31, 2020 · Industry Insights

Why DevOps Is Hard and How the Agile Flow Model Can Help Teams

This article analyzes why DevOps adoption often fails, explores the Agile Flow Model with its four fluency intervals, outlines the necessary organizational investments and challenges for each stage, and presents real‑world DevOps toolchain case studies to guide effective implementation.

AgileOperationsOrganizational Change

0 likes · 17 min read

Why DevOps Is Hard and How the Agile Flow Model Can Help Teams

转转QA

Jul 31, 2020 · Operations

Design and Implementation of a Real-Time Log Collection and Query System for Distributed Deployment

The article describes the challenges of troubleshooting distributed deployments across many machines and presents a solution built on the ELK stack that centralizes logs from Java and Go services, enabling near‑real‑time search, visualization, and faster issue resolution.

Operationsdistributed systemslog collection

0 likes · 5 min read

Design and Implementation of a Real-Time Log Collection and Query System for Distributed Deployment

StarRing Big Data Open Lab

Jul 28, 2020 · Operations

How DevOps and SRE Transform Modern Software Delivery and Operations

This article explains the evolution from traditional C/S to B/S architectures, compares DevOps and SRE principles, discusses their roles in the container and cloud eras, and showcases StarRing's TDC platform that integrates automated pipelines, monitoring, and deployment for efficient software delivery.

Cloud ComputingContinuous IntegrationOperations

0 likes · 14 min read

How DevOps and SRE Transform Modern Software Delivery and Operations

Xianyu Technology

Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

AutomationBig DataDecision Tree

0 likes · 12 min read

ShenTan: Automated Fault Localization System for Online Services

High Availability Architecture

Jul 28, 2020 · Operations

Tech Migrations the Spotify Way: A Three‑Step Strategy to Reduce Fragmentation

Spotify shares a three‑step, product‑focused approach—prioritization, productized migration, and automation—to streamline large‑scale technical upgrades, avoid fragmented legacy systems, and keep engineering teams focused on core business value.

AutomationOperationsProduct Management

0 likes · 9 min read

Tech Migrations the Spotify Way: A Three‑Step Strategy to Reduce Fragmentation

Laravel Tech Community

Jul 27, 2020 · Operations

Detailed Nginx Configuration Parameters and Multi‑Server Load Balancing Guide

This article provides a comprehensive walkthrough of Nginx configuration directives, explains key parameters such as worker processes, logging, gzip, and demonstrates how to set up multi‑node load balancing with various upstream algorithms using Docker containers and host file adjustments.

DockerNginxOperations

0 likes · 12 min read

Detailed Nginx Configuration Parameters and Multi‑Server Load Balancing Guide