Tagged articles
3281 articles
Page 17 of 33
Efficient Ops
Efficient Ops
Nov 18, 2021 · Operations

Latest DevOps Maturity Assessment Results Reveal Top Companies and New 2+ Level Standards

The 2021 GOPS Global Operations Conference in Shanghai announced the latest DevOps capability maturity assessment results, detailing the enterprises that achieved continuous delivery level 3 and technical operation level 2+, explaining the new 2+ grading, and outlining the DevOps maturity model and its industry adoption.

Capability MaturityContinuous DeliveryDevOps
0 likes · 6 min read
Latest DevOps Maturity Assessment Results Reveal Top Companies and New 2+ Level Standards
vivo Internet Technology
vivo Internet Technology
Nov 17, 2021 · Operations

Design and Architecture of a Unified Alert Convergence System for Monitoring

The paper presents a unified alert convergence system that centralizes metric calculation, detection, and alarm handling across monitoring subsystems, employing mechanisms such as convergence, claiming, silencing, escalation, and a Redis‑based delayed queue integrated via Kafka or REST to reduce alarm fatigue, improve MTTA/MTTR, and enable future AI‑driven AIOps.

MTTAMTTROperations
0 likes · 18 min read
Design and Architecture of a Unified Alert Convergence System for Monitoring
58UXD
58UXD
Nov 17, 2021 · Operations

How 58 Daojia Scaled Service Center Design Across Hundreds of Stores

This article details the design principles, brand‑value strategies, quality control, and cost‑saving measures used to launch the first 58 Daojia premium service center and expand the concept to nearly a hundred physical stores nationwide.

OperationsProject ManagementService Center
0 likes · 9 min read
How 58 Daojia Scaled Service Center Design Across Hundreds of Stores
Open Source Linux
Open Source Linux
Nov 16, 2021 · Databases

How to Stress Test Redis with redis-benchmark: A Quick Guide

This guide explains how to use Redis's built-in redis-benchmark tool to simulate concurrent client load, interpret key performance metrics such as request latency and throughput, and monitor server resource usage, helping operators prevent cache-related failures like penetration and avalanche after deployment.

OperationsPerformance Testingbenchmark
0 likes · 3 min read
How to Stress Test Redis with redis-benchmark: A Quick Guide
DevOps
DevOps
Nov 16, 2021 · Operations

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

The article outlines how enterprises can assess digital transformation outcomes, formulate effective strategies, build large‑scale capabilities, foster agile culture, and continuously monitor progress, drawing on McKinsey research and real‑world examples to guide traditional firms toward sustainable digital growth.

Big DataDigital TransformationOperations
0 likes · 17 min read
Key Strategies and Recommendations for Successful Enterprise Digital Transformation
58UXD
58UXD
Nov 15, 2021 · Operations

How Strategic Visual Design Boosts E‑commerce Campaign Performance

This article examines how thoughtfully crafted main visuals influence user engagement and sales in e‑commerce campaigns, presenting four case studies from the “Super Welfare Day” series that illustrate design background, strategy, visual style, implementation, and measurable results such as an 85.2% GMV lift.

Design ThinkingOperationscampaign strategy
0 likes · 8 min read
How Strategic Visual Design Boosts E‑commerce Campaign Performance
Open Source Linux
Open Source Linux
Nov 14, 2021 · Operations

How to Quickly Identify Disk Space Hogs on Linux Servers

Learn step-by-step Linux techniques—including df, du, find, and lsof commands—to pinpoint large directories or files, filter results, handle hidden space consumption, and adjust reserved filesystem space, ensuring you can efficiently resolve unexpected disk usage issues on your servers.

LinuxOperationsdf
0 likes · 4 min read
How to Quickly Identify Disk Space Hogs on Linux Servers
IT Architects Alliance
IT Architects Alliance
Nov 11, 2021 · Operations

Design and Implementation of a TB‑Scale Log Monitoring System Using the ELK Stack

This article explains how to build a terabyte‑level log monitoring platform for micro‑service environments by unifying log collection with FileBeat, enriching observability through Elastic APM, processing streams via Kafka Streams, and visualizing metrics with Grafana and Kibana, while addressing cost‑effective filtering and retention strategies.

ELK StackGrafanaLog Monitoring
0 likes · 8 min read
Design and Implementation of a TB‑Scale Log Monitoring System Using the ELK Stack
DevOps
DevOps
Nov 8, 2021 · Operations

Digital Transformation vs. IT Transformation: Key Differences and How They Should Interact

The article explains that digital transformation is a customer‑driven, end‑to‑end business overhaul distinct from IT transformation, which focuses on technology, highlighting three major differences, the risks of conflating the two, and why digital transformation should ultimately drive IT transformation for lasting competitive advantage.

Business strategyDigital TransformationIT transformation
0 likes · 9 min read
Digital Transformation vs. IT Transformation: Key Differences and How They Should Interact
Liangxu Linux
Liangxu Linux
Nov 7, 2021 · Operations

How to Quickly Identify Disk Space Hogs on Linux Servers

This guide explains how to diagnose unexpected disk usage on Linux by using df, du, find, and lsof commands, demonstrates efficient ways to locate large directories or deleted files, and shows how to adjust reserved space with tune2fs to reclaim lost storage.

LinuxOperationsdisk space
0 likes · 5 min read
How to Quickly Identify Disk Space Hogs on Linux Servers
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Nov 4, 2021 · Operations

Mastering Service Degradation: Strategies to Keep Systems Available

Service degradation involves strategically reducing or disabling non‑essential features during traffic spikes or failures to maintain core functionality, covering concepts like SLA levels, fallback data, rate‑limiting, timeout handling, circuit breaking, and front‑end and back‑end downgrade techniques for high‑availability systems.

OperationsSLAfallback data
0 likes · 14 min read
Mastering Service Degradation: Strategies to Keep Systems Available
Alibaba Cloud Native
Alibaba Cloud Native
Nov 2, 2021 · Cloud Native

How to Spot Load‑Balancing, Scheduling, and Hotspot Issues with Kubernetes Monitoring

This article explains how to use Kubernetes monitoring features such as service details, topology maps, and pod metrics to quickly identify load‑balancing imbalances, cluster scheduling bottlenecks, and resource hotspot problems, providing practical steps and visual examples for improving system reliability and performance.

Cloud NativeKubernetesOperations
0 likes · 10 min read
How to Spot Load‑Balancing, Scheduling, and Hotspot Issues with Kubernetes Monitoring
Efficient Ops
Efficient Ops
Nov 1, 2021 · Operations

How AIOps Is Empowering Enterprise Digital Transformation

The article explains how AIOps, built on DevOps principles and leveraging AI and big‑data analytics, helps enterprises overcome governance challenges, improve operational efficiency, and accelerate digital transformation, highlighting standards, real‑world evaluations, and key benefits such as real‑time analysis and noise reduction.

DevOpsDigital TransformationIT Governance
0 likes · 7 min read
How AIOps Is Empowering Enterprise Digital Transformation
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Nov 1, 2021 · Operations

Mastering Service Degradation: Strategies to Keep Your System Available Under Load

Service degradation, a crucial reliability technique, involves selectively disabling non-essential features, applying rate limiting, timeout handling, fallback data, and tiered switches across front‑end, back‑end, and infrastructure layers to maintain core functionality during traffic spikes or component failures, ensuring high availability and meeting SLA targets.

FallbackOperationsReliability
0 likes · 13 min read
Mastering Service Degradation: Strategies to Keep Your System Available Under Load
IT Architects Alliance
IT Architects Alliance
Oct 31, 2021 · Operations

How to Build a Highly Available Redis Service with Sentinel – A Practical Guide

This article explains why Redis needs high availability, defines common failure scenarios, compares several HA architectures—including single‑instance, master‑slave with one or multiple Sentinel processes, and VIP‑based solutions—and provides step‑by‑step guidance for deploying a robust Redis Sentinel cluster.

BackendOperationsarchitecture
0 likes · 13 min read
How to Build a Highly Available Redis Service with Sentinel – A Practical Guide
YunZhu Net Technology Team
YunZhu Net Technology Team
Oct 29, 2021 · Operations

Digital Transformation and the Role of Time Tracking in Management Decision‑Making

The article explains how digital transformation requires lean management and data‑driven decision making, using time‑tracking tools like TAPD to capture work hours, analyze costs, and link talent utilization to project outcomes for improved operational efficiency and transparent business guidance.

Digital TransformationLean ManagementOperations
0 likes · 3 min read
Digital Transformation and the Role of Time Tracking in Management Decision‑Making
58UXD
58UXD
Oct 29, 2021 · Operations

How the CST Model Boosts User Conversion: A Design Case Study

This article examines how applying the CST design model, user segmentation, and psychological principles such as mental accounting and social proof can significantly improve conversion rates for a savings membership product.

AB testingCST modelOperations
0 likes · 7 min read
How the CST Model Boosts User Conversion: A Design Case Study
Huolala Tech
Huolala Tech
Oct 29, 2021 · Operations

How Huolala Guarantees Cloud‑Native Stability at Scale

In this detailed account of Huolala's 2021 Cloud Operations Best Practices talk, the company shares its multi‑cloud architecture, service‑oriented governance, capacity‑testing, monitoring, and risk‑prediction techniques that together ensure high‑availability and efficient scaling for its diverse logistics services.

Operationscapacity testingmonitoring
0 likes · 17 min read
How Huolala Guarantees Cloud‑Native Stability at Scale
Alibaba Terminal Technology
Alibaba Terminal Technology
Oct 25, 2021 · Cloud Native

How Alibaba’s AServer Gateway Evolved to a Cloud‑Native Architecture

Alibaba’s AServer access gateway, handling billions of users and millions of QPS, transitioned from a monolithic tengine‑based system to a cloud‑native, containerized architecture with Kubernetes, Pilot, and Envoy, improving operational complexity, dynamic routing, traffic isolation, and scalability for massive e‑commerce traffic.

Cloud NativeKubernetesOperations
0 likes · 17 min read
How Alibaba’s AServer Gateway Evolved to a Cloud‑Native Architecture
Efficient Ops
Efficient Ops
Oct 22, 2021 · Operations

How Zhengzhou Bank Boosted Delivery Speed 2.3× with DevOps Standard Assessment

Zhengzhou Bank’s new retail loan system passed the third‑level DevOps continuous‑delivery assessment, leading to a 2.31‑fold increase in annual delivery demand, an 18‑day reduction in cycle time, and a shift to automated, seconds‑level environment delivery, illustrating the transformative power of standardized DevOps practices.

BankingContinuous DeliveryDevOps
0 likes · 12 min read
How Zhengzhou Bank Boosted Delivery Speed 2.3× with DevOps Standard Assessment
Efficient Ops
Efficient Ops
Oct 22, 2021 · Operations

What Do 42 Companies Reveal About DevOps Maturity in China?

The DevOps International Summit in Beijing announced that 42 enterprises covering 108 projects achieved level‑3 maturity in the CAICT DevOps Capability Model, highlighting the impact of standardized tools and processes on software delivery efficiency across finance, telecom and other sectors.

ChinaDevOpsMaturity Model
0 likes · 6 min read
What Do 42 Companies Reveal About DevOps Maturity in China?
ByteFE
ByteFE
Oct 20, 2021 · Operations

Troubleshooting DNS Resolution Failure of goofy.app in Singapore Office Due to DNSSEC Misconfiguration

After users in Singapore reported inability to resolve the internal domain goofy.app, a systematic investigation revealed that DNSSEC misconfiguration—specifically an incorrect DS record—caused DNS resolution failures globally, while Chinese DNS servers succeeded due to disabled DNSSEC validation, and removing the faulty key resolved the issue.

DNSSECDomain ResolutionOperations
0 likes · 8 min read
Troubleshooting DNS Resolution Failure of goofy.app in Singapore Office Due to DNSSEC Misconfiguration
Baidu Geek Talk
Baidu Geek Talk
Oct 20, 2021 · Operations

Practical Strategies for Building High‑Availability Systems

This article presents a comprehensive, step‑by‑step guide on improving system reliability through early fault detection, scope reduction, frequency reduction, and rapid incident handling, using real‑world practices from Baidu's commercial hosting platform.

Log StandardizationOperationscapacity planning
0 likes · 20 min read
Practical Strategies for Building High‑Availability Systems
Zhongtong Tech
Zhongtong Tech
Oct 19, 2021 · Operations

Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing

This article details ZTO's evolution from traditional offline and online load‑testing approaches—highlighting their shortcomings—to a comprehensive full‑chain performance testing solution that uses JavaAgent probes, shadow resources, and a structured deployment and verification process to ensure safe, accurate production testing.

Load TestingOperationsPerformance Testing
0 likes · 17 min read
Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing
360 Tech Engineering
360 Tech Engineering
Oct 15, 2021 · Operations

Log Collection Architecture Using Filebeat, Logstash, and Kafka

This article describes a lightweight, resource‑efficient log collection solution that combines Filebeat agents, optional Logstash aggregation, and Kafka transport, detailing configuration choices, meta‑persistence, back‑pressure mechanisms, monitoring setup, and deployment architecture for reliable at‑least‑once delivery.

FilebeatLogstashOperations
0 likes · 14 min read
Log Collection Architecture Using Filebeat, Logstash, and Kafka
DataFunTalk
DataFunTalk
Oct 15, 2021 · Artificial Intelligence

Risk Control and Operations for Existing Credit Customers: Models, Strategies, and Practices

This article examines how financial institutions can manage risk and improve operations for existing loan customers by analyzing client flow, regulatory impacts, accelerated deterioration, and layered segmentation, and by applying advanced models such as rule‑based alerts, B‑card scoring, LSTM, and survival analysis to enable timely risk detection and targeted cross‑selling.

Customer SegmentationOperationsfinancial modeling
0 likes · 20 min read
Risk Control and Operations for Existing Credit Customers: Models, Strategies, and Practices
IT Architects Alliance
IT Architects Alliance
Oct 14, 2021 · Operations

How to Build a TB‑Scale Log Monitoring System with ELK Stack

This article explains how to design and implement a TB‑level log monitoring platform for micro‑service environments using ELK Stack, Filebeat, Elastic APM, Kafka Streams, Prometheus, and Grafana, covering data collection, filtering, storage, and visualization while addressing cost and resource constraints.

ELKFilebeatGrafana
0 likes · 9 min read
How to Build a TB‑Scale Log Monitoring System with ELK Stack
DevOps
DevOps
Oct 12, 2021 · Operations

Gray Release (Canary Deployment): Concepts, Benefits, and Implementation Guide

This article explains what gray release (canary deployment) is, why it is needed to reduce risk and improve product quality, and provides a step‑by‑step guide covering strategy, user targeting, data feedback, rollback, deployment architectures, and version management for modern software operations.

OperationsVersion Controlcanary deployment
0 likes · 13 min read
Gray Release (Canary Deployment): Concepts, Benefits, and Implementation Guide
Open Source Linux
Open Source Linux
Oct 11, 2021 · Operations

10 Essential Ops Principles Every Engineer Should Follow

This article shares ten practical operations guidelines—from avoiding duplicated work and embracing mistakes to emphasizing monitoring, backup roles, clear division of labor, and continuous improvement—aimed at boosting reliability, efficiency, and team cohesion for both engineers and managers.

OperationsReliabilitybest practices
0 likes · 10 min read
10 Essential Ops Principles Every Engineer Should Follow
Open Source Linux
Open Source Linux
Oct 10, 2021 · Operations

Essential Linux Command-Line Tools to Boost Your Productivity

This article presents a curated list of powerful Linux command-line utilities—ranging from fast file searchers and interactive Git viewers to system monitors and multi‑threaded downloaders—each explained with concise descriptions and usage examples to help developers and sysadmins work more efficiently.

OperationsSysadmincommand-line
0 likes · 5 min read
Essential Linux Command-Line Tools to Boost Your Productivity
HaoDF Tech Team
HaoDF Tech Team
Oct 8, 2021 · Operations

Understanding SRE: Foundations, Metrics, and Tackling Technical Debt

This article introduces the fundamentals of Site Reliability Engineering (SRE), explains how to measure service stability with metrics like MTTR, MTBF, and availability, outlines the SRE workflow from prevention to post‑mortem, and discusses how to identify and reduce technical debt to improve system health.

OperationsReliabilitySRE
0 likes · 18 min read
Understanding SRE: Foundations, Metrics, and Tackling Technical Debt
dbaplus Community
dbaplus Community
Oct 7, 2021 · Databases

How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments

This article explains what MySQL slow queries are, why they cause system failures, proposes multi‑dimensional metrics to assess their severity, outlines concrete guidelines and change standards, and shares real‑world optimization cases and daily operational practices for eliminating slow SQL.

Database PerformanceMetricsOperations
0 likes · 13 min read
How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments
Top Architect
Top Architect
Oct 7, 2021 · Backend Development

Essential Linux Commands and Java Debugging Tools for Backend Engineers

This article compiles a practical set of Linux command examples and Java debugging utilities—including tail, grep, awk, find, tsar, btrace, Greys, Arthas, JProfiler, and various JVM tools—to help backend developers quickly diagnose and resolve performance and stability issues in production environments.

LinuxOperationsdebugging
0 likes · 13 min read
Essential Linux Commands and Java Debugging Tools for Backend Engineers
IT Architects Alliance
IT Architects Alliance
Oct 1, 2021 · Operations

Understanding Service Degradation: Definitions, Levels, and Mitigation Strategies

The article explains service degradation concepts, defines SLA levels and the meaning of six nines, and details various degradation techniques such as fallback data, rate‑limiting, timeout, fault handling, read/write strategies, frontend safeguards, and the use of switches and pre‑embedding to maintain system availability during traffic spikes or failures.

FallbackOperationsSLA
0 likes · 12 min read
Understanding Service Degradation: Definitions, Levels, and Mitigation Strategies
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 30, 2021 · Operations

Key Findings from the 2021 DORA DevOps Report: SRE Practices, Documentation, Security, and Culture

The 2021 DORA DevOps Report reveals that elite teams outperform low‑performing teams by adopting SRE principles, high‑quality documentation, integrated security, modern technical practices such as loose coupling, continuous testing, CI/CD, and a performance‑driven culture that fosters belonging and inclusion.

CultureOperationsSRE
0 likes · 19 min read
Key Findings from the 2021 DORA DevOps Report: SRE Practices, Documentation, Security, and Culture
Liangxu Linux
Liangxu Linux
Sep 28, 2021 · Operations

Top Linux CLI Tools for Real‑Time Network Bandwidth Monitoring

This article surveys a collection of Linux command‑line utilities that can monitor overall, per‑interface, per‑socket, and per‑process network bandwidth, explaining how each tool works, what data it reports, and how to install it on major distributions.

LinuxNetwork MonitoringOperations
0 likes · 10 min read
Top Linux CLI Tools for Real‑Time Network Bandwidth Monitoring
Open Source Linux
Open Source Linux
Sep 27, 2021 · Operations

Step-by-Step Guide to Installing Zabbix 5 on CentOS 7

This article provides a comprehensive, hands‑on tutorial for installing and configuring Zabbix 5 on CentOS 7, covering system overview, key terminology, disabling SELinux and firewalls, setting up repositories, installing server, agent, frontend, MariaDB, database initialization, configuration tweaks, and final web‑UI setup.

CentOSInstallationOperations
0 likes · 9 min read
Step-by-Step Guide to Installing Zabbix 5 on CentOS 7
Programmer DD
Programmer DD
Sep 27, 2021 · Operations

How a Rural County Built China’s Dominant Copy‑Printing Empire

This article traces the emergence and evolution of Newhua County’s copy‑printing industry—from 1960s typewriter repairs to a nationwide network of repair shops, second‑hand markets, and equipment manufacturing—highlighting its social roots, ladder‑style development, research methods, key findings, and lasting impact on China’s office‑equipment sector.

ChinaNewhuaOperations
0 likes · 25 min read
How a Rural County Built China’s Dominant Copy‑Printing Empire
Efficient Ops
Efficient Ops
Sep 23, 2021 · Operations

Why Did Our New Deployment Crash? Uncovering Metaspace‑Induced Full‑GC

The article recounts a staged rollout of the Maybach service on elastic cloud, details the timeline of successful and failing deployments, analyzes JVM metrics revealing excessive Metaspace usage that triggered continuous full garbage collections, and explains how this caused system‑wide timeouts and a half‑hour outage.

Full GCJVMMetaspace
0 likes · 10 min read
Why Did Our New Deployment Crash? Uncovering Metaspace‑Induced Full‑GC
Efficient Ops
Efficient Ops
Sep 23, 2021 · Operations

How Leading Chinese Insurers Achieved DevOps Maturity: Case Studies and Insights

This article examines how three major Chinese insurance firms applied the CAICT DevOps Capability Maturity Model to improve IT efficiency, integrate teams, and accelerate continuous delivery, highlighting architectural innovations, cloud adoption, and measurable performance gains across distributed core systems, e‑commerce platforms, and agricultural claims solutions.

Continuous DeliveryDevOpsInsurance
0 likes · 9 min read
How Leading Chinese Insurers Achieved DevOps Maturity: Case Studies and Insights
Liangxu Linux
Liangxu Linux
Sep 22, 2021 · Cloud Native

Master Dockerfile: Complete Guide to All Instructions and Best Practices

This article provides a comprehensive, step‑by‑step explanation of every Dockerfile instruction—including variables, FROM, RUN, CMD, LABEL, EXPOSE, ENV, ARG, ADD, COPY, ENTRYPOINT, VOLUME, STOPSIGNAL, HEALTHCHECK, SHELL, WORKDIR, and USER—along with syntax details, usage tips, and practical code examples for building efficient container images.

ContainerDockerDockerfile
0 likes · 12 min read
Master Dockerfile: Complete Guide to All Instructions and Best Practices
Efficient Ops
Efficient Ops
Sep 22, 2021 · Operations

Master Advanced kubectl Tricks: Debug, Filter, and Automate Kubernetes Pods

This article shares a collection of powerful kubectl commands and techniques—including API debugging, status‑based pod filtering and deletion, node‑specific pod listing, pod distribution statistics, and proxy usage—to help Kubernetes operators work more efficiently and avoid manual API scripting.

CLIDevOpsOperations
0 likes · 7 min read
Master Advanced kubectl Tricks: Debug, Filter, and Automate Kubernetes Pods
DevOps Cloud Academy
DevOps Cloud Academy
Sep 21, 2021 · Operations

Practical Elasticsearch Operations and Performance Tuning Guide

This article extends previous Elasticsearch cheat sheets with practical commands and step‑by‑step instructions for shard allocation, replica adjustment, cluster settings, slow‑log configuration, mapping routing, force merge, bulk writes, refresh intervals, translog durability, heap sizing, disk‑space monitoring, and troubleshooting strategies.

Cluster ManagementElasticsearchOperations
0 likes · 7 min read
Practical Elasticsearch Operations and Performance Tuning Guide
Efficient Ops
Efficient Ops
Sep 16, 2021 · Operations

How Chinese Banks Are Accelerating Digital Transformation with DevOps Maturity

This article reviews the China Academy of Information and Communications Technology's DevOps Capability Maturity Model, shows how major state‑owned banks have participated in 39 assessments, and presents detailed case studies illustrating each bank's DevOps adoption, challenges, and outcomes.

BankingCapability Maturity ModelDevOps
0 likes · 11 min read
How Chinese Banks Are Accelerating Digital Transformation with DevOps Maturity
Efficient Ops
Efficient Ops
Sep 15, 2021 · Operations

How China’s Telecom Giants Accelerate Efficiency with the DevOps Maturity Model

This article details how leading Chinese telecom operators have adopted the CAICT‑led DevOps Capability Maturity Model, evaluating 17 projects across multiple companies to improve IT efficiency, integrate resources, and support business systems, showcasing concrete performance gains and best‑practice insights.

Continuous DeliveryDevOpsMaturity Model
0 likes · 15 min read
How China’s Telecom Giants Accelerate Efficiency with the DevOps Maturity Model
Java Architect Essentials
Java Architect Essentials
Sep 14, 2021 · Operations

Graceful Service Startup and Shutdown for Microservices with Spring Boot and Docker

This article explains how to implement graceful shutdown and startup for microservices using JVM shutdown hooks, Spring Boot's built‑in mechanisms, Docker stop signals, and external containers like Jetty, providing code examples and best‑practice recommendations for ensuring services deregister, reject traffic, and start only after health checks succeed.

DockerGracefulShutdownMicroservices
0 likes · 10 min read
Graceful Service Startup and Shutdown for Microservices with Spring Boot and Docker
Efficient Ops
Efficient Ops
Sep 14, 2021 · Operations

How China’s Leading Banks Achieve DevOps Maturity: Real‑World Case Studies

This article examines how major Chinese state‑owned banks applied the CAICT DevOps Capability Maturity Model to improve IT efficiency, integrate resources, and support business systems, detailing assessment numbers, project implementations, challenges, and outcomes across continuous delivery, security, and toolchain standards.

BankingContinuous DeliveryDevOps
0 likes · 14 min read
How China’s Leading Banks Achieve DevOps Maturity: Real‑World Case Studies
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Sep 11, 2021 · Operations

Mastering Arthas: A Practical Guide to Java Runtime Debugging and Monitoring

This article introduces Arthas, a Java online diagnostic tool, explains its instrumentation‑based runtime principle, guides installation on various platforms, and provides a comprehensive command reference—including basic, system, class, and enhancement commands—for effective debugging, monitoring, and performance analysis of Java applications.

ArthasInstrumentationOperations
0 likes · 10 min read
Mastering Arthas: A Practical Guide to Java Runtime Debugging and Monitoring
Alibaba Terminal Technology
Alibaba Terminal Technology
Sep 10, 2021 · Mobile Development

How Taobao Overhauled Mobile Diagnostics to Achieve 5‑15‑60 SLA

Taobao redesigned its mobile client’s diagnostics and logging architecture—introducing scenario‑based monitoring, standardized log protocols, snapshot collection, and change‑tracking SDKs—to meet a 5‑minute response, 15‑minute identification, and 60‑minute recovery goal, dramatically improving issue detection, analysis, and resolution efficiency.

Operationsclient-sidelog system
0 likes · 17 min read
How Taobao Overhauled Mobile Diagnostics to Achieve 5‑15‑60 SLA
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How a Chinese Consumer Finance Firm Boosted Efficiency with DevOps – Level‑3 Assessment

In a detailed interview, Henan Zhongyuan Consumer Finance explains how its new generation consumer loan system achieved the industry‑first Level‑3 DevOps continuous delivery assessment, highlighting the standards, tools, performance metrics, challenges overcome, and future plans that together illustrate the transformative power of standardized DevOps practices.

Continuous DeliveryDevOpsOperations
0 likes · 12 min read
How a Chinese Consumer Finance Firm Boosted Efficiency with DevOps – Level‑3 Assessment
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How CITIC Securities Boosted Efficiency with DevOps: A Deep Dive into Their Level‑3 Assessment

CITIC Securities’ CIO Xiao Gang discusses how their outsourced service platform achieved Level‑3 DevOps continuous delivery assessment, detailing the motivations, implementation challenges, measurable improvements, and future plans, while highlighting the broader significance of the national DevOps maturity model for the financial sector.

Continuous DeliveryDevOpsDigital Transformation
0 likes · 11 min read
How CITIC Securities Boosted Efficiency with DevOps: A Deep Dive into Their Level‑3 Assessment
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How Haitong Securities Boosted Efficiency with DevOps Standard Evaluation

The interview reveals how Haitong Securities leveraged the national DevOps standard assessment to transform its software development, achieving level‑3 continuous delivery maturity, accelerating release cycles, improving quality, and outlining future DevSecOps and industry‑specific standardization plans.

Continuous DeliveryDevOpsOperations
0 likes · 11 min read
How Haitong Securities Boosted Efficiency with DevOps Standard Evaluation
Efficient Ops
Efficient Ops
Sep 9, 2021 · Operations

How China Construction Bank’s FinTech Arm Earned Top Marks in the National DevOps Standard

The article details how JiAnXin FinTech’s YaoGuang Agile Development Platform achieved an excellent rating in China’s first national DevOps standard evaluation, sharing interview insights on platform architecture, the importance of end‑to‑end toolchains, future DevOps trends, and the tangible benefits realized after the assessment.

Continuous DeliveryDevOpsFinTech
0 likes · 12 min read
How China Construction Bank’s FinTech Arm Earned Top Marks in the National DevOps Standard
Open Source Linux
Open Source Linux
Sep 4, 2021 · Operations

How to Use nologin to Block User Logins on Linux

This guide explains how the Linux nologin command can politely deny user logins, log attempts, and provides multiple methods—including command-line usage, password locking, and /etc/passwd modifications—to restrict login access for specific or all users during system maintenance.

LinuxOperationsSystem Administration
0 likes · 3 min read
How to Use nologin to Block User Logins on Linux
HelloTech
HelloTech
Sep 2, 2021 · Operations

How Production Full‑Link Load Testing Guarantees High Availability at Scale

The article explains why large‑scale services must conduct production full‑link load testing, describes its evolution from ad‑hoc trials to standardized monthly practices, and details the technical and procedural steps—including traffic modeling, JMeter usage, middleware tagging, and responsibility mapping—that ensure reliable capacity planning and risk mitigation.

MicroservicesOperationscapacity planning
0 likes · 13 min read
How Production Full‑Link Load Testing Guarantees High Availability at Scale
Liangxu Linux
Liangxu Linux
Aug 29, 2021 · Operations

Boosting a Python Service to 50k QPS: My Step‑by‑Step Performance Tuning

Through a detailed case study, the author documents the process of optimizing a Python‑based web module—identifying bottlenecks, redesigning architecture with Redis queues, tuning MySQL, adjusting Linux TCP settings, and iteratively load‑testing until achieving 50,000 QPS with sub‑70 ms latency and zero errors.

BackendOperationsPython
0 likes · 9 min read
Boosting a Python Service to 50k QPS: My Step‑by‑Step Performance Tuning
JD Retail Technology
JD Retail Technology
Aug 24, 2021 · Operations

Key Metrics and Process for Lean Value Stream Analysis

The article explains how lean value‑stream analysis uses meaningful metrics such as lead time, process time and percent complete & accurate, outlines a step‑by‑step workflow for mapping and evaluating value streams, and demonstrates the approach with a department‑level case study and radar‑chart analysis.

LeanOperationsValue Stream
0 likes · 6 min read
Key Metrics and Process for Lean Value Stream Analysis
Efficient Ops
Efficient Ops
Aug 23, 2021 · Operations

Master HAProxy: Build High‑Performance L7/L4 Load Balancers & HA Clusters

This guide introduces HAProxy, an open‑source L4/L7 load balancer, and walks through its core features, performance and stability characteristics, step‑by‑step installation on CentOS 7, configuration of both L7 and L4 balancing, monitoring, and setting up high‑availability with Keepalived.

HAProxyLinuxOperations
0 likes · 27 min read
Master HAProxy: Build High‑Performance L7/L4 Load Balancers & HA Clusters
IT Architects Alliance
IT Architects Alliance
Aug 21, 2021 · Operations

Mastering Nginx: From Basics to Advanced Load Balancing and Rate Limiting

This article explains what Nginx is, why it’s chosen for high‑performance reverse proxy and load balancing, walks through its event‑driven architecture, core configuration directives, virtual host setups, location regex rules, static‑dynamic separation, rate‑limiting techniques, load‑balancing algorithms, high‑availability settings and practical code examples.

ConfigurationNginxOperations
0 likes · 19 min read
Mastering Nginx: From Basics to Advanced Load Balancing and Rate Limiting
58UXD
58UXD
Aug 20, 2021 · Operations

How the Ganjian Salary Wish Festival Boosted User Engagement

This article analyzes the Ganjian Salary Wish Festival as a case study of operational marketing, exploring industry insights, audience targeting, brand messaging, benefit‑driven conversion, interactive game design, and data results to reveal how such activities can sustainably retain users beyond simple incentives.

MarketingOperationscase study
0 likes · 5 min read
How the Ganjian Salary Wish Festival Boosted User Engagement
Architects' Tech Alliance
Architects' Tech Alliance
Aug 16, 2021 · Operations

The Evolution, Types, and Pitfalls of Enterprise Mid‑Platform Architecture

This article traces the history of the Chinese "mid‑platform" concept, outlines how major tech firms implement various middle‑platform strategies, distinguishes front‑end, back‑end, and middle layers, categorizes platform types, and highlights common pitfalls and organizational challenges in building such platforms.

Operationsbusiness architectureenterprise architecture
0 likes · 12 min read
The Evolution, Types, and Pitfalls of Enterprise Mid‑Platform Architecture
Efficient Ops
Efficient Ops
Aug 11, 2021 · Operations

Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips

This guide outlines how to prepare large‑scale Kubernetes clusters on public clouds by increasing node quotas, adjusting kernel parameters, configuring high‑availability etcd with the etcd‑operator, tuning kube‑apiserver settings, and applying pod‑level best practices for resource limits and affinity.

Kernel TuningOperationscluster scaling
0 likes · 8 min read
Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips
DevOps
DevOps
Aug 11, 2021 · Operations

Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems

This article explains that chaos engineering is not a magic cure but a disciplined practice for testing distributed systems by designing and running controlled experiments, outlining four essential steps—observability, defining steady state, hypothesizing events, and executing experiments—to gain confidence in system resilience.

Operationschaos engineeringexperimentation
0 likes · 11 min read
Introduction to Chaos Engineering – Part 2: Four Steps for Disrupting Complex Systems
DevOps
DevOps
Aug 9, 2021 · Operations

Microsoft Digital: Internal IT Transformation and Operational Excellence

Microsoft Digital describes how Microsoft’s internal IT organization, renamed from CSEO to Microsoft Digital, drove a comprehensive digital transformation by migrating operations to Azure, adopting cloud‑centric architecture, implementing DevOps, enhancing security, data, and AI capabilities, and aligning vision‑driven priorities to boost productivity, customer focus, and business outcomes.

Data AnalyticsDigital TransformationInformation Security
0 likes · 20 min read
Microsoft Digital: Internal IT Transformation and Operational Excellence
Alibaba Cloud Native
Alibaba Cloud Native
Aug 6, 2021 · Operations

Scaling Chaos Engineering at Qunar: Lessons from Thousands of Microservices

Qunar shares how it built a large‑scale chaos engineering platform for thousands of microservices, detailing tool selection, architecture, evolution stages, fault‑injection scenarios, strong/weak dependency automation, open‑source contributions, and future plans for automated random drills.

Cloud NativeFault InjectionOperations
0 likes · 9 min read
Scaling Chaos Engineering at Qunar: Lessons from Thousands of Microservices
Wukong Talks Architecture
Wukong Talks Architecture
Aug 6, 2021 · Databases

Redis Operational Best Practices and Guidelines

This guide presents a comprehensive set of mandatory, reference, and recommended Redis usage standards—including command restrictions, key naming, data sizing, persistence configurations, monitoring, and deployment strategies—to improve performance, reliability, and operational efficiency for production environments.

OperationsPersistencebest practices
0 likes · 9 min read
Redis Operational Best Practices and Guidelines
Efficient Ops
Efficient Ops
Aug 2, 2021 · Operations

How Alibaba Scales Massive Big Data Engines with an SRE Framework

This article describes Alibaba’s comprehensive SRE system for managing ultra‑large‑scale big data engines, detailing stability metrics, resource cost management, and intelligent operation productization, and introduces speaker Fu Tianyuan, a senior operations expert leading the MaxCompute and DataWorks SRE team.

AlibabaBig DataOperations
0 likes · 3 min read
How Alibaba Scales Massive Big Data Engines with an SRE Framework
ByteDance SE Lab
ByteDance SE Lab
Jul 30, 2021 · Operations

Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It

The article examines Salesforce’s five‑hour global outage caused by a shortcut DNS deployment and the subsequent recovery challenges, then explores a viral experiment where twenty smartphones generated artificial traffic congestion, illustrating how real‑time data feeds and operational safeguards can prevent large‑scale service disruptions.

Big DataOperationsSaaS
0 likes · 7 min read
Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It
DevOps Cloud Academy
DevOps Cloud Academy
Jul 29, 2021 · Operations

Ensuring the CI/CD Pipeline Is the Sole Path to Production

The article emphasizes that a CI/CD pipeline must be the exclusive route for deploying immutable artifacts to production, warning against direct local deployments, highlighting risks of lost traceability, and urging strict network-level controls to ensure only the pipeline can release code.

DevOpsOperationsPipeline
0 likes · 4 min read
Ensuring the CI/CD Pipeline Is the Sole Path to Production
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 29, 2021 · Mobile Development

How Mobile API Gateways Transform App Development and Scale High‑Traffic Services

Mobile API gateways act as protocol adapters between networks, centralizing services for mobile apps; the article explains their role at Alibaba, the evolution of R&D efficiency through unified programming models and SDKs, large‑scale platform development, high‑availability strategies, and the EMAS top‑level model for mobile development.

EMASMobile DevelopmentOperations
0 likes · 9 min read
How Mobile API Gateways Transform App Development and Scale High‑Traffic Services
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jul 28, 2021 · Operations

Common Open‑Source Tools for MySQL Operations and Maintenance

This article introduces a curated list of open‑source MySQL operational tools—including online DDL changers, backup and restore utilities, load‑testing frameworks, flashback solutions, slow‑query analyzers, replication consistency checkers, audit platforms, and graphical clients—explaining their principles, usage scenarios, and visual references.

BackupOperationsReplication
0 likes · 8 min read
Common Open‑Source Tools for MySQL Operations and Maintenance
DevOps
DevOps
Jul 28, 2021 · Operations

Improving System Availability: Stages, Influencing Factors, and Practical Measures

This article explains system availability, outlines three stages of incident handling, identifies key factors that degrade availability such as human error, avalanche effects, untested releases and infrastructure failures, and proposes technical and team‑oriented practices to enhance reliability and achieve higher "nines" of uptime.

OperationsReliabilityincident management
0 likes · 11 min read
Improving System Availability: Stages, Influencing Factors, and Practical Measures