Tagged articles
3281 articles
Page 16 of 33
Efficient Ops
Efficient Ops
Mar 3, 2022 · Operations

How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Models

This article reviews how leading Chinese telecom operators adopted the CAICT‑led DevOps Capability Maturity Model, detailing 17 evaluated projects across companies, the improvements achieved in continuous delivery, technical operations, and tooling, and the broader impact on IT efficiency and digital transformation.

Continuous DeliveryDevOpsIT efficiency
0 likes · 15 min read
How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Models
Youzan Coder
Youzan Coder
Mar 3, 2022 · Operations

How Standard Deviation Uncovers Hidden Bottlenecks in Software R&D Throughput

The article introduces a new R&D efficiency metric—throughput standard deviation—explains its statistical basis, shows how it was derived from annual reports, illustrates its application across multiple teams, and discusses practical insights and limitations for software development operations.

MetricsOperationsR&D efficiency
0 likes · 7 min read
How Standard Deviation Uncovers Hidden Bottlenecks in Software R&D Throughput
Efficient Ops
Efficient Ops
Mar 2, 2022 · Operations

How Chinese Banks Accelerate IT Efficiency with DevOps Maturity Models

This article reviews how major Chinese joint‑stock banks have adopted the CAICT‑led DevOps Capability Maturity Model, detailing assessment numbers, case studies of each bank's DevOps implementation, and the model’s standards and industry impact.

BankingDevOpsDigital Transformation
0 likes · 16 min read
How Chinese Banks Accelerate IT Efficiency with DevOps Maturity Models
Efficient Ops
Efficient Ops
Mar 2, 2022 · Operations

How Chinese Banks Accelerate Digital Transformation with DevOps Maturity Models

Amid a nationwide digital transformation push, Chinese banks such as Ningbo, Zhengzhou, Baixin, and others have leveraged the China Information and Communication Research Institute's DevOps Capability Maturity Model to assess and improve their IT efficiency, team integration, and continuous delivery practices, providing valuable industry insights.

BankingDevOpsDigital Transformation
0 likes · 15 min read
How Chinese Banks Accelerate Digital Transformation with DevOps Maturity Models
DevOps Cloud Academy
DevOps Cloud Academy
Mar 2, 2022 · Operations

Key DevOps Metrics for Effective Software Delivery

This article explains the most important DevOps metrics—such as deployment frequency, lead time, automated test pass rate, change failure rate, MTTR, and others—and how tracking them helps teams improve software delivery speed, quality, and operational efficiency.

DevOpsMetricsOperations
0 likes · 10 min read
Key DevOps Metrics for Effective Software Delivery
Efficient Ops
Efficient Ops
Mar 1, 2022 · Operations

How Chinese Banks Boost IT Efficiency with the DevOps Maturity Model

This article outlines how major Chinese banks have adopted the CAICT‑led DevOps Capability Maturity Model, presenting assessment counts across state‑owned, joint‑stock, and city commercial banks, summarizing the model’s standards, evaluation domains, and providing contact details for further inquiries.

BankingDevOpsDigital Transformation
0 likes · 6 min read
How Chinese Banks Boost IT Efficiency with the DevOps Maturity Model
Efficient Ops
Efficient Ops
Mar 1, 2022 · Operations

How China’s Leading Banks Master DevOps: Insights from the CAICT Maturity Model

This article reviews how major Chinese state‑owned banks applied the China Academy of Information and Communications Technology’s DevOps Capability Maturity Model, detailing assessment numbers, case studies of e‑life, AI advisory, mobile banking, and cloud‑native platforms, and highlighting the operational and security benefits achieved.

BankingContinuous DeliveryDevOps
0 likes · 17 min read
How China’s Leading Banks Master DevOps: Insights from the CAICT Maturity Model
FunTester
FunTester
Feb 27, 2022 · Operations

Performance Testing Articles Collection (Chinese Resources)

This collection compiles dozens of Chinese articles on performance testing, covering tools, frameworks, case studies, and techniques such as netdata monitoring, load generators, concurrency utilities, distributed testing, QPS modeling, and comparisons of JMeter, K6, Gatling, and FunTester.

BenchmarkingLoad TestingOperations
0 likes · 8 min read
Performance Testing Articles Collection (Chinese Resources)
Ops Development Stories
Ops Development Stories
Feb 25, 2022 · Operations

Recovering a Ceph 16 Cluster After System Disk Failure

This guide walks through the step‑by‑step process of restoring a Ceph 16 cluster when a node's system disk fails, covering host removal, node re‑initialization, Docker and Cephadm installation, host addition, labeling, OSD recreation, and final verification.

CephCluster RecoveryOperations
0 likes · 7 min read
Recovering a Ceph 16 Cluster After System Disk Failure
IT Architects Alliance
IT Architects Alliance
Feb 23, 2022 · Operations

A Historical Overview of DevOps and Its Related Practices

This article traces the evolution of DevOps from its roots in Toyota’s Production System and early manufacturing practices through the emergence of Kanban, Waterfall, Scrum, Agile, Lean, and modern extensions like ChatOps, GitOps, FinOps and AiOps, highlighting key milestones and concepts.

KanbanLeanOperations
0 likes · 10 min read
A Historical Overview of DevOps and Its Related Practices
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Feb 23, 2022 · Operations

Essential Project Management Charts Every Manager Should Use

This guide introduces the most effective project management charts—including Gantt, burn‑down, WBS, HOQ, RACI, matrix, PERT, mind‑map, decision‑tree, and status tables—explaining their purpose, key components, and how to create them with common tools like Excel, Visio, and Xmind.

Burn-down ChartGantt ChartOperations
0 likes · 6 min read
Essential Project Management Charts Every Manager Should Use
HomeTech
HomeTech
Feb 23, 2022 · Operations

Construction and Future Planning of the Quality Assurance Technical System at Home

Facing rapid business growth and evolving mobile, AI, and automotive trends, the Home quality assurance team outlines its testing cloud platform’s three‑layer architecture, current capabilities such as performance, automation, and code scanning, the challenges it confronts, and its roadmap for expanding Paas and mobile testing.

Automation testingOperationsTesting Platform
0 likes · 11 min read
Construction and Future Planning of the Quality Assurance Technical System at Home
Architecture Digest
Architecture Digest
Feb 19, 2022 · Operations

Guide to Setting Up and Using the JVM Monitoring Tool with Spring Boot

This article provides a step‑by‑step tutorial for installing, configuring, and running a JVM monitoring solution that integrates with Spring Boot, covering repository cloning, server configuration, Maven installation, application property setup, and accessing the monitor server UI.

GitJVM MonitoringOperations
0 likes · 4 min read
Guide to Setting Up and Using the JVM Monitoring Tool with Spring Boot
Ctrip Technology
Ctrip Technology
Feb 17, 2022 · Operations

Evolution and Architecture of the Hickwall Enterprise Monitoring Platform

The article details the background, challenges, multi‑year evolution, current architecture, and future roadmap of Hickwall, Ctrip's enterprise‑grade monitoring and observability platform, covering metrics, logs, traces, high‑cardinality handling, cloud‑native integration, alert governance, and storage engine migrations.

AlertingOperationsTSDB
0 likes · 15 min read
Evolution and Architecture of the Hickwall Enterprise Monitoring Platform
IT Architects Alliance
IT Architects Alliance
Feb 15, 2022 · Operations

What Real-World Performance Tuning Taught Us About Legacy Web Apps

After a traffic surge exposed severe latency in a 15-year-old multi-service web platform, we used monitoring to discover a DB-connection leak caused by a liveness probe, corrected it, and distilled four practical lessons on latency metrics, tooling, legacy maintenance, and code vigilance.

APMLoad TestingOperations
0 likes · 9 min read
What Real-World Performance Tuning Taught Us About Legacy Web Apps
dbaplus Community
dbaplus Community
Feb 14, 2022 · Operations

Building a Robust Monitoring System for Securities Firms with Open‑Source Tools

This article explains why securities firms must adopt comprehensive, centralized monitoring, outlines regulatory and SLA drivers, identifies common monitoring shortcomings, and provides a step‑by‑step guide using open‑source solutions like Zabbix and Grafana to design, implement, evaluate, and continuously improve monitoring management.

GrafanaIT infrastructureOperations
0 likes · 33 min read
Building a Robust Monitoring System for Securities Firms with Open‑Source Tools
IT Services Circle
IT Services Circle
Feb 12, 2022 · Operations

Elon Musk Unveils Latest Starship Updates and Ambitious Mars Plans

Elon Musk presented new Starship performance data, outlined a goal of up to 50 launches this year and three daily launches in the future, described the spacecraft’s dimensions, propulsion, heat shield and orbital refueling technology, and reiterated his long‑term vision of making humanity a multiplanet species by colonising Mars.

AerospaceMarsOperations
0 likes · 9 min read
Elon Musk Unveils Latest Starship Updates and Ambitious Mars Plans
Alibaba Terminal Technology
Alibaba Terminal Technology
Feb 11, 2022 · Operations

How to Execute a Multi‑Phase IPv6 Migration for Large‑Scale Services

This guide outlines a comprehensive, three‑stage IPv6 migration roadmap—including network upgrades, DNS/HTTPDNS redesign, security hardening, cloud and CDN adaptation, and mobile/app adjustments—to achieve full IPv6‑only support across infrastructure, services, and end‑users while ensuring seamless performance and security.

IPv6MobileNetwork Migration
0 likes · 22 min read
How to Execute a Multi‑Phase IPv6 Migration for Large‑Scale Services
Efficient Ops
Efficient Ops
Feb 10, 2022 · Operations

Why Did a Metaspace Misconfiguration Crash Our Elastic Cloud Service?

A production incident on an elastic‑cloud deployment revealed that setting the JVM Metaspace limit to 64 MiB, while the application required around 76 MiB, triggered continuous Full GC, causing stop‑the‑world pauses, full‑line time‑outs, and a costly rollback.

Elastic CloudJVMMetaspace
0 likes · 9 min read
Why Did a Metaspace Misconfiguration Crash Our Elastic Cloud Service?
Efficient Ops
Efficient Ops
Feb 7, 2022 · Operations

How Xinwang Bank Overcame DevOps Hurdles to Pass a Level‑3 Continuous Delivery Assessment

In 2021, Xinwang Bank’s digital-native team tackled tight deadlines, tool migrations, personnel shifts, and intense debates to successfully achieve a Level‑3 DevOps continuous‑delivery assessment for its distributed consumer‑credit core system, demonstrating how coordinated effort and containerization can boost operational excellence.

Banking TechnologyContinuous DeliveryDevOps
0 likes · 9 min read
How Xinwang Bank Overcame DevOps Hurdles to Pass a Level‑3 Continuous Delivery Assessment
21CTO
21CTO
Feb 7, 2022 · Operations

Why Every Line of Code Matters: Boosting Performance by 3000% with a Simple DB Fix

This article shares hard‑won lessons from optimizing fifteen high‑load web applications, highlighting how a tiny DB‑connection leak in a pod probe caused severe slowdown and how fixing it, along with proper load testing, monitoring, and investment in tools and people, can dramatically improve system performance.

APMLoad TestingOperations
0 likes · 9 min read
Why Every Line of Code Matters: Boosting Performance by 3000% with a Simple DB Fix
Java Backend Technology
Java Backend Technology
Feb 7, 2022 · Operations

Why Did the Internet Crash in 2021? 10 Major Outage Lessons

The article reviews ten significant 2021 internet outages—both domestic and international—analyzing their root causes, from server room power failures to configuration bugs, and highlights the operational lessons engineers can learn to improve system resilience.

OperationsOutagecase study
0 likes · 17 min read
Why Did the Internet Crash in 2021? 10 Major Outage Lessons
dbaplus Community
dbaplus Community
Jan 29, 2022 · Operations

Accelerating Call Center Incident Recovery: Practical Fault Handling and Monitoring Strategies

This article walks through a real call‑center outage scenario, outlines step‑by‑step fault identification, emergency recovery actions, monitoring enhancements, concise emergency‑plan design, and introduces intelligent, automated event handling to help operations teams resolve incidents faster and more reliably.

Operationscall centeremergency plan
0 likes · 14 min read
Accelerating Call Center Incident Recovery: Practical Fault Handling and Monitoring Strategies
Architect
Architect
Jan 25, 2022 · Databases

Designing a High‑Availability Redis Service with Sentinel

This article explains why Redis needs high availability, defines failure scenarios, compares several HA architectures—including single‑instance, master‑slave with one or multiple Sentinel processes, and a three‑node solution with a virtual IP—and provides practical guidance for building a reliable Redis service.

Operationshigh availabilityredis
0 likes · 12 min read
Designing a High‑Availability Redis Service with Sentinel
Qunar Tech Salon
Qunar Tech Salon
Jan 24, 2022 · Databases

Qunar 2021 Technical Salon – Infrastructure Articles Collection (Databases, Operations, Components)

This article compiles the 2021 Qunar Technical Salon infrastructure series, presenting original technical writings on databases, operational practices, and core components, each linked to detailed posts that share real‑world experiences, design guidelines, and performance insights for engineers and practitioners.

DevOpsInfrastructureOperations
0 likes · 7 min read
Qunar 2021 Technical Salon – Infrastructure Articles Collection (Databases, Operations, Components)
Top Architect
Top Architect
Jan 21, 2022 · Operations

Clearing and Reconciliation System in Payment Platforms: Architecture, Processes, and Data Handling

The article provides a comprehensive overview of payment‑system clearing and reconciliation, detailing fund inflow/outflow matching, reconciliation center functions, various one‑to‑one, many‑to‑many and one‑to‑many matching rules, data import, error handling, and auxiliary modules such as balance entry and abnormal data recovery.

FinancialOperationsReconciliation
0 likes · 29 min read
Clearing and Reconciliation System in Payment Platforms: Architecture, Processes, and Data Handling
Ops Development Stories
Ops Development Stories
Jan 21, 2022 · Operations

How to Combine ELK and Zabbix for Real‑Time Log Alerting

This guide explains how to integrate ELK's Logstash with Zabbix using the logstash‑output‑zabbix plugin, covering installation, configuration of Logstash pipelines, Zabbix template and trigger setup, and testing the end‑to‑end alerting workflow.

AlertingELKLog Monitoring
0 likes · 17 min read
How to Combine ELK and Zabbix for Real‑Time Log Alerting
58UXD
58UXD
Jan 20, 2022 · Operations

How to Redesign Offline Recruitment Stores: Key Moments and Service Strategies

This article analyzes the challenges of offline recruitment stores, explores why job seekers drop out or feel dissatisfied, presents user research from Chongqing, identifies critical service moments, and proposes concrete design strategies to improve the hiring experience and operational efficiency.

OperationsUser Researchoffline recruitment
0 likes · 12 min read
How to Redesign Offline Recruitment Stores: Key Moments and Service Strategies
IT Xianyu
IT Xianyu
Jan 14, 2022 · Operations

Redis Monitoring, Data Migration, and Cluster Management Tools Overview

This article introduces essential Redis operational tools, covering the INFO command for monitoring, Prometheus‑based redis‑exporter visualization, the Redis‑shake data migration utility, Redis‑full‑check consistency verification, and the CacheCloud platform for comprehensive cluster management.

CacheCloudData MigrationOperations
0 likes · 10 min read
Redis Monitoring, Data Migration, and Cluster Management Tools Overview
Programmer DD
Programmer DD
Jan 11, 2022 · Operations

Building a TB‑Scale Log Monitoring System with ELK Stack and Kafka Streams

This article explains how to design and implement a terabyte‑level log monitoring platform using ELK Stack, FileBeat, Elastic APM, Kafka Streams, Prometheus, and Grafana, covering data collection, filtering, visualization, and resource‑efficient processing for large‑scale microservice environments.

ELKGrafanaLog Monitoring
0 likes · 9 min read
Building a TB‑Scale Log Monitoring System with ELK Stack and Kafka Streams
Architecture Digest
Architecture Digest
Jan 10, 2022 · Operations

Comprehensive Guide to Deploying Filebeat and Graylog for Centralized Log Collection

This article explains how to use Filebeat and Graylog together for centralized log collection, covering Filebeat’s role, configuration files, input modules, Graylog’s architecture, pipeline rules, and step‑by‑step deployment using Docker and docker‑compose, providing practical commands and examples for operational environments.

DockerElasticsearchFilebeat
0 likes · 14 min read
Comprehensive Guide to Deploying Filebeat and Graylog for Centralized Log Collection
Open Source Linux
Open Source Linux
Jan 10, 2022 · Operations

Where Does Linux Store Its System Logs? A Complete /var/log Guide

This article enumerates the most common Linux system log files under /var/log, explains the purpose of each log—including messages, dmesg, auth, boot, daemon, and service-specific logs—and lists key subdirectories for web, mail, audit, and other services.

LinuxLog FilesOperations
0 likes · 5 min read
Where Does Linux Store Its System Logs? A Complete /var/log Guide
58UXD
58UXD
Jan 7, 2022 · Operations

How 3D World‑Building Transforms Home‑Service Operations at 58 Daojia

By analyzing user demographics, business model, and marketing needs, the 58 Daojia case study shows how constructing a cohesive 3D‑driven worldview—through gene‑family characters, mood‑board visual systems, and story‑centric design—enhances operational campaigns, boosts brand immersion, and streamlines visual asset reuse.

3D designDesignO2O
0 likes · 7 min read
How 3D World‑Building Transforms Home‑Service Operations at 58 Daojia
Practical DevOps Architecture
Practical DevOps Architecture
Jan 5, 2022 · Operations

Deploying Prometheus and Node Exporter on a Linux Host

This guide walks through installing Prometheus and Node Exporter on a Linux server, copying binaries to system paths, configuring Prometheus with scrape jobs for the local node and remote hosts, and running the exporters with specific collector options for system metrics.

OperationsPrometheusmonitoring
0 likes · 4 min read
Deploying Prometheus and Node Exporter on a Linux Host
Tencent Qidian Tech Team
Tencent Qidian Tech Team
Jan 5, 2022 · Operations

Building a Real‑Time Log Tracker for Phone SDKs Using Cloud‑Native Design

This article describes the design and implementation of a comprehensive log tracking system for a phone SDK, covering client‑side logging, colored classification, plugin mechanisms, cloud‑native architecture, serverless functions, Elasticsearch storage, and real‑time visual debugging to enable rapid issue identification and resolution.

Cloud NativeOperationsServerless
0 likes · 18 min read
Building a Real‑Time Log Tracker for Phone SDKs Using Cloud‑Native Design
Efficient Ops
Efficient Ops
Jan 4, 2022 · Operations

How China’s Leading Enterprises Are Shaping the New DevOps Efficiency Measurement Standard

The article explains how a collaborative effort by CAICT and more than 30 leading tech and financial companies created the most comprehensive DevOps efficiency measurement model, detailing its components, evaluation results, and upcoming assessment enrollment for enterprises seeking to boost software development performance.

DevOpsDigital TransformationOperations
0 likes · 6 min read
How China’s Leading Enterprises Are Shaping the New DevOps Efficiency Measurement Standard
Architect's Tech Stack
Architect's Tech Stack
Jan 3, 2022 · Operations

Overview of Redis Monitoring, Data Migration, and Cluster Management Tools

This article introduces essential Redis operational tools, covering real‑time monitoring with the INFO command and exporters, data migration using Redis‑shake, consistency checking via Redis‑full‑check, and cluster management through CacheCloud, while highlighting key metrics such as stat, commandstat, cpu, and memory.

CacheCloudOperationsPrometheus
0 likes · 10 min read
Overview of Redis Monitoring, Data Migration, and Cluster Management Tools
Top Architect
Top Architect
Jan 3, 2022 · Operations

Gray Release (Canary Deployment) Design and Implementation Guide

This article explains the concept of gray release, outlines a simple architecture with essential components, describes common traffic-splitting strategies, shows how to implement control in Nginx and service layers, and discusses complex scenarios such as multi‑service and data‑centric deployments.

A/B testingBackend ArchitectureDeployment Strategy
0 likes · 7 min read
Gray Release (Canary Deployment) Design and Implementation Guide
Continuous Delivery 2.0
Continuous Delivery 2.0
Dec 31, 2021 · Operations

Curated Reading List on DevOps, Software Delivery Performance, and Engineering Productivity

This article presents a concise collection of ten Chinese-language resources that summarize the 2021 DORA DevOps report, the importance of consistency in R&D, fundamental efficiency principles, Microsoft’s testing shift, Google’s release and productivity metrics, and SRE health measurements, offering valuable insights for modern software engineering teams.

Engineering ProductivityOperationsSRE
0 likes · 5 min read
Curated Reading List on DevOps, Software Delivery Performance, and Engineering Productivity
Alibaba Cloud Native
Alibaba Cloud Native
Dec 30, 2021 · Operations

How to Implement Chaos Engineering for Cloud‑Native Applications: A Step‑by‑Step Guide

This article explains how cloud‑native teams can adopt chaos engineering—defining its concepts, outlining its unique characteristics, and detailing a four‑stage implementation process from manual drills to production‑level raids, with practical steps, environment setups, and real‑world results.

Cloud NativeFault InjectionKubernetes
0 likes · 14 min read
How to Implement Chaos Engineering for Cloud‑Native Applications: A Step‑by‑Step Guide
HomeTech
HomeTech
Dec 30, 2021 · Operations

Open-falcon in Automotive Home: Application, Architecture, and Customizations

This article describes how the open‑falcon monitoring system is applied and customized at Automotive Home, covering its architecture, component roles, a comparison with other open‑source solutions, and the enhancements made for service‑tree based dynamic monitoring, alerting, self‑healing, and high‑availability deployment.

Open-FalconOperationsmonitoring
0 likes · 11 min read
Open-falcon in Automotive Home: Application, Architecture, and Customizations
Open Source Linux
Open Source Linux
Dec 30, 2021 · Operations

Master Network Troubleshooting: Proven Strategies to Resolve Common Issues

This comprehensive guide presents a step‑by‑step approach for diagnosing and fixing everyday network problems, covering fault scope identification, link and configuration checks, common diagnostic methods, detailed case studies, and essential command‑line tools for IT professionals.

IT supportOperationsdiagnostic steps
0 likes · 7 min read
Master Network Troubleshooting: Proven Strategies to Resolve Common Issues
DataFunSummit
DataFunSummit
Dec 29, 2021 · Operations

How to Build an Operations Monitoring Platform with Spring Boot Admin

This article explains what Spring Boot Admin is, walks through creating a server and client to monitor Spring Boot applications, shows how to configure ports, enable the admin UI, and set up email and custom alert notifications for operational health monitoring.

OperationsSpring Bootjava
0 likes · 12 min read
How to Build an Operations Monitoring Platform with Spring Boot Admin
Efficient Ops
Efficient Ops
Dec 28, 2021 · Operations

How China Post Savings Bank Achieved Top‑Tier DevOps Maturity: A Success Story

China Post Savings Bank’s three core systems passed the Level 3 DevOps Continuous Delivery assessment, showcasing leading domestic capabilities, while senior leaders discuss the bank’s DevOps evolution, measurable improvements, future DevSecOps plans, and the broader industry standards driving these results.

BankingContinuous DeliveryDevOps
0 likes · 13 min read
How China Post Savings Bank Achieved Top‑Tier DevOps Maturity: A Success Story
DevOps
DevOps
Dec 28, 2021 · Operations

The Pros and Cons of Work‑Hour Reporting for Knowledge Workers

This article examines the concept of work‑hour reporting, exploring its definitions, purposes, benefits such as productivity tracking and profit maximisation, and drawbacks including mistrust, administrative overhead, and misalignment with modern knowledge‑work practices, while also discussing agile approaches to time management.

Operationsproductivitytime tracking
0 likes · 10 min read
The Pros and Cons of Work‑Hour Reporting for Knowledge Workers
dbaplus Community
dbaplus Community
Dec 27, 2021 · Operations

How to Trace Server Latency and Build a Comprehensive Performance Toolkit

This guide explains how to trace transaction latency in multi‑vendor server environments, outlines the key monitoring metrics across CPU, network, disk and processes, compares coarse‑ and fine‑grained sampling, and proposes a unified, AI‑enhanced toolkit for diagnosing hardware and software performance bottlenecks.

AI analysisOperationshardware diagnostics
0 likes · 13 min read
How to Trace Server Latency and Build a Comprehensive Performance Toolkit
Efficient Ops
Efficient Ops
Dec 27, 2021 · Operations

How Ping An Bank’s Starlink Platform Earned Industry‑Leading DevOps Efficiency Rating

Ping An Bank’s Starlink DevOps platform was awarded the "industry promotion level" in the first batch evaluation of the China Academy of Information and Communications’ DevOps General Efficiency Measurement Model, highlighting its leading domestic performance and the bank’s commitment to digital governance and fine‑grained R&D efficiency management.

DevOpsDigital GovernanceOperations
0 likes · 12 min read
How Ping An Bank’s Starlink Platform Earned Industry‑Leading DevOps Efficiency Rating
DevOps
DevOps
Dec 27, 2021 · Operations

2021 China Chaos Engineering Survey Report: Findings and Recommendations

Based on 1,016 valid questionnaire responses and 17 enterprise interviews, the 2021 China Chaos Engineering Survey Report reveals low software system stability, limited adoption of chaos engineering, its positive impact on availability, and provides data‑driven recommendations for improving stability through mature tools, metrics, and cultural shifts.

Cloud NativeOperationschaos engineering
0 likes · 15 min read
2021 China Chaos Engineering Survey Report: Findings and Recommendations
Efficient Ops
Efficient Ops
Dec 26, 2021 · Operations

How Zhengzhou Bank Achieved Advanced DevSecOps Maturity: Insights and Lessons

The article reports on Zhengzhou Bank's successful DevSecOps assessment at the 2021 GOLF+ IT New Governance Forum, detailing the bank's interview on implementation practices, cultural, process and technical measures, and the broader significance of the national DevOps maturity model for digital governance.

DevSecOpsDigital GovernanceInformation Security
0 likes · 12 min read
How Zhengzhou Bank Achieved Advanced DevSecOps Maturity: Insights and Lessons
IT Architects Alliance
IT Architects Alliance
Dec 26, 2021 · Operations

What Is DevOps? Origins, Principles, and Practical Implementation Guide

This article explains DevOps by tracing its 2008 origins, summarizing evolving wiki definitions, outlining the business drivers behind its popularity, detailing its three core principles—flow, feedback, and continuous learning—and providing concrete technical practices, organizational patterns, and key takeaways for effective adoption.

Continuous DeliveryCultureDevOps
0 likes · 22 min read
What Is DevOps? Origins, Principles, and Practical Implementation Guide
Efficient Ops
Efficient Ops
Dec 25, 2021 · Operations

How Anxin Securities Achieved DevOps Maturity: Insights from the 2021 GOLF+ IT Governance Forum

The article reports on Anxin Securities' successful Level‑2 DevOps technology‑operation assessment announced at the 2021 GOLF+ IT Governance Forum, featuring interview highlights from the CIO and operations head, details of the evaluated Financial Store System, and broader industry statistics on DevOps maturity in the securities sector.

DevOpsDigital TransformationFinancial Services
0 likes · 11 min read
How Anxin Securities Achieved DevOps Maturity: Insights from the 2021 GOLF+ IT Governance Forum
DeWu Technology
DeWu Technology
Dec 24, 2021 · Operations

How to Quickly Attribute Live‑Streaming Alert Issues in a Kubernetes Environment

This article walks through a real‑world live‑streaming service alert where response time and goroutine spikes were traced through Grafana metrics, MySQL/Redis performance, routing logic, and Istio sidecar load, ultimately revealing a mis‑reported Istio metric and a resource‑allocation fix to prevent future jitter.

IstioKubernetesOperations
0 likes · 11 min read
How to Quickly Attribute Live‑Streaming Alert Issues in a Kubernetes Environment
Efficient Ops
Efficient Ops
Dec 24, 2021 · Operations

How Baidu’s iReport Leads the New Era of DevOps Efficiency Measurement

The China Academy of Information and Communications Technology unveiled its DevOps efficiency measurement model, with Baidu’s iReport platform becoming the first to achieve industry‑promotion level certification, and detailed the model’s modules, maturity levels, and practical insights for improving software development performance.

DevOpsDigital TransformationOperations
0 likes · 10 min read
How Baidu’s iReport Leads the New Era of DevOps Efficiency Measurement
Alibaba Cloud Native
Alibaba Cloud Native
Dec 22, 2021 · Operations

How Alibaba’s ASI Powers Massive Serverless Kubernetes at Scale

This article details Alibaba's Serverless Infrastructure (ASI) built on ACK, explaining its large‑scale Kubernetes architecture, fully managed operations, change‑risk controls, gray‑release pipelines, web‑shell access, taskflow orchestration, node lifecycle management, elasticity, risk mitigation, probing, and self‑healing capabilities that enable reliable cloud‑native services.

Cloud NativeInfrastructureKubernetes
0 likes · 32 min read
How Alibaba’s ASI Powers Massive Serverless Kubernetes at Scale
Efficient Ops
Efficient Ops
Dec 20, 2021 · Cloud Native

How to Build a Scalable Kubernetes Logging System with S6 and Filebeat

This article explains Docker and Kubernetes logging challenges, compares logging drivers, and presents a unified, node‑agent based logging architecture using S6‑based containers, Filebeat, logrotate, Kafka, and Elasticsearch to achieve reliable, auto‑rotating log collection in production environments.

DockerOperationsS6
0 likes · 8 min read
How to Build a Scalable Kubernetes Logging System with S6 and Filebeat
Zhongtong Tech
Zhongtong Tech
Dec 17, 2021 · Operations

How Digitalization Is Revolutionizing China's Logistics Industry

At the WISE2021 China Digital Innovation Summit, Zhongtong Express CTO Zhu Jingxi detailed the company's digital transformation journey, highlighting the impact of electronic waybills, data-driven operations, AI routing, and privacy security on reshaping the logistics supply chain and boosting efficiency.

@DataAILogistics
0 likes · 11 min read
How Digitalization Is Revolutionizing China's Logistics Industry
dbaplus Community
dbaplus Community
Dec 16, 2021 · Operations

How Ops Leaders Can Transform Teams for the Cloud‑Native Era

In this expert round‑table, senior SRE and DB leaders discuss how operations teams must revamp their management philosophy, processes, knowledge systems, and collaboration models—adopting OKRs, DevOps, AI‑ops, and proactive "left‑shift" practices—to thrive in the cloud‑native landscape.

DevOpsOperationsknowledge management
0 likes · 18 min read
How Ops Leaders Can Transform Teams for the Cloud‑Native Era
Efficient Ops
Efficient Ops
Dec 13, 2021 · Operations

Why Every Ops Team Needs a Kubernetes Standards Playbook

This article shares practical standards for Kubernetes operations—from infrastructure choices and application packaging to CI/CD tooling—helping teams reduce complexity, improve reliability, and foster continuous learning and sharing in fast‑moving cloud environments.

DevOpsInfrastructureOperations
0 likes · 13 min read
Why Every Ops Team Needs a Kubernetes Standards Playbook
Top Architect
Top Architect
Dec 12, 2021 · Operations

Blue‑Green, Rolling, and Canary Deployment Strategies Explained

This article introduces three common release strategies—blue‑green deployment, rolling deployment, and canary (gray) deployment—explaining their workflows, advantages, drawbacks, and practical considerations for safely updating production systems during iterative project releases.

Blue-GreenCanaryDeployment
0 likes · 10 min read
Blue‑Green, Rolling, and Canary Deployment Strategies Explained
Programmer DD
Programmer DD
Dec 12, 2021 · Operations

How Netflix’s Telltale Transforms Monitoring for 100+ Services

This article explains Netflix’s home‑grown monitoring system Telltale, detailing its design, multi‑dimensional health‑assessment model, intelligent alerting, integration with Slack, deployment monitoring, and continuous optimization that together keep over a hundred production applications running smoothly.

AlertingMicroservicesNetflix
0 likes · 13 min read
How Netflix’s Telltale Transforms Monitoring for 100+ Services
Architects Research Society
Architects Research Society
Dec 10, 2021 · Backend Development

Principled GraphQL: Ten Principles for Building, Maintaining, and Operating Data Graphs

This article presents ten GraphQL principles—grouped into integrity, agility, and operations—that guide the design, evolution, and safe production deployment of a unified data graph, emphasizing a single schema, collaborative implementation, versioned registries, performance monitoring, and robust access and demand controls.

BackendData GraphGraphQL
0 likes · 19 min read
Principled GraphQL: Ten Principles for Building, Maintaining, and Operating Data Graphs
Top Architect
Top Architect
Dec 10, 2021 · Operations

Comprehensive Guide to Load Balancing: Principles, Types, Algorithms, and Hardware

This article explains the fundamentals of load balancing, covering why it is needed for high‑traffic services, the difference between vertical and horizontal scaling, various load‑balancing techniques (DNS, HTTP, IP, link‑layer, hybrid), common algorithms, and the trade‑offs of software versus hardware solutions.

Distributed SystemsNetworkingOperations
0 likes · 13 min read
Comprehensive Guide to Load Balancing: Principles, Types, Algorithms, and Hardware
Dada Group Technology
Dada Group Technology
Dec 10, 2021 · Operations

Design and Practice of the Freight Business Check System (BCS)

The article introduces the freight BCS system, explains its business background, describes multiple validation modes for data consistency and business logic correctness, compares implementation approaches, and outlines the architecture, task flow, and future enhancements to improve system reliability and operational monitoring.

BackendData ConsistencyOperations
0 likes · 10 min read
Design and Practice of the Freight Business Check System (BCS)
Cloud Native Technology Community
Cloud Native Technology Community
Dec 8, 2021 · Cloud Native

Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes

This tutorial walks you through preparing three identical VMs, installing required packages, configuring Rook and Ceph versions, deploying the storage cluster on a Kubernetes 1.20 environment, exposing the Ceph dashboard, and cleaning up the installation, complete with command examples and troubleshooting tips.

CephCloud Native StorageDeployment
0 likes · 14 min read
Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes
Java Architect Essentials
Java Architect Essentials
Dec 6, 2021 · Databases

Facebook’s MySQL 5.6‑to‑8.0 Migration: Challenges, Process, and Lessons Learned

The article details Facebook’s multi‑year effort to migrate its heavily customized MySQL 5.6 deployment—including the MyRocks storage engine—to MySQL 8.0, describing the technical challenges, patch‑porting strategy, replication changes, automated verification, and application validation performed during the upgrade.

FacebookMyRocksOperations
0 likes · 17 min read
Facebook’s MySQL 5.6‑to‑8.0 Migration: Challenges, Process, and Lessons Learned
Open Source Linux
Open Source Linux
Dec 5, 2021 · Operations

Choosing the Right Backup: Normal, Copy, Differential, Incremental

The article explains four primary backup methods—Normal (full), Copy, Differential, and Incremental—detailing their processes, advantages, and drawbacks, and helps readers decide which strategy best balances storage space, recovery speed, and data protection needs.

BackupData ProtectionIncremental Backup
0 likes · 4 min read
Choosing the Right Backup: Normal, Copy, Differential, Incremental
Open Source Linux
Open Source Linux
Dec 5, 2021 · Operations

Essential Skill Maps Every DevOps Engineer Should Master

This article compiles a series of visual skill maps covering DevOps, cloud computing, big data, security, architecture, and development practices, offering engineers a comprehensive roadmap to build and expand their technical knowledge across multiple domains.

Big DataDevOpsOperations
0 likes · 3 min read
Essential Skill Maps Every DevOps Engineer Should Master
Efficient Ops
Efficient Ops
Dec 5, 2021 · Operations

Mastering ITIL Event Management: Strategies for Efficient IT Operations

This article explores the fundamentals of ITIL-based event management, detailing its relationship with ITSM, the challenges of unmanaged services, key processes, priority definitions, and three management models—centralized, self‑managed, and collaborative—to help organizations improve service stability and response efficiency.

ITILITSMIncident Prioritization
0 likes · 14 min read
Mastering ITIL Event Management: Strategies for Efficient IT Operations
IT Architects Alliance
IT Architects Alliance
Dec 1, 2021 · Operations

What Does an SRE Actually Do? A Deep Dive into Roles and Practices

This article explains the origins of Site Reliability Engineering, breaks down its three main layers—Infrastructure, Platform, and Business SRE—covers day‑one and day‑2 deployment, on‑call processes, SLI/SLO design, post‑mortems, capacity planning, user support, and offers practical advice for aspiring SREs.

InfrastructureOncallOperations
0 likes · 24 min read
What Does an SRE Actually Do? A Deep Dive into Roles and Practices
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 26, 2021 · Industry Insights

How iQIYI Built an Unmanned Fault‑Handling System for 99% Reliability

This article details iQIYI's unmanned monitoring platform, covering its design goals, overall architecture, core modules such as real‑time data collection, decision engine, and event‑processing engine, as well as the machine‑learning model used for production‑time prediction and the system's operational results and future roadmap.

OperationsSystem Architecturefault automation
0 likes · 13 min read
How iQIYI Built an Unmanned Fault‑Handling System for 99% Reliability
dbaplus Community
dbaplus Community
Nov 25, 2021 · Operations

How Unified Alert Convergence Can Transform Monitoring Systems

This article explains the background and challenges of legacy monitoring systems, defines key concepts such as exceptions, problems, alerts and recoveries, introduces critical metrics like MTTA and MTTR, and details the design, architecture, and core implementation of a unified alert convergence service using Redis delay queues.

MTTAMTTROperations
0 likes · 19 min read
How Unified Alert Convergence Can Transform Monitoring Systems
转转QA
转转QA
Nov 25, 2021 · Operations

Full‑Chain Production Environment Load Testing for Double 11 Promotion: Process, Findings, and Lessons

This article details the end‑to‑end preparation, execution, reporting, and retrospective of a large‑scale production‑environment load test for the Double 11 shopping festival, covering data preparation, QPS target calculation, multi‑scenario testing, issue analysis, and continuous improvement practices.

Double11Load TestingOperations
0 likes · 8 min read
Full‑Chain Production Environment Load Testing for Double 11 Promotion: Process, Findings, and Lessons
Qingyun Technology Community
Qingyun Technology Community
Nov 24, 2021 · Operations

How eBPF Toolchains Simplify Kernel Tracing from BCC to BPFtrace

This article walks through the high‑level components of eBPF programs—backend, loader, frontend, and data structures—showing how the original sock_example.c is split into separate files, how LLVM compiles restricted C to ELF, and how projects like BCC, BPFtrace, and IOVisor automate development, deployment, and cloud‑native observability while highlighting their trade‑offs for embedded environments.

BCCCloud NativeLinux
0 likes · 15 min read
How eBPF Toolchains Simplify Kernel Tracing from BCC to BPFtrace
Architecture Digest
Architecture Digest
Nov 23, 2021 · Operations

A Historical Overview of DevOps and Its Evolution

This article traces the evolution of DevOps from its roots in Toyota Production System and Kanban through Waterfall, Scrum, Agile, Lean, and modern extensions like ChatOps, GitOps, FinOps and AiOps, highlighting key milestones and their impact on software delivery practices.

DevOpsKanbanOperations
0 likes · 9 min read
A Historical Overview of DevOps and Its Evolution
DevOps
DevOps
Nov 23, 2021 · Operations

Zero‑Downtime Application Deployment: Strategies, Maturity Levels, and Required Technical Components

The article explains why traditional three‑step application releases cause service interruptions, introduces three maturity levels for zero‑downtime deployment, compares blue‑green, rolling, and canary release models, and provides concrete technical components, load‑balancer architectures, and Spring‑Boot/Eureka shutdown procedures to achieve uninterrupted service.

OperationsZero Downtimeload balancing
0 likes · 22 min read
Zero‑Downtime Application Deployment: Strategies, Maturity Levels, and Required Technical Components
IT Architects Alliance
IT Architects Alliance
Nov 20, 2021 · Operations

Analysis and Optimization of Business System Performance

This article outlines a comprehensive approach to diagnosing and optimizing performance problems in production business systems, covering analysis processes, hardware, OS, database, middleware, JVM tuning, code inefficiencies, and monitoring techniques to identify root causes and improve system reliability.

Database TuningOperationsSystem optimization
0 likes · 16 min read
Analysis and Optimization of Business System Performance