Tagged articles
3281 articles
Page 12 of 33
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
May 10, 2023 · Operations

How to Streamline RTC Audio Issue Troubleshooting: Frameworks, Tools, and Automation

This article explores the challenges of real‑time communication audio problems, outlines their common manifestations and characteristics, and presents a comprehensive troubleshooting framework with standardized processes, automation tools, and perception models to improve efficiency and service quality.

OperationsRTCaudio troubleshooting
0 likes · 12 min read
How to Streamline RTC Audio Issue Troubleshooting: Frameworks, Tools, and Automation
Liangxu Linux
Liangxu Linux
May 7, 2023 · Cloud Native

Unlock Hidden kubectl Tricks: Advanced Commands for Kubernetes Mastery

This article presents a collection of advanced kubectl techniques—including API inspection, status‑based pod filtering and deletion, node‑specific pod listing, distribution counting, and proxy usage—to help experienced Kubernetes users solve ad‑hoc tasks more efficiently.

CLIKubernetesOperations
0 likes · 7 min read
Unlock Hidden kubectl Tricks: Advanced Commands for Kubernetes Mastery
Programmer DD
Programmer DD
May 5, 2023 · Operations

Boost Development Efficiency with GitLab CI/CD: A Hands‑On Guide

This article explains why efficiency matters in software delivery, introduces CI/CD concepts and tools like Jenkins and GitLab, details installing GitLab Runner, walks through pipeline configuration with key YAML keywords, and emphasizes that mastering DevOps principles and tools dramatically improves development productivity.

Continuous DeliveryDevOpsGitLab
0 likes · 10 min read
Boost Development Efficiency with GitLab CI/CD: A Hands‑On Guide
Java Architect Essentials
Java Architect Essentials
May 4, 2023 · Operations

Easy-Jenkins: A One‑Click Deployment Tool for Vue Front‑Ends and Java JAR Back‑Ends

The article introduces Easy‑Jenkins, a lightweight one‑click deployment tool that supports Vue and JAR projects, explains its pipeline architecture, shows step‑by‑step installation, configuration, branch management, and deployment operations, and provides practical screenshots and command examples for developers.

DeploymentOperationsVue
0 likes · 7 min read
Easy-Jenkins: A One‑Click Deployment Tool for Vue Front‑Ends and Java JAR Back‑Ends
MaGe Linux Operations
MaGe Linux Operations
May 1, 2023 · Cloud Native

Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow

This article shares a collection of practical kubectl commands and tips—including API debugging, pod filtering and deletion, node‑wise pod statistics, and proxy usage—to help Kubernetes users work more efficiently and avoid writing custom client code.

KubernetesOperationsTips
0 likes · 8 min read
Unlock Hidden kubectl Tricks: Boost Your Kubernetes Workflow
DataFunTalk
DataFunTalk
Apr 29, 2023 · Operations

WeChat NLP Algorithm Microservice Governance: Challenges and Solutions

This article examines the governance of WeChat NLP algorithm microservices, outlining the management, performance, and scheduling challenges they pose, and presents solutions including automated CI/CD pipelines, task‑aware auto‑scaling, DAG‑based service composition, custom Python interpreter PyInter, and an improved Joint‑Idle‑Queue load‑balancing algorithm.

AIMicroservicesNLP
0 likes · 13 min read
WeChat NLP Algorithm Microservice Governance: Challenges and Solutions
Efficient Ops
Efficient Ops
Apr 26, 2023 · Operations

Building a Chaos Engineering Platform for Financial Services: Key Lessons

This talk outlines the challenges of maintaining system stability in fast‑moving, cloud‑native financial services, describes a risk‑identification model, high‑fidelity fault simulation, and a comprehensive stability engineering platform, and shares future plans for automated, data‑driven risk mitigation.

Financial ServicesOperationsSRE
0 likes · 15 min read
Building a Chaos Engineering Platform for Financial Services: Key Lessons
Data Thinking Notes
Data Thinking Notes
Apr 25, 2023 · Operations

Why Data Quality Matters: A Practical Guide to Governance and Seven‑Dimensional Evaluation

This article explains why data quality is critical for businesses, outlines common data quality problems, their root causes, and presents a comprehensive governance framework—including monitoring rules, alerting, full‑link monitoring, and a seven‑dimensional evaluation model—to ensure high‑quality data delivery.

Big DataData GovernanceData Quality
0 likes · 12 min read
Why Data Quality Matters: A Practical Guide to Governance and Seven‑Dimensional Evaluation
Efficient Ops
Efficient Ops
Apr 19, 2023 · Operations

How BizDevOps Drives Value Delivery in Cloud-Adapted Banking

The presentation outlines the evolution of lean management, the characteristics and expectations of the cloud era, and the practical implementation of BizDevOps at China Merchants Bank, detailing the 1‑3‑5 framework, goals, capabilities, key practices, and the bank's cloud adaptation strategy.

BizDevOpsDigitalTransformationOperations
0 likes · 16 min read
How BizDevOps Drives Value Delivery in Cloud-Adapted Banking
MaGe Linux Operations
MaGe Linux Operations
Apr 16, 2023 · Operations

How Netflix’s Telltale Transforms Application Monitoring and Alerting

The article details Netflix’s self‑built Telltale monitoring system, explaining how it consolidates data sources, reduces alert fatigue, provides intelligent alerts, and continuously optimizes application health assessment for over 100 production services, ultimately improving operational efficiency and reliability.

AlertingNetflixOperations
0 likes · 11 min read
How Netflix’s Telltale Transforms Application Monitoring and Alerting
Ops Development Stories
Ops Development Stories
Apr 13, 2023 · Operations

How to Deploy N9e: A Step‑by‑Step Guide to Unified Observability

This article walks through the challenges of observability for small‑to‑medium companies and provides a detailed, hands‑on guide to installing, configuring, and using the N9e monitoring platform—including architecture options, component setup, and adding data sources—so readers can achieve integrated alerting, metrics, logs, and tracing in a single pane.

N9eOperationsmonitoring
0 likes · 13 min read
How to Deploy N9e: A Step‑by‑Step Guide to Unified Observability
Ops Development Stories
Ops Development Stories
Apr 12, 2023 · Operations

Essential System Performance Metrics Every Ops Engineer Should Track

This article explains how to categorize and deeply understand key system performance metrics—including infrastructure, application, user experience, and business indicators—so engineers can monitor stability, efficiency, and business impact under high load and concurrency.

InfrastructureOperationsUser experience
0 likes · 10 min read
Essential System Performance Metrics Every Ops Engineer Should Track
dbaplus Community
dbaplus Community
Apr 10, 2023 · Operations

Can Ops Roles Disappear? Exploring Self‑Service Platforms, COE Experts, and SaaS in Modern Monitoring

The article examines whether traditional operations positions can become obsolete by analyzing a self‑service platform + COE + Business Partner model, detailing essential monitoring tools, the role of COE specialists, SaaS alternatives, and practical career pathways for newcomers, mid‑level, and senior engineers.

COEOperationsSaaS
0 likes · 8 min read
Can Ops Roles Disappear? Exploring Self‑Service Platforms, COE Experts, and SaaS in Modern Monitoring
Continuous Delivery 2.0
Continuous Delivery 2.0
Apr 10, 2023 · Operations

Five Best Practices for Applying DevOps in Real Projects

This article outlines five practical DevOps best practices—test automation, deployment automation, trunk‑based development, security left‑shift, and loose‑coupled architecture—explaining their importance, implementation tips, and the benefits they bring to continuous delivery and high‑quality software production.

Operationsautomationsoftware-engineering
0 likes · 7 min read
Five Best Practices for Applying DevOps in Real Projects
Python Programming Learning Circle
Python Programming Learning Circle
Apr 8, 2023 · Operations

Using Python for Operations Automation: Remote Execution, Log Parsing, Monitoring, Deployment, and Backup

The article explains how operations engineers can leverage Python scripts and popular libraries such as paramiko, regex, psutil, fabric, and shutil to automate common tasks like remote command execution, log analysis, system monitoring with alerts, batch software deployment, and file backup and recovery, providing code examples for each scenario.

DevOpsOperationsPython
0 likes · 9 min read
Using Python for Operations Automation: Remote Execution, Log Parsing, Monitoring, Deployment, and Backup
Efficient Ops
Efficient Ops
Apr 8, 2023 · Operations

South Grid’s CloudYan Platform Wins Top DevOps Maturity Rating – Lessons Learned

At the 20th GOPS Global Operations Conference in Shenzhen, China’s Information and Communication Research Institute announced that South Grid’s Digital Platform Technology (Guangdong) Co., Ltd. achieved excellent ratings for its CloudYan Platform DevOps subsystem, demonstrating how standardized DevOps pipelines and toolchains can dramatically improve software delivery quality, speed, and safety.

Continuous DeliveryDevOpsDigital Transformation
0 likes · 12 min read
South Grid’s CloudYan Platform Wins Top DevOps Maturity Rating – Lessons Learned
Efficient Ops
Efficient Ops
Apr 8, 2023 · Information Security

How China Postal Savings Bank Reached Advanced DevSecOps Maturity – Lessons and Practices

The article details China Postal Savings Bank's successful DevSecOps assessment at the 2023 GOPS Global Operations Conference, sharing the bank's project background, interview insights on culture, processes, and tooling, and outlining the benefits and future plans of adopting standardized DevSecOps practices.

BankingDevSecOpsInformation Security
0 likes · 17 min read
How China Postal Savings Bank Reached Advanced DevSecOps Maturity – Lessons and Practices
Efficient Ops
Efficient Ops
Apr 8, 2023 · Operations

How Guotai Junan Achieved Industry‑Leading DevOps Maturity at GOPS 2023

The article reports on Guotai Junan's successful completion of the CAICT DevOps technical‑operation 2+ assessment at the 20th GOPS Global Operations Conference, detailing the standards, project implementations, interview insights, industry statistics, and the broader DevOps maturity model.

CaseStudyDevOpsDigitalTransformation
0 likes · 16 min read
How Guotai Junan Achieved Industry‑Leading DevOps Maturity at GOPS 2023
Efficient Ops
Efficient Ops
Apr 7, 2023 · Operations

Guotai Junan’s Journey to Leading DevOps 2+ Certification – A Case Study

At the 20th GOPS Global Operations Conference in Shenzhen, Guotai Junan’s data center team detailed how their “Central Operations” and “Junhong Junrong” trading projects earned the China Information & Communication Research Institute’s DevOps Technical Operations 2+ level assessment, showcasing the company’s leading digital transformation and smart operations practices.

DevOpsDigital TransformationFinancial Services
0 likes · 17 min read
Guotai Junan’s Journey to Leading DevOps 2+ Certification – A Case Study
Efficient Ops
Efficient Ops
Apr 7, 2023 · Operations

How South Grid’s Cloud Yan Platform Secured Top DevOps Maturity Scores

The article details South Grid’s successful DevOps maturity assessment at the 20th GOPS Global Operations Conference, highlighting the Cloud Yan platform’s excellent ratings in build‑and‑integration and pipeline modules, and shares insights from a Q&A on the impact of standardized DevOps practices.

DevOpsDigital TransformationMaturity Assessment
0 likes · 12 min read
How South Grid’s Cloud Yan Platform Secured Top DevOps Maturity Scores
Efficient Ops
Efficient Ops
Apr 7, 2023 · Operations

What Do China’s Latest DevOps & AIOps Maturity Assessments Reveal About Enterprise Success?

China's Information and Communication Research Institute announced the newest evaluation results for its DevOps and AIOps capability maturity models, showing that standardization and tool empowerment have helped over 75 leading enterprises across banking, securities, telecom, and internet sectors improve quality, efficiency, and market competitiveness.

DevOpsEnterpriseMaturity Model
0 likes · 8 min read
What Do China’s Latest DevOps & AIOps Maturity Assessments Reveal About Enterprise Success?
Architecture Digest
Architecture Digest
Apr 4, 2023 · Operations

Understanding Logs, Their Value, and Practices for Observability and Operations

This article explains what logs are, when to record them, their importance in troubleshooting, performance optimization, security monitoring, and business decisions, and describes how centralized logging, metrics, tracing, and tools like ELK, Prometheus, and OpenTracing enable effective observability in modern distributed systems.

APMOperationstracing
0 likes · 19 min read
Understanding Logs, Their Value, and Practices for Observability and Operations
Architecture Digest
Architecture Digest
Apr 3, 2023 · Operations

Design and Implementation of a Multi‑Layer Load‑Balancing Platform (VGW)

This article explains the need for reliable load balancing in large‑scale services, analyzes the problems of request distribution and fault isolation, and details the design of a three‑layer and four‑layer load‑balancing architecture—including DNS, Nginx, LVS, FULLNAT, and VGW—along with health‑check, redundancy, and performance optimization techniques.

DPDKFullNATOperations
0 likes · 21 min read
Design and Implementation of a Multi‑Layer Load‑Balancing Platform (VGW)
Zhuanzhuan Tech
Zhuanzhuan Tech
Mar 29, 2023 · Operations

Design and Implementation of a Warehouse Control System (WCS) for Automated Warehouse Operations

The article details the evolution from a basic inventory system to a full‑featured WMS, introduces a dedicated Warehouse Control System (WCS) architecture, explains the use of HTTP, SSE, WebSocket and TCP protocols for hardware integration, and demonstrates how various automated devices empower inbound, outbound and auxiliary warehouse processes, ultimately improving operational efficiency.

Device IntegrationOperationsSystem Architecture
0 likes · 9 min read
Design and Implementation of a Warehouse Control System (WCS) for Automated Warehouse Operations
Efficient Ops
Efficient Ops
Mar 28, 2023 · Operations

Why SRE Matters: Bridging Product Development and Reliability Engineering

This article explains the role of Site Reliability Engineering (SRE), its responsibilities, how it complements product development, the software lifecycle perspective, and practical approaches to ensure system stability through controllability, observability, and best‑practice implementation.

OperationsSREobservability
0 likes · 14 min read
Why SRE Matters: Bridging Product Development and Reliability Engineering
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Mar 27, 2023 · Industry Insights

How to Build a Scalable E‑Commerce Supply System: Lessons from Industry Leaders

This article examines the challenges of rapid‑growth e‑commerce supply chains, compares global and domestic supply‑chain software, outlines core SCM concepts, and proposes a framework of design principles, value metrics, and ROI calculations for constructing a flexible, high‑performance supply system.

Industry AnalysisOperationsSupply Chain
0 likes · 11 min read
How to Build a Scalable E‑Commerce Supply System: Lessons from Industry Leaders
MaGe Linux Operations
MaGe Linux Operations
Mar 24, 2023 · Operations

How to Reduce False Alarms in Distributed Systems with Interval Detection

This article explains the challenges of monitoring highly distributed applications, why static alert thresholds often fail, and how interval detection using algorithms like Local Outlier Factor can improve alert accuracy while reducing noise across tools such as Grafana, Zabbix, and Open‑Falcon.

AlertingOperationsinterval detection
0 likes · 16 min read
How to Reduce False Alarms in Distributed Systems with Interval Detection
MaGe Linux Operations
MaGe Linux Operations
Mar 24, 2023 · Operations

Why Most Monitoring Strategies Fail and How the CAR Framework Fixes Them

This article explains why typical monitoring approaches miss the mark, outlines four root causes of persistent incidents, and introduces the CAR framework—Customer, Application, Resource—to build user‑centric observability that reduces noise, restores trust, and improves reliability.

CAR frameworkOperationsincident management
0 likes · 11 min read
Why Most Monitoring Strategies Fail and How the CAR Framework Fixes Them
Efficient Ops
Efficient Ops
Mar 23, 2023 · Operations

How ICBC Transformed Banking with DevOps: A Deep Dive into Operations Excellence

This article examines Industrial and Commercial Bank of China's four‑year DevOps journey, detailing its top‑level design, toolchain integration, end‑to‑end pipelines, team benchmarking, data‑driven management, and coach development, and shows how these practices boosted delivery speed, reduced defects, and supported digital transformation in banking.

BankingContinuous DeliveryDevOps
0 likes · 14 min read
How ICBC Transformed Banking with DevOps: A Deep Dive into Operations Excellence
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 22, 2023 · Fundamentals

How ByteDance Scales Data Governance: Challenges, Distributed Solutions, and Best Practices

This article examines ByteDance's data governance journey, outlining business, organizational, and cultural challenges, the six-stage evolution framework, real‑world case studies, and the shift from centralized to distributed autonomous governance to improve quality, security, cost, and team efficiency.

Big DataData GovernanceData Quality
0 likes · 18 min read
How ByteDance Scales Data Governance: Challenges, Distributed Solutions, and Best Practices
dbaplus Community
dbaplus Community
Mar 20, 2023 · Operations

How Xianyu’s Messaging Team Built a Zero‑Incident System with Gray Releases, Monitoring, and Automated Regression

The article details how Xianyu’s messaging team systematically improved system stability by classifying risks, implementing gray‑release traffic, establishing dedicated monitoring and alerting dashboards, integrating automated regression into CI/CD, and managing strong‑weak dependencies, ultimately reducing online incidents to near zero.

Operationsautomated regressiondependency management
0 likes · 10 min read
How Xianyu’s Messaging Team Built a Zero‑Incident System with Gray Releases, Monitoring, and Automated Regression
Liangxu Linux
Liangxu Linux
Mar 19, 2023 · Operations

Master Log Analysis: Fast Linux Commands to Pinpoint Errors

This guide shows programmers how to quickly locate errors in massive server logs using essential Linux commands such as tail, cat, grep, sed, and pagination tools, providing step‑by‑step examples and tips for efficient debugging.

LinuxOperationsShell Commands
0 likes · 12 min read
Master Log Analysis: Fast Linux Commands to Pinpoint Errors
Sohu Tech Products
Sohu Tech Products
Mar 16, 2023 · Operations

Spug: Lightweight Agentless Automation Platform with Docker Deployment Guide

Spug is a lightweight, agentless automation platform for small and medium enterprises that integrates host management, batch execution, online terminal, deployment, scheduling, configuration, monitoring and alerting, and the article provides step‑by‑step Docker and docker‑compose installation instructions to set up the system.

DevOpsDockerOperations
0 likes · 4 min read
Spug: Lightweight Agentless Automation Platform with Docker Deployment Guide
Efficient Ops
Efficient Ops
Mar 15, 2023 · Operations

How Human‑Machine Collaboration Is Redefining Operations with AIOps

The article explores how AIOps, a human‑machine collaborative approach powered by data, algorithms, and contextual knowledge, transforms modern operations by enabling real‑time insight, predictive decision‑making, automated execution, and continuous feedback, especially in complex, security‑sensitive environments like finance.

@DataOperationsaiops
0 likes · 11 min read
How Human‑Machine Collaboration Is Redefining Operations with AIOps
Baidu Tech Salon
Baidu Tech Salon
Mar 15, 2023 · Industry Insights

How Baidu Feed Scales Millions of Users with Serverless: A Multi‑Dimensional Elasticity Blueprint

This article details Baidu Feed's serverless transformation, describing how multi‑dimensional service profiling (elasticity, traffic, capacity) and three elastic strategies—predictive, load‑feedback, and timed—enable automatic scaling that reduces resource waste while maintaining 24/7 stability for billions of users.

Baidu FeedCloud NativeOperations
0 likes · 19 min read
How Baidu Feed Scales Millions of Users with Serverless: A Multi‑Dimensional Elasticity Blueprint
DeWu Technology
DeWu Technology
Mar 15, 2023 · Operations

Blue-Green Deployment: Process, Traffic Scheduling, and Component Support

The article explains blue‑green deployment as a release strategy that improves large‑scale microservice rollouts by extracting traffic from a blue cluster, incrementally shifting it to a green environment, using global and local traffic scheduling, central metadata, compatible components, and careful considerations such as idempotent consumption and version compatibility.

Blue‑Green deploymentContinuous DeliveryOperations
0 likes · 12 min read
Blue-Green Deployment: Process, Traffic Scheduling, and Component Support
JD Cloud Developers
JD Cloud Developers
Mar 15, 2023 · Operations

Designing Seamless Offline Delivery for Private Cloud Environments

This article outlines a general, process‑focused approach to offline delivery in private or dedicated cloud environments, covering the need for internal mirrors, plug‑in architecture, dependency awareness, full automation, and best‑practice process design to reduce SRE effort and ensure consistent production.

KubernetesOperationsautomation
0 likes · 5 min read
Designing Seamless Offline Delivery for Private Cloud Environments
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Mar 15, 2023 · Operations

How Yidun Automates Performance Testing to Overcome Real‑World Pain Points

This article explains performance testing fundamentals, why it matters, the specific challenges Yidun faced such as complex execution, human‑dependent monitoring, data isolation, and cost loss, and describes their automated, gradient‑based testing platform with quantified monitoring and future visualisation plans.

Data IsolationOperationsPerformance Testing
0 likes · 8 min read
How Yidun Automates Performance Testing to Overcome Real‑World Pain Points
IT Architects Alliance
IT Architects Alliance
Mar 14, 2023 · Operations

Key Practices for Achieving High Availability in Internet Services

The article outlines essential high‑availability techniques for internet‑scale systems, covering availability metrics, microservice modularization, database redundancy, load balancing, rate limiting, circuit breaking, isolation, retry strategies, rollback plans, stress testing, monitoring, and on‑call procedures.

OperationsSystem Designhigh availability
0 likes · 10 min read
Key Practices for Achieving High Availability in Internet Services
dbaplus Community
dbaplus Community
Mar 13, 2023 · Cloud Native

From Bare Metal to Cloud‑Native: How Zhuanzhuan Reinvented Log Collection

This article traces Zhuanzhuan's evolution of log collection—from a bare‑metal scribe + flume pipeline, through a container‑aware log‑pilot solution, to a cloud‑native filebeat and fb‑advisor architecture—detailing the motivations, technical designs, performance gains, and trade‑offs of each stage.

ContainerFilebeatOperations
0 likes · 12 min read
From Bare Metal to Cloud‑Native: How Zhuanzhuan Reinvented Log Collection
FunTester
FunTester
Mar 13, 2023 · Operations

How Chaos Engineering Can Strengthen System Reliability: A Practical Guide

This article explains the origins and principles of chaos engineering, illustrates how fault‑injection scenarios expose system weaknesses, outlines step‑by‑step implementation—from tool selection and metric definition to execution and post‑mortem—and highlights its role in achieving high‑availability service level agreements.

DevOpsDistributed SystemsFault Injection
0 likes · 10 min read
How Chaos Engineering Can Strengthen System Reliability: A Practical Guide
MaGe Linux Operations
MaGe Linux Operations
Mar 10, 2023 · Operations

249 Ready-to-Use Shell Scripts to Boost Your Linux Ops Skills

Discover a curated collection of 249 practical shell script examples, complete with clear documentation and usage guidelines, designed to help Linux operations engineers improve efficiency, master scripting conventions, and quickly solve common admin tasks, all available for free download via the provided QR code.

BashOperationsShell
0 likes · 7 min read
249 Ready-to-Use Shell Scripts to Boost Your Linux Ops Skills
Alimama Tech
Alimama Tech
Mar 8, 2023 · Industry Insights

How Alibaba’s Dynamic Compute Transforms Ad Engine Efficiency

This article details Alibaba Mama’s dynamic compute system—its architecture, offline and online tidal‑compute mechanisms, city‑level mutual backup, RT control, large‑scale promotion handling, metric integration, and recent infrastructure upgrades—showcasing concrete performance gains and future challenges in green, intelligent ad‑engine resource management.

AlibabaOperationsad engine
0 likes · 16 min read
How Alibaba’s Dynamic Compute Transforms Ad Engine Efficiency
Python Programming Learning Circle
Python Programming Learning Circle
Mar 6, 2023 · Operations

Intelligent Operations: AI‑Driven Anomaly Detection, Alarm Compression, and Log Analysis Techniques

This article presents an AI‑enhanced operations framework that combines metric anomaly detection, alarm compression, log anomaly detection, and intelligent analysis using machine learning methods such as DBSCAN clustering, SARIMAX modeling, Apriori association rules, and LSTM‑based log parsing to improve fault detection and reduce operational costs.

Operationsaiopsanomaly detection
0 likes · 15 min read
Intelligent Operations: AI‑Driven Anomaly Detection, Alarm Compression, and Log Analysis Techniques
Efficient Ops
Efficient Ops
Mar 1, 2023 · Operations

How China Galaxy Securities Achieved Leading‑Edge DevOps Maturity with CMDB Platform

China Galaxy Securities’ CMDB platform recently earned an excellent rating in the China Academy of Information and Communications Technology’s DevOps system and tool standards, showcasing how standardized, tool‑enabled DevOps practices can boost efficiency, safety, and digital transformation for large financial enterprises.

CMDBDevOpsDigital Transformation
0 likes · 11 min read
How China Galaxy Securities Achieved Leading‑Edge DevOps Maturity with CMDB Platform
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Mar 1, 2023 · Operations

Stability Quality Assurance: Definitions, Metrics, and Implementation Guide

This article explains the origins and meaning of software stability and stability testing, outlines key standards such as GB/T 16260 and industry definitions, and presents a comprehensive framework for stability quality assurance covering system elements, external disturbances, baseline setting, robust design, monitoring, and rapid incident response.

OperationsSREquality assurance
0 likes · 17 min read
Stability Quality Assurance: Definitions, Metrics, and Implementation Guide
Efficient Ops
Efficient Ops
Feb 28, 2023 · Operations

How a Chinese Bank’s Wealth Management System Mastered DevOps Level‑3 Continuous Delivery

The Agricultural Bank of China's Wealth Management Customer Share Management System passed the CAICT DevOps Level‑3 Continuous Delivery assessment, showcasing a comprehensive DevOps transformation that improved code quality, automated testing, and deployment efficiency while delivering measurable performance gains across the organization.

BankingContinuous DeliveryDevOps
0 likes · 10 min read
How a Chinese Bank’s Wealth Management System Mastered DevOps Level‑3 Continuous Delivery
DeWu Technology
DeWu Technology
Feb 27, 2023 · Operations

Message Push Monitoring and SLA Practices

The team implemented SLA‑based, node‑level monitoring for mobile push messages—splitting the workflow, measuring latency, blocking volume, and success rates, isolating metrics with Spring AOP, and tracking third‑party vendors—resulting in clear latency standards, doubled peak throughput, faster issue resolution, and improved overall reliability.

Message PushOperationsSLA
0 likes · 11 min read
Message Push Monitoring and SLA Practices
Continuous Delivery 2.0
Continuous Delivery 2.0
Feb 27, 2023 · Operations

What Is Infrastructure as Code (IaC) and Its Benefits and Drawbacks

Infrastructure as Code (IaC) is a DevOps practice that defines, creates, and manages infrastructure through machine‑readable code, offering reproducibility, efficiency, collaboration, cost savings, and flexibility, while also presenting challenges such as a steep learning curve, dependency management, potential code errors, drift, and initial costs.

Infrastructure as CodeOperationsautomation
0 likes · 5 min read
What Is Infrastructure as Code (IaC) and Its Benefits and Drawbacks
Su San Talks Tech
Su San Talks Tech
Feb 24, 2023 · Backend Development

Why We’re Dropping RabbitMQ for Kafka: A Complete Migration Blueprint

Facing chaotic usage, maintenance challenges, partition tolerance issues, and performance bottlenecks with RabbitMQ, our middleware team decided to fully migrate to Kafka, outlining reasons, comparative models, migration strategies, and verification steps to ensure a smooth, high‑availability, high‑performance transition.

BackendKafkaMessage Queue
0 likes · 13 min read
Why We’re Dropping RabbitMQ for Kafka: A Complete Migration Blueprint
ITPUB
ITPUB
Feb 23, 2023 · Operations

Why Did Microservices Drop After Zookeeper Restart? Session Mechanics & Fixes

A mistaken Zookeeper restart caused a 30‑minute outage of all microservices; this article analyzes the ZK session mechanism, why temporary nodes were not recreated, and presents two concrete solutions and best‑practice recommendations to prevent similar failures.

MicroservicesOperationsRPC
0 likes · 11 min read
Why Did Microservices Drop After Zookeeper Restart? Session Mechanics & Fixes
dbaplus Community
dbaplus Community
Feb 21, 2023 · Operations

How Standardized Application Monitoring Boosts Operational Efficiency

This article reviews G Bank's multi‑year journey to standardize application monitoring, detailing the methodology, models, metrics, automation mechanisms, and quantitative evaluation that together improve visibility, early fault detection, and overall operations management for both traditional and distributed systems.

MetricsOperationsaiops
0 likes · 18 min read
How Standardized Application Monitoring Boosts Operational Efficiency
Zhuanzhuan Tech
Zhuanzhuan Tech
Feb 21, 2023 · Databases

Fast and Stable MySQL Data Center Migration: Choosing and Implementing the Optimal Strategy

This article details the background, migration plan selection, and step‑by‑step procedures—including pre‑building cascades, service pause, automated batch operations, cluster tiering, pre‑ and post‑checks, and gray‑scale validation—to achieve a fast, stable MySQL data‑center migration for a large‑scale production environment.

Operationsautomationcloud
0 likes · 11 min read
Fast and Stable MySQL Data Center Migration: Choosing and Implementing the Optimal Strategy
21CTO
21CTO
Feb 16, 2023 · Operations

Which Log Management Tool Is Right for You? A Comprehensive Comparison of 9 Solutions

This article provides a detailed comparison of nine popular log management tools—including Filebeat, Graylog, LogDNA, ELK, Grafana Loki, Datadog, Logstash, Fluentd, and Splunk—covering their main features, pricing, advantages, and disadvantages to guide readers in selecting the most suitable solution for their needs.

ELKLog ManagementOperations
0 likes · 16 min read
Which Log Management Tool Is Right for You? A Comprehensive Comparison of 9 Solutions
Code Ape Tech Column
Code Ape Tech Column
Feb 16, 2023 · Databases

Understanding and Solving BigKey and HotKey Issues in Redis Clusters

BigKey and HotKey are common Redis cluster problems that can degrade performance, cause timeouts, network congestion, and even system-wide failures; this article explains their definitions, impacts, detection methods, and practical mitigation strategies—including key splitting, local caching, and migration optimizations—based on real-world production cases.

BigKeyHotKeyOperations
0 likes · 22 min read
Understanding and Solving BigKey and HotKey Issues in Redis Clusters
Efficient Ops
Efficient Ops
Feb 15, 2023 · Operations

How China Agricultural Bank’s ARROW Platform Mastered DevOps Continuous Delivery

The article details China Agricultural Bank’s ARROW platform achieving third‑level DevOps continuous delivery certification, outlining its end‑to‑end pipeline, quality gates, metric‑driven improvements, and how these practices boost code quality, delivery speed, and support the bank’s digital transformation.

ArrowContinuous DeliveryDevOps
0 likes · 8 min read
How China Agricultural Bank’s ARROW Platform Mastered DevOps Continuous Delivery
Zhuanzhuan Tech
Zhuanzhuan Tech
Feb 15, 2023 · Operations

Automating TiDB Operations at ZuanZuan: From Manual Management to Platform‑Based Automation

This article details ZuanZuan's journey of automating TiDB operations, covering the initial operational pain points, the implementation of metadata and resource management, comprehensive upgrades, alarm redesign, and the development of a work‑order‑driven platform that streamlines node, scaling, decommission, and monitoring tasks while significantly reducing manual effort and costs.

Database ManagementOperationsTiDB
0 likes · 18 min read
Automating TiDB Operations at ZuanZuan: From Manual Management to Platform‑Based Automation
Hulu Beijing
Hulu Beijing
Feb 14, 2023 · Operations

How Hulu Scaled Its Live Streaming for the Super Bowl: Inside the War Room

This article details how Hulu's Beijing engineering teams prepared, scaled, and operated the live streaming infrastructure for the 2024 Super Bowl, handling a 20% traffic surge with advanced load‑testing, auto‑scaling, and coordinated on‑call support to ensure a flawless broadcast.

HuluOperationsSuper Bowl
0 likes · 3 min read
How Hulu Scaled Its Live Streaming for the Super Bowl: Inside the War Room
Code Ape Tech Column
Code Ape Tech Column
Feb 14, 2023 · Backend Development

High‑Availability Architecture for a Billion‑Scale Membership System: Elasticsearch Dual‑Center Cluster, Redis Caching, and MySQL Migration

This article describes how a membership platform serving over ten billion users achieves high performance and fault tolerance through a dual‑center Elasticsearch cluster, traffic‑isolated three‑cluster ES design, Redis multi‑center caching, and a seamless migration from SQL Server to a partitioned MySQL architecture, while detailing operational safeguards and fine‑grained flow‑control strategies.

ElasticsearchOperationsScalability
0 likes · 23 min read
High‑Availability Architecture for a Billion‑Scale Membership System: Elasticsearch Dual‑Center Cluster, Redis Caching, and MySQL Migration
DataFunSummit
DataFunSummit
Feb 8, 2023 · Product Management

Content‑Driven Data Product Management: Challenges, Governance Frameworks, and Implementation Strategies

This article shares practical insights from a data product expert on the problems faced by content‑oriented data products, outlines a comprehensive governance methodology—including DAMA, Huawei, and Alibaba frameworks—and demonstrates how to operationalize these ideas through concrete examples such as event‑tracking and metric governance.

Big DataData GovernanceData Product Management
0 likes · 16 min read
Content‑Driven Data Product Management: Challenges, Governance Frameworks, and Implementation Strategies
JD Cloud Developers
JD Cloud Developers
Feb 8, 2023 · Operations

Boosting Log Anomaly Detection with NLP and Deep Learning

This article presents a log anomaly detection approach that leverages NLP techniques such as Part‑of‑Speech tagging and Named Entity Recognition combined with deep neural networks, detailing a six‑step model, experimental validation on three datasets, and superior performance compared with existing DeepLog and LogClass methods.

DNNDeep LearningNER
0 likes · 13 min read
Boosting Log Anomaly Detection with NLP and Deep Learning
DataFunSummit
DataFunSummit
Feb 7, 2023 · Operations

Understanding RPA: Concepts, Core Modules, Element Analyzer, and Development Stages

This article provides a comprehensive overview of Robotic Process Automation (RPA), covering its definition, integration with AI (IPA), common AI techniques, value propositions, evolution from RPA 1.0 to 4.0, core platform and control‑center modules, element analyzer fundamentals, automation technology classifications, and a brief Q&A session.

AIOperationsRPA
0 likes · 16 min read
Understanding RPA: Concepts, Core Modules, Element Analyzer, and Development Stages
dbaplus Community
dbaplus Community
Feb 6, 2023 · Operations

How Vivo Built a Scalable, Cloud‑Native Monitoring Platform for Millions of Services

This article outlines Vivo's multi‑year journey of designing, evolving, and operating a cloud‑native, AIOps‑enabled monitoring platform that supports tens of thousands of hosts, databases, containers, and services, detailing its architecture, challenges, and future directions for observability and reliability.

OperationsSystem Architectureaiops
0 likes · 18 min read
How Vivo Built a Scalable, Cloud‑Native Monitoring Platform for Millions of Services
Efficient Ops
Efficient Ops
Feb 6, 2023 · Operations

Agricultural Bank of China's DevOps Journey: Building an Integrated Development System

Facing rapid digital transformation demands, Agricultural Bank of China launched a comprehensive DevOps initiative in 2019, establishing an integrated development lifecycle that combines CMMI, TMMi, ITIL, and automated pipelines across five key streams—process, tools, data, standards, and culture—to boost delivery speed, quality, and operational efficiency.

Banking TechnologyDevOpsDigital Transformation
0 likes · 14 min read
Agricultural Bank of China's DevOps Journey: Building an Integrated Development System
Efficient Ops
Efficient Ops
Feb 5, 2023 · Operations

How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Models

Amid digital transformation, six leading Chinese telecom operators adopted the CAICT‑led DevOps Capability Maturity Model, completing 31 assessments that showcase improved IT efficiency, integrated team resources, and accelerated business support across continuous delivery, technical operation, security, and system tooling.

Capability Maturity ModelDevOpsDigital Transformation
0 likes · 14 min read
How China’s Telecom Giants Accelerate IT Efficiency with DevOps Maturity Models
Efficient Ops
Efficient Ops
Feb 2, 2023 · R&D Management

How China’s Leading Banks Boost IT Efficiency with DevOps Maturity Models

This article reviews how six major state‑owned Chinese banks and their subsidiaries applied the China Information and Communication Research Institute's DevOps Capability Maturity Model, detailing assessment numbers, project case studies, implementation challenges, and measurable improvements in continuous delivery, cloud architecture, security, and overall IT performance.

BankingITCloudComputingContinuousDelivery
0 likes · 20 min read
How China’s Leading Banks Boost IT Efficiency with DevOps Maturity Models
ITPUB
ITPUB
Feb 2, 2023 · Operations

Why 80% of Digital Transformations Fail and How to Ensure Success

This article explains why digital transformation is now a must for enterprises, outlines its core purpose of boosting efficiency and revenue, describes the three progressive stages—digitization, data-driven, and intelligent automation—and highlights the strategic, organizational, cultural, and technological factors that determine success.

AIData-drivenDigital Transformation
0 likes · 11 min read
Why 80% of Digital Transformations Fail and How to Ensure Success
Efficient Ops
Efficient Ops
Jan 29, 2023 · Operations

How Linux Kernel Handles TCP Connections: Deep Dive into sock_common and Lookup

This article explores Linux kernel TCP connection handling by examining socket data structures, port range and file descriptor tuning, core functions like tcp_v4_rcv, and lookup mechanisms, while offering practical tips to boost client-side concurrent connections beyond traditional limits.

Linux kernelNetworkingOperations
0 likes · 9 min read
How Linux Kernel Handles TCP Connections: Deep Dive into sock_common and Lookup
Alibaba Cloud Native
Alibaba Cloud Native
Jan 19, 2023 · Cloud Native

How Java Evolved for Cloud‑Native Operations: Key Features from JDK 9‑19

Since JDK 9, Java has accelerated its release cadence and added a suite of cloud‑native capabilities—such as container‑aware metrics, single‑file execution, refined JVM options, fast‑fail memory controls, class‑data sharing, compact strings, active‑processor detection, and Unix‑domain sockets—to better serve modern containerized workloads.

Cloud NativeContainerJDK
0 likes · 17 min read
How Java Evolved for Cloud‑Native Operations: Key Features from JDK 9‑19
Efficient Ops
Efficient Ops
Jan 18, 2023 · Operations

How Zhongyuan Bank Accelerated Digital Transformation with DevOps: A Case Study

This article details Zhongyuan Bank's award-winning DevOps implementation and digital transformation journey, highlighting its rapid delivery improvements, security enhancements, pandemic response initiatives, and numerous industry recognitions that showcase the bank's operational excellence.

BankingDevOpsDigital Transformation
0 likes · 8 min read
How Zhongyuan Bank Accelerated Digital Transformation with DevOps: A Case Study
DevOps
DevOps
Jan 18, 2023 · Operations

Qualitative Analysis as a Metric for Software Quality Measurement

The article explains how qualitative analysis serves as a measurable metric throughout the software lifecycle, outlines five key qualitative methods—interviews, root‑cause analysis, maturity assessment, reviews, and post‑mortems—and demonstrates their practical application for continuous quality improvement.

Maturity AssessmentOperationsRoot Cause Analysis
0 likes · 8 min read
Qualitative Analysis as a Metric for Software Quality Measurement
MaGe Linux Operations
MaGe Linux Operations
Jan 15, 2023 · Operations

How to Slim Down Your Application Logs by Up to 80%

This article explains why oversized logs hurt system performance, then presents a step‑by‑step methodology—including printing only necessary logs, merging duplicate entries, and simplifying payloads—illustrated with real Java code and a concrete case study that reduces daily log volume from 5 GB to under 1 GB.

Operationsdebugjava
0 likes · 8 min read
How to Slim Down Your Application Logs by Up to 80%
ITPUB
ITPUB
Jan 12, 2023 · Operations

How to Build a Truly High‑Availability System: 6 Essential Design Layers

This article breaks down the essential design and operational considerations for achieving high availability across six layers—development standards, application services, storage, product strategy, operations deployment, and incident response—providing concrete practices, metrics, and safeguards to reach four‑nine (99.99%) uptime.

OperationsSystem Designcapacity planning
0 likes · 25 min read
How to Build a Truly High‑Availability System: 6 Essential Design Layers