Tagged articles

Operations

3329 articles · Page 17 of 34

Dec 30, 2021 · Operations

Master Network Troubleshooting: Proven Strategies to Resolve Common Issues

This comprehensive guide presents a step‑by‑step approach for diagnosing and fixing everyday network problems, covering fault scope identification, link and configuration checks, common diagnostic methods, detailed case studies, and essential command‑line tools for IT professionals.

IT supportOperationsdiagnostic steps

0 likes · 7 min read

Master Network Troubleshooting: Proven Strategies to Resolve Common Issues

IT Architects Alliance

Dec 30, 2021 · Operations

Payment Operations Platform: Role, Architecture, Business Logic, and Design Principles

This article explains the purpose, evolution, business logic, architectural design, and key design principles of a payment operations platform, detailing its user groups, system architecture, interaction model, permission management, and security considerations for internal staff in payment companies.

OperationsPlatformarchitecture

0 likes · 8 min read

Payment Operations Platform: Role, Architecture, Business Logic, and Design Principles

DataFunSummit

Dec 29, 2021 · Operations

How to Build an Operations Monitoring Platform with Spring Boot Admin

This article explains what Spring Boot Admin is, walks through creating a server and client to monitor Spring Boot applications, shows how to configure ports, enable the admin UI, and set up email and custom alert notifications for operational health monitoring.

JavaOperationsSpring Boot

0 likes · 12 min read

How to Build an Operations Monitoring Platform with Spring Boot Admin

Efficient Ops

Dec 28, 2021 · Operations

How China Post Savings Bank Achieved Top‑Tier DevOps Maturity: A Success Story

China Post Savings Bank’s three core systems passed the Level 3 DevOps Continuous Delivery assessment, showcasing leading domestic capabilities, while senior leaders discuss the bank’s DevOps evolution, measurable improvements, future DevSecOps plans, and the broader industry standards driving these results.

Case StudyOperationsSoftware engineering

0 likes · 13 min read

DevOps

Dec 28, 2021 · Operations

The Pros and Cons of Work‑Hour Reporting for Knowledge Workers

This article examines the concept of work‑hour reporting, exploring its definitions, purposes, benefits such as productivity tracking and profit maximisation, and drawbacks including mistrust, administrative overhead, and misalignment with modern knowledge‑work practices, while also discussing agile approaches to time management.

Operationsproductivitytime tracking

0 likes · 10 min read

The Pros and Cons of Work‑Hour Reporting for Knowledge Workers

dbaplus Community

Dec 27, 2021 · Operations

How to Trace Server Latency and Build a Comprehensive Performance Toolkit

This guide explains how to trace transaction latency in multi‑vendor server environments, outlines the key monitoring metrics across CPU, network, disk and processes, compares coarse‑ and fine‑grained sampling, and proposes a unified, AI‑enhanced toolkit for diagnosing hardware and software performance bottlenecks.

AI analysisOperationshardware diagnostics

0 likes · 13 min read

How to Trace Server Latency and Build a Comprehensive Performance Toolkit

Efficient Ops

Dec 27, 2021 · Operations

How Ping An Bank’s Starlink Platform Earned Industry‑Leading DevOps Efficiency Rating

Ping An Bank’s Starlink DevOps platform was awarded the "industry promotion level" in the first batch evaluation of the China Academy of Information and Communications’ DevOps General Efficiency Measurement Model, highlighting its leading domestic performance and the bank’s commitment to digital governance and fine‑grained R&D efficiency management.

Digital GovernanceOperationsdevops

0 likes · 12 min read

DevOps

Dec 27, 2021 · Operations

2021 China Chaos Engineering Survey Report: Findings and Recommendations

Based on 1,016 valid questionnaire responses and 17 enterprise interviews, the 2021 China Chaos Engineering Survey Report reveals low software system stability, limited adoption of chaos engineering, its positive impact on availability, and provides data‑driven recommendations for improving stability through mature tools, metrics, and cultural shifts.

Operationschaos engineeringcloud-native

0 likes · 15 min read

2021 China Chaos Engineering Survey Report: Findings and Recommendations

Efficient Ops

Dec 26, 2021 · Operations

How Zhengzhou Bank Achieved Advanced DevSecOps Maturity: Insights and Lessons

The article reports on Zhengzhou Bank's successful DevSecOps assessment at the 2021 GOLF+ IT New Governance Forum, detailing the bank's interview on implementation practices, cultural, process and technical measures, and the broader significance of the national DevOps maturity model for digital governance.

DevSecOpsDigital GovernanceMaturity Assessment

0 likes · 12 min read

How Zhengzhou Bank Achieved Advanced DevSecOps Maturity: Insights and Lessons

IT Architects Alliance

Dec 26, 2021 · Operations

What Is DevOps? Origins, Principles, and Practical Implementation Guide

This article explains DevOps by tracing its 2008 origins, summarizing evolving wiki definitions, outlining the business drivers behind its popularity, detailing its three core principles—flow, feedback, and continuous learning—and providing concrete technical practices, organizational patterns, and key takeaways for effective adoption.

CI/CDCultureOperations

0 likes · 22 min read

What Is DevOps? Origins, Principles, and Practical Implementation Guide

Efficient Ops

Dec 25, 2021 · Operations

How Anxin Securities Achieved DevOps Maturity: Insights from the 2021 GOLF+ IT Governance Forum

The article reports on Anxin Securities' successful Level‑2 DevOps technology‑operation assessment announced at the 2021 GOLF+ IT Governance Forum, featuring interview highlights from the CIO and operations head, details of the evaluated Financial Store System, and broader industry statistics on DevOps maturity in the securities sector.

OperationsTechnology Governancedevops

0 likes · 11 min read

How Anxin Securities Achieved DevOps Maturity: Insights from the 2021 GOLF+ IT Governance Forum

DeWu Technology

Dec 24, 2021 · Operations

How to Quickly Attribute Live‑Streaming Alert Issues in a Kubernetes Environment

This article walks through a real‑world live‑streaming service alert where response time and goroutine spikes were traced through Grafana metrics, MySQL/Redis performance, routing logic, and Istio sidecar load, ultimately revealing a mis‑reported Istio metric and a resource‑allocation fix to prevent future jitter.

IstioKubernetesLive Streaming

0 likes · 11 min read

How to Quickly Attribute Live‑Streaming Alert Issues in a Kubernetes Environment

Efficient Ops

Dec 24, 2021 · Operations

How Baidu’s iReport Leads the New Era of DevOps Efficiency Measurement

The China Academy of Information and Communications Technology unveiled its DevOps efficiency measurement model, with Baidu’s iReport platform becoming the first to achieve industry‑promotion level certification, and detailed the model’s modules, maturity levels, and practical insights for improving software development performance.

OperationsSoftware engineeringdevops

0 likes · 10 min read

How Baidu’s iReport Leads the New Era of DevOps Efficiency Measurement

Alibaba Cloud Native

Dec 22, 2021 · Operations

How Alibaba’s ASI Powers Massive Serverless Kubernetes at Scale

This article details Alibaba's Serverless Infrastructure (ASI) built on ACK, explaining its large‑scale Kubernetes architecture, fully managed operations, change‑risk controls, gray‑release pipelines, web‑shell access, taskflow orchestration, node lifecycle management, elasticity, risk mitigation, probing, and self‑healing capabilities that enable reliable cloud‑native services.

KubernetesOperationsSRE

0 likes · 32 min read

How Alibaba’s ASI Powers Massive Serverless Kubernetes at Scale

Efficient Ops

Dec 20, 2021 · Cloud Native

How to Build a Scalable Kubernetes Logging System with S6 and Filebeat

This article explains Docker and Kubernetes logging challenges, compares logging drivers, and presents a unified, node‑agent based logging architecture using S6‑based containers, Filebeat, logrotate, Kafka, and Elasticsearch to achieve reliable, auto‑rotating log collection in production environments.

DockerLoggingOperations

0 likes · 8 min read

How to Build a Scalable Kubernetes Logging System with S6 and Filebeat

Zhongtong Tech

Dec 17, 2021 · Operations

How Digitalization Is Revolutionizing China's Logistics Industry

At the WISE2021 China Digital Innovation Summit, Zhongtong Express CTO Zhu Jingxi detailed the company's digital transformation journey, highlighting the impact of electronic waybills, data-driven operations, AI routing, and privacy security on reshaping the logistics supply chain and boosting efficiency.

@DataAIOperations

0 likes · 11 min read

How Digitalization Is Revolutionizing China's Logistics Industry

Java High-Performance Architecture

Dec 17, 2021 · Operations

Explore FinalShell: Free All-in-One SSH, Remote Desktop & Server Management Tool

This article introduces FinalShell, a free, cross‑platform server management suite that combines SSH, remote desktop acceleration, network monitoring, file transfer, and customizable themes, offering a powerful alternative to XShell for developers and operations engineers.

Cross-PlatformFinalShellNetwork Monitoring

0 likes · 5 min read

Explore FinalShell: Free All-in-One SSH, Remote Desktop & Server Management Tool

dbaplus Community

Dec 16, 2021 · Operations

How Ops Leaders Can Transform Teams for the Cloud‑Native Era

In this expert round‑table, senior SRE and DB leaders discuss how operations teams must revamp their management philosophy, processes, knowledge systems, and collaboration models—adopting OKRs, DevOps, AI‑ops, and proactive "left‑shift" practices—to thrive in the cloud‑native landscape.

Knowledge ManagementOperationsdevops

0 likes · 18 min read

How Ops Leaders Can Transform Teams for the Cloud‑Native Era

Wukong Talks Architecture

Dec 15, 2021 · Operations

Understanding Service Avalanche and Circuit Breaker Mechanisms through the Red Cliffs Battle Analogy

This article uses the historic Battle of Red Cliffs as an analogy to explain service avalanche, its causes in micro‑service architectures, and how circuit‑breaker, rate‑limiting, and isolation techniques can prevent cascading failures in modern distributed systems.

OperationsService Avalanchecircuit breaker

0 likes · 14 min read

Understanding Service Avalanche and Circuit Breaker Mechanisms through the Red Cliffs Battle Analogy

Efficient Ops

Dec 13, 2021 · Operations

Why Every Ops Team Needs a Kubernetes Standards Playbook

This article shares practical standards for Kubernetes operations—from infrastructure choices and application packaging to CI/CD tooling—helping teams reduce complexity, improve reliability, and foster continuous learning and sharing in fast‑moving cloud environments.

CI/CDOperationsStandardization

0 likes · 13 min read

Why Every Ops Team Needs a Kubernetes Standards Playbook

Open Source Linux

Dec 12, 2021 · Operations

How to Check and Increase Linux Open File Limits (ulimit, sysctl)

This guide explains how to view and modify Linux's open file descriptor limits using commands like ulimit, sysctl, and by editing system configuration files, covering both system-wide and per‑user settings for improved application performance.

LinuxOperationsfile descriptor

0 likes · 5 min read

How to Check and Increase Linux Open File Limits (ulimit, sysctl)

Top Architect

Dec 12, 2021 · Operations

Blue‑Green, Rolling, and Canary Deployment Strategies Explained

This article introduces three common release strategies—blue‑green deployment, rolling deployment, and canary (gray) deployment—explaining their workflows, advantages, drawbacks, and practical considerations for safely updating production systems during iterative project releases.

Blue-GreenDeploymentOperations

0 likes · 10 min read

Blue‑Green, Rolling, and Canary Deployment Strategies Explained

Programmer DD

Dec 12, 2021 · Operations

How Netflix’s Telltale Transforms Monitoring for 100+ Services

This article explains Netflix’s home‑grown monitoring system Telltale, detailing its design, multi‑dimensional health‑assessment model, intelligent alerting, integration with Slack, deployment monitoring, and continuous optimization that together keep over a hundred production applications running smoothly.

AlertingNetflixOperations

0 likes · 13 min read

How Netflix’s Telltale Transforms Monitoring for 100+ Services

Architects Research Society

Dec 10, 2021 · Backend Development

Principled GraphQL: Ten Principles for Building, Maintaining, and Operating Data Graphs

This article presents ten GraphQL principles—grouped into integrity, agility, and operations—that guide the design, evolution, and safe production deployment of a unified data graph, emphasizing a single schema, collaborative implementation, versioned registries, performance monitoring, and robust access and demand controls.

Best PracticesData GraphGraphQL

0 likes · 19 min read

Principled GraphQL: Ten Principles for Building, Maintaining, and Operating Data Graphs

Top Architect

Dec 10, 2021 · Operations

Comprehensive Guide to Load Balancing: Principles, Types, Algorithms, and Hardware

This article explains the fundamentals of load balancing, covering why it is needed for high‑traffic services, the difference between vertical and horizontal scaling, various load‑balancing techniques (DNS, HTTP, IP, link‑layer, hybrid), common algorithms, and the trade‑offs of software versus hardware solutions.

High AvailabilityOperationsdistributed systems

0 likes · 13 min read

Comprehensive Guide to Load Balancing: Principles, Types, Algorithms, and Hardware

Dada Group Technology

Dec 10, 2021 · Operations

Design and Practice of the Freight Business Check System (BCS)

The article introduces the freight BCS system, explains its business background, describes multiple validation modes for data consistency and business logic correctness, compares implementation approaches, and outlines the architecture, task flow, and future enhancements to improve system reliability and operational monitoring.

Data ConsistencyOperationsSystem Design

0 likes · 10 min read

Design and Practice of the Freight Business Check System (BCS)

Cloud Native Technology Community

Dec 8, 2021 · Cloud Native

Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes

This tutorial walks you through preparing three identical VMs, installing required packages, configuring Rook and Ceph versions, deploying the storage cluster on a Kubernetes 1.20 environment, exposing the Ceph dashboard, and cleaning up the installation, complete with command examples and troubleshooting tips.

CephDeploymentDistributed storage

0 likes · 14 min read

Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes

DevOps

Dec 8, 2021 · Operations

Understanding Digital Transformation Strategy: Definition, Implementation Steps, and Success Guarantees

The article explains what a digital transformation strategy is, how it strategically leverages IT and data, outlines practical steps for implementation, and describes organizational, talent, technology, and governance measures needed to ensure successful enterprise digital transformation.

@DataEnterpriseIT

0 likes · 10 min read

Understanding Digital Transformation Strategy: Definition, Implementation Steps, and Success Guarantees

Java Architect Essentials

Dec 6, 2021 · Databases

Facebook’s MySQL 5.6‑to‑8.0 Migration: Challenges, Process, and Lessons Learned

The article details Facebook’s multi‑year effort to migrate its heavily customized MySQL 5.6 deployment—including the MyRocks storage engine—to MySQL 8.0, describing the technical challenges, patch‑porting strategy, replication changes, automated verification, and application validation performed during the upgrade.

FacebookMigrationMyRocks

0 likes · 17 min read

Facebook’s MySQL 5.6‑to‑8.0 Migration: Challenges, Process, and Lessons Learned

Open Source Linux

Dec 5, 2021 · Operations

Choosing the Right Backup: Normal, Copy, Differential, Incremental

The article explains four primary backup methods—Normal (full), Copy, Differential, and Incremental—detailing their processes, advantages, and drawbacks, and helps readers decide which strategy best balances storage space, recovery speed, and data protection needs.

Data ProtectionIncremental BackupOperations

0 likes · 4 min read

Choosing the Right Backup: Normal, Copy, Differential, Incremental

Open Source Linux

Dec 5, 2021 · Operations

Essential Skill Maps Every DevOps Engineer Should Master

This article compiles a series of visual skill maps covering DevOps, cloud computing, big data, security, architecture, and development practices, offering engineers a comprehensive roadmap to build and expand their technical knowledge across multiple domains.

Big DataCloud ComputingOperations

0 likes · 3 min read

Essential Skill Maps Every DevOps Engineer Should Master

Efficient Ops

Dec 5, 2021 · Operations

Mastering ITIL Event Management: Strategies for Efficient IT Operations

This article explores the fundamentals of ITIL-based event management, detailing its relationship with ITSM, the challenges of unmanaged services, key processes, priority definitions, and three management models—centralized, self‑managed, and collaborative—to help organizations improve service stability and response efficiency.

ITILITSMIncident Management

0 likes · 14 min read

Mastering ITIL Event Management: Strategies for Efficient IT Operations

IT Architects Alliance

Dec 4, 2021 · Operations

Understanding Blue‑Green, Rolling, Canary (Gray) Release and A/B Testing Deployment Strategies

This article explains common deployment strategies—including blue‑green, rolling, canary (gray) releases, and A/B testing—detailing their principles, advantages, drawbacks, and practical considerations for safely delivering new versions in production environments.

A/B testingBlue-GreenDeployment

0 likes · 9 min read

Understanding Blue‑Green, Rolling, Canary (Gray) Release and A/B Testing Deployment Strategies

Aikesheng Open Source Community

Dec 3, 2021 · Operations

Monitoring DBLE with Zabbix: Environment Setup, Scripts, and Template Configuration

This guide explains how to set up a monitoring environment for the DBLE distributed middleware using Zabbix, covering host and software configuration, MySQL master‑slave deployment, DBLE installation, Zabbix script creation, and template configuration with detailed code examples.

DBLEOperationsZabbix

0 likes · 8 min read

Monitoring DBLE with Zabbix: Environment Setup, Scripts, and Template Configuration

IT Architects Alliance

Dec 1, 2021 · Operations

What Does an SRE Actually Do? A Deep Dive into Roles and Practices

This article explains the origins of Site Reliability Engineering, breaks down its three main layers—Infrastructure, Platform, and Business SRE—covers day‑one and day‑2 deployment, on‑call processes, SLI/SLO design, post‑mortems, capacity planning, user support, and offers practical advice for aspiring SREs.

OncallOperationsSLI

0 likes · 24 min read

What Does an SRE Actually Do? A Deep Dive into Roles and Practices

Open Source Linux

Nov 30, 2021 · Operations

Master Dockerfile Optimization: Reduce Image Size and Boost Build Efficiency

This guide explains how to format and optimize Dockerfile instructions, choose minimal base images, avoid root users, manage processes, clean caches, and apply best‑practice tips such as proper ENTRYPOINT scripts and labeling to create smaller, more secure container images.

Operationsbest-practicescontainer

0 likes · 10 min read

Master Dockerfile Optimization: Reduce Image Size and Boost Build Efficiency

Programmer DD

Nov 28, 2021 · Product Management

Why Chinese Cloud Companies Struggle to Match US Counterparts: A Deep Dive into 996 vs 955

The article analyses why Chinese B2B cloud firms, despite long working hours and abundant talent, lag behind US providers, attributing the gap to product‑driven versus sales‑driven strategies, cultural differences, incentive structures, and low willingness to pay for high‑quality solutions.

B2BCloud ComputingIndustry Analysis

0 likes · 9 min read

Why Chinese Cloud Companies Struggle to Match US Counterparts: A Deep Dive into 996 vs 955

iQIYI Technical Product Team

Nov 26, 2021 · Industry Insights

How iQIYI Built an Unmanned Fault‑Handling System for 99% Reliability

This article details iQIYI's unmanned monitoring platform, covering its design goals, overall architecture, core modules such as real‑time data collection, decision engine, and event‑processing engine, as well as the machine‑learning model used for production‑time prediction and the system's operational results and future roadmap.

Machine LearningOperationsfault automation

0 likes · 13 min read

How iQIYI Built an Unmanned Fault‑Handling System for 99% Reliability

dbaplus Community

Nov 25, 2021 · Operations

How Unified Alert Convergence Can Transform Monitoring Systems

This article explains the background and challenges of legacy monitoring systems, defines key concepts such as exceptions, problems, alerts and recoveries, introduces critical metrics like MTTA and MTTR, and details the design, architecture, and core implementation of a unified alert convergence service using Redis delay queues.

MTTAMTTROperations

0 likes · 19 min read

How Unified Alert Convergence Can Transform Monitoring Systems

转转QA

Nov 25, 2021 · Operations

Full‑Chain Production Environment Load Testing for Double 11 Promotion: Process, Findings, and Lessons

This article details the end‑to‑end preparation, execution, reporting, and retrospective of a large‑scale production‑environment load test for the Double 11 shopping festival, covering data preparation, QPS target calculation, multi‑scenario testing, issue analysis, and continuous improvement practices.

Double11OperationsPerformance

0 likes · 8 min read

Full‑Chain Production Environment Load Testing for Double 11 Promotion: Process, Findings, and Lessons

Qingyun Technology Community

Nov 24, 2021 · Operations

How eBPF Toolchains Simplify Kernel Tracing from BCC to BPFtrace

This article walks through the high‑level components of eBPF programs—backend, loader, frontend, and data structures—showing how the original sock_example.c is split into separate files, how LLVM compiles restricted C to ELF, and how projects like BCC, BPFtrace, and IOVisor automate development, deployment, and cloud‑native observability while highlighting their trade‑offs for embedded environments.

BCCLinuxOperations

0 likes · 15 min read

How eBPF Toolchains Simplify Kernel Tracing from BCC to BPFtrace

Architecture Digest

Nov 23, 2021 · Operations

A Historical Overview of DevOps and Its Evolution

This article traces the evolution of DevOps from its roots in Toyota Production System and Kanban through Waterfall, Scrum, Agile, Lean, and modern extensions like ChatOps, GitOps, FinOps and AiOps, highlighting key milestones and their impact on software delivery practices.

AgileKanbanOperations

0 likes · 9 min read

DevOps

Nov 23, 2021 · Operations

Zero‑Downtime Application Deployment: Strategies, Maturity Levels, and Required Technical Components

The article explains why traditional three‑step application releases cause service interruptions, introduces three maturity levels for zero‑downtime deployment, compares blue‑green, rolling, and canary release models, and provides concrete technical components, load‑balancer architectures, and Spring‑Boot/Eureka shutdown procedures to achieve uninterrupted service.

OperationsReleaseZero Downtime

0 likes · 22 min read

Zero‑Downtime Application Deployment: Strategies, Maturity Levels, and Required Technical Components

HelloTech

Nov 22, 2021 · Product Management

How Haro’s Smart Bike Scheduling Boosts Asset Utilization – Key Takeaways from the HiPM Summit

At the HiPM Product Innovation Summit, Haro’s senior product expert shared a five‑point framework and a three‑stage evolution of intelligent bike dispatch, revealing how data‑driven, human‑machine collaboration can transform chaotic bike distribution into efficient, asset‑maximizing operations.

AIOperationsProduct Management

0 likes · 5 min read

How Haro’s Smart Bike Scheduling Boosts Asset Utilization – Key Takeaways from the HiPM Summit

Practical DevOps Architecture

Nov 21, 2021 · Operations

Practical Server Hardware, Linux Kernel, and MySQL Optimization Tips

This article provides practical guidance on server hardware selection, Linux kernel parameter tuning, and MySQL configuration adjustments to improve overall system performance, reliability, and efficiency for production environments.

LinuxMySQLOperations

0 likes · 4 min read

Practical Server Hardware, Linux Kernel, and MySQL Optimization Tips

IT Architects Alliance

Nov 20, 2021 · Operations

Analysis and Optimization of Business System Performance

This article outlines a comprehensive approach to diagnosing and optimizing performance problems in production business systems, covering analysis processes, hardware, OS, database, middleware, JVM tuning, code inefficiencies, and monitoring techniques to identify root causes and improve system reliability.

Database TuningJVM TuningOperations

0 likes · 16 min read

Analysis and Optimization of Business System Performance

Efficient Ops

Nov 19, 2021 · Operations

How Shanghai Pudong Development Bank Achieved Top‑Tier DevOps Maturity Across 8 Projects

Shanghai Pudong Development Bank’s eight systems passed the third‑level DevOps continuous‑delivery assessment, showcasing how standardized processes, tool empowerment, and a unified maturity model can dramatically boost development efficiency, quality, and competitive advantage in the banking sector.

Case StudyMaturity AssessmentOperations

0 likes · 13 min read

How Shanghai Pudong Development Bank Achieved Top‑Tier DevOps Maturity Across 8 Projects

DevOps Cloud Academy

Nov 19, 2021 · Operations

Guide to Using Grafana Stat Panel for Monitoring: Text and Background Modes, Configuration Steps

This tutorial explains how to create and configure Grafana Stat panels—including text and background modes, threshold‑based coloring, unit settings, and Markdown/HTML text panels—to visualize metrics such as node uptime, CPU cores, and total memory on a dashboard.

GrafanaOperationsStat panel

0 likes · 8 min read

Guide to Using Grafana Stat Panel for Monitoring: Text and Background Modes, Configuration Steps

Efficient Ops

Nov 18, 2021 · Operations

Latest DevOps Maturity Assessment Results Reveal Top Companies and New 2+ Level Standards

The 2021 GOPS Global Operations Conference in Shanghai announced the latest DevOps capability maturity assessment results, detailing the enterprises that achieved continuous delivery level 3 and technical operation level 2+, explaining the new 2+ grading, and outlining the DevOps maturity model and its industry adoption.

Capability MaturityOperationscontinuous delivery

0 likes · 6 min read

Latest DevOps Maturity Assessment Results Reveal Top Companies and New 2+ Level Standards

vivo Internet Technology

Nov 17, 2021 · Operations

Design and Architecture of a Unified Alert Convergence System for Monitoring

The paper presents a unified alert convergence system that centralizes metric calculation, detection, and alarm handling across monitoring subsystems, employing mechanisms such as convergence, claiming, silencing, escalation, and a Redis‑based delayed queue integrated via Kafka or REST to reduce alarm fatigue, improve MTTA/MTTR, and enable future AI‑driven AIOps.

MTTAMTTROperations

0 likes · 18 min read

Design and Architecture of a Unified Alert Convergence System for Monitoring

Cloud Native Technology Community

Nov 17, 2021 · Operations

Deploy a Single‑Node Ceph Cluster with Docker in Minutes

This step‑by‑step guide shows how to initialize a CentOS 7 host, disable firewalls, configure time sync, prepare storage disks, pull the official Ceph Docker image, and launch MON, OSD, MGR, RGW, and MDS containers to build a functional single‑node Ceph cluster with a dashboard.

CephDeploymentDocker

0 likes · 15 min read

Deploy a Single‑Node Ceph Cluster with Docker in Minutes

58UXD

Nov 17, 2021 · Operations

How 58 Daojia Scaled Service Center Design Across Hundreds of Stores

This article details the design principles, brand‑value strategies, quality control, and cost‑saving measures used to launch the first 58 Daojia premium service center and expand the concept to nearly a hundred physical stores nationwide.

OperationsProject ManagementService Center

0 likes · 9 min read

How 58 Daojia Scaled Service Center Design Across Hundreds of Stores

Open Source Linux

Nov 16, 2021 · Databases

How to Stress Test Redis with redis-benchmark: A Quick Guide

This guide explains how to use Redis's built-in redis-benchmark tool to simulate concurrent client load, interpret key performance metrics such as request latency and throughput, and monitor server resource usage, helping operators prevent cache-related failures like penetration and avalanche after deployment.

BenchmarkOperationsRedis

0 likes · 3 min read

How to Stress Test Redis with redis-benchmark: A Quick Guide

DevOps

Nov 16, 2021 · Operations

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

The article outlines how enterprises can assess digital transformation outcomes, formulate effective strategies, build large‑scale capabilities, foster agile culture, and continuously monitor progress, drawing on McKinsey research and real‑world examples to guide traditional firms toward sustainable digital growth.

Big DataEnterprise StrategyOperations

0 likes · 17 min read

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

58UXD

Nov 15, 2021 · Operations

How Strategic Visual Design Boosts E‑commerce Campaign Performance

This article examines how thoughtfully crafted main visuals influence user engagement and sales in e‑commerce campaigns, presenting four case studies from the “Super Welfare Day” series that illustrate design background, strategy, visual style, implementation, and measurable results such as an 85.2% GMV lift.

Case StudyDesign thinkingOperations

0 likes · 8 min read

How Strategic Visual Design Boosts E‑commerce Campaign Performance

DevOps Cloud Academy

Nov 15, 2021 · Operations

Creating and Transforming Grafana Table Panels for Server Resource Monitoring

This guide demonstrates how to create a Grafana Table panel to monitor server resources, add multiple queries, merge them using the Transform feature, customize fields and units, and organize rows for a comprehensive dashboard view.

GrafanaOperationsTable Panel

0 likes · 7 min read

Creating and Transforming Grafana Table Panels for Server Resource Monitoring

Open Source Linux

Nov 14, 2021 · Operations

How to Quickly Identify Disk Space Hogs on Linux Servers

Learn step-by-step Linux techniques—including df, du, find, and lsof commands—to pinpoint large directories or files, filter results, handle hidden space consumption, and adjust reserved filesystem space, ensuring you can efficiently resolve unexpected disk usage issues on your servers.

LinuxOperationsdf

0 likes · 4 min read

How to Quickly Identify Disk Space Hogs on Linux Servers

Aikesheng Open Source Community

Nov 12, 2021 · Operations

Monitoring TiDB with Zabbix Server 5.4 – Step‑by‑Step Guide

This article explains how to use Zabbix Server 5.4 to monitor TiDB clusters by configuring HTTP agents, converting Prometheus metrics to JSON, creating custom macros, linking TiDB templates, and verifying data collection, while noting version and OS requirements.

OperationsPrometheusTiDB

0 likes · 5 min read

Monitoring TiDB with Zabbix Server 5.4 – Step‑by‑Step Guide

Full-Stack Internet Architecture

Nov 11, 2021 · Databases

Understanding Redis Sentinel: Architecture, Configuration, and Automatic Failover

This article explains Redis Sentinel’s role in high‑availability deployments, covering its architecture, monitoring and notification mechanisms, automatic failover process, configuration steps for master‑slave and sentinel nodes, and practical guidelines for building a reliable Redis cluster.

High AvailabilityOperationsRedis

0 likes · 21 min read

Understanding Redis Sentinel: Architecture, Configuration, and Automatic Failover

IT Architects Alliance

Nov 11, 2021 · Operations

Design and Implementation of a TB‑Scale Log Monitoring System Using the ELK Stack

This article explains how to build a terabyte‑level log monitoring platform for micro‑service environments by unifying log collection with FileBeat, enriching observability through Elastic APM, processing streams via Kafka Streams, and visualizing metrics with Grafana and Kibana, while addressing cost‑effective filtering and retention strategies.

ELK StackGrafanaLog Monitoring

0 likes · 8 min read

Design and Implementation of a TB‑Scale Log Monitoring System Using the ELK Stack

Open Source Linux

Nov 10, 2021 · Operations

How to Fix Kubernetes Memory Leaks and Expired Certificates: Step‑by‑Step Guide

This article explains common Kubernetes issues such as node memory leaks and certificate expiration, provides diagnostic commands, and offers detailed solutions including disabling kmem accounting, recompiling runc and kubelet, and extending certificate validity to ten years.

Operationscertificate-renewalk8s troubleshooting

0 likes · 12 min read

How to Fix Kubernetes Memory Leaks and Expired Certificates: Step‑by‑Step Guide

Liangxu Linux

Nov 9, 2021 · Operations

Essential Ops Practices: Prevent Disasters with Backups, Security, and Tuning

This guide shares hard‑learned Linux operations lessons—testing before changes, rigorous backups, SSH hardening, firewall rules, monitoring, and systematic performance tuning—to help engineers avoid costly mistakes and keep services stable and secure.

OperationsPerformance Tuningbackup

0 likes · 11 min read

Essential Ops Practices: Prevent Disasters with Backups, Security, and Tuning

Alibaba Cloud Native

Nov 9, 2021 · Cloud Computing

How Nanguo Film Migrated 30+ Services to Alibaba Cloud Serverless in Just 7 Days

In a seven‑day sprint, Nanguo Film transformed its entire streaming platform by moving over 30 systems to Alibaba Cloud's Serverless Application Engine, cutting operational effort by 70%, reducing costs by more than 40%, and achieving ten‑fold faster scaling while maintaining zero downtime.

Alibaba CloudCI/CDCloud Migration

0 likes · 15 min read

How Nanguo Film Migrated 30+ Services to Alibaba Cloud Serverless in Just 7 Days

DevOps

Nov 8, 2021 · Operations

Digital Transformation vs. IT Transformation: Key Differences and How They Should Interact

The article explains that digital transformation is a customer‑driven, end‑to‑end business overhaul distinct from IT transformation, which focuses on technology, highlighting three major differences, the risks of conflating the two, and why digital transformation should ultimately drive IT transformation for lasting competitive advantage.

IT transformationOperationsbusiness strategy

0 likes · 9 min read

Digital Transformation vs. IT Transformation: Key Differences and How They Should Interact

Liangxu Linux

Nov 7, 2021 · Operations

How to Quickly Identify Disk Space Hogs on Linux Servers

This guide explains how to diagnose unexpected disk usage on Linux by using df, du, find, and lsof commands, demonstrates efficient ways to locate large directories or deleted files, and shows how to adjust reserved space with tune2fs to reclaim lost storage.

LinuxOperationsdisk space

0 likes · 5 min read

IT Architects Alliance

Nov 6, 2021 · Operations

Blue‑Green Deployment, Rolling Release, Canary Release, and A/B Testing: Key Strategies for Application Rollout

The article explains four major software release strategies—blue‑green deployment, rolling release, canary (gray) release, and A/B testing—detailing their principles, advantages, drawbacks, and practical considerations for safely rolling out new versions in production environments.

A/B testingBlue-GreenDeployment

0 likes · 9 min read

Blue‑Green Deployment, Rolling Release, Canary Release, and A/B Testing: Key Strategies for Application Rollout

ITFLY8 Architecture Home

Nov 4, 2021 · Operations

Mastering Service Degradation: Strategies to Keep Systems Available

Service degradation involves strategically reducing or disabling non‑essential features during traffic spikes or failures to maintain core functionality, covering concepts like SLA levels, fallback data, rate‑limiting, timeout handling, circuit breaking, and front‑end and back‑end downgrade techniques for high‑availability systems.

OperationsSLAfallback data

0 likes · 14 min read

Mastering Service Degradation: Strategies to Keep Systems Available

Alibaba Cloud Native

Nov 2, 2021 · Cloud Native

How to Spot Load‑Balancing, Scheduling, and Hotspot Issues with Kubernetes Monitoring

This article explains how to use Kubernetes monitoring features such as service details, topology maps, and pod metrics to quickly identify load‑balancing imbalances, cluster scheduling bottlenecks, and resource hotspot problems, providing practical steps and visual examples for improving system reliability and performance.

KubernetesOperationsResource Hotspots

0 likes · 10 min read

How to Spot Load‑Balancing, Scheduling, and Hotspot Issues with Kubernetes Monitoring

Efficient Ops

Nov 1, 2021 · Operations

How AIOps Is Empowering Enterprise Digital Transformation

The article explains how AIOps, built on DevOps principles and leveraging AI and big‑data analytics, helps enterprises overcome governance challenges, improve operational efficiency, and accelerate digital transformation, highlighting standards, real‑world evaluations, and key benefits such as real‑time analysis and noise reduction.

AIOpsIT GovernanceOperations

0 likes · 7 min read

How AIOps Is Empowering Enterprise Digital Transformation

ITFLY8 Architecture Home

Nov 1, 2021 · Operations

Mastering Service Degradation: Strategies to Keep Your System Available Under Load

Service degradation, a crucial reliability technique, involves selectively disabling non-essential features, applying rate limiting, timeout handling, fallback data, and tiered switches across front‑end, back‑end, and infrastructure layers to maintain core functionality during traffic spikes or component failures, ensuring high availability and meeting SLA targets.

FallbackOperationsReliability

0 likes · 13 min read

Mastering Service Degradation: Strategies to Keep Your System Available Under Load

IT Architects Alliance

Oct 31, 2021 · Operations

How to Build a Highly Available Redis Service with Sentinel – A Practical Guide

This article explains why Redis needs high availability, defines common failure scenarios, compares several HA architectures—including single‑instance, master‑slave with one or multiple Sentinel processes, and VIP‑based solutions—and provides step‑by‑step guidance for deploying a robust Redis Sentinel cluster.

High AvailabilityOperationsRedis

0 likes · 13 min read

How to Build a Highly Available Redis Service with Sentinel – A Practical Guide

YunZhu Net Technology Team

Oct 29, 2021 · Operations

Digital Transformation and the Role of Time Tracking in Management Decision‑Making

The article explains how digital transformation requires lean management and data‑driven decision making, using time‑tracking tools like TAPD to capture work hours, analyze costs, and link talent utilization to project outcomes for improved operational efficiency and transparent business guidance.

Lean ManagementOperationscost analysis

0 likes · 3 min read

Digital Transformation and the Role of Time Tracking in Management Decision‑Making

58UXD

Oct 29, 2021 · Operations

How the CST Model Boosts User Conversion: A Design Case Study

This article examines how applying the CST design model, user segmentation, and psychological principles such as mental accounting and social proof can significantly improve conversion rates for a savings membership product.

AB testingCST modelOperations

0 likes · 7 min read

How the CST Model Boosts User Conversion: A Design Case Study

Huolala Tech

Oct 29, 2021 · Operations

How Huolala Guarantees Cloud‑Native Stability at Scale

In this detailed account of Huolala's 2021 Cloud Operations Best Practices talk, the company shares its multi‑cloud architecture, service‑oriented governance, capacity‑testing, monitoring, and risk‑prediction techniques that together ensure high‑availability and efficient scaling for its diverse logistics services.

Multi-CloudOperationsService Governance

0 likes · 17 min read

How Huolala Guarantees Cloud‑Native Stability at Scale

Tencent IMWeb Frontend Team

Oct 25, 2021 · Backend Development

Mastering Node.js Backend Logging: Design, Tools, and Full‑Trace Strategies

This article shares a comprehensive guide to building robust logging systems for Node.js backend services, covering log types, storage options, performance considerations, full‑trace design, custom field schemas, integration with cloud log platforms, and practical troubleshooting examples.

Backend DevelopmentLoggingNode.js

0 likes · 15 min read

Mastering Node.js Backend Logging: Design, Tools, and Full‑Trace Strategies

Top Architect

Oct 25, 2021 · Operations

Capacity Design and Performance Evaluation: Estimating QPS, Concurrency, and System Scaling

The article explains how to assess system capacity by analyzing daily traffic, calculating average and peak QPS, applying the 80/20 rule, conducting stress tests, and adjusting resources, illustrated with a sports event example and a book reservation system case study.

ConcurrencyOperationsQPS

0 likes · 11 min read

Capacity Design and Performance Evaluation: Estimating QPS, Concurrency, and System Scaling

Alibaba Terminal Technology

Oct 25, 2021 · Cloud Native

How Alibaba’s AServer Gateway Evolved to a Cloud‑Native Architecture

Alibaba’s AServer access gateway, handling billions of users and millions of QPS, transitioned from a monolithic tengine‑based system to a cloud‑native, containerized architecture with Kubernetes, Pilot, and Envoy, improving operational complexity, dynamic routing, traffic isolation, and scalability for massive e‑commerce traffic.

KubernetesOperationsService Mesh

0 likes · 17 min read

How Alibaba’s AServer Gateway Evolved to a Cloud‑Native Architecture

Efficient Ops

Oct 22, 2021 · Operations

How Zhengzhou Bank Boosted Delivery Speed 2.3× with DevOps Standard Assessment

Zhengzhou Bank’s new retail loan system passed the third‑level DevOps continuous‑delivery assessment, leading to a 2.31‑fold increase in annual delivery demand, an 18‑day reduction in cycle time, and a shift to automated, seconds‑level environment delivery, illustrating the transformative power of standardized DevOps practices.

Case StudyOperationsSoftware engineering

0 likes · 12 min read

How Zhengzhou Bank Boosted Delivery Speed 2.3× with DevOps Standard Assessment

Efficient Ops

Oct 22, 2021 · Operations

Key Takeaways from China’s DevOps Summit: Enterprise Maturity & AIOps Standards

The 2021 DevOps International Summit in Beijing announced the latest DevOps capability maturity assessments for dozens of enterprises, introduced the new AIOps maturity model, and highlighted the global significance of China’s DevOps standards across finance, telecom, and other industries.

AIOpsCloud ComputingMaturity Model

0 likes · 7 min read

Key Takeaways from China’s DevOps Summit: Enterprise Maturity & AIOps Standards

Efficient Ops

Oct 22, 2021 · Operations

What Do 42 Companies Reveal About DevOps Maturity in China?

The DevOps International Summit in Beijing announced that 42 enterprises covering 108 projects achieved level‑3 maturity in the CAICT DevOps Capability Model, highlighting the impact of standardized tools and processes on software delivery efficiency across finance, telecom and other sectors.

ChinaMaturity ModelOperations

0 likes · 6 min read

What Do 42 Companies Reveal About DevOps Maturity in China?

Open Source Linux

Oct 20, 2021 · Operations

Master Linux File Renaming: Sed, Substring Tricks & Extension Swaps

This guide demonstrates how to batch‑replace text with sed, rename files using Bash substring expansion, and change file extensions efficiently on a Linux system, providing clear command examples and practical one‑liners for everyday operations.

File RenamingOperationsbash

0 likes · 8 min read

Master Linux File Renaming: Sed, Substring Tricks & Extension Swaps

ByteFE

Oct 20, 2021 · Operations

Troubleshooting DNS Resolution Failure of goofy.app in Singapore Office Due to DNSSEC Misconfiguration

After users in Singapore reported inability to resolve the internal domain goofy.app, a systematic investigation revealed that DNSSEC misconfiguration—specifically an incorrect DS record—caused DNS resolution failures globally, while Chinese DNS servers succeeded due to disabled DNSSEC validation, and removing the faulty key resolved the issue.

DNSSECDomain ResolutionOperations

0 likes · 8 min read

Troubleshooting DNS Resolution Failure of goofy.app in Singapore Office Due to DNSSEC Misconfiguration

Baidu Geek Talk

Oct 20, 2021 · Operations

Practical Strategies for Building High‑Availability Systems

This article presents a comprehensive, step‑by‑step guide on improving system reliability through early fault detection, scope reduction, frequency reduction, and rapid incident handling, using real‑world practices from Baidu's commercial hosting platform.

High AvailabilityLog StandardizationOperations

0 likes · 20 min read

Practical Strategies for Building High‑Availability Systems

Open Source Linux

Oct 19, 2021 · Operations

Master Dockerfile: Essential Commands and Best Practices Explained

This guide walks through every Dockerfile instruction—from variables and FROM to ENTRYPOINT and HEALTHCHECK—explaining syntax, usage tips, and common pitfalls, so you can build efficient, reproducible container images with confidence.

DockerImage BuildingOperations

0 likes · 13 min read

Master Dockerfile: Essential Commands and Best Practices Explained

Open Source Linux

Oct 19, 2021 · Operations

Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring

This guide shares practical Linux operations lessons—ranging from cautious command use, rigorous backup habits, and secure SSH configurations to comprehensive monitoring and performance tuning—to help teams avoid costly mistakes and maintain stable, reliable services.

OperationsPerformance Tuningbackup

0 likes · 12 min read

Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring

Zhongtong Tech

Oct 19, 2021 · Operations

Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing

This article details ZTO's evolution from traditional offline and online load‑testing approaches—highlighting their shortcomings—to a comprehensive full‑chain performance testing solution that uses JavaAgent probes, shadow resources, and a structured deployment and verification process to ensure safe, accurate production testing.

Operationsfull-chain testingload testing

0 likes · 17 min read

Transforming Load Testing at ZTO: From Offline Pitfalls to Safe Full‑Chain Online Testing

Practical DevOps Architecture

Oct 17, 2021 · Cloud Native

Viewing and Scaling Pods in Kubernetes with kubectl

This guide demonstrates how to list current pod instances, increase the replica count of a deployment, and verify the updated pod status in a Kubernetes cluster using kubectl commands.

OperationsPod Scalingkubectl

0 likes · 3 min read

Viewing and Scaling Pods in Kubernetes with kubectl

360 Tech Engineering

Oct 15, 2021 · Operations

Log Collection Architecture Using Filebeat, Logstash, and Kafka

This article describes a lightweight, resource‑efficient log collection solution that combines Filebeat agents, optional Logstash aggregation, and Kafka transport, detailing configuration choices, meta‑persistence, back‑pressure mechanisms, monitoring setup, and deployment architecture for reliable at‑least‑once delivery.

LogstashOperationsbackpressure

0 likes · 14 min read

Log Collection Architecture Using Filebeat, Logstash, and Kafka

DataFunTalk

Oct 15, 2021 · Artificial Intelligence

Risk Control and Operations for Existing Credit Customers: Models, Strategies, and Practices

This article examines how financial institutions can manage risk and improve operations for existing loan customers by analyzing client flow, regulatory impacts, accelerated deterioration, and layered segmentation, and by applying advanced models such as rule‑based alerts, B‑card scoring, LSTM, and survival analysis to enable timely risk detection and targeted cross‑selling.

Customer SegmentationMachine LearningOperations

0 likes · 20 min read

Risk Control and Operations for Existing Credit Customers: Models, Strategies, and Practices

IT Architects Alliance

Oct 14, 2021 · Operations

How to Build a TB‑Scale Log Monitoring System with ELK Stack

This article explains how to design and implement a TB‑level log monitoring platform for micro‑service environments using ELK Stack, Filebeat, Elastic APM, Kafka Streams, Prometheus, and Grafana, covering data collection, filtering, storage, and visualization while addressing cost and resource constraints.

ELKGrafanaLog Monitoring

0 likes · 9 min read

How to Build a TB‑Scale Log Monitoring System with ELK Stack

DevOps

Oct 12, 2021 · Operations

Gray Release (Canary Deployment): Concepts, Benefits, and Implementation Guide

This article explains what gray release (canary deployment) is, why it is needed to reduce risk and improve product quality, and provides a step‑by‑step guide covering strategy, user targeting, data feedback, rollback, deployment architectures, and version management for modern software operations.

Canary DeploymentOperationsVersion Control

0 likes · 13 min read

Gray Release (Canary Deployment): Concepts, Benefits, and Implementation Guide

Open Source Linux

Oct 11, 2021 · Operations

10 Essential Ops Principles Every Engineer Should Follow

This article shares ten practical operations guidelines—from avoiding duplicated work and embracing mistakes to emphasizing monitoring, backup roles, clear division of labor, and continuous improvement—aimed at boosting reliability, efficiency, and team cohesion for both engineers and managers.

Best PracticesOperationsReliability

0 likes · 10 min read

10 Essential Ops Principles Every Engineer Should Follow

Open Source Linux

Oct 10, 2021 · Operations

Essential Linux Command-Line Tools to Boost Your Productivity

This article presents a curated list of powerful Linux command-line utilities—ranging from fast file searchers and interactive Git viewers to system monitors and multi‑threaded downloaders—each explained with concise descriptions and usage examples to help developers and sysadmins work more efficiently.

OperationsToolscommand-line

0 likes · 5 min read

Essential Linux Command-Line Tools to Boost Your Productivity

HaoDF Tech Team

Oct 8, 2021 · Operations

Understanding SRE: Foundations, Metrics, and Tackling Technical Debt

This article introduces the fundamentals of Site Reliability Engineering (SRE), explains how to measure service stability with metrics like MTTR, MTBF, and availability, outlines the SRE workflow from prevention to post‑mortem, and discusses how to identify and reduce technical debt to improve system health.

OperationsReliabilitySRE

0 likes · 18 min read

Understanding SRE: Foundations, Metrics, and Tackling Technical Debt

dbaplus Community

Oct 7, 2021 · Databases

How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments

This article explains what MySQL slow queries are, why they cause system failures, proposes multi‑dimensional metrics to assess their severity, outlines concrete guidelines and change standards, and shares real‑world optimization cases and daily operational practices for eliminating slow SQL.

Database PerformanceMySQLOperations

0 likes · 13 min read

How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments

Top Architect

Oct 7, 2021 · Backend Development

Essential Linux Commands and Java Debugging Tools for Backend Engineers

This article compiles a practical set of Linux command examples and Java debugging utilities—including tail, grep, awk, find, tsar, btrace, Greys, Arthas, JProfiler, and various JVM tools—to help backend developers quickly diagnose and resolve performance and stability issues in production environments.

JavaLinuxOperations

0 likes · 13 min read

Essential Linux Commands and Java Debugging Tools for Backend Engineers

MaGe Linux Operations

Oct 6, 2021 · Operations

How to Accelerate Call Center Incident Resolution with Smart Monitoring and Automation

This article outlines a comprehensive approach to handling call‑center incidents, covering common troubleshooting steps, proactive monitoring enhancements, well‑structured emergency plans, and intelligent event‑driven automation to reduce downtime and improve operational efficiency.

Incident ManagementOperationsautomation

0 likes · 12 min read

How to Accelerate Call Center Incident Resolution with Smart Monitoring and Automation

IT Architects Alliance

Oct 1, 2021 · Operations

Understanding Service Degradation: Definitions, Levels, and Mitigation Strategies

The article explains service degradation concepts, defines SLA levels and the meaning of six nines, and details various degradation techniques such as fallback data, rate‑limiting, timeout, fault handling, read/write strategies, frontend safeguards, and the use of switches and pre‑embedding to maintain system availability during traffic spikes or failures.

FallbackOperationsSLA

0 likes · 12 min read

Understanding Service Degradation: Definitions, Levels, and Mitigation Strategies

Continuous Delivery 2.0

Sep 30, 2021 · Operations

Key Findings from the 2021 DORA DevOps Report: SRE Practices, Documentation, Security, and Culture

The 2021 DORA DevOps Report reveals that elite teams outperform low‑performing teams by adopting SRE principles, high‑quality documentation, integrated security, modern technical practices such as loose coupling, continuous testing, CI/CD, and a performance‑driven culture that fosters belonging and inclusion.

CultureOperationsSRE

0 likes · 19 min read

Key Findings from the 2021 DORA DevOps Report: SRE Practices, Documentation, Security, and Culture

Liangxu Linux

Sep 28, 2021 · Operations

Top Linux CLI Tools for Real‑Time Network Bandwidth Monitoring

This article surveys a collection of Linux command‑line utilities that can monitor overall, per‑interface, per‑socket, and per‑process network bandwidth, explaining how each tool works, what data it reports, and how to install it on major distributions.

Command-line ToolsLinuxNetwork Monitoring

0 likes · 10 min read

Top Linux CLI Tools for Real‑Time Network Bandwidth Monitoring