DevOps Operations Practice
Author

DevOps Operations Practice

We share professional insights on cloud-native, DevOps & operations, Kubernetes, observability & monitoring, and Linux systems.

162
Articles
0
Likes
566
Views
0
Comments
Recent Articles

Latest from DevOps Operations Practice

100 recent articles max
DevOps Operations Practice
DevOps Operations Practice
Dec 17, 2024 · Backend Development

From CPU Alert to Resolution: A Step‑by‑Step Backend Performance Debugging Guide

This article recounts a midnight CPU alert incident and walks through systematic backend troubleshooting—from initial system checks and JVM profiling to algorithm refactoring, database indexing, Docker‑based isolation, and proactive monitoring—demonstrating how to restore service performance and prevent future outages.

DockerJVMJava
0 likes · 7 min read
From CPU Alert to Resolution: A Step‑by‑Step Backend Performance Debugging Guide
DevOps Operations Practice
DevOps Operations Practice
Dec 16, 2024 · Cloud Native

Analysis of OpenAI's December 2024 Outage: Kubernetes Control Plane Overload and Mitigation

The December 11, 2024 OpenAI outage, caused by a misconfigured monitoring service that overloaded the Kubernetes control plane, led to a four‑hour service disruption and was resolved through cluster scaling, API blocking, and resource expansion, highlighting critical infrastructure risks for large‑scale cloud‑native operations.

KubernetesOpenAIOutage
0 likes · 7 min read
Analysis of OpenAI's December 2024 Outage: Kubernetes Control Plane Overload and Mitigation
DevOps Operations Practice
DevOps Operations Practice
Oct 31, 2024 · Operations

Bilibili Data Center Migration: Planning, Execution, and Lessons Learned

This article details Bilibili’s 18‑month, multi‑region data‑center migration, covering background, project challenges, comprehensive planning, execution steps, risk management, automation, and post‑migration benefits, offering practical insights for large‑scale infrastructure relocation and operational optimization.

BilibiliData Center Migrationinfrastructure operations
0 likes · 21 min read
Bilibili Data Center Migration: Planning, Execution, and Lessons Learned
DevOps Operations Practice
DevOps Operations Practice
Oct 22, 2024 · Cloud Native

How to Find the IP Address of a Docker Container

This guide explains how to quickly retrieve the IP address of a running Docker container using simple commands such as `docker ps`, `docker inspect`, and a formatted inspect query, with step‑by‑step instructions and example output for easy debugging and network configuration.

DockerIP addressInspect
0 likes · 3 min read
How to Find the IP Address of a Docker Container
DevOps Operations Practice
DevOps Operations Practice
Oct 10, 2024 · Operations

Seven Key Truths About Operations: Downtime, Automation, Prevention, Technology as a Tool, DevOps, Communication, and Security

Effective operations management acknowledges inevitable downtime, emphasizes automation, prioritizes proactive prevention, treats technology as a means rather than an end, integrates closely with development through DevOps, relies on strong communication, and continuously addresses pervasive security challenges to minimize business impact.

Monitoringautomationdowntime
0 likes · 5 min read
Seven Key Truths About Operations: Downtime, Automation, Prevention, Technology as a Tool, DevOps, Communication, and Security
DevOps Operations Practice
DevOps Operations Practice
Oct 5, 2024 · Operations

Practical Linux Commands for File Classification, Process Monitoring, and Network Analysis

This article demonstrates how to use xargs with find and tar for file handling, ps for identifying high‑memory and high‑CPU processes, and netstat combined with awk and sort to inspect TCP connection states and the top requesting IP addresses, providing essential command‑line techniques for system administrators.

System Administrationnetstatps
0 likes · 4 min read
Practical Linux Commands for File Classification, Process Monitoring, and Network Analysis