Tagged articles
3281 articles
Page 3 of 33
DevOps Coach
DevOps Coach
Oct 2, 2025 · Interview Experience

Top 10 SRE Interview Questions & Answers to Ace Your Next Interview

This article compiles ten essential Site Reliability Engineering interview questions covering incident command systems, shell types, browser request flow, SSH, error budgets, toil reduction, Linux boot process, QUIC benefits, UDP VPN usage, and common enterprise network protocols, providing concise answers to help you prepare effectively.

DevOpsOperationsReliability
0 likes · 10 min read
Top 10 SRE Interview Questions & Answers to Ace Your Next Interview
Raymond Ops
Raymond Ops
Oct 2, 2025 · Operations

Step‑by‑Step Guide: Deploy Zabbix 7.0 on Ubuntu 22.04 LTS

This tutorial walks you through preparing the environment, adding the Zabbix repository, installing required packages, configuring MySQL, setting up Zabbix server and agent, customizing the web interface, and adding monitored hosts on Ubuntu 22.04 LTS, complete with commands and screenshots.

OperationsUbuntuZabbix
0 likes · 7 min read
Step‑by‑Step Guide: Deploy Zabbix 7.0 on Ubuntu 22.04 LTS
Ops Community
Ops Community
Oct 2, 2025 · Operations

How to Fix Nginx 502 Bad Gateway Errors: A 90% Success Checklist

This article provides a comprehensive, step‑by‑step checklist for diagnosing and resolving Nginx 502 Bad Gateway errors, covering backend service verification, configuration checks, log analysis, resource monitoring, network troubleshooting, special scenarios, and long‑term preventive measures.

502BackendBad Gateway
0 likes · 25 min read
How to Fix Nginx 502 Bad Gateway Errors: A 90% Success Checklist
DevOps Coach
DevOps Coach
Oct 1, 2025 · Operations

10 Hard‑Earned Infrastructure Lessons Every Engineer Should Know

Drawing from real incidents like SQLite crashes, missing logs, unthrottled APIs, slow container startups, queue bottlenecks, network partitions, unreliable clocks, and weak alerts, this article shares ten concrete infrastructure lessons with code examples, performance data, and practical recommendations to avoid costly pitfalls.

DevOpsGoInfrastructure
0 likes · 8 min read
10 Hard‑Earned Infrastructure Lessons Every Engineer Should Know
MaGe Linux Operations
MaGe Linux Operations
Oct 1, 2025 · Operations

How a Single rm -rf Command Almost Wiped My Data—and the Backup Plan That Saved It

A disastrous rm -rf command erased 2.3 TB of production MySQL data, but a meticulously designed multi‑layer backup strategy—including logical, physical, real‑time, and cloud backups—enabled a 99.4% data recovery within 72 hours, highlighting essential lessons and best‑practice guidelines for reliable data protection.

BackupData ProtectionOperations
0 likes · 36 min read
How a Single rm -rf Command Almost Wiped My Data—and the Backup Plan That Saved It
MaGe Linux Operations
MaGe Linux Operations
Oct 1, 2025 · Operations

How Automated Ops Cut Service Restarts by 80% and Save Hours Daily

Discover a comprehensive automated operations framework that eliminates manual service restarts, reduces repetitive tasks by 80%, accelerates fault recovery from minutes to seconds, and boosts reliability through health checks, Kubernetes self‑healing, Systemd scripts, monitoring, and scalable deployment strategies.

AutomationOperationsmonitoring
0 likes · 37 min read
How Automated Ops Cut Service Restarts by 80% and Save Hours Daily
DevOps Cloud Academy
DevOps Cloud Academy
Sep 28, 2025 · Operations

Mastering LLMOps: Essential Practices for Managing Large Language Models

This article outlines the lifecycle of large language models and presents LLMOps best practices—including data management, model development, deployment, monitoring, prompt engineering, and security—to help engineers build, scale, and maintain production-ready LLM applications.

LLMOpsOperationsartificial intelligence
0 likes · 19 min read
Mastering LLMOps: Essential Practices for Managing Large Language Models
Architecture Breakthrough
Architecture Breakthrough
Sep 28, 2025 · Operations

How to Build an Organizational High‑Availability Mechanism for Banking IT Production Issues

This article outlines a comprehensive, step‑by‑step framework for establishing a high‑availability system in large‑scale banking IT, covering goal definition, logical architecture, service classification, key activity identification, capability upgrades, monitoring, emergency‑response asset creation, technical debt tracking, and periodic post‑mortem redesign.

OperationsProcess DesignTechnical Debt
0 likes · 10 min read
How to Build an Organizational High‑Availability Mechanism for Banking IT Production Issues
Raymond Ops
Raymond Ops
Sep 27, 2025 · Operations

Unlock Linux File Permissions: A Complete Guide to Managing Access Rights

This article explains Linux file permission concepts, numeric and symbolic representations, how to combine and modify permissions with chmod (including recursive changes), ownership categories, testing permission effects on files and directories, and default permission settings derived from umask.

OperationsPermissionsUnix
0 likes · 6 min read
Unlock Linux File Permissions: A Complete Guide to Managing Access Rights
Ray's Galactic Tech
Ray's Galactic Tech
Sep 26, 2025 · Operations

Master Spring Boot Admin: Real‑Time Monitoring for Microservices

Spring Boot Admin is an open‑source tool that provides real‑time health checks, JVM metrics, log management, environment inspection, JMX control, and customizable alerts for Spring Boot applications, and this guide explains its core features, architecture, quick setup, advanced security, notification, Actuator integration, and production best practices.

AdminJavaMicroservices
0 likes · 7 min read
Master Spring Boot Admin: Real‑Time Monitoring for Microservices
ITPUB
ITPUB
Sep 26, 2025 · Operations

50 Must‑Know Linux Configuration Files for Sysadmins

This guide enumerates fifty essential Linux configuration files across categories such as user and permission management, network settings, startup services, logging, shell environment, cron scheduling, and system information, explaining each file’s path and primary purpose for system administrators.

ConfigurationOperationsSystem Administration
0 likes · 10 min read
50 Must‑Know Linux Configuration Files for Sysadmins
Ops Community
Ops Community
Sep 25, 2025 · Operations

How to Master Linux Operations: A Step‑by‑Step Roadmap from Junior to Senior

The article outlines the challenging early duties of operations engineers, emphasizes a positive learning mindset, and presents a three‑stage Linux operations career roadmap—from junior to senior—illustrated with diagrams, guiding readers toward the skills needed for a cloud‑focused engineering path.

Learning PathOperationscareer roadmap
0 likes · 3 min read
How to Master Linux Operations: A Step‑by‑Step Roadmap from Junior to Senior
MaGe Linux Operations
MaGe Linux Operations
Sep 24, 2025 · Operations

How I Pinpointed the Real Culprit of a 100% CPU Spike in Production in Just 3 Minutes

When a production server hit 100% CPU at 3 AM, the author walks through a three‑minute, step‑by‑step method—quickly identifying the offending process, drilling into threads, and pinpointing problematic code—while sharing useful shell commands, common pitfalls, advanced safeguards like cgroup limits and eBPF tracing.

CPU troubleshootingLinux performanceOperations
0 likes · 9 min read
How I Pinpointed the Real Culprit of a 100% CPU Spike in Production in Just 3 Minutes
Open Source Linux
Open Source Linux
Sep 23, 2025 · Operations

20 Essential Linux Commands Every Ops Engineer Must Master

This guide walks you through twenty indispensable Linux commands—covering system monitoring, performance analysis, process management, networking, disk handling, and tuning—explaining their basic and advanced usages, real‑world pitfalls, and how they stay relevant in the cloud‑native era.

NetworkingOperationsShell
0 likes · 12 min read
20 Essential Linux Commands Every Ops Engineer Must Master
Ops Community
Ops Community
Sep 22, 2025 · Operations

20 Essential Linux Commands Every Ops Engineer Must Master

This guide presents twenty indispensable Linux commands—covering system monitoring, performance analysis, process management, networking, disk handling, and system tuning—along with practical examples, tips, and common pitfalls, empowering operations engineers to quickly diagnose and resolve production issues in modern cloud‑native environments.

DevOpsOperationscommand-line
0 likes · 14 min read
20 Essential Linux Commands Every Ops Engineer Must Master
Liangxu Linux
Liangxu Linux
Sep 21, 2025 · Operations

10 Essential Linux Performance Monitoring Commands Every Sysadmin Should Know

Master Linux system performance by learning ten powerful monitoring commands—top, vmstat, lsof, iotop, iostat, htop, netstat, iftop, tcpdump, and nethogs—each illustrated with usage examples and output, enabling quick diagnosis of CPU, memory, disk, and network issues.

Linux toolsOperationsPerformance Monitoring
0 likes · 10 min read
10 Essential Linux Performance Monitoring Commands Every Sysadmin Should Know
Raymond Ops
Raymond Ops
Sep 20, 2025 · Operations

Quick Guide: Install Portainer to Manage Docker & Kubernetes

This guide walks you through installing the open‑source Portainer UI, checking system and Docker versions, running the Portainer server and agent containers, accessing the web interface via HTTP/HTTPS, and completing the initial admin setup, providing a streamlined way to manage Docker and Kubernetes resources without command‑line complexity.

Container ManagementDevOpsDocker
0 likes · 5 min read
Quick Guide: Install Portainer to Manage Docker & Kubernetes
Liangxu Linux
Liangxu Linux
Sep 20, 2025 · Operations

Master Linux Filesystem Hierarchy: Complete Guide for Sysadmins

This comprehensive guide explains the Linux Filesystem Hierarchy Standard (FHS), detailing each top‑level directory, its purpose, typical contents, common commands, and best‑practice administration techniques, helping system administrators and DevOps engineers understand, manage, and optimize the directory structure for security and performance.

Directory StructureFHSFilesystem
0 likes · 23 min read
Master Linux Filesystem Hierarchy: Complete Guide for Sysadmins
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Sep 19, 2025 · Operations

How to Become a Senior Linux Operations Engineer: A Step‑by‑Step Learning Roadmap

From the gritty early tasks of fixing hardware to a clear three‑stage career roadmap, this guide outlines a comprehensive learning path for Linux operations engineers, highlighting essential skills and resources to help you progress from junior to senior roles in cloud‑focused system administration.

Operationscareer pathcloud computing
0 likes · 3 min read
How to Become a Senior Linux Operations Engineer: A Step‑by‑Step Learning Roadmap
Liangxu Linux
Liangxu Linux
Sep 16, 2025 · Operations

Boost Linux Network Performance: Practical TCP/IP Stack Tuning Guide

This article presents a comprehensive, step‑by‑step guide for Linux network performance optimization, covering real‑world issues, TCP and IP stack parameter tweaks, queue and interrupt tuning, high‑concurrency scenarios, monitoring scripts, a detailed e‑commerce case study, best‑practice recommendations, and common pitfalls.

OperationsPerformance OptimizationTCP Tuning
0 likes · 13 min read
Boost Linux Network Performance: Practical TCP/IP Stack Tuning Guide
DevOps Coach
DevOps Coach
Sep 16, 2025 · Operations

How to Ace DevOps Interviews: Real Skills Over Certifications

The article examines the current chaos in DevOps hiring, exposing how certifications often mask a lack of practical ability, and offers concrete, experience‑based strategies—such as mastering fundamentals, troubleshooting, log analysis, honest self‑assessment, and building a solid portfolio—to succeed in DevOps interviews and improve hiring processes.

DevOpsOperationsSkills
0 likes · 10 min read
How to Ace DevOps Interviews: Real Skills Over Certifications
MaGe Linux Operations
MaGe Linux Operations
Sep 15, 2025 · Operations

Master Nginx Troubleshooting: From 502 Errors to Performance Optimization

This article walks you through ten real-world Nginx failure cases—covering 502 errors, SSL expiration, high concurrency bottlenecks, cache misconfigurations, log rotation issues, load‑balancing mistakes, security gaps, reverse‑proxy quirks, URL rewrite conflicts, and monitoring—while teaching a systematic diagnostic methodology for ops engineers.

502 errorDevOpsOperations
0 likes · 27 min read
Master Nginx Troubleshooting: From 502 Errors to Performance Optimization
Architect's Tech Stack
Architect's Tech Stack
Sep 15, 2025 · Operations

Deploy and Explore the Jianmu No‑Code CI/CD Platform

This guide introduces the open‑source Jianmu CI/CD tool, explains its no‑code/low‑code approach, provides step‑by‑step deployment instructions via Docker‑Compose or Kubernetes, and walks you through creating and running a sample workflow, with links to online demos and resources.

DeploymentOperationslow-code
0 likes · 6 min read
Deploy and Explore the Jianmu No‑Code CI/CD Platform
Liangxu Linux
Liangxu Linux
Sep 14, 2025 · Operations

21 Essential Linux Commands Every Sysadmin Should Master

Master the most useful Linux commands for everyday system administration, covering file navigation, permission handling, process control, text manipulation, compression, and system shutdown, with clear examples and syntax to boost efficiency and confidence in handling common operational tasks.

OperationsShellcommand-line
0 likes · 7 min read
21 Essential Linux Commands Every Sysadmin Should Master
Raymond Ops
Raymond Ops
Sep 12, 2025 · Operations

Mastering vsftpd: Essential Configuration Settings for Secure FTP

This guide walks through 21 essential vsftpd configuration options—including command port changes, active/passive mode ports, anonymous login and upload settings, user mapping, chroot restrictions, logging, banner messages, PAM authentication, connection limits, timeouts, transfer rates, and text mode—providing example commands and troubleshooting tips for Linux FTP servers.

FTPOperationslinux
0 likes · 10 min read
Mastering vsftpd: Essential Configuration Settings for Secure FTP
Liangxu Linux
Liangxu Linux
Sep 11, 2025 · Operations

20 Essential Linux Commands Every Sysadmin Should Master

Mastering these 20 high‑frequency Linux commands—from navigating directories and managing files to monitoring processes—empowers system administrators to automate routine tasks, troubleshoot efficiently, and boost productivity across file management, process control, system monitoring, and remote operations.

OperationsShellSysadmin
0 likes · 7 min read
20 Essential Linux Commands Every Sysadmin Should Master
Ops Community
Ops Community
Sep 11, 2025 · Operations

Mastering LVS+Keepalived: Build a Production-Ready High-Availability Load Balancer

This comprehensive guide walks you through the principles, architecture, deployment steps, performance tuning, monitoring, and advanced techniques for building a robust, production‑grade high‑availability load‑balancing solution using LVS and Keepalived, suitable for both beginners and seasoned engineers.

LVSOperationskeepalived
0 likes · 23 min read
Mastering LVS+Keepalived: Build a Production-Ready High-Availability Load Balancer
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Sep 11, 2025 · Operations

Mastering Load Balancing: Single, Dual, and Multi‑Layer Architectures Explained

This article explains the fundamentals of load balancing, describing single‑layer, dual‑layer, and multi‑layer architectures, their advantages, disadvantages, and suitable scenarios, helping readers choose the right design based on traffic volume, availability, security, topology, budget, and operational capabilities.

OperationsScalabilityhigh availability
0 likes · 6 min read
Mastering Load Balancing: Single, Dual, and Multi‑Layer Architectures Explained
Architect
Architect
Sep 10, 2025 · Operations

Building System Stability: A Backend Engineer’s Guide to Risk Management

This article explores system stability from a backend perspective, defining its academic and engineering meanings, quantifying metrics like SLA, MTBF and MTTR, analyzing why stability matters, outlining the challenges faced, and presenting practical steps—including resource consensus, goal setting, awareness cultivation, production standards, monitoring, emergency response, and regular inspections—to effectively build and maintain stable systems.

Operationsmonitoringrisk management
0 likes · 25 min read
Building System Stability: A Backend Engineer’s Guide to Risk Management
MaGe Linux Operations
MaGe Linux Operations
Sep 10, 2025 · Operations

149 Practical Shell Script Examples Every Ops Engineer Should Know

This article presents a curated selection of 149 shell script cases—including tasks such as locating zombie processes, removing empty files, summing numbers, formatting dates, checking file existence, sorting integers, retrieving MAC addresses, and determining leap years—to help operations engineers automate routine work and boost productivity.

AutomationBashOperations
0 likes · 8 min read
149 Practical Shell Script Examples Every Ops Engineer Should Know
MaGe Linux Operations
MaGe Linux Operations
Sep 9, 2025 · Operations

Master Nginx Rate Limiting & Anti‑Crawler Techniques: A Complete Ops Engineer Guide

This guide walks operations engineers through the principles and practical configurations of Nginx rate limiting and anti‑crawler protection, covering token‑bucket and leaky‑bucket algorithms, IP and URI based limits, geo‑based controls, advanced User‑Agent filtering, JavaScript challenges, monitoring, performance tuning, and troubleshooting.

DevOpsOperationsanti‑crawler
0 likes · 19 min read
Master Nginx Rate Limiting & Anti‑Crawler Techniques: A Complete Ops Engineer Guide
Ops Community
Ops Community
Sep 9, 2025 · Operations

Master Linux Filesystem: Complete Guide to Directory Structure for Sysadmins

This comprehensive tutorial walks you through the Linux filesystem hierarchy, explaining the purpose of each core directory, best practices for management, real-world examples, and advanced operational tips such as permission hardening, monitoring, backup strategies, and performance optimization for reliable system administration.

Directory StructureFilesystemOperations
0 likes · 29 min read
Master Linux Filesystem: Complete Guide to Directory Structure for Sysadmins
Raymond Ops
Raymond Ops
Sep 8, 2025 · Operations

All-in-One Linux Init Scripts for Rocky, AlmaLinux, CentOS, Ubuntu, Debian & More

This article introduces a comprehensive collection of shell scripts that automate system initialization across many Linux distributions—Rocky, AlmaLinux, CentOS, Ubuntu, Debian, openEuler, AnolisOS, OpenCloudOS, openSUSE, Kylin Server and UOS Server—covering network setup, hostname, repository configuration, firewall, SELinux, swap, timezone, kernel tuning, SSH, user environment and more, with detailed version changelogs and usage instructions.

OperationsSystem Initializationlinux
0 likes · 19 min read
All-in-One Linux Init Scripts for Rocky, AlmaLinux, CentOS, Ubuntu, Debian & More
Raymond Ops
Raymond Ops
Sep 3, 2025 · Operations

Boost Linux Server Performance: Essential Kernel and Sysctl Tweaks

Learn how to optimize Linux server performance by permanently disabling SELinux, setting runlevel 3, increasing file descriptor limits, fine-tuning kernel network parameters via /etc/sysctl.conf, configuring firewall settings, and handling common issues such as too many open files and connection timeouts.

Kernel ParametersOperationslinux
0 likes · 7 min read
Boost Linux Server Performance: Essential Kernel and Sysctl Tweaks
MaGe Linux Operations
MaGe Linux Operations
Sep 3, 2025 · Operations

Master Crontab: From Basics to Advanced Automation for Ops Engineers

This comprehensive guide walks operations engineers through the fundamentals of crontab, its core mechanics, time‑expression syntax, best‑practice configurations, real‑world scenarios, debugging techniques, performance tips, enterprise‑scale management, and when to consider more advanced scheduling alternatives.

AutomationDevOpsOperations
0 likes · 17 min read
Master Crontab: From Basics to Advanced Automation for Ops Engineers
Efficient Ops
Efficient Ops
Sep 2, 2025 · Operations

Essential Linux & Java Debugging Tools Every Engineer Should Know

This guide compiles a comprehensive set of Linux commands and Java diagnostic utilities—including tail, grep, awk, find, tsar, btrace, Greys, jps, jstack, jmap, and more—providing practical examples and code snippets to help engineers quickly troubleshoot and monitor system and JVM issues.

DebuggingOperationsShell
0 likes · 17 min read
Essential Linux & Java Debugging Tools Every Engineer Should Know
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Sep 1, 2025 · Operations

Master Supplier Selection: The 3‑Steady & 3‑Look Framework for Reliable Procurement

This guide explains how to choose reliable suppliers by applying the "Three Steady, Three Look" principles—stabilizing quality, delivery, and cost while assessing supplier qualifications, capabilities, and services—through a five‑step process that reduces risk and boosts operational efficiency.

Operationscost controlprocurement
0 likes · 7 min read
Master Supplier Selection: The 3‑Steady & 3‑Look Framework for Reliable Procurement
Raymond Ops
Raymond Ops
Aug 29, 2025 · Operations

Mastering Nginx Gzip: When, How, and Common Pitfalls

Compressing HTTP responses with Nginx gzip improves user experience by reducing load times and cuts bandwidth costs, while proper configuration—such as setting compression level, buffer sizes, MIME types, and handling static compression—avoids common mistakes that can render gzip ineffective in production environments.

BackendGzipNginx
0 likes · 6 min read
Mastering Nginx Gzip: When, How, and Common Pitfalls
Java Tech Enthusiast
Java Tech Enthusiast
Aug 28, 2025 · Operations

How to Make Your Ops Work Visible and Worth Paying For

This article explains why operations teams must showcase their work to clients, offering practical ways to turn routine reports and meetings into compelling evidence of value, so that stability is recognized as a result of visible effort rather than taken for granted.

IT ManagementOperationsReporting
0 likes · 6 min read
How to Make Your Ops Work Visible and Worth Paying For
Nightwalker Tech
Nightwalker Tech
Aug 28, 2025 · Operations

How to Diagnose and Fix E‑commerce Order Failures with Observability, APM, and Distributed Tracing

This article explains the hierarchical relationship between APM, distributed tracing, and observability, walks through a real Double‑11 e‑commerce incident, and demonstrates how a well‑designed observability stack can pinpoint the root cause, apply emergency fixes, and restore system performance within minutes.

APMDistributed TracingFault Diagnosis
0 likes · 16 min read
How to Diagnose and Fix E‑commerce Order Failures with Observability, APM, and Distributed Tracing
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Aug 25, 2025 · Operations

Why Overstocks Happen and How to Master Lean, Precise Procurement

The article explains how excessive inventory ties up cash flow, identifies root causes such as inflated safety stock, poor supplier analysis, and inadequate material classification, and offers a data‑driven, step‑by‑step framework—including classification, demand forecasting, supplier management, and EOQ—to achieve lean, precise purchasing.

Demand ForecastingEOQOperations
0 likes · 7 min read
Why Overstocks Happen and How to Master Lean, Precise Procurement
Ops Development & AI Practice
Ops Development & AI Practice
Aug 24, 2025 · Operations

How to Unlock Parallel Job Execution in GitLab Runner

This guide explains why parallel task handling matters for CI/CD efficiency, details the core 'concurrent' setting in GitLab Runner's config.toml, shows step‑by‑step configuration across platforms, and demonstrates how to combine it with the .gitlab-ci.yml 'parallel' keyword for fine‑grained job scheduling.

GitLab RunnerOperationsPipeline
0 likes · 7 min read
How to Unlock Parallel Job Execution in GitLab Runner
Tech Freedom Circle
Tech Freedom Circle
Aug 24, 2025 · Operations

How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem

A leading financial platform suffered a six‑hour outage and $170 million loss when its Nacos service‑registry cluster entered a split‑brain state due to network partition, exposing flaws in AP‑mode deployment, monitoring gaps, and cascading failures that were later resolved through Raft migration, multi‑active architecture, and client‑side resilience.

Distributed SystemsMicroservicesNacos
0 likes · 32 min read
How a Misconfigured Nacos Cluster Cost $170 Million: A Deep P0 Incident Postmortem
Ops Community
Ops Community
Aug 23, 2025 · Information Security

Top 10 Linux Security Threats in 2025 Every Ops Engineer Must Know

This 2025 Linux security threat report breaks down the ten most critical risks—ranging from supply‑chain poisoning to AI‑driven APT attacks—offering real‑world case studies and actionable, step‑by‑step mitigation strategies for Linux operations teams.

Container SecurityLinux securityOperations
0 likes · 14 min read
Top 10 Linux Security Threats in 2025 Every Ops Engineer Must Know
Raymond Ops
Raymond Ops
Aug 22, 2025 · Operations

Mastering ELK Stack: From Installation to Advanced Sharding Strategies

This guide introduces the ELK stack fundamentals, explains Elasticsearch, Logstash, and Kibana roles, walks through environment preparation, installation, configuration, head plugin setup, shard and replica concepts, scaling recommendations, and provides scripts for monitoring cluster health, offering a comprehensive hands‑on reference for log analytics operations.

ELKElasticsearchKibana
0 likes · 16 min read
Mastering ELK Stack: From Installation to Advanced Sharding Strategies
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Aug 21, 2025 · Operations

10 Common Procurement Mistakes That Sabotage Your Negotiations (And How to Fix Them)

This article reveals the ten most frequent low‑level errors procurement professionals make during supplier negotiations—such as revealing budgets early, over‑promising volume, ignoring delivery terms, and neglecting data—while offering concrete, example‑driven tactics to avoid each pitfall and secure better price, lead‑time, and service outcomes.

Cost ManagementOperationsnegotiation
0 likes · 14 min read
10 Common Procurement Mistakes That Sabotage Your Negotiations (And How to Fix Them)
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Aug 21, 2025 · Operations

Kubernetes Troubleshooting Handbook: Diagnose Pods, Nodes & Clusters Fast

This handbook provides Kubernetes operators with a comprehensive, step‑by‑step troubleshooting framework covering common Pod issues, Node problems, and cluster‑wide failures, offering practical commands, diagnostic tips, and explanations of error states to quickly identify and resolve stability challenges in K8s environments.

ClusterKubernetesOperations
0 likes · 9 min read
Kubernetes Troubleshooting Handbook: Diagnose Pods, Nodes & Clusters Fast
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Aug 20, 2025 · Operations

How to Build a Real‑Time Procurement Order Tracking System in Just 2 Hours

This guide walks you through the common pain points of fragmented procurement processes and shows, step by step, how to design, implement, and roll out a low‑code order‑tracking system that provides transparent status, automatic exception alerts, inventory‑order synchronization, and real‑time dashboards for all stakeholders.

Operationslow-codeorder tracking
0 likes · 10 min read
How to Build a Real‑Time Procurement Order Tracking System in Just 2 Hours
MaGe Linux Operations
MaGe Linux Operations
Aug 19, 2025 · Big Data

Master Kafka High Availability: Replica Sync & Disaster Recovery Strategies

This article provides a comprehensive guide to building enterprise‑grade, highly available Kafka clusters, covering architecture design, hardware planning, production‑level broker configurations, ISR management, monitoring, fault‑tolerance procedures, rolling upgrades, capacity planning, and automation scripts for seamless operations.

KafkaOperationsdisaster-recovery
0 likes · 16 min read
Master Kafka High Availability: Replica Sync & Disaster Recovery Strategies
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Aug 19, 2025 · Operations

How to Accurately Calculate Procurement Costs and Protect Your Profit Margins

This article reveals why many businesses underestimate procurement expenses, breaks down the five hidden cost components, provides step‑by‑step formulas with real‑world examples, and offers practical strategies to optimize purchasing, logistics, loss control, capital use, and system automation for stable margins.

OperationsProfit marginSupply Chain
0 likes · 10 min read
How to Accurately Calculate Procurement Costs and Protect Your Profit Margins
Cognitive Technology Team
Cognitive Technology Team
Aug 19, 2025 · Operations

How Bilibili Scaled Server Fault Management with Automated Detection and Repair

This article details Bilibili's evolving server fault management architecture, covering fault classification, the shortcomings of manual processes, and the design of an automated detection and repair system that combines in‑band and out‑of‑band data collection, rule‑based alerts, and end‑to‑end repair automation.

Operationsin‑band collectionmonitoring
0 likes · 18 min read
How Bilibili Scaled Server Fault Management with Automated Detection and Repair
Raymond Ops
Raymond Ops
Aug 15, 2025 · Operations

Mastering ELK: Step-by-Step Guide to Deploying a Full-Scale Log Analysis System

This article provides a comprehensive walkthrough of the ELK stack—Elasticsearch, Logstash, and Kibana—detailing its architecture, core concepts, and step-by-step deployment on a multi-node environment, including configuration, service setup, plugin installation, and troubleshooting tips for effective log analysis.

ELKElasticsearchKibana
0 likes · 16 min read
Mastering ELK: Step-by-Step Guide to Deploying a Full-Scale Log Analysis System
Open Source Linux
Open Source Linux
Aug 15, 2025 · Operations

Master Nginx Load Balancing: Algorithms, Reverse Proxy & Config Examples

This guide explains how Nginx functions as a load balancer and reverse proxy, covering its event‑driven architecture, worker processes, and core mechanisms, and details common balancing algorithms such as round‑robin, least connections, IP hash, weighted round‑robin and weighted least connections with full configuration examples and monitoring commands.

NginxOperationsload balancing
0 likes · 10 min read
Master Nginx Load Balancing: Algorithms, Reverse Proxy & Config Examples
Raymond Ops
Raymond Ops
Aug 12, 2025 · Operations

How to Install, Configure, and Operate GitLab CE on CentOS

This guide walks through installing GitLab CE, configuring its URL and email settings, managing services, applying Chinese localization, creating projects via HTTP and SSH, adding remote repositories, and handling users and groups, providing complete hands‑on instructions for DevOps teams.

GitLabOperationsVersion Control
0 likes · 9 min read
How to Install, Configure, and Operate GitLab CE on CentOS
MaGe Linux Operations
MaGe Linux Operations
Aug 12, 2025 · Cloud Native

Master kubectl: 15 Essential Tips to Supercharge Your Kubernetes Workflow

This guide presents fifteen practical kubectl techniques—from resource abbreviations and context switching to advanced JSONPath queries and custom output formats—empowering operators to manage Kubernetes clusters more efficiently, troubleshoot issues faster, and automate routine tasks with confidence.

KubernetesOperationsTips
0 likes · 12 min read
Master kubectl: 15 Essential Tips to Supercharge Your Kubernetes Workflow
DevOps Operations Practice
DevOps Operations Practice
Aug 11, 2025 · Operations

Zen Master’s Secrets to the Ultimate State of Operations

Through a series of dialogues with a Zen master, the article humorously explores the highest level of operations—automation that runs itself, balanced alerting, cloud migration, reliable backups, high‑availability, stability through chaos engineering, and the ultimate goal of making systems operate without human intervention.

AutomationBackupOperations
0 likes · 5 min read
Zen Master’s Secrets to the Ultimate State of Operations
Raymond Ops
Raymond Ops
Aug 11, 2025 · Operations

Mastering Redis Sentinel: Automatic Failover and High Availability Explained

This article provides a comprehensive guide to Redis Sentinel, covering its purpose, architecture, monitoring functions, discovery mechanisms, failover process, leader election, configuration options, and practical commands for achieving reliable high‑availability in Redis deployments.

Operationsfailoverhigh availability
0 likes · 17 min read
Mastering Redis Sentinel: Automatic Failover and High Availability Explained
MaGe Linux Operations
MaGe Linux Operations
Aug 9, 2025 · Operations

Boost Linux Network Bandwidth & Slash Latency with Proven Tuning Techniques

This article explains how operations engineers can dramatically improve Linux network performance by understanding key metrics and applying practical tuning methods—such as adjusting TCP windows, enabling TCP Fast Open, switching to BBR, optimizing kernel parameters, using high‑performance NICs, zero‑copy transfers, load balancing, and monitoring tools—to increase bandwidth and reduce latency for high‑concurrency and real‑time applications.

LatencyOperationsTuning
0 likes · 11 min read
Boost Linux Network Bandwidth & Slash Latency with Proven Tuning Techniques
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 6, 2025 · Operations

How Alibaba Cloud’s Serverless Elasticsearch Powers Data‑Driven Operations

Alibaba Cloud’s Serverless Elasticsearch service, combined with the SREWorks data‑driven operations platform, offers a cloud‑native, real‑time search and analytics engine that integrates metric and log collection, cost management, and health monitoring to enhance scalability, performance, and operational efficiency for enterprise applications.

Cloud NativeDataOpsElasticsearch
0 likes · 11 min read
How Alibaba Cloud’s Serverless Elasticsearch Powers Data‑Driven Operations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes

The article details how Alibaba’s Tesla SRE platform supports the massive offline and real‑time big‑data ecosystems through a layered, data‑driven operations framework—DataOps—integrating unified portals, configuration, job, workflow, and analytics platforms, enabling automated monitoring, intelligent decision‑making, and self‑healing capabilities across 100,000+ nodes.

Big DataDataOpsOperations
0 likes · 20 min read
Inside Alibaba’s Tesla: Data‑Driven Ops for 100k+ Big Data Nodes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 5, 2025 · Operations

How Alibaba’s Open‑Source SREWorks Transforms Cloud‑Native Data Operations

Alibaba's SREWorks platform, now open‑source, combines cloud‑native architecture, DataOps and AIOps to address the growing complexity of big‑data and AI operations, offering a layered SaaS/PaaS/IaaS solution that streamlines delivery, monitoring, management, control, operation, and service for modern enterprises.

Cloud NativeDataOpsOperations
0 likes · 10 min read
How Alibaba’s Open‑Source SREWorks Transforms Cloud‑Native Data Operations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 4, 2025 · Operations

From Scripts to AIOps: How Alibaba’s Ops Evolved and What Skills You Need Today

Tracing Alibaba’s journey from manual, script‑based operations through tool‑centric and platform‑driven DevOps to the data‑focused DataOps era and emerging AIOps, the article outlines the shifting responsibilities, architectural challenges, and the multidisciplinary skill set required for modern operations engineers.

DataOpsOperationsSkill development
0 likes · 8 min read
From Scripts to AIOps: How Alibaba’s Ops Evolved and What Skills You Need Today
MaGe Linux Operations
MaGe Linux Operations
Aug 4, 2025 · Operations

10 Real‑World TCPDump Cases That Uncover Hidden Network Problems

This article walks senior operations engineers through ten authentic production‑level TCPDump case studies, teaching core command options, packet‑analysis heuristics, and a systematic four‑step troubleshooting framework that turns network mysteries into clear, actionable solutions.

Operationslinuxnetwork troubleshooting
0 likes · 18 min read
10 Real‑World TCPDump Cases That Uncover Hidden Network Problems
Java Baker
Java Baker
Aug 4, 2025 · Operations

How to Build Real‑Time and Offline Data Reconciliation for System Consistency

This article explains why cross‑system data inconsistencies occur, defines key reconciliation metrics—completeness, timeliness, and automatic repair—and provides step‑by‑step designs for both real‑time (seconds‑to‑minutes) and offline (hour‑to‑day) reconciliation, including message‑driven triggers, batch processing, and SQL examples for detecting and fixing mismatches.

Data ReconciliationOfflineOperations
0 likes · 8 min read
How to Build Real‑Time and Offline Data Reconciliation for System Consistency
Liangxu Linux
Liangxu Linux
Jul 30, 2025 · Operations

8 Essential Network Packet Capture Tools for Faster Debugging and Security

This guide reviews eight network packet‑capture utilities—from lightweight command‑line tools like Tcpdump and Tshark to visual HTTP debuggers such as Charles, mitmproxy, and Fiddler—detailing their core strengths, typical use cases, command examples, and how to choose the right tool for operations or security scenarios.

OperationsPacket Capturemitmproxy
0 likes · 12 min read
8 Essential Network Packet Capture Tools for Faster Debugging and Security
Tencent Architect
Tencent Architect
Jul 28, 2025 · Operations

How TencentOS NBS Solves Network Latency Mysteries: Real‑Time Trace Without Disruption

Network latency spikes often leave developers guessing whether the culprit lies in user‑space, the kernel stack, or the physical link; this article introduces TencentOS’s NBS (Net Blackboard System), a low‑overhead, zero‑disruption solution that pinpoints delay sources, supports continuous deployment, and outperforms traditional tools like tcpdump and bpftrace.

NBSNetwork LatencyOperations
0 likes · 14 min read
How TencentOS NBS Solves Network Latency Mysteries: Real‑Time Trace Without Disruption
Architecture Breakthrough
Architecture Breakthrough
Jul 28, 2025 · Operations

Turn Point Fixes into Systemic Solutions: A Practical Optimization Framework

Effective technical optimization requires moving from isolated, point‑style ideas to a comprehensive, measurable framework that quantifies goals, assesses gaps, designs capacity, monitors key services and links, and establishes clear compensation and incident‑handling procedures, ensuring a complete, closed‑loop solution.

Operationscapacity planningincident handling
0 likes · 8 min read
Turn Point Fixes into Systemic Solutions: A Practical Optimization Framework
MaGe Linux Operations
MaGe Linux Operations
Jul 25, 2025 · Operations

5 Game‑Changing One‑Liner Shell Commands Every Ops Engineer Must Know

This article shares five battle‑tested one‑line Shell commands that instantly diagnose server health, analyze logs, rank process resources, troubleshoot network connections, and clean disk space, plus practical tips and mindset advice to help operations engineers solve critical incidents faster and more reliably.

One-linerOperationsShell
0 likes · 10 min read
5 Game‑Changing One‑Liner Shell Commands Every Ops Engineer Must Know