Tag

operations

1 views collected around this technical thread.

Linux Ops Smart Journey
Linux Ops Smart Journey
Jun 13, 2025 · Operations

Master ServiceMonitor: Build Reliable Prometheus Monitoring for Kubernetes

This article dives deep into ServiceMonitor, comparing it with traditional Prometheus configurations, detailing its core fields, and providing hands‑on examples for Harbor and GitLab metrics, enabling you to create stable, flexible, and maintainable monitoring setups for Kubernetes services.

Cloud NativeKubernetesPrometheus
0 likes · 5 min read
Master ServiceMonitor: Build Reliable Prometheus Monitoring for Kubernetes
TAL Education Technology
TAL Education Technology
Jun 13, 2025 · Operations

How Large Language Models Are Revolutionizing Fault Localization

This article explores how the rapid rise of large language models and techniques like Retrieval‑Augmented Generation, Chain‑of‑Thought prompting, and multi‑agent architectures can dramatically improve the speed, accuracy, and automation of fault localization in modern operations environments.

CoTRAGagent architecture
0 likes · 14 min read
How Large Language Models Are Revolutionizing Fault Localization
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Jun 12, 2025 · Operations

6 Proven Strategies to Master Procurement Cost Control

This article breaks down procurement cost control into six practical methods—centralized buying, design‑stage BOM optimization, advance planning, supplier tiering, cross‑functional quality collaboration, and digital workflow automation—showing how system thinking and total cost of ownership analysis can dramatically reduce expenses.

cost controldigital procurementoperations
0 likes · 9 min read
6 Proven Strategies to Master Procurement Cost Control
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 12, 2025 · Operations

How iQIYI’s “Qijing” Platform Revolutionizes Testing Across Devices and Teams

This article explores iQIYI’s comprehensive testing ecosystem, detailing industry trends, the platform’s multi‑terminal challenges, fragmented legacy solutions, and the unified, cloud‑native “Qijing” environment that streamlines test access, zero‑trust security, and real‑world validation for rapid product delivery.

Cloud NativeiQIYIoperations
0 likes · 20 min read
How iQIYI’s “Qijing” Platform Revolutionizes Testing Across Devices and Teams
DevOps Operations Practice
DevOps Operations Practice
Jun 11, 2025 · Operations

Ops vs DevOps vs SRE: Which Role Matches Your Career Goals?

This article compares traditional Operations (Ops), DevOps, and Site Reliability Engineering (SRE) by outlining their definitions, core responsibilities, typical technology stacks, and career considerations, helping readers understand the distinct philosophies and choose the path that best fits their interests and market demand.

DevOpsSRETechnology Stack
0 likes · 6 min read
Ops vs DevOps vs SRE: Which Role Matches Your Career Goals?
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Jun 11, 2025 · Operations

Master the 7‑Step Procurement Process to Cut Costs and Mitigate Risks

This guide breaks down the complete seven‑step procurement workflow—from demand initiation to archiving—highlighting common pitfalls, risk‑control measures, and cost‑saving tactics so leaders can finally understand where money is spent, how risks arise, and what concrete actions ensure a streamlined, accountable purchasing operation.

cost controloperationsprocess optimization
0 likes · 8 min read
Master the 7‑Step Procurement Process to Cut Costs and Mitigate Risks
Efficient Ops
Efficient Ops
Jun 10, 2025 · Operations

What Caused the June 6, 2025 Alibaba Cloud DNS Outage and How to Mitigate It?

On June 6, 2025 Alibaba Cloud experienced a widespread DNS resolution failure affecting OSS, CDN, container image services and more, which was later linked to a Shadowserver sinkhole, and the article outlines the incident timeline, root‑cause analysis, and practical mitigation steps for operators.

Alibaba CloudDNS outageShadowserver
0 likes · 4 min read
What Caused the June 6, 2025 Alibaba Cloud DNS Outage and How to Mitigate It?
Architecture Digest
Architecture Digest
Jun 10, 2025 · Operations

How Much Bandwidth Does Douyin (TikTok) Really Have? Inside Its Massive Data Centers

This article explains how Douyin, TikTok, Baidu, Alibaba Cloud and Tencent operate self‑built data centers with terabit‑level outbound bandwidth, details ByteDance's server count growth from tens of thousands to hundreds of thousands, and describes the CDN technologies that enable billions of users to stream smoothly.

BandwidthCDNData Center
0 likes · 8 min read
How Much Bandwidth Does Douyin (TikTok) Really Have? Inside Its Massive Data Centers
Efficient Ops
Efficient Ops
Jun 9, 2025 · Operations

How OnCall Platforms Transform Incident Management and Reduce Manual Overhead

This article explains the purpose and key features of OnCall platforms, compares popular solutions like PagerDuty, Opsgenie, Grafana OnCall and Alibaba Cloud ARMS, clarifies webhooks with a simple analogy, and summarizes how centralized on‑call management boosts operational efficiency while minimizing manual intervention.

Oncallincident responsemonitoring
0 likes · 5 min read
How OnCall Platforms Transform Incident Management and Reduce Manual Overhead
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Jun 9, 2025 · Operations

How to End Equipment‑Failure Blame‑Games: A Practical Operations Blueprint

This article explains why equipment failures often lead to blame‑shifting between production and maintenance teams, defines clear responsibilities, and outlines a four‑step system—including daily inspections, maintenance scheduling, fault handling, and performance metrics—to achieve coordinated, data‑driven equipment management.

equipment managementmaintenanceoperations
0 likes · 10 min read
How to End Equipment‑Failure Blame‑Games: A Practical Operations Blueprint
DevOps Operations Practice
DevOps Operations Practice
Jun 7, 2025 · Operations

How Ops Professionals Can Reach a 300k Annual Salary: Real‑World Tips

This article compiles practical advice from experienced operations engineers on the challenges and strategies for achieving a 300,000 CNY yearly salary, covering skill development, career moves, company size, automation, and the evolving role of SRE/DevOps.

DevOpsSREcareer
0 likes · 6 min read
How Ops Professionals Can Reach a 300k Annual Salary: Real‑World Tips
Efficient Ops
Efficient Ops
Jun 4, 2025 · Operations

Streamline Nginx Management with Nginx UI: Features, Installation & AI Agent Integration

This article introduces Nginx UI, a graphical tool that simplifies Nginx configuration and monitoring, outlines its core features—including AI Agent support—provides pre‑installation notes, and offers step‑by‑step installation guides for Systemd, Docker, and quick‑install scripts, concluding with its operational benefits.

DockerNginxUI
0 likes · 5 min read
Streamline Nginx Management with Nginx UI: Features, Installation & AI Agent Integration
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Jun 4, 2025 · Operations

Why Warehouses Overflow Yet Stockouts Occur? Root Causes & Solutions

The article explains why warehouses can be overfilled while customers still face stockouts, analyzing false and structural overstock, flawed demand planning, weak supply chain execution, and offers practical steps such as data‑driven forecasting, ABC inventory classification, transparent collaboration, fast‑response mechanisms, and accountability to resolve the paradox.

Warehouse Optimizationdemand planninginventory management
0 likes · 11 min read
Why Warehouses Overflow Yet Stockouts Occur? Root Causes & Solutions
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Jun 3, 2025 · Operations

How to Turn Procurement into a Profit‑Driving Powerhouse

This article reveals why procurement is often underestimated, outlines the three essential capabilities—planning, collaboration, and cost control—and provides a step‑by‑step framework for creating effective purchase plans, aligning with production, sales, and R&D, and mastering total‑cost management to boost company profitability.

Cost Managementoperationsplanning
0 likes · 8 min read
How to Turn Procurement into a Profit‑Driving Powerhouse
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Jun 1, 2025 · Operations

Understanding SLA, SLO, and SLI: Key Metrics for High‑Availability Systems

This article explains the differences between SLA, SLO, and SLI, shows how to express user expectations as concrete service level agreements, and introduces essential high‑availability metrics such as availability percentages, MTBF, MTTR, RPO, RTO, WRT, and MTD for reliable system design.

High AvailabilitySLASLI
0 likes · 9 min read
Understanding SLA, SLO, and SLI: Key Metrics for High‑Availability Systems
Dual-Track Product Journal
Dual-Track Product Journal
May 30, 2025 · Operations

How to Design Offline Inventory Counting: Avoid Data Loss and Conflict

This article explains how to build a robust offline inventory counting system that prevents data loss, resolves synchronization conflicts, and ensures seamless operation even when network connectivity is interrupted, offering practical design patterns and pitfall‑avoidance tips for warehouse teams.

Conflict DetectionInventorySynchronization
0 likes · 6 min read
How to Design Offline Inventory Counting: Avoid Data Loss and Conflict
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
May 29, 2025 · Operations

Master Supplier Performance Evaluation: A Complete SRM Guide

This comprehensive guide explains what supplier performance evaluation is, why it matters, and provides a step‑by‑step "3+1" framework—including metric definition, scoring methods, result grading, and system integration—to help organizations build a data‑driven, actionable SRM process that improves supply chain reliability and reduces costs.

SRMoperationsperformance evaluation
0 likes · 8 min read
Master Supplier Performance Evaluation: A Complete SRM Guide