Tagged articles

Operations

3329 articles · Page 7 of 34

Dec 16, 2024 · Operations

How Qunar Built a 5‑Million‑Metric Radar System to Cut Ticket Failures by 87%

This article details the design, implementation, and results of Qunar's intelligent ticket‑monitoring Radar system, covering the business need, architecture, anomaly‑detection algorithms, test‑set construction, parameter tuning, and the achieved 87% detection accuracy with future plans for large‑model integration.

Anomaly DetectionMachine LearningOperations

0 likes · 17 min read

How Qunar Built a 5‑Million‑Metric Radar System to Cut Ticket Failures by 87%

Chen Tian Universe

Dec 13, 2024 · Fundamentals

Why Mastering Accounting Architecture Is the Key to Seamless Payment Systems

This comprehensive guide explains how robust accounting design—covering principles, account subsystems, hot‑account handling, merging strategies, reverse‑deduction models, sub‑account structures, day‑cut mechanisms, marketing‑related accounting, and settlement processes—forms the backbone of modern payment and clearing systems, helping product and operations teams build reliable financial infrastructure.

Operationsaccountingfinancial architecture

0 likes · 91 min read

Why Mastering Accounting Architecture Is the Key to Seamless Payment Systems

JD Cloud Developers

Dec 10, 2024 · Operations

How We Boosted Inventory Platform Stability 24× with Smart Traffic Splitting and Redis Caching

This article examines the stability challenges of an e‑commerce inventory platform—including workflow complexity, database hotspots, and high‑frequency calculations—and details comprehensive solutions such as traffic splitting, gray releases, Redis caching, data consistency mechanisms, rate limiting, and monitoring enhancements that together improved throughput by 24× and reduced latency dramatically.

InventoryOperationsRedis

0 likes · 14 min read

How We Boosted Inventory Platform Stability 24× with Smart Traffic Splitting and Redis Caching

Efficient Ops

Dec 8, 2024 · Operations

Diagnosing High Load with Low CPU on Linux: Commands and Tips

This guide explains how to analyze and troubleshoot situations where a Linux system shows high load averages despite low CPU usage, covering common load analysis methods, key commands like top, vmstat, iostat, and practical solutions for I/O bottlenecks and stuck processes.

CPULinuxLoad

0 likes · 11 min read

Diagnosing High Load with Low CPU on Linux: Commands and Tips

Test Development Learning Exchange

Dec 6, 2024 · Operations

Common Docker Commands Reference

This article provides a comprehensive reference of essential Docker commands, covering basic container operations, image management, volume handling, network configuration, and data management, with brief Chinese descriptions and example usages for each command.

CLIDockerOperations

0 likes · 6 min read

Chen Tian Universe

Dec 5, 2024 · Operations

Mastering the Four-Stage Reconciliation Model for Large Payment Institutions

This article explains how major payment institutions ensure the accuracy of tens of millions of daily transactions and billions of dollars by using a four‑segment data model, three verification groups, error classification, and extensible data coding to achieve reliable settlement and accounting.

OperationsReconciliationaccounting

0 likes · 6 min read

Mastering the Four-Stage Reconciliation Model for Large Payment Institutions

Efficient Ops

Dec 4, 2024 · Operations

Mastering Gray Releases: Safe Deployment, Validation, and Rollback Strategies

This guide explains how to design and execute gray releases with patience, detailed planning, monitoring, and effective rollback techniques to minimize risk and ensure system stability during high‑risk deployment phases.

DeploymentOperationsRollback

0 likes · 13 min read

Mastering Gray Releases: Safe Deployment, Validation, and Rollback Strategies

Efficient Ops

Dec 2, 2024 · Operations

How AI‑Driven Parameter Governance Transforms DevOps Efficiency

This article explains how AI‑powered parameter governance, integrated with DevOps and AIOps practices, tackles the explosion of configuration parameters in large‑scale financial systems, streamlines design, auditing, detection, and deployment, and ultimately boosts operational efficiency and risk control.

AIOpsArtificial IntelligenceAutomation

0 likes · 8 min read

How AI‑Driven Parameter Governance Transforms DevOps Efficiency

Efficient Ops

Dec 1, 2024 · Operations

How to Evaluate and Mature Your Enterprise DevOps Platform in 2024

This article outlines the current state of enterprise DevOps in China, explains regulatory emphasis on integrated R&D‑operations platforms, describes a five‑level maturity model, and provides detailed guidelines for assessing and improving organizational DevOps platforms using a structured tool‑module framework.

Maturity ModelOperationsPlatform Assessment

0 likes · 8 min read

How to Evaluate and Mature Your Enterprise DevOps Platform in 2024

Efficient Ops

Dec 1, 2024 · Operations

How I Rescued a Production MySQL Database After a Fatal rm -rf Accident

After a junior engineer mistakenly ran an unguarded rm -rf command that wiped an entire production server—including MySQL and Tomcat—I documented the step‑by‑step recovery using ext3grep, extundelete, and MySQL binlog, highlighting the lessons learned for future operations.

Data RecoveryLinuxMySQL

0 likes · 9 min read

How I Rescued a Production MySQL Database After a Fatal rm -rf Accident

macrozheng

Nov 29, 2024 · Operations

Visual Server Monitoring Made Easy with Sampler: Install & Configure

This article introduces the Sampler visual monitoring tool, explains how to install it on Linux, and provides step‑by‑step YAML configuration examples for tracking CPU, memory, Docker containers, network activity, and system time, enabling quick, intuitive server status checks.

LinuxOperationsServer monitoring

0 likes · 8 min read

Visual Server Monitoring Made Easy with Sampler: Install & Configure

Top Architect

Nov 27, 2024 · Backend Development

Payment Business Architecture: Process Decomposition, Sequence Design, and Structural Modeling

The article outlines the payment business background, breaks down the payment process into modular components, presents sequence diagrams and structural designs, summarizes key technical considerations, and additionally promotes a ChatGPT community and related services.

OperationsTransaction Managementbackend-architecture

0 likes · 11 min read

Payment Business Architecture: Process Decomposition, Sequence Design, and Structural Modeling

DevOps Cloud Academy

Nov 22, 2024 · Operations

12 Essential Bash Scripts for DevOps Automation

This article presents twelve practical Bash scripts that automate common DevOps tasks such as system updates, disk monitoring, backups, log rotation, SSH key setup, MySQL dumping, Docker cleanup, Kubernetes pod checks, SSL certificate monitoring, Git pulling, user management, and service health verification.

LinuxOperationsbash

0 likes · 11 min read

12 Essential Bash Scripts for DevOps Automation

FunTester

Nov 22, 2024 · Operations

Why Java Is the Ultimate Backbone for Performance Testing

The author recounts a four‑year journey from UI automation to Java‑based performance testing, illustrating how mastering Java’s concurrency utilities and Groovy scripting can replace traditional tools like JMeter, enabling flexible, high‑throughput test scenarios and deeper control over test case design.

GroovyJMeterJava

0 likes · 8 min read

Why Java Is the Ultimate Backbone for Performance Testing

Linux Ops Smart Journey

Nov 21, 2024 · Operations

How to Build a Real-Time Redis Monitoring Dashboard with Grafana and Prometheus

Learn step‑by‑step how to deploy redis‑exporter, configure Prometheus to scrape Redis metrics, and create a comprehensive Grafana dashboard, enabling you to instantly visualize Redis performance, detect issues early, and maintain high availability in fast‑paced internet environments.

GrafanaOperationsPrometheus

0 likes · 5 min read

How to Build a Real-Time Redis Monitoring Dashboard with Grafana and Prometheus

Ops Development Stories

Nov 19, 2024 · Operations

How to Install and Explore Nightingale v7.7: New Features, Upgrade Guide, and Hands‑On Demo

This article introduces Nightingale monitoring's final v7.7 release, outlines its new features and major v7 changes, provides step‑by‑step upgrade instructions, and walks through a Docker‑based installation, data‑source integration, dashboard import, and alert‑rule configuration with DingTalk notifications.

Alert RulesDockerNightingale

0 likes · 10 min read

How to Install and Explore Nightingale v7.7: New Features, Upgrade Guide, and Hands‑On Demo

Huolala Tech

Nov 14, 2024 · Operations

How Huolala Scaled Kafka: From Integrated Design to Cloud‑Native Elastic Architecture

This article chronicles the evolution of Huolala’s Kafka infrastructure—from an integrated compute‑storage design to a separated compute‑storage model with multi‑tenant deployment, and finally to a cloud‑native elastic architecture—detailing the challenges of capacity awareness, alarm configuration, and cost‑effective performance optimization.

KafkaMulti‑tenantOperations

0 likes · 9 min read

How Huolala Scaled Kafka: From Integrated Design to Cloud‑Native Elastic Architecture

Cognitive Technology Team

Nov 14, 2024 · Operations

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

To ensure distributed applications can recover automatically from hardware, network, or service failures, this guide outlines three core capabilities—fault detection, graceful handling, and monitoring—plus practical strategies such as asynchronous component separation, retries, circuit breakers, isolation, load shedding, failover, compensation, checkpointing, graceful degradation, rate limiting, leader election, fault injection, chaos engineering, and use of availability zones.

Operationscloud-nativedistributed systems

0 likes · 7 min read

Designing Self‑Healing Applications for Fault Tolerance in Distributed Systems

Efficient Ops

Nov 13, 2024 · Operations

How China’s Auto Giants Are Driving Global DevOps Standardization

The article outlines China’s 2024‑2027 IT standards action plan, CAICT’s synchronized DevOps assessments, and detailed case studies of FAW‑Volkswagen and Changan achieving international and domestic DevOps certifications, highlighting measurable improvements in automation, delivery speed, and platform capabilities across the automotive sector.

AutomotiveOperationsStandardization

0 likes · 12 min read

How China’s Auto Giants Are Driving Global DevOps Standardization

Liangxu Linux

Nov 12, 2024 · Operations

How to Access Firewalled Servers Using Reverse SSH Tunnels

Reverse SSH lets you reach machines behind restrictive firewalls by creating a tunnel from the remote server back to your local host, using the ssh -R option, and includes step‑by‑step commands, configuration tips, and a persistent machine setup for reliable access.

OperationsRemote accessSSH tunneling

0 likes · 6 min read

How to Access Firewalled Servers Using Reverse SSH Tunnels

Efficient Ops

Nov 11, 2024 · Operations

How China’s Leading Banks Are Driving Global DevOps Standardization

The article details China’s 2024‑2027 Information Standard Construction Action Plan, the launch of synchronized ITU DevOps and domestic DevOps assessments, and showcases dozens of banking projects—from agile development to continuous delivery, security, and BizDevOps—that have achieved certification, illustrating the nation’s push for international standardization and operational excellence in the financial sector.

OperationsStandardizationbanking

0 likes · 29 min read

DevOps

Nov 10, 2024 · Product Management

Product Operations vs. Product Management: Differences, Roles, and Collaboration

This article explains the distinct responsibilities and mindsets of product operations and product management, outlines their daily tasks, career paths, workflow differences, and how the two functions can cooperate to maximize product value and business outcomes.

Career PathOperationsProduct Development

0 likes · 17 min read

Product Operations vs. Product Management: Differences, Roles, and Collaboration

Liangxu Linux

Nov 10, 2024 · Operations

50 Essential Ops Troubleshooting & Fix Techniques Every Sysadmin Should Know

This guide compiles fifty practical troubleshooting and remediation techniques covering system, network, application, database, and security layers, enabling operations engineers to quickly diagnose common failures such as high load, service crashes, permission errors, and performance bottlenecks, and apply concrete fixes to maintain stable, secure services.

NetworkOperationssecurity

0 likes · 16 min read

50 Essential Ops Troubleshooting & Fix Techniques Every Sysadmin Should Know

Architects' Tech Alliance

Nov 9, 2024 · Industry Insights

EOR vs TOR Data Center Networks: Choosing the Right Architecture and Switches

This article compares EOR and TOR data‑center network architectures, explains the characteristics of their respective switches, outlines the advantages and disadvantages of each design, and provides practical guidance for selecting TOR switches and adapting to emerging spine‑leaf topologies.

Data CenterEORNetwork Architecture

0 likes · 8 min read

EOR vs TOR Data Center Networks: Choosing the Right Architecture and Switches

Ops Development Stories

Nov 8, 2024 · Operations

Building a Simple Cloud‑Native Alert Platform: Features, Architecture & Roadmap

This article describes the design and implementation of a lightweight cloud‑native alert platform, outlining its core features, future enhancements, system architecture, and demo screenshots, offering practical insights for SREs and operations teams handling growing monitoring workloads.

Alert ManagementOperationsincident response

0 likes · 6 min read

Architect

Nov 7, 2024 · Operations

Full-Link Multi-Version Deployment: Architecture, Techniques, and Future Outlook

This article explains the concept of full-link multi-version deployment in microservice architectures, describes the challenges of traditional test environments, and details the technical solutions—including traffic coloring, isolation, label propagation, environment management, and monitoring—implemented through a flexible CI/CD pipeline.

CI/CDMulti-Version DeploymentOperations

0 likes · 16 min read

Full-Link Multi-Version Deployment: Architecture, Techniques, and Future Outlook

FunTester

Nov 7, 2024 · Operations

Mastering Software Risk Management: Proven Strategies to Prevent Project Failures

Effective software risk management—by identifying technical and business risks, integrating quality assurance, using structured processes, and leveraging risk‑management tools—helps avoid financial loss, project delays, and reputational damage while ensuring project success and operational stability.

OperationsProject Managementquality assurance

0 likes · 11 min read

Mastering Software Risk Management: Proven Strategies to Prevent Project Failures

Model Perspective

Nov 6, 2024 · Operations

Unlock Hidden Losses: How the Funnel Model Optimizes Your Process

The Funnel Model breaks down any process into sequential stages, measures entry and exit numbers at each step, calculates stage and overall conversion rates, and reveals where the greatest losses occur, enabling data‑driven optimization for e‑commerce, management, and other applications.

Operationsconversion ratefunnel model

0 likes · 5 min read

Unlock Hidden Losses: How the Funnel Model Optimizes Your Process

Linux Cloud Computing Practice

Nov 5, 2024 · Operations

10 Essential Linux Ops Tools Every Engineer Should Master

This article introduces ten indispensable Linux operations tools—Shell scripting, Git, Ansible, Prometheus, Grafana, Docker, Kubernetes, Nginx, ELK Stack, and Zabbix—detailing their functions, typical use cases, advantages, and practical examples to help engineers automate and monitor infrastructure efficiently.

Operationsconfiguration managementdevops

0 likes · 9 min read

10 Essential Linux Ops Tools Every Engineer Should Master

MaGe Linux Operations

Nov 4, 2024 · Cloud Native

Essential kubectl Commands for Viewing, Managing, and Debugging Kubernetes

This guide walks you through essential kubectl commands for checking cluster status, inspecting resources, retrieving detailed object information, monitoring logs, managing configurations, labeling, and performing create, update, and delete operations, empowering you to efficiently view, troubleshoot, and control Kubernetes workloads.

KubernetesOperationscloud-native

0 likes · 13 min read

Essential kubectl Commands for Viewing, Managing, and Debugging Kubernetes

MaGe Linux Operations

Nov 3, 2024 · Operations

Diagnosing Deployment Failures: Using Linux Disk Commands to Find Full Partitions

This guide walks through a real‑world deployment issue where one node failed to start due to a full disk, showing how to verify overall and per‑directory disk usage with df and du commands, interpret their output, and locate large files or directories.

LinuxOperationsdf

0 likes · 3 min read

Diagnosing Deployment Failures: Using Linux Disk Commands to Find Full Partitions

Test Development Learning Exchange

Oct 31, 2024 · Operations

Using top and htop for Real‑Time System Resource Monitoring and Performance Analysis

This guide explains how to use the Linux utilities top and htop to monitor CPU, memory, disk I/O and network usage in real time, record performance data, analyze bottlenecks, and apply advanced techniques such as per‑process tracking, logging, chart generation and optimization recommendations.

HtopLinuxOperations

0 likes · 9 min read

Using top and htop for Real‑Time System Resource Monitoring and Performance Analysis

Aikesheng Open Source Community

Oct 29, 2024 · Operations

Resolving OAT Precheck ulimit Errors by Enabling PAM in SSH Configuration

This article explains why OAT's precheck fails due to mismatched ulimit values when SSH does not load PAM limits, and provides a step‑by‑step solution to enable PAM in sshd_config so the expected limits are applied correctly.

LinuxOATOperations

0 likes · 10 min read

Resolving OAT Precheck ulimit Errors by Enabling PAM in SSH Configuration

DevOps Engineer

Oct 29, 2024 · Operations

A Day in the Life of a DevOps Engineer

The article walks through a DevOps engineer’s typical workday, from morning Slack checks and task planning, through code repository maintenance, build and release duties, coffee breaks, lunch with teammates, focused afternoon development, and evening family time, highlighting both technical and personal aspects.

AutomationCI/CDOperations

0 likes · 4 min read

Efficient Ops

Oct 28, 2024 · Operations

Master Linux Command Line: Essential Tips and Tricks for System Operations

The article covers Linux commands, shortcuts, file and directory management, permissions, users, searching, software repositories, manual pages, advanced topics like redirection, pipelines, processes, daemons, compression, compilation, networking, backup, and system control, providing practical examples and code snippets.

LinuxOperationssystem-administration

0 likes · 50 min read

Master Linux Command Line: Essential Tips and Tricks for System Operations

360 Zhihui Cloud Developer

Oct 28, 2024 · Operations

How Zero‑Intrusion eBPF Transforms TCP Network Monitoring and Troubleshooting

This article explains how zero‑intrusion eBPF technology enables detailed, non‑disruptive TCP network monitoring, covering data collection interfaces, aggregation methods, implementation steps, usage limitations, and practical installation and visualization guidance for improving network performance and fault analysis.

Linux kernelNetwork MonitoringObservability

0 likes · 9 min read

How Zero‑Intrusion eBPF Transforms TCP Network Monitoring and Troubleshooting

Efficient Ops

Oct 27, 2024 · Operations

How China’s Aviation Leaders Earn International DevOps Certification and Boost Efficiency

The article outlines China’s 2024‑2027 Information Standard Action Plan, CAICT’s synchronized DevOps assessments, and how major aviation firms like Southern Airlines and China Aviation Information Network achieved international and domestic DevOps, AIOps, and BizDevOps certifications, delivering measurable improvements in build success rates, deployment speed, and operational automation.

BizDevOpsCloud ComputingInternational Standards

0 likes · 11 min read

How China’s Aviation Leaders Earn International DevOps Certification and Boost Efficiency

Efficient Ops

Oct 27, 2024 · Operations

How Ningbo Bank Achieved Industry‑Leading BizDevOps Maturity: Key Insights and Lessons

Ningbo Bank’s "Fengjinhao" platform passed both the international IG1374 BizDevOps standard and China’s domestic BizDevOps benchmark, showcasing how integrated business‑development‑operations practices accelerate digital transformation, improve performance metrics, and set a leading example for the banking sector.

BizDevOpsDigitalTransformationOperations

0 likes · 14 min read

How Ningbo Bank Achieved Industry‑Leading BizDevOps Maturity: Key Insights and Lessons

Efficient Ops

Oct 27, 2024 · Operations

How China Aviation’s DevOps Assessment Boosted Delivery Efficiency and Set a New Industry Benchmark

The article details China Aviation Information Network's successful dual certification in ITU DevOps international and domestic standards, highlighting the evaluation process, measurable improvements in pipeline alerts and deployment success, and expert insights on the future of DevOps in the aviation sector.

Aviation ITChinaOperations

0 likes · 12 min read

How China Aviation’s DevOps Assessment Boosted Delivery Efficiency and Set a New Industry Benchmark

MaGe Linux Operations

Oct 26, 2024 · Operations

Mastering Tomcat and Apache Log Formats: Patterns, Parameters, and Browser Differences

This guide explains Tomcat and Apache httpd log configurations, showing default and recommended patterns, detailed parameter meanings, sample log entries, and how different browsers appear in the logs, providing a comprehensive reference for developers and operators managing server logging.

HTTP LogsOperationsServer Configuration

0 likes · 8 min read

Mastering Tomcat and Apache Log Formats: Patterns, Parameters, and Browser Differences

Efficient Ops

Oct 24, 2024 · Operations

How Migu’s AI‑Powered Observability Boosts Cloud Gaming Operations

During the 24th GOPS Global Operations Conference, Migu Interactive Entertainment’s Vice President Su Yi discussed how their AI‑driven AIOps observability framework, validated by ITU standards, enhances cloud gaming platform stability, accelerates issue detection, and supports China Mobile’s 5G‑based digital transformation.

AIAIOpsObservability

0 likes · 19 min read

How Migu’s AI‑Powered Observability Boosts Cloud Gaming Operations

macrozheng

Oct 24, 2024 · Backend Development

Simplify Nginx Management: A Hands‑On Guide to Using Nginx UI with Docker

This tutorial introduces Nginx UI, a visual management tool for Nginx, explains how to install it via Docker, and demonstrates its core features—including dashboard monitoring, static and dynamic proxy configuration, and SSL management—through a step‑by‑step deployment of a SpringBoot‑Vue e‑commerce project.

NginxOperationsproxy

0 likes · 9 min read

Simplify Nginx Management: A Hands‑On Guide to Using Nginx UI with Docker

Efficient Ops

Oct 23, 2024 · Operations

How Zhongdian Hongxin Secured Dual DevOps Certification and Boosted Delivery Efficiency

The article details China Information and Communication Research Institute's dual ITU DevOps international and domestic standard assessment, Zhongdian Hongxin's successful certification, the practical benefits of end‑to‑end toolchains, and insights from company leaders on future DevOps development.

Continuous IntegrationOperationsStandard Assessment

0 likes · 12 min read

How Zhongdian Hongxin Secured Dual DevOps Certification and Boosted Delivery Efficiency

Software Development Quality

Oct 23, 2024 · R&D Management

Essential R&D Performance Metrics: Measure Business Value, Delivery Speed, Quality and Operations

This article presents a comprehensive set of R&D performance indicators—including business value, delivery speed, engineering quality, and operational reliability—detailing each metric's definition, calculation method, and practical notes to help teams monitor and improve their development efficiency.

AgileOperationsR&D metrics

0 likes · 9 min read

Essential R&D Performance Metrics: Measure Business Value, Delivery Speed, Quality and Operations

Efficient Ops

Oct 22, 2024 · Operations

How New BizDevOps Standards Are Shaping China’s Digital Transformation

This article reviews the latest progress of DevOps standards in China, introduces the newly released BizDevOps framework, details the content of the standard system, highlights emerging XOps hotspots, and explains how these initiatives support enterprise digital transformation and operational efficiency.

BizDevOpsOperationsPlatform Engineering

0 likes · 18 min read

How New BizDevOps Standards Are Shaping China’s Digital Transformation

Linux Ops Smart Journey

Oct 22, 2024 · Operations

Why Ansible Is the Key to Automating Hundreds of Servers Efficiently

This article introduces Ansible, explaining its core principles, main components, common use cases, and step‑by‑step installation and verification procedures, helping readers understand how to automate large‑scale server configurations and improve operational efficiency.

Operationsinfrastructure as code

0 likes · 5 min read

Why Ansible Is the Key to Automating Hundreds of Servers Efficiently

DataFunSummit

Oct 22, 2024 · Big Data

From Self‑Built BI to Volcano Engine: Challenges, Selection, Operations, and Future Outlook

The article recounts Firefly Thinking's early BI system limitations, the decision‑making process that led to adopting Volcano Engine, subsequent operational strategies to unlock tool potential, and a forward‑looking vision of data analysis in the large‑model era.

AIAnalyticsBI

0 likes · 18 min read

From Self‑Built BI to Volcano Engine: Challenges, Selection, Operations, and Future Outlook

Linux Cloud Computing Practice

Oct 22, 2024 · Operations

Simplify Multi‑Server Linux Management with a Ready‑Made Batch Script

This article introduces a ready‑to‑use Linux batch‑operation script that enables non‑expert administrators to update, configure, and manage multiple Ubuntu 22.04 servers simultaneously—covering functions such as updating the script, creating SSL certificates, generating SSH keys, bulk password changes, and deploying or removing ALEO services—while also offering a free, comprehensive Linux command and shell‑script tutorial.

Batch OperationsOperationsServer Management

0 likes · 5 min read

Simplify Multi‑Server Linux Management with a Ready‑Made Batch Script

Efficient Ops

Oct 21, 2024 · Operations

Essential Prometheus Best Practices: Avoid Common Pitfalls and Boost Reliability

This article shares practical Prometheus best‑practice tips—from understanding its accuracy‑reliability trade‑offs and self‑monitoring, to avoiding NFS storage, managing high‑cardinality metrics, handling rate() and recording‑rule pitfalls, and fine‑tuning alerting—so you can run a stable, low‑cost monitoring stack.

AlertingObservabilityOperations

0 likes · 10 min read

Essential Prometheus Best Practices: Avoid Common Pitfalls and Boost Reliability

JD Cloud Developers

Oct 21, 2024 · Operations

How Test Teams Can Build Observability Beyond Traditional Monitoring

This article examines how quality assurance engineers can adopt observability principles—distinct from conventional monitoring—to enhance system health detection, root‑cause analysis, and proactive risk mitigation across resources, services, business functions, data, and logs.

ObservabilityOperationsmonitoring

0 likes · 17 min read

How Test Teams Can Build Observability Beyond Traditional Monitoring

Efficient Ops

Oct 20, 2024 · Operations

Key Takeaways from the 24th GOPS Global Operations Conference – Shanghai

The article recaps the two‑day 24th GOPS Global Operations Conference in Shanghai, highlighting opening remarks, major speaker sessions on DevOps, BizDevOps, AIOps, large‑model applications, industry case studies, and provides links to presentation materials.

AIOpsBig DataCloud Computing

0 likes · 10 min read

Key Takeaways from the 24th GOPS Global Operations Conference – Shanghai

Efficient Ops

Oct 19, 2024 · Operations

China Southern Airlines Wins Dual DevOps Certifications: BizDevOps & Continuous Delivery Excellence

China Southern Airlines achieved leading BizDevOps and continuous delivery capabilities by passing both international ITU DevOps and domestic standards across three key projects, highlighting the strategic impact of DevOps standardization on business value, technology integration, and digital transformation within the airline industry.

BizDevOpsCase StudyOperations

0 likes · 17 min read

China Southern Airlines Wins Dual DevOps Certifications: BizDevOps & Continuous Delivery Excellence

Efficient Ops

Oct 18, 2024 · Operations

How Changan Auto Achieved Dual ITU DevOps Certification and Boosted Efficiency

This article details China’s 2024‑2027 ITU‑DevOps standard action plan, CAICT’s dual international and domestic DevOps assessments, Changan Auto’s successful Gaia platform certification, and insights from senior executives on implementation challenges, benefits, and future DevOps trends.

OperationsStandard Assessmentautomotive industry

0 likes · 21 min read

How Changan Auto Achieved Dual ITU DevOps Certification and Boosted Efficiency

Efficient Ops

Oct 18, 2024 · Operations

How FAW‑Volkswagen Achieved Top‑Tier DevOps Standards in China and Internationally

FAW‑Volkswagen’s R&D Efficiency Platform and Integrated Operations Platform passed both ITU DevOps international and domestic standards, showcasing a comprehensive case study of digital transformation, platform engineering, and standards‑driven DevOps adoption in the automotive industry.

OperationsPlatform EngineeringStandard Assessment

0 likes · 14 min read

How FAW‑Volkswagen Achieved Top‑Tier DevOps Standards in China and Internationally

DevOps Engineer

Oct 18, 2024 · Operations

Comprehensive DevOps Interview Questions from a Swedish Company

This article presents a comprehensive list of 17 in‑depth DevOps interview questions asked by a Swedish company, covering Linux boot processes, Kubernetes internals, Git workflows, Jenkins pipelines, networking, monitoring, databases, Docker, and soft‑skill topics to help candidates prepare effectively.

CI/CDKubernetesLinux

0 likes · 3 min read

Comprehensive DevOps Interview Questions from a Swedish Company

JD Tech Talk

Oct 17, 2024 · Operations

Comprehensive Guide to Change Management: Compatibility Design, Release Planning, Gray Deployment, Data Migration, Rollback, and Configuration Control

This article presents a detailed overview of change management practices, covering compatibility design across hardware, base software, and applications, release strategies, gray‑deployment techniques, data migration analysis, rollback planning, configuration change control, and verification procedures to ensure system stability and reliability.

Change ManagementGray DeploymentOperations

0 likes · 26 min read

Comprehensive Guide to Change Management: Compatibility Design, Release Planning, Gray Deployment, Data Migration, Rollback, and Configuration Control

JD Cloud Developers

Oct 17, 2024 · Operations

Master Change Management: Compatibility, Gray Release & Rollback Strategies

This guide outlines comprehensive change‑management practices—including compatibility design across hardware, base and application software, structured release planning, gray‑release techniques, data‑migration safeguards, rollback mechanisms, and configuration control—to ensure system stability and reliability during updates.

Change ManagementDeploymentOperations

0 likes · 25 min read

Master Change Management: Compatibility, Gray Release & Rollback Strategies

Alibaba Cloud Developer

Oct 15, 2024 · Databases

Why Did Redis Crash at 100% Memory? Uncovering Buffer Overflows and Best Practices

A detailed post‑mortem of a Redis outage shows how a traffic surge filled bandwidth, caused massive input and output buffers to consume almost all memory, and led to timeouts, while offering step‑by‑step analysis, memory diagnostics, and practical recommendations to prevent similar buffer‑overflow failures.

Best PracticesOperationsRedis

0 likes · 22 min read

Why Did Redis Crash at 100% Memory? Uncovering Buffer Overflows and Best Practices

MaGe Linux Operations

Oct 15, 2024 · Operations

Master Linux Process Management: From Basics to Powerful Commands

This guide explains what a program and a process are, describes process creation, lifecycle, and identifiers, and provides detailed usage of essential Linux commands such as ps, top, pgrep, pstree, lsof, vmstat, iostat, iftop, dstat, as well as foreground/background control and scheduling with at and crontab.

CommandsLinuxOperations

0 likes · 10 min read

Master Linux Process Management: From Basics to Powerful Commands

MaGe Linux Operations

Oct 12, 2024 · Operations

Step‑by‑Step Guide to Deploying Zabbix Distributed Monitoring on Linux

This article walks operations engineers through the concepts, components, and detailed installation steps for setting up a Zabbix distributed monitoring system on Linux, including server and agent configuration, web front‑end deployment, custom items, triggers, and email alerts.

Distributed MonitoringLinuxOperations

0 likes · 10 min read

Step‑by‑Step Guide to Deploying Zabbix Distributed Monitoring on Linux

Practical DevOps Architecture

Oct 11, 2024 · Operations

Troubleshooting Disk Space Not Released After Deleting Files on Linux

This article explains why disk space may not be freed after deleting large log files on a Linux server, describes the underlying file system mechanisms, and provides a step‑by‑step troubleshooting guide using /tmp cleanup and the lsof command to identify and kill lingering processes.

OperationsTroubleshootingdisk space

0 likes · 7 min read

Troubleshooting Disk Space Not Released After Deleting Files on Linux

DevOps Operations Practice

Oct 10, 2024 · Operations

Seven Key Truths About Operations: Downtime, Automation, Prevention, Technology as a Tool, DevOps, Communication, and Security

Effective operations management acknowledges inevitable downtime, emphasizes automation, prioritizes proactive prevention, treats technology as a means rather than an end, integrates closely with development through DevOps, relies on strong communication, and continuously addresses pervasive security challenges to minimize business impact.

AutomationOperationsdowntime

0 likes · 5 min read

Seven Key Truths About Operations: Downtime, Automation, Prevention, Technology as a Tool, DevOps, Communication, and Security

Qunar Tech Salon

Oct 10, 2024 · Operations

Design and Architecture of a Distributed Task Scheduling System for Database Automation

This document outlines the terminology, background, requirements, task classifications, state model, and detailed architecture—including TaskScheduler, TaskWorker, and TaskConsole components—of a new distributed task scheduling system designed to replace Celery in a database automation platform, with emphasis on scalability, reliability, and extensibility.

LocksOperationsTask scheduling

0 likes · 23 min read

Design and Architecture of a Distributed Task Scheduling System for Database Automation

Architecture Digest

Oct 9, 2024 · Operations

Longest‑Running Computer Systems: Real‑World Server Uptime Stories

This article compiles real-world anecdotes from Zhihu users describing computers and servers that have run continuously for years or even decades, highlighting examples such as a 14‑year Red Hat Linux machine, a 20‑year base‑station, long‑standing DOS and Sun systems, and space probes that have operated for nearly half a century.

Operationshardware longevitylong-running systems

0 likes · 7 min read

Longest‑Running Computer Systems: Real‑World Server Uptime Stories

Selected Java Interview Questions

Oct 7, 2024 · Operations

Top 10 Tools Frequently Used by Operations Engineers: Features, Use Cases, and Practical Examples

This article introduces ten essential tools for operations engineers—Shell scripts, Git, Ansible, Prometheus, Grafana, Docker, Kubernetes, Nginx, ELK Stack, and Zabbix—detailing each tool's functionality, typical scenarios, advantages, and real‑world examples with code snippets for practical automation and monitoring.

AutomationOperationsdevops tools

0 likes · 8 min read

Top 10 Tools Frequently Used by Operations Engineers: Features, Use Cases, and Practical Examples

ITPUB

Oct 6, 2024 · Operations

Mastering Prometheus Metrics: Practical Best‑Practice Guide for Effective Monitoring

This guide explains how to design and implement Prometheus metrics for application monitoring, covering the selection of monitoring targets, the four golden metrics, system‑specific metric groups, vector and label choices, naming conventions, histogram bucket design, and useful Grafana visualization tips.

GrafanaOperationsPrometheus

0 likes · 9 min read

Mastering Prometheus Metrics: Practical Best‑Practice Guide for Effective Monitoring

MaGe Linux Operations

Oct 3, 2024 · Operations

Master Nginx Log Formatting: Customize Access Logs for Precise Debugging

This guide explains how to use Nginx's HttpLogModule to configure access_log and log_format directives, customize log parameters, enable log caching, and set per‑location logs with buffer and flush options for efficient troubleshooting and performance monitoring.

NginxOperationsServer Configuration

0 likes · 6 min read

Master Nginx Log Formatting: Customize Access Logs for Precise Debugging

dbaplus Community

Oct 3, 2024 · Operations

How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems

This article explains Netflix's chaos engineering practice, detailing the challenges of microservice reliability, the implementation of the Chaos Monkey tool, the step‑by‑step methodology, guiding principles, and real‑world outcomes that demonstrate improved system availability.

Chaos MonkeyNetflixOperations

0 likes · 6 min read

How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems

Liangxu Linux

Oct 2, 2024 · Operations

10 Essential Ops Engineer Tools Every Sysadmin Should Master

A comprehensive guide lists ten indispensable tools for operations engineers, detailing each tool's functionality, ideal use cases, advantages, and real‑world examples, plus practical code snippets for automation, monitoring, container orchestration, and log analysis.

AutomationOperationsdevops

0 likes · 7 min read

10 Essential Ops Engineer Tools Every Sysadmin Should Master

Liangxu Linux

Oct 1, 2024 · Operations

10 Proven Practices to Prevent System Failures for Ops Teams

This guide outlines ten practical strategies—including rollback testing, safe handling of destructive commands, prompt customization, robust backup and verification, production environment discipline, thorough handover, proactive monitoring, cautious auto‑failover, meticulous execution, and simplicity—to help operations engineers dramatically reduce system outages and improve reliability.

Best PracticesOperationsbackup

0 likes · 17 min read

10 Proven Practices to Prevent System Failures for Ops Teams

DevOps Engineer

Oct 1, 2024 · Operations

What a Chief DevOps Engineer Does: Responsibilities, Required Skills, and Business Benefits

The article explains the role of a chief DevOps engineer, outlining core duties such as infrastructure design, automation, and cultural leadership, the essential technical and soft‑skill requirements, and the advantages this position brings to an organization’s efficiency, reliability, and collaboration.

AutomationChief EngineerLeadership

0 likes · 6 min read

What a Chief DevOps Engineer Does: Responsibilities, Required Skills, and Business Benefits

Architect

Sep 30, 2024 · Operations

Automated Resource Balancing and Migration for Redis Clusters

The article describes how an automated resource‑balancing system continuously monitors Redis host memory usage, selects optimal nodes, safely migrates them through a multi‑step process (adding slaves, verifying replication, promoting masters, deleting old nodes), and provides task management and notification features to maintain high availability and reduce manual DBA effort.

AutomationCluster MigrationOperations

0 likes · 13 min read

Automated Resource Balancing and Migration for Redis Clusters

Liangxu Linux

Sep 29, 2024 · Operations

Essential Automation Scripts for Operations: Baselines, Checks, and Repository Structure

This guide presents a comprehensive collection of automation operation scripts—including baseline health checks, business inspections, organized directory structures, naming conventions, and download links—designed to streamline system, network, database, and cloud infrastructure management.

AnsibleAutomationOperations

0 likes · 6 min read

Essential Automation Scripts for Operations: Baselines, Checks, and Repository Structure

Efficient Ops

Sep 29, 2024 · Operations

Essential Linux Ops Tools Every Sysadmin Must Master

This guide outlines the ten core tool categories—from Linux basics and networking services to scripting, firewalls, monitoring, clustering, and backup—that a Linux operations engineer should master to become an effective sysadmin.

LinuxOperationsdatabase

0 likes · 6 min read

Essential Linux Ops Tools Every Sysadmin Must Master

Python Programming Learning Circle

Sep 29, 2024 · Operations

Docker Image Registry Access Restored in China: Ping Tests and Observations

After a nationwide outage of Docker image registries since June 6, recent ping tests show that get.docker.com and download.docker.com are now reachable in China, offering developers restored access without VPNs, though the article also includes a promotional Python course invitation.

ChinaDockerImage Registry

0 likes · 3 min read

Docker Image Registry Access Restored in China: Ping Tests and Observations

IT Architects Alliance

Sep 28, 2024 · Operations

How DevOps Transforms IT: Core Principles, Practices, and Real-World Success

This article explores the DevOps mindset, its core principles such as collaboration, automation, continuous improvement, and customer focus, outlines essential practices like CI/CD, IaC, monitoring, microservices, and provides a step‑by‑step adoption roadmap illustrated with a detailed case study and future trends.

AutomationCI/CDOperations

0 likes · 11 min read

How DevOps Transforms IT: Core Principles, Practices, and Real-World Success

Python Programming Learning Circle

Sep 28, 2024 · Operations

Essential Skills for Becoming a Successful DevOps Engineer

The article outlines the key competencies a DevOps engineer must master—including programming, Linux system knowledge, configuration management, infrastructure-as-code, CI/CD tools, networking and security, monitoring, and cloud services—to guide readers on building a comprehensive skill set for effective DevOps practice.

LinuxOperationsdevops

0 likes · 5 min read

Essential Skills for Becoming a Successful DevOps Engineer

IT Services Circle

Sep 27, 2024 · Operations

Analysis of the Shanghai Stock Exchange Outage and System Design Lessons

The article recounts the Shanghai Stock Exchange’s sudden P0 outage that halted trading, analyzes the causes such as massive order volume and system bottlenecks, and discusses how distributed architectures and message‑queue based queuing can mitigate similar high‑concurrency failures.

Operationsdistributed systemshigh concurrency

0 likes · 6 min read

Analysis of the Shanghai Stock Exchange Outage and System Design Lessons

Zhuanzhuan Tech

Sep 26, 2024 · Artificial Intelligence

Pricing Strategy and Model Evolution for Second‑Hand Phone Auctions in ZhaiZhai TOB Marketplace

This article examines the characteristics of ZhaiZhai's B2B auction scenario, defines core pricing metrics, presents a step‑by‑step methodology for determining optimal starting prices, reviews early practices and their shortcomings, and details the current modular machine‑learning model architecture that improves transaction rates and reduces price premiums for second‑hand smartphones.

Machine LearningOperationsPrice Optimization

0 likes · 29 min read

Pricing Strategy and Model Evolution for Second‑Hand Phone Auctions in ZhaiZhai TOB Marketplace

Open Source Linux

Sep 26, 2024 · Operations

50 Essential Ops Troubleshooting & Fix Techniques for Rapid Issue Resolution

This guide compiles 50 practical troubleshooting and remediation techniques covering system, network, application, database, and security layers, enabling operations engineers to quickly diagnose failures, apply targeted fixes, and maintain stable, secure infrastructure.

NetworkOperationsTroubleshooting

0 likes · 17 min read

50 Essential Ops Troubleshooting & Fix Techniques for Rapid Issue Resolution

Efficient Ops

Sep 24, 2024 · Operations

Master Linux Performance in 60 Seconds: 10 Essential Commands

When a Linux server shows performance issues, the first minute is critical; this guide walks you through ten standard command‑line tools—uptime, dmesg, vmstat, mpstat, pidstat, iostat, free, sar, and top—explaining what each metric means and how to interpret the output for quick troubleshooting.

LinuxOperationsPerformance

0 likes · 19 min read

Master Linux Performance in 60 Seconds: 10 Essential Commands

Top Architect

Sep 23, 2024 · Backend Development

Understanding Nginx Architecture, Process Model, FastCGI Integration, and Performance Optimization

This article provides a comprehensive overview of Nginx's high‑performance architecture, including its core, basic, and third‑party modules, master‑worker process model, asynchronous non‑blocking I/O mechanisms, FastCGI and PHP‑FPM integration, and practical configuration and tuning tips for optimal server operation.

NginxOperationsOptimization

0 likes · 46 min read

Understanding Nginx Architecture, Process Model, FastCGI Integration, and Performance Optimization

FunTester

Sep 20, 2024 · Operations

Chaos Engineering vs Fault Testing: Methods, Challenges, and Future Trends

This article compares chaos engineering and fault testing, outlines fault injection techniques, implementation layers, testing strategies, challenges, and future trends such as automation, AI-driven diagnostics, and cloud‑native integration, providing a comprehensive guide for improving system resilience and reliability.

Operationschaos engineeringcloud-native

0 likes · 17 min read

Chaos Engineering vs Fault Testing: Methods, Challenges, and Future Trends

MaGe Linux Operations

Sep 19, 2024 · Operations

Configure Keepalived Dual‑Host Mode for Automatic VIP Failover

This guide demonstrates how to set up a two‑node Keepalived configuration in dual‑host mode, creating two VRRP instances that act as master and backup to enable seamless VIP address migration between the servers.

High AvailabilityKeepalivedLinux networking

0 likes · 7 min read

Configure Keepalived Dual‑Host Mode for Automatic VIP Failover

Liangxu Linux

Sep 17, 2024 · Operations

Top 10 Essential Ops Tools Every Engineer Should Master

This article presents ten indispensable tools for operations engineers—detailing each tool’s functionality, ideal use cases, advantages, and real‑world examples, from shell scripting and Git to Ansible, Prometheus, Grafana, Docker, Kubernetes, Nginx, the ELK stack, and Zabbix, helping professionals streamline automation, monitoring, and deployment tasks.

AutomationOperationsconfiguration management

0 likes · 8 min read

Top 10 Essential Ops Tools Every Engineer Should Master

Java High-Performance Architecture

Sep 17, 2024 · Operations

What Happens When a Data Center Fire Disrupts Global Cloud Services?

A lithium‑battery fire at Alibaba Cloud's Singapore C zone data centre on September 10 triggered a multi‑day outage that crippled major cloud products, affected dozens of tech companies, and highlighted the challenges of extinguishing battery fires in critical infrastructure.

Data CenterOperationscloud outage

0 likes · 9 min read

What Happens When a Data Center Fire Disrupts Global Cloud Services?

Architects' Tech Alliance

Sep 12, 2024 · Industry Insights

Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights

This article examines the key pain points of massive AI compute clusters—including heterogeneous hardware compatibility, efficient scheduling, training and inference acceleration, and fault‑tolerant operations—while presenting practical management and performance‑tuning strategies, a cloud‑native AI platform implementation, and future directions for the ecosystem.

AI computingOperationsPerformance Tuning

0 likes · 7 min read

Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights

Linux Cloud Computing Practice

Sep 11, 2024 · Operations

Essential Linux Commands Every Sysadmin Should Master

This guide compiles the most frequently used Linux commands—covering help utilities, file and directory manipulation, content processing, compression, system information, networking, disk management, permissions, user administration, and process control—to provide a comprehensive reference for effective system operation and troubleshooting.

OperationsShellUnix

0 likes · 14 min read

Essential Linux Commands Every Sysadmin Should Master

DevOps Engineer

Sep 11, 2024 · Operations

Will DevOps Disappear? How AI Impacts the Role of DevOps Engineers

While AI can automate many routine DevOps tasks such as scripting, CI/CD pipeline creation, and infrastructure design, it cannot replace the contextual understanding, critical thinking, experience, and judgment of senior DevOps engineers, who will evolve into architects and innovators rather than being rendered obsolete.

Artificial IntelligenceAutomationOperations

0 likes · 4 min read

Will DevOps Disappear? How AI Impacts the Role of DevOps Engineers

FunTester

Sep 11, 2024 · Operations

Pinterest Performance Plan: Real‑User Monitoring, Regression Detection, and Alerting

Pinterest’s performance program details how the team defines custom Pinner Wait Time metrics, uses real‑user monitoring and fine‑grained alerts to detect regressions quickly, and follows structured root‑cause analysis and ownership processes to prevent performance degradation across web surfaces.

OperationsRegressionmonitoring

0 likes · 18 min read

Pinterest Performance Plan: Real‑User Monitoring, Regression Detection, and Alerting

DevOps Operations Practice

Sep 8, 2024 · Operations

Which Types of Companies Pay Well for Operations Engineers

The article explains that technology‑driven firms, financial institutions, large multinational corporations, and innovative startups are the main types of companies that tend to offer high salaries to operations engineers because of their critical reliance on stable and secure IT infrastructure.

CareerIT infrastructureOperations

0 likes · 6 min read

Which Types of Companies Pay Well for Operations Engineers

NetEase LeiHuo Testing Center

Sep 6, 2024 · Operations

From Log Beginner to Pro: A QA’s Journey in Game Log Management and Monitoring

This article chronicles the author’s progression from a novice to a proficient log analyst in game development, explaining what logs are, how to collect and classify them, establishing standards and workflows, and detailing the implementation of log monitoring and QA processes for reliable game operations.

Game DevelopmentLog MonitoringOperations

0 likes · 20 min read

From Log Beginner to Pro: A QA’s Journey in Game Log Management and Monitoring

Linux Cloud Computing Practice

Sep 4, 2024 · Operations

How to Diagnose and Fix Linux CPU 100% Issues with a Handy Shell Script

This article walks you through a systematic approach to identify the root cause of a Linux server's CPU hitting 100%, from pinpointing the high‑load process with top, tracing the responsible business code, to using a custom shell script that streamlines thread analysis and resolves the overload.

CPUOperationsPerformance

0 likes · 12 min read

How to Diagnose and Fix Linux CPU 100% Issues with a Handy Shell Script

FunTester

Sep 4, 2024 · Operations

Reflections on Technical Growth: Foundations, Output, and Continuous Learning

The article shares a software engineer’s personal journey, emphasizing the importance of solid fundamentals, proactive output, curiosity‑driven problem solving, documentation, and process optimization to build lasting technical competence and reduce tacit knowledge throughout a career.

Operationscareer developmentdocumentation

0 likes · 13 min read

Reflections on Technical Growth: Foundations, Output, and Continuous Learning

Efficient Ops

Sep 2, 2024 · Operations

How China’s Auto Industry Is Leading the Way in DevOps Standardization

The article details China’s 2024‑2027 Information Standardization Action Plan, the CAICT’s DevOps assessment framework, and showcases how automotive firms like FAW‑Volkswagen and Chang'an have achieved top‑tier continuous delivery and system‑tool standards, highlighting key metrics and the role of international ITU standards.

IT GovernanceOperationsStandardization

0 likes · 9 min read

How China’s Auto Industry Is Leading the Way in DevOps Standardization

Volcano Engine Developer Services

Sep 2, 2024 · Operations

How ByteDance Scales Disaster Recovery: From Single Data Center to Multi‑Region Active‑Active

This article details ByteDance’s disaster‑recovery evolution—from a single‑room deployment to same‑city multi‑data‑center setups and finally to active‑active multi‑region architectures—explaining the challenges, specific failure scenarios, and the strategic practices used to ensure continuous service during outages.

Disaster RecoveryHigh AvailabilityOperations

0 likes · 15 min read

How ByteDance Scales Disaster Recovery: From Single Data Center to Multi‑Region Active‑Active

Liangxu Linux

Aug 31, 2024 · Operations

How to Quickly Identify and Free Hidden Disk Space on Linux Servers

This guide explains practical commands like du, find, and lsof to locate large directories or deleted files consuming disk space, and shows how to adjust reserved filesystem space with tune2fs to reclaim lost capacity.

LinuxOperationsdisk space

0 likes · 5 min read

How to Quickly Identify and Free Hidden Disk Space on Linux Servers

Efficient Ops

Aug 29, 2024 · Operations

Which Operations Roles Match Your MBTI Personality? Find the Perfect Fit

This article explains the MBTI framework, outlines its four dimensions, and matches each of the 16 personality types to specific operations (运维) positions, highlighting the strengths and ideal job responsibilities for each type.

Career GuidanceJob FitMBTI

0 likes · 5 min read

Which Operations Roles Match Your MBTI Personality? Find the Perfect Fit

Top Architect

Aug 29, 2024 · Operations

Setting Up Nginx Log Monitoring with Loki, Promtail, and Grafana

This article walks through a complete, step‑by‑step solution for collecting Nginx access logs, converting them to JSON, shipping them with Promtail to Loki, and visualizing the data in Grafana, including Docker deployment, dashboard import, and world‑map plugin installation.

GrafanaLoggingOperations

0 likes · 10 min read

Setting Up Nginx Log Monitoring with Loki, Promtail, and Grafana