Tag

fault management

0 views collected around this technical thread.

Efficient Ops
Efficient Ops
Jul 7, 2024 · Operations

Boost Business Continuity and IT System Stability: Practical Strategies

This article explains business continuity concepts, outlines the risks to IT system stability, and provides actionable steps—such as expanding monitoring coverage, improving fault detection, enhancing architecture resilience, and strengthening emergency coordination—to ensure continuous operation despite inevitable failures.

Business ContinuityIT Operationsdisaster recovery
0 likes · 7 min read
Boost Business Continuity and IT System Stability: Practical Strategies
Code Ape Tech Column
Code Ape Tech Column
Jul 26, 2023 · Operations

Service Governance: Monitoring, Fault Management, Release and Capacity Planning

This article explains how to achieve 24/7 service availability through comprehensive monitoring, fault handling, release management, and capacity planning, covering alarm types, batch processing, traffic and resource metrics, fault causes and mitigation, deployment strategies, scaling commands, and service degradation techniques.

Capacity Planningfault managementmonitoring
0 likes · 20 min read
Service Governance: Monitoring, Fault Management, Release and Capacity Planning
NetEase Game Operations Platform
NetEase Game Operations Platform
Apr 23, 2022 · Artificial Intelligence

Design and Implementation of an AI‑Driven Intelligent Operations Platform for Game Services

The article presents a comprehensive overview of an AI‑ops platform for game operations, covering its background, roadmap, team structure, business scenarios, anomaly‑detection techniques, platform architecture, detection workflow, model deployment, and intelligent fault‑management strategies.

AIOpsAnomaly DetectionIntelligent Operations
0 likes · 20 min read
Design and Implementation of an AI‑Driven Intelligent Operations Platform for Game Services
Efficient Ops
Efficient Ops
Dec 25, 2021 · Artificial Intelligence

How Zhejiang Mobile’s AIOps Achieved National‑Level Excellence in Fault Management

The article explains AIOps fundamentals, details Zhejiang Mobile’s successful assessment in the national AIOps capability maturity model, shares insights from an interview with the company’s network‑management deputy director, and outlines future plans and industry recommendations for AI‑driven IT operations.

AIOpsCapability Maturity ModelIT Operations
0 likes · 9 min read
How Zhejiang Mobile’s AIOps Achieved National‑Level Excellence in Fault Management
Architecture Digest
Architecture Digest
Sep 17, 2017 · R&D Management

Comprehensive R&D Management Practices: Task Management, Documentation, Code Collaboration, QA, Deployment, and Fault Handling

This article presents a detailed, experience‑driven guide to building an efficient R&D management system covering the product lifecycle, task management, documentation, code collaboration, quality assurance, automated deployment, fault management, instant communication, and techniques for continuous technical improvement.

Code CollaborationDocumentationR&D management
0 likes · 23 min read
Comprehensive R&D Management Practices: Task Management, Documentation, Code Collaboration, QA, Deployment, and Fault Handling
Efficient Ops
Efficient Ops
Aug 16, 2017 · Operations

How Qunar Built an Automated Hardware Operations Platform to Boost Efficiency

This article details Qunar's end‑to‑end hardware automation system, covering background challenges, lifecycle management, automated testing, data collection, fault detection, and visualized monitoring, and explains how the integrated platform reduces manual effort, improves reliability, and cuts operational costs.

CMDBfault managementhardware automation
0 likes · 22 min read
How Qunar Built an Automated Hardware Operations Platform to Boost Efficiency