Tagged articles
53 articles
Page 1 of 1
AI Software Product Manager
AI Software Product Manager
Apr 6, 2026 · R&D Management

How OpenSpec Enables Spec‑Driven Development with AI Collaboration

OpenSpec is a CLI tool that introduces a spec‑driven workflow between developers and AI assistants, outlining clear "what" and "how" stages, change proposal management, integration with Cursor for automated documentation, and a complete cycle of proposing, reviewing, implementing, and archiving changes.

AI CollaborationCLIOpenSpec
0 likes · 12 min read
How OpenSpec Enables Spec‑Driven Development with AI Collaboration
dbaplus Community
dbaplus Community
Dec 18, 2025 · Operations

How Bilibili’s ChangePilot Platform Reduces Production Risk with Structured Change Management

This article explains Bilibili’s approach to change management, defining change concepts, outlining a technical framework, detailing control levels, and describing the ChangePilot platform’s architecture, integration, and future directions to improve stability in large-scale cloud‑native environments.

Cloud NativeProduction Stabilitychange management
0 likes · 29 min read
How Bilibili’s ChangePilot Platform Reduces Production Risk with Structured Change Management
AI Info Trend
AI Info Trend
Dec 5, 2025 · Industry Insights

How CEOs Can Turn Generative AI Into a Superpower: A 5‑Step Framework

The McKinsey report outlines a five‑step change‑management framework that helps CEOs define a North‑Star vision, build data trust, redesign workflows, create hybrid AI‑human organizations, and empower employees to become AI ambassadors, turning generative AI into a strategic competitive advantage.

AI strategyEnterprise AILeadership
0 likes · 11 min read
How CEOs Can Turn Generative AI Into a Superpower: A 5‑Step Framework
AI Info Trend
AI Info Trend
Sep 8, 2025 · Industry Insights

How CEOs Can Unlock Gen AI Value: A Five‑Step Change Management Playbook

The McKinsey report outlines a five‑step framework for CEOs to turn generative AI pilots into real business value by setting a clear North Star, establishing data trust, redesigning workflows, blending autonomous and augmented teams, and empowering employees as change agents.

AI adoptionGen AILeadership
0 likes · 9 min read
How CEOs Can Unlock Gen AI Value: A Five‑Step Change Management Playbook
JD Tech Talk
JD Tech Talk
Oct 17, 2024 · Operations

Comprehensive Guide to Change Management: Compatibility Design, Release Planning, Gray Deployment, Data Migration, Rollback, and Configuration Control

This article presents a detailed overview of change management practices, covering compatibility design across hardware, base software, and applications, release strategies, gray‑deployment techniques, data migration analysis, rollback planning, configuration change control, and verification procedures to ensure system stability and reliability.

CompatibilityGray DeploymentOperations
0 likes · 26 min read
Comprehensive Guide to Change Management: Compatibility Design, Release Planning, Gray Deployment, Data Migration, Rollback, and Configuration Control
JD Cloud Developers
JD Cloud Developers
Oct 17, 2024 · Operations

Master Change Management: Compatibility, Gray Release & Rollback Strategies

This guide outlines comprehensive change‑management practices—including compatibility design across hardware, base and application software, structured release planning, gray‑release techniques, data‑migration safeguards, rollback mechanisms, and configuration control—to ensure system stability and reliability during updates.

DeploymentOperationschange management
0 likes · 25 min read
Master Change Management: Compatibility, Gray Release & Rollback Strategies
Bilibili Tech
Bilibili Tech
Aug 9, 2024 · Operations

Design and Implementation of Bilibili's Change Control Platform

Bilibili’s Change Prevention Platform consolidates data from over 60 systems to proactively detect and block more than 100 risky changes daily, reducing change‑related incidents by applying a four‑pillar framework of technical support, landing, cross‑domain enablement, and cultural safeguards, while evolving toward AI‑driven, end‑to‑end change defense.

BilibiliDevOpsReliability
0 likes · 20 min read
Design and Implementation of Bilibili's Change Control Platform
Architecture and Beyond
Architecture and Beyond
Jul 21, 2024 · Operations

Mastering Backend Stability: 7 Essential Practices for High Availability

This comprehensive guide outlines the seven key pillars—operations, high‑availability architecture, capacity governance, change management, risk governance, fault management, and chaos engineering—that together form a systematic approach to building and maintaining a reliable, 24‑hour backend system.

Operationsbackend stabilitycapacity planning
0 likes · 40 min read
Mastering Backend Stability: 7 Essential Practices for High Availability
DevOps Cloud Academy
DevOps Cloud Academy
May 30, 2024 · Operations

Case Study: Overcoming Resistance in a Large Manufacturing Company's IT Department During DevOps Transformation

This case study describes how a large manufacturing company's IT department, led by Michael, overcame strong internal resistance from senior staff to transition from a traditional waterfall development model to an agile and DevOps approach through personalized communication, stakeholder engagement, and transparent implementation planning.

DevOpsIT transformationagile
0 likes · 8 min read
Case Study: Overcoming Resistance in a Large Manufacturing Company's IT Department During DevOps Transformation
Cognitive Technology Team
Cognitive Technology Team
Apr 15, 2024 · Operations

Tencent Cloud Service Outage on April 8: Root Cause, Impact, and Improvement Measures

On April 8, Tencent Cloud experienced a major service outage caused by a cloud API failure that prevented console login and disrupted several public cloud services for 87 minutes, prompting a detailed post‑mortem that outlines the root cause, impact, and a series of operational and change‑management improvements.

OperationsTencent Cloudchange management
0 likes · 4 min read
Tencent Cloud Service Outage on April 8: Root Cause, Impact, and Improvement Measures
Bilibili Tech
Bilibili Tech
Jan 5, 2024 · Cloud Native

ChangePilot: Bilibili’s Unified Change Management Platform and Practices

ChangePilot is Bilibili’s unified change‑management platform that standardizes change definition, lifecycle, and risk governance through a platform‑scenario model and five control levels (G0‑G4), offering built‑in checks, searchable records, subscription alerts, intelligent correlation, and emergency channels to boost production stability while maintaining operational efficiency.

SREchange managementrisk control
0 likes · 29 min read
ChangePilot: Bilibili’s Unified Change Management Platform and Practices
AntTech
AntTech
Dec 18, 2023 · Cloud Native

AlterShield Open‑Source Change Risk Control Platform: Architecture, Features, and Future Roadmap

AlterShield is an open‑source change‑risk prevention solution originally built by Ant Group that provides lifecycle‑aware change defense, cloud‑native operator integration, KDE‑based anomaly detection, and extensible plug‑in frameworks, with detailed module descriptions, recent v1.0 releases, and a roadmap for advanced monitoring and noise‑reduction capabilities.

Cloud NativeKubernetesSRE
0 likes · 13 min read
AlterShield Open‑Source Change Risk Control Platform: Architecture, Features, and Future Roadmap
dbaplus Community
dbaplus Community
Aug 13, 2023 · Operations

Mastering SRE: Key Questions on Monitoring, Capacity, and Change Management

This article provides a comprehensive SRE guide covering senior role definitions, monitoring objectives and implementation, core metric selection, link and event monitoring, capacity planning and mitigation strategies, a real‑world health‑code outage case, and change‑management best practices to improve reliability and efficiency.

SREcapacitychange management
0 likes · 9 min read
Mastering SRE: Key Questions on Monitoring, Capacity, and Change Management
Architecture and Beyond
Architecture and Beyond
Jul 22, 2023 · Operations

Mastering Production Change Management: Prevent Outages with Proven Processes

This article analyzes high‑profile service outages, defines the production environment and its components, categorizes five types of production changes, and presents a comprehensive change‑management framework—including organizational roles, step‑by‑step procedures, and best‑practice tips—to help teams reduce risk and maintain system stability.

DevOpsOperationschange management
0 likes · 15 min read
Mastering Production Change Management: Prevent Outages with Proven Processes
AntTech
AntTech
Jul 20, 2023 · Operations

AlterShield: An Open‑Source Change Management Platform for Risk Control and Observability

AlterShield is an open‑source, end‑to‑end change‑control platform that systematizes change perception, risk analysis, and defense across distributed cloud‑native environments, enabling SRE teams to mitigate stability risks through standardized protocols, incremental rollout, and automated observability checks.

Cloud NativeSREchange management
0 likes · 24 min read
AlterShield: An Open‑Source Change Management Platform for Risk Control and Observability
DevOps
DevOps
Jun 9, 2023 · R&D Management

Preparing for Organizational Change: Building Urgency, Leadership, Team Participation, Goals, Research, and Action Plans

The article explains how to prepare for successful organizational change by creating urgency and recognition, establishing a change leadership team, guiding team participation, defining clear goals, conducting research interviews, and developing detailed action plans, all supported by practical examples and visual illustrations.

LeadershipR&Dchange management
0 likes · 11 min read
Preparing for Organizational Change: Building Urgency, Leadership, Team Participation, Goals, Research, and Action Plans
Efficient Ops
Efficient Ops
Jun 7, 2023 · Artificial Intelligence

How Guangdong Mobile Scaled AIOps: From Manual Ops to Intelligent Automation

This article details Guangdong Mobile's evolution of IT systems and operations, explains the four domain architecture, chronicles the AIOps adoption timeline, showcases intelligent anomaly detection, change assessment, fault diagnosis, and operation robots, and shares practical promotion methods and future outlook for AI‑driven IT operations.

AutomationFault DiagnosisIT Operations
0 likes · 19 min read
How Guangdong Mobile Scaled AIOps: From Manual Ops to Intelligent Automation
DeWu Technology
DeWu Technology
Oct 17, 2022 · Operations

High Availability: Principles and Practices for System Stability

High availability—measured in nines of uptime—requires partitioning systems, decoupling components, choosing robust technologies, deploying redundant instances with automatic failover, capacity planning, rapid scaling, traffic shaping, resource isolation, global protection, observability, and disciplined change management to achieve stable, resilient services.

Observabilitycapacity planningchange management
0 likes · 10 min read
High Availability: Principles and Practices for System Stability
Top Architect
Top Architect
Sep 4, 2022 · Backend Development

Designing Fault‑Tolerant Microservices Architecture

The article explains how to build highly available microservice systems by isolating failures, applying graceful degradation, change‑management, health checks, self‑healing, fallback caches, circuit breakers, retry policies, rate limiting and testing strategies, while acknowledging the cost and operational complexity involved.

Retrychange managementcircuit breaker
0 likes · 16 min read
Designing Fault‑Tolerant Microservices Architecture
Architects Research Society
Architects Research Society
May 22, 2022 · Operations

Designing Resilient Microservices: Fault‑Tolerance Patterns and Practices

This article explains how to build highly available microservice systems by defining clear service boundaries, employing graceful degradation, change‑management strategies, health checks, self‑healing, cache failover, retry logic, rate limiting, bulkheads, circuit breakers, and testing techniques to mitigate failures in distributed environments.

Cloud Nativechange managementcircuit breaker
0 likes · 15 min read
Designing Resilient Microservices: Fault‑Tolerance Patterns and Practices
Architects Research Society
Architects Research Society
Aug 17, 2021 · Fundamentals

The Critical Role of Enterprise Architecture in Successful Business Transformations

The article explains how enterprise architecture, when focused on agile change processes and integrated with strategy, risk, compliance, and portfolio management, becomes a vital knowledge hub that enables organizations to accelerate digital transformation, reduce costs, and improve customer satisfaction.

Business strategyagilechange management
0 likes · 9 min read
The Critical Role of Enterprise Architecture in Successful Business Transformations
DevOps
DevOps
Aug 4, 2021 · R&D Management

Five Key Lessons for Successful Digital Transformation

The article analyzes why many digital transformation initiatives fail, presents five practical lessons—including aligning business strategy, leveraging internal capabilities, designing customer experience from the outside in, addressing employee concerns, and adopting a Silicon Valley‑style entrepreneurial culture—to help leaders drive effective change.

Business strategyDigital TransformationLeadership
0 likes · 10 min read
Five Key Lessons for Successful Digital Transformation
DevOps
DevOps
Apr 16, 2021 · R&D Management

Why 80% of Digital Transformations Fail and How to Build a Successful Digital Culture

A McKinsey report reveals that only 20% of digital transformation initiatives succeed, largely because organizational culture—not technology—is the decisive factor, and it outlines five practical steps—including hiring digital‑savvy leaders, upskilling staff, redesigning work mechanisms, modernising tools, and storytelling—to create an agile, adaptive culture that drives successful transformation.

Digital TransformationLeadershipMcKinsey
0 likes · 12 min read
Why 80% of Digital Transformations Fail and How to Build a Successful Digital Culture
Tencent Cloud Developer
Tencent Cloud Developer
Dec 25, 2020 · Operations

Tencent Cloud Network Operations Platform: Architecture, Chaos Engineering, Change Health Check, and Monitoring

Tencent Cloud’s network operations platform combines a layered underlay‑overlay architecture, rapid fault detection within seconds and recovery in minutes, chaos‑engineering experiments, rigorous change health checks, high‑frequency multi‑path monitoring, and plans for predictive self‑healing to ensure reliable service across millions of servers.

Network MonitoringTencent Cloudchange management
0 likes · 14 min read
Tencent Cloud Network Operations Platform: Architecture, Chaos Engineering, Change Health Check, and Monitoring
DevOps
DevOps
Dec 22, 2020 · R&D Management

Key Practices for Taking the First Step in Digital Transformation

The article outlines practical guidance for organizations embarking on digital transformation, emphasizing a focused goal, managing internal expectations, leveraging appropriate external resources, and executing short‑term pilot projects of 3‑6 months to build confidence and ensure sustainable success.

Business InsightConsultingDigital Transformation
0 likes · 10 min read
Key Practices for Taking the First Step in Digital Transformation
Efficient Ops
Efficient Ops
Dec 1, 2020 · Operations

Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform

At the 2020 GOPS Global Operations Conference, Tencent’s senior operations engineer Xie Hailin detailed the design and implementation of the Panshi platform—a comprehensive, high‑availability solution that unifies change management, fault handling, continuous operation, and disaster recovery to ensure uninterrupted payment services for billions of daily transactions.

Operationsaiopschange management
0 likes · 24 min read
Zero‑Downtime Ops: Inside Tencent’s Panshi High‑Availability Platform
FunTester
FunTester
Nov 9, 2019 · R&D Management

How to Champion Software Quality: A Practical Guide to Driving Change

This article outlines a step‑by‑step approach for identifying change opportunities, persuading leaders to prioritize software quality, and sustaining effective transformation through audience insight, problem framing, tone crafting, and collaborative ownership, helping teams accelerate market delivery.

LeadershipR&DSoftware quality
0 likes · 7 min read
How to Champion Software Quality: A Practical Guide to Driving Change
DevOpsClub
DevOpsClub
Aug 25, 2019 · Operations

How to Reach Elite DevOps Efficiency: Insights from the 2019 State of DevOps Report

The 2019 State of DevOps report reveals how organizations can benchmark their software delivery performance, adopt two key efficiency models, and implement lightweight change‑management practices to move from low‑efficiency to elite‑efficiency status, backed by data‑driven insights and actionable steps.

Continuous Deliverychange managementperformance metrics
0 likes · 12 min read
How to Reach Elite DevOps Efficiency: Insights from the 2019 State of DevOps Report
Efficient Ops
Efficient Ops
Apr 24, 2019 · Operations

Why Every Ops Change Should Be Treated Like a Project

This article shares practical lessons from a real‑world ops incident, emphasizing the need for clear change background, optimal timing, project‑style management, and strict process adherence to reduce risk and improve production reliability.

DevOpsOperationsbest practices
0 likes · 9 min read
Why Every Ops Change Should Be Treated Like a Project
ITPUB
ITPUB
Apr 15, 2019 · Operations

Essential Practices to Prevent Operational Failures and Boost System Availability

This guide outlines six practical strategies—rollback testing, cautious destructive actions, clear command prompts, verified backups, careful handovers, and proactive monitoring—to help operations teams minimize outages and maintain high system availability.

AvailabilityOperationsbackup verification
0 likes · 6 min read
Essential Practices to Prevent Operational Failures and Boost System Availability
Programmer DD
Programmer DD
Mar 12, 2019 · R&D Management

Why Change Requests Outperform Pull/Merge Requests in Modern DevOps

This article compares Alibaba’s change request workflow with traditional Pull/Merge Request models, outlining their similarities, key differences, benefits such as flexibility and faster releases, as well as associated risks, and explains how the method is implemented and supported by tools like Cloud Eff.

DevOpschange managementcontinuous integration
0 likes · 15 min read
Why Change Requests Outperform Pull/Merge Requests in Modern DevOps
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 11, 2019 · Operations

How Alibaba’s Real‑Time CFD Sandbox Revolutionizes Data Center Change Management

Alibaba and Nanyang Technological University have built a high‑precision, real‑time CFD change‑sandbox system that integrates with DCIM to simulate and validate data‑center HVAC modifications, enabling automated, accurate impact assessment, reducing fault risk, and supporting design optimization and operational automation.

CFDData centerReal-time Simulation
0 likes · 11 min read
How Alibaba’s Real‑Time CFD Sandbox Revolutionizes Data Center Change Management
Hujiang Technology
Hujiang Technology
Jun 1, 2018 · R&D Management

Recap of Hujiang PMO Salon: Insights on Project Management Practices and Organizational PMO

The article recounts the Hujiang PMO salon held on May 27, 2018, detailing the schedule, keynote presentations on PMO operations, organizational project management, a complex cross‑departmental learning system case study, and concluding remarks on continuous improvement and transformation in project management.

LeadershipPMOProject Management
0 likes · 7 min read
Recap of Hujiang PMO Salon: Insights on Project Management Practices and Organizational PMO
dbaplus Community
dbaplus Community
May 1, 2018 · Operations

How to Achieve Zero‑Fault Database Operations: Real‑World Cases and Management Practices

This article shares practical experiences from a DBAplus Guangzhou tech salon, detailing three real Oracle database incident cases, the root‑cause analyses, and a three‑step framework for rapid resolution, prevention, and team management to maintain zero‑fault operations across thousands of database instances.

Database operationsOracleTeam Practices
0 likes · 22 min read
How to Achieve Zero‑Fault Database Operations: Real‑World Cases and Management Practices
DevOps
DevOps
Apr 4, 2017 · Operations

10 DevOps Best Practices for Accelerating App Development and Delivery

This article outlines ten practical DevOps best‑practice steps—including breaking IT silos, aligning performance metrics, achieving real‑time project visibility, automating across the stack, choosing compatible toolchains, starting with small wins, keeping users central, managing change collaboratively, embracing continuous deployment, and building an internal service‑focused culture—to help organizations deliver applications faster and more reliably.

App DevelopmentAutomationContinuous Delivery
0 likes · 6 min read
10 DevOps Best Practices for Accelerating App Development and Delivery
Efficient Ops
Efficient Ops
Dec 10, 2016 · Operations

What DevOps Lessons Does “The Phoenix Project” Reveal for Modern IT Operations?

After reading the novel‑style account of the Phoenix Project, the author reflects on the book’s DevOps insights—highlighting chronic IT operations challenges, the power of visualizing changes with Kanban, addressing resource constraints, navigating security audits, and pursuing automation through a three‑step cultural transformation.

DevOpsIT OperationsKanban
0 likes · 10 min read
What DevOps Lessons Does “The Phoenix Project” Reveal for Modern IT Operations?
Efficient Ops
Efficient Ops
May 11, 2016 · Operations

How to Build an Automated Operations Platform: Insights from Tencent's Experience

This article shares Peng Lihang's practical insights on operations automation, covering the essential trio of configuration, state, and change management, the evolution of ops practices, platform design principles, and concrete steps for building scalable, business‑driven ops platforms.

AutomationConfiguration ManagementOperations
0 likes · 24 min read
How to Build an Automated Operations Platform: Insights from Tencent's Experience
Big Data and Microservices
Big Data and Microservices
Apr 20, 2016 · Operations

How to Build an Effective IT Operations Service System: Principles, Architecture & Best Practices

This article outlines the fundamental principles, overall architecture, scope, and detailed components of an IT operations service system, covering policies, processes, organizational structure, platform tools, and management workflows such as incident, problem, change, and configuration management.

Configuration ManagementIT OperationsPlatform Integration
0 likes · 19 min read
How to Build an Effective IT Operations Service System: Principles, Architecture & Best Practices
dbaplus Community
dbaplus Community
Apr 11, 2016 · Operations

Can External Quality Acceptance Drive DevOps Monitoring and Eliminate Technical Debt?

This article explains how focusing on non‑functional quality during external acceptance testing can drive DevOps teams to improve system monitorability, reduce technical debt, and establish concrete change‑control, acceptance, and performance verification processes for both operational and business‑level observability.

DevOpsObservabilityTechnical Debt
0 likes · 15 min read
Can External Quality Acceptance Drive DevOps Monitoring and Eliminate Technical Debt?