Tagged articles
34 articles
Page 1 of 1
DevOps Coach
DevOps Coach
Apr 20, 2026 · Industry Insights

Why Senior Developers Still Matter When AI Does the Coding

The article argues that despite junior developers completing tasks quickly with AI assistants, senior engineers add lasting value through rigorous testing, system reliability, deep architectural insight, and mentorship, illustrating the complementary roles of experience and generative AI in modern software teams.

AI coding toolsSoftware Engineeringsenior developers
0 likes · 13 min read
Why Senior Developers Still Matter When AI Does the Coding
FunTester
FunTester
Apr 19, 2026 · Artificial Intelligence

How AI Can Reduce Deployment Failures by Up to 50% and Boost Team Efficiency

This article analyzes why software deployment failures pose systemic risks, enumerates the most common root causes, and explains how AI‑driven automation—covering intelligent version control, automatic rollback, test optimization, dependency management, database migration, observability, security checks, self‑documenting pipelines, backup verification, and predictive scaling—can transform DevOps from reactive firefighting to proactive, self‑healing delivery.

AIDeployment AutomationDevOps
0 likes · 15 min read
How AI Can Reduce Deployment Failures by Up to 50% and Boost Team Efficiency
AgentGuide
AgentGuide
Mar 24, 2026 · Artificial Intelligence

What I Learned Moving from Backend Engineering to AI Agent Development

The author, a former backend engineer turned AI Agent developer, explains how LLM uncertainty, context engineering, shifting code responsibilities, workflow standards, new failure modes, and the ReAct paradigm shape modern Agent development, and outlines tasks best suited—or unsuited—for LLMs.

AI AgentContext EngineeringLLM
0 likes · 6 min read
What I Learned Moving from Backend Engineering to AI Agent Development
Java Web Project
Java Web Project
Mar 10, 2026 · Industry Insights

Why AI‑Generated Code Still Needs a Post‑Processing Engineer

The article analyzes how large‑model code generators can quickly produce 80‑point prototypes but still require skilled engineers to fix missing logic, boundary cases, security flaws, and performance issues, turning shaky AI output into reliable, production‑ready software.

AI code generationAutonomous Agentsindustry insight
0 likes · 9 min read
Why AI‑Generated Code Still Needs a Post‑Processing Engineer
FunTester
FunTester
Oct 31, 2025 · Fundamentals

Master Defensive Programming: Turn Failures into Manageable Events

This article explains why defensive programming is essential, outlines its core principles, presents common failure scenarios and practical guidelines, and shows how testing and observability can turn inevitable errors into controlled, recoverable events that keep systems stable and maintainable.

Error HandlingObservabilitydefensive programming
0 likes · 9 min read
Master Defensive Programming: Turn Failures into Manageable Events
AntTech
AntTech
Jun 23, 2025 · Artificial Intelligence

Can AI Auditors Ensure Reliable Software? Highlights from EXPRESS 2025 at ISSTA

The EXPRESS 2025 workshop at ISSTA in Norway will showcase AI‑driven code auditing, present cutting‑edge research on trustworthy software systems, and invite researchers and practitioners to discuss transparency, reliability, and security challenges in modern software engineering.

AI auditingISSTA 2025LLM
0 likes · 5 min read
Can AI Auditors Ensure Reliable Software? Highlights from EXPRESS 2025 at ISSTA
DeWu Technology
DeWu Technology
Mar 17, 2025 · Operations

Stability and Its Significance: Challenges and Practices for Building System Reliability

Building system stability requires quantifying risk through formulas, confronting challenges like low short‑term value and resource competition, and implementing a consensus‑driven framework that sets clear goals, cultivates awareness, enforces safety standards, ensures emergency response, conducts routine inspections, and applies sound architecture governance to continuously reduce inherent and change‑related risks.

process improvementrisk managementsoftware reliability
0 likes · 25 min read
Stability and Its Significance: Challenges and Practices for Building System Reliability
JD Cloud Developers
JD Cloud Developers
Oct 21, 2024 · Operations

How Test Teams Can Build Observability Beyond Traditional Monitoring

This article examines how quality assurance engineers can adopt observability principles—distinct from conventional monitoring—to enhance system health detection, root‑cause analysis, and proactive risk mitigation across resources, services, business functions, data, and logs.

ObservabilityOperationsmonitoring
0 likes · 17 min read
How Test Teams Can Build Observability Beyond Traditional Monitoring
FunTester
FunTester
Sep 19, 2024 · Fundamentals

Software Antifragility: Rethinking Error Handling and Reliability

This paper introduces the concept of software antifragility, drawing on Taleb’s theory to argue that embracing errors through fault tolerance, automatic runtime repair, and fault injection can transform software systems into self‑improving, more robust entities, and discusses implications for development processes and product reliability.

antifragilitychaos engineeringfault tolerance
0 likes · 13 min read
Software Antifragility: Rethinking Error Handling and Reliability
Software Development Quality
Software Development Quality
Aug 12, 2024 · Information Security

How to Detect and Prevent Financial Losses in Banking Systems

This guide explains what capital loss means, outlines common financial loss scenarios, details a comprehensive testing methodology, presents real-world banking and insurance loss cases, and offers practical prevention measures to safeguard financial operations.

Fraud Preventionbanking systemsfinancial loss
0 likes · 9 min read
How to Detect and Prevent Financial Losses in Banking Systems
Ele.me Technology
Ele.me Technology
May 28, 2024 · Operations

Automated Mock for E2E Testing: Design and Implementation of Unmanned MOCK

Unmanned MOCK automatically generates intelligent, context‑aware mock responses for downstream services in end‑to‑end tests by collecting sub‑call data, extracting knowledge, and applying dynamic rules, so failures in downstream systems are isolated, raising test success rates toward near‑100 % without manual mock configuration.

Automated Testinge2eservice isolation
0 likes · 12 min read
Automated Mock for E2E Testing: Design and Implementation of Unmanned MOCK
Efficient Ops
Efficient Ops
Mar 25, 2024 · Operations

How CAICT’s SRE Standards Strengthen System Reliability and Continuity

This article outlines the rising frequency of system outages, explains the key characteristics and challenges of modern large‑scale distributed systems, introduces China’s CAICT SRE framework and its two‑part reliability model, showcases a successful SRE case, and announces the 2024 SRE maturity assessment program.

Digital GovernanceSREsoftware reliability
0 likes · 12 min read
How CAICT’s SRE Standards Strengthen System Reliability and Continuity
Tencent Cloud Developer
Tencent Cloud Developer
Jan 10, 2024 · Operations

The Challenges of Building Continuously Available Systems: Entropy, Murphy's Law, and the 'Divine Doctor Paradox'

Building continuously available systems in 2023 is hampered by entropy‑driven technical debt and Murphy’s Law failures, and the “Divine Doctor Paradox” shows that successful availability work goes unnoticed while blame follows any outage, making cultural commitment—not just technology—the essential solution.

Murphy's LawSRETechnical Debt
0 likes · 14 min read
The Challenges of Building Continuously Available Systems: Entropy, Murphy's Law, and the 'Divine Doctor Paradox'
Bilibili Tech
Bilibili Tech
Jan 5, 2024 · Cloud Native

ChangePilot: Bilibili’s Unified Change Management Platform and Practices

ChangePilot is Bilibili’s unified change‑management platform that standardizes change definition, lifecycle, and risk governance through a platform‑scenario model and five control levels (G0‑G4), offering built‑in checks, searchable records, subscription alerts, intelligent correlation, and emergency channels to boost production stability while maintaining operational efficiency.

SREchange managementrisk control
0 likes · 29 min read
ChangePilot: Bilibili’s Unified Change Management Platform and Practices
FunTester
FunTester
Oct 12, 2023 · Interview Experience

Master Performance Testing: Key Interview Questions & 12306 Crash Lessons

This article compiles essential performance testing interview questions, outlines a complete testing process with metrics and types, analyzes the 12306 ticketing system crash causes—including overload, bugs, security and network issues—and offers practical mitigation strategies for engineers.

12306 crashLoad TestingPerformance Testing
0 likes · 8 min read
Master Performance Testing: Key Interview Questions & 12306 Crash Lessons
DevOps Coach
DevOps Coach
Sep 21, 2023 · Operations

Why Observability Engineering Is Essential for Modern Software Systems

The article examines the concept of observability engineering, highlighting its importance for complex distributed systems, the cultural shift toward DevOps collaboration, key principles from the book “Observability Engineering,” and practical guidance for developers, SREs, managers, and executives to improve reliability, performance, and security.

Distributed Systemssoftware reliability
0 likes · 14 min read
Why Observability Engineering Is Essential for Modern Software Systems
FunTester
FunTester
Aug 11, 2023 · Operations

Essential Performance Testing Best Practices Every Engineer Should Follow

Performance testing is crucial for ensuring software reliability, and this guide outlines essential best practices—including setting clear goals, selecting appropriate tools, crafting maintainable scripts, using realistic data, running long‑duration loads, and scheduling regular tests—to help engineers achieve stable, high‑performing applications.

Load TestingOperationsPerformance Testing
0 likes · 8 min read
Essential Performance Testing Best Practices Every Engineer Should Follow
JD Tech
JD Tech
Jun 7, 2023 · Operations

Practical Guide to Achieving High Availability in Software Delivery

This article explains the concept of high availability, outlines the challenges of collaborative delivery, architectural design, coding practices, secure release, and deployment operations, and provides concrete steps, process standards, emergency plans, and self‑check tools to ensure reliable, fault‑tolerant software systems.

CollaborationDeploymentarchitecture
0 likes · 13 min read
Practical Guide to Achieving High Availability in Software Delivery
JD Retail Technology
JD Retail Technology
Mar 16, 2023 · Operations

Ensuring High Availability in Software: Collaboration, Architecture, Implementation, and Operational Practices

This article explains the concept of high availability, outlines the challenges of achieving it in complex software delivery chains, and provides practical guidance on improving collaboration efficiency, establishing process standards, designing robust architecture, implementing disciplined coding, executing safe releases, and maintaining operational safeguards.

CollaborationDeploymentarchitecture
0 likes · 11 min read
Ensuring High Availability in Software: Collaboration, Architecture, Implementation, and Operational Practices
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Mar 1, 2023 · Operations

Stability Quality Assurance: Definitions, Metrics, and Implementation Guide

This article explains the origins and meaning of software stability and stability testing, outlines key standards such as GB/T 16260 and industry definitions, and presents a comprehensive framework for stability quality assurance covering system elements, external disturbances, baseline setting, robust design, monitoring, and rapid incident response.

OperationsSREquality assurance
0 likes · 17 min read
Stability Quality Assurance: Definitions, Metrics, and Implementation Guide
dbaplus Community
dbaplus Community
May 11, 2022 · Backend Development

Mastering Failure‑Oriented Design: Mindset, Process, and Distributed Locks

This article explores the philosophy and practical techniques of failure‑oriented design, covering why anticipating failures is crucial for developers, the organizational and process changes needed, core design principles, and concrete implementations such as multi‑level Redis distributed locks with code examples.

Backend EngineeringOperationsdistributed-lock
0 likes · 23 min read
Mastering Failure‑Oriented Design: Mindset, Process, and Distributed Locks
DeWu Technology
DeWu Technology
Feb 28, 2022 · Operations

DeWu Tech Salon – Quality Assurance Sessions Summary

The DeWu Tech Salon, co‑hosted by DeWu App Quality Platform and TesterHome, brought senior engineers from Alibaba Cloud, ByteDance, Lagou and DeWu together to share practical QA insights on end‑side monitoring, traffic replay, full‑link stress testing, and industry‑scale chaos engineering, while announcing a PPT collection, a testing‑expert recruitment drive, and a preview of the next wireless‑technology salon.

Performance Monitoringchaos engineeringsoftware reliability
0 likes · 6 min read
DeWu Tech Salon – Quality Assurance Sessions Summary
DevOps
DevOps
May 10, 2021 · Backend Development

Automated Unit Test Generation for Exception Recall in C/C++ Services

This article presents a white‑box, unit‑test‑driven approach for automatically generating C/C++ test cases that detect and recall runtime stability issues, detailing problem analysis, solution design, code‑analysis, test‑data generation, code generation, failure analysis, and deployment results across large‑scale backend modules.

CTest Generationfuzzing
0 likes · 19 min read
Automated Unit Test Generation for Exception Recall in C/C++ Services
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 1, 2021 · Cloud Native

From Google to Ant: How He Zhengyu Built Ant’s Trusted Native Cloud Platform

This interview chronicles He Zhengyu’s journey from a prodigious student to a Google engineer and Ant Group leader, highlighting his role in shaping the Trusted Native initiative that combines cloud‑native, secure containers, confidential computing, and open‑source contributions to boost reliability and security for large‑scale financial services.

career adviceopen sourcesoftware reliability
0 likes · 15 min read
From Google to Ant: How He Zhengyu Built Ant’s Trusted Native Cloud Platform
DevOps Cloud Academy
DevOps Cloud Academy
Aug 27, 2020 · Cloud Native

Step-by-Step Guide to Building More Reliable Software with Kubernetes and DevOps

This article presents a practical, multi‑stage approach for improving software reliability in Kubernetes‑based microservice environments, covering static analysis, testing pyramids, CI/CD observability, performance testing, deployment strategies, and feedback loops to help engineering teams deliver faster, higher‑quality releases.

Cloud NativeDevOpsci/cd
0 likes · 11 min read
Step-by-Step Guide to Building More Reliable Software with Kubernetes and DevOps
21CTO
21CTO
Jun 18, 2019 · Operations

Why Embracing Failure Accelerates Growth: Lessons from Intuit and PayPal

The article explains how organizations can achieve rapid growth by openly acknowledging failures, creating lightweight post‑mortem processes, and continuously learning from mistakes, illustrated through Intuit’s SaaS transition, PayPal’s rollback challenges, and practical rules for QA and architecture.

QASaaSarchitecture
0 likes · 31 min read
Why Embracing Failure Accelerates Growth: Lessons from Intuit and PayPal
360 Tech Engineering
360 Tech Engineering
Jul 11, 2018 · Fundamentals

Static Program Analysis, Gödel’s Incompleteness, and the Halting Problem: Foundations of Software Reliability

This article explains how redundancy and voting schemes improve system reliability, introduces Gödel’s incompleteness and consistency concepts, describes the undecidable halting problem, and outlines static program analysis techniques—including data‑flow, inter‑procedural, pointer analysis, and constraint solving—while discussing practical heuristic rules and tools.

GödelSoftware Engineeringdecision problems
0 likes · 8 min read
Static Program Analysis, Gödel’s Incompleteness, and the Halting Problem: Foundations of Software Reliability
UCloud Tech
UCloud Tech
Mar 23, 2018 · Operations

How UCloud’s Application Hot‑Patch Framework Enables Zero‑Downtime Fixes

This article explains the design, components, and implementation of UCloud's application hot‑patch framework, covering its motivation, safety checks, multi‑thread support, and how the Creator, Loader, and Core Runtime work together to apply, manage, and roll back patches without restarting services.

ELFLinuxUCloud
0 likes · 13 min read
How UCloud’s Application Hot‑Patch Framework Enables Zero‑Downtime Fixes
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
May 22, 2015 · Industry Insights

How Facebook Cuts Power Use with Cold Storage: Inside Their Low‑Energy Data Center Design

This article examines Facebook's cold storage system, detailing how the company redesigned hardware and software to slash power consumption, improve reliability with Reed‑Solomon coding, mitigate bit‑rot, and balance loads while supporting massive photo archives in energy‑constrained data centers.

Data centerFacebookReed-Solomon
0 likes · 8 min read
How Facebook Cuts Power Use with Cold Storage: Inside Their Low‑Energy Data Center Design