Tag

reliability testing

0 views collected around this technical thread.

FunTester
FunTester
Mar 12, 2025 · Operations

Fault Injection Testing: Concepts, Scenarios, Process, and Best Practices

Fault injection testing deliberately introduces failures into a system to assess its resilience, helping identify weak points, improve retry and timeout mechanisms, and ensure robust operation across software, protocol, and infrastructure layers, with practical guidance on processes, tools, and Kubernetes-specific practices.

Chaos EngineeringFault InjectionKubernetes
0 likes · 8 min read
Fault Injection Testing: Concepts, Scenarios, Process, and Best Practices
Architects Research Society
Architects Research Society
Oct 5, 2023 · Fundamentals

Understanding Stability and Reliability Testing in Software Development

This article explains the definitions, objectives, importance, and types of stability and reliability testing in software development, highlighting how these tests improve system availability, reduce failure risk, and guide corrective actions to lower maintenance costs.

MTBFMTTRquality assurance
0 likes · 14 min read
Understanding Stability and Reliability Testing in Software Development
Bilibili Tech
Bilibili Tech
Nov 18, 2022 · Operations

Chaos Engineering and Fault Injection System Design: Principles, Implementation, and Practice

Chaos Engineering and Fault Injection System Design combine steady-state hypotheses, controlled blast-radius experiments, and a lightweight interceptor layer using gRPC and protobuf to inject and report faults in micro-service architectures, enabling continuous testing, rapid MTTR reduction, and resilient services through automated, real-time experimentation and analysis.

Chaos EngineeringFault InjectionGo
0 likes · 15 min read
Chaos Engineering and Fault Injection System Design: Principles, Implementation, and Practice
Architects Research Society
Architects Research Society
Sep 14, 2022 · Fundamentals

Understanding Stability and Reliability Testing in Software Development

This article explains the definitions, objectives, importance, and various types of stability and reliability testing—including stress, recovery, failover, and stability tests—while highlighting how these practices reduce system failures, improve MTBF/MTTR, and support informed decision‑making for software quality assurance.

MTBFMTTRperformance testing
0 likes · 11 min read
Understanding Stability and Reliability Testing in Software Development
Architects Research Society
Architects Research Society
Sep 11, 2022 · Cloud Native

Chaos Mesh: A Cloud‑Native Chaos Engineering Platform for Kubernetes

Chaos Mesh, a CNCF‑hosted cloud‑native chaos engineering platform, orchestrates fault injection experiments in Kubernetes through components like the Chaos Operator and Dashboard, supporting various CRD types such as DNSChaos, PodChaos, and NetworkChaos to simulate failures ranging from pod kills to network partitions.

Chaos EngineeringChaos MeshFault Injection
0 likes · 7 min read
Chaos Mesh: A Cloud‑Native Chaos Engineering Platform for Kubernetes
Bilibili Tech
Bilibili Tech
May 13, 2022 · Cloud Native

Chaos Engineering Practices for Bilibili Distributed KV Storage

Peng Liangyou describes how Bilibili’s large‑scale distributed KV storage adopts Netflix‑style chaos engineering—defining steady‑state hypotheses, replicating production environments, injecting CPU, memory, network and replica faults via automated “monkey” experiments, monitoring latency and durability with Prometheus/Grafana, and over 1.5 years preventing critical incidents while cutting testing costs and enabling incremental, standards‑based reliability improvements.

BilibiliChaos EngineeringDistributed Storage
0 likes · 15 min read
Chaos Engineering Practices for Bilibili Distributed KV Storage