Tagged articles
66 articles
Page 1 of 1
ByteDance Data Platform
ByteDance Data Platform
Feb 2, 2026 · Big Data

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

ByteDance’s StreamShield delivers a three‑layer resiliency framework—engine self‑healing, hybrid replication at the cluster level, and chaos‑tested releases—that enables over 70,000 concurrent Flink jobs on 11 million CPU cores to meet strict SLAs with second‑level startup and robust fault tolerance.

Apache FlinkByteDanceReal‑Time Computing
0 likes · 6 min read
How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale
Ray's Galactic Tech
Ray's Galactic Tech
Jan 27, 2026 · Backend Development

Resilient Go Microservices: Rate Limiting, Circuit Breaking & K8s Architecture

This guide walks you through implementing a complete stability engineering system for Go microservices—covering token‑bucket rate limiting, concurrency and Redis‑based throttling, circuit breakers with slow‑request detection, graceful degradation strategies, Kubernetes‑aware deployment, monitoring, dynamic configuration, and load‑testing to set safe thresholds.

Resiliencecircuit breakerrate limiting
0 likes · 10 min read
Resilient Go Microservices: Rate Limiting, Circuit Breaking & K8s Architecture
dbaplus Community
dbaplus Community
Dec 21, 2025 · Operations

5 Must‑Have Soft Skills for Ops Engineers to Future‑Proof Their Careers

In a rapidly changing tech landscape where Kubernetes and AI dominate, seasoned ops professionals share five core soft‑skill abilities—communication, problem solving, ownership, resilience, and continuous learning—that amplify technical expertise and drive promotions, salary growth, and long‑term career value.

Career DevelopmentResiliencecommunication
0 likes · 11 min read
5 Must‑Have Soft Skills for Ops Engineers to Future‑Proof Their Careers
21CTO
21CTO
Dec 3, 2025 · Operations

What My Biggest Developer Mistakes Taught Me About Operations and Resilience

A software engineer recounts three major mistakes—from accidentally deleting thousands of F5 URLs to leaking code externally and being laid off during COVID—highlighting how operational oversights, poor process controls, and personal resilience shape professional growth and underscore the value of empathy and systematic safeguards.

InfrastructureResiliencefailure
0 likes · 14 min read
What My Biggest Developer Mistakes Taught Me About Operations and Resilience
21CTO
21CTO
Nov 18, 2025 · Operations

What Cloudflare’s Latest Outage Reveals About Cloud Dependency Risks

A massive Cloudflare outage on November 18, 2025 crippled DNS and CDN services, causing widespread failures for platforms like ChatGPT and Discord, and the article analyzes the incident, past failures, and offers four practical resilience strategies to mitigate over‑reliance on single cloud providers.

CDNCloudflareDNS
0 likes · 7 min read
What Cloudflare’s Latest Outage Reveals About Cloud Dependency Risks
dbaplus Community
dbaplus Community
Nov 8, 2025 · Cloud Native

Why the 2025 AWS Outage Shows Kubernetes Is the Key to True Multi‑Cloud Resilience

The 2025 AWS us‑east‑1 outage exposed the fragility of single‑cloud architectures and demonstrates how Kubernetes can provide a cloud‑native abstraction that enables true multi‑cloud portability, faster CI/CD pipelines, and resilient, cost‑effective infrastructure for modern software development.

Cloud NativeInfrastructure as CodeResilience
0 likes · 10 min read
Why the 2025 AWS Outage Shows Kubernetes Is the Key to True Multi‑Cloud Resilience
Cognitive Technology Team
Cognitive Technology Team
Oct 12, 2025 · Backend Development

Resilient Microservices: Practical Patterns to Keep Your Services Alive

Learn how to tame chaotic microservices with practical resilience patterns—circuit breakers, bulkheads, smart retries, timeouts with fallbacks, and event‑driven messaging—plus tool recommendations and observability tips that ensure your system stays responsive even when individual services fail.

ObservabilityResilienceRetry
0 likes · 9 min read
Resilient Microservices: Practical Patterns to Keep Your Services Alive
Programmer DD
Programmer DD
Oct 10, 2025 · Artificial Intelligence

How to Build a Resilient Multi‑LLM Chatbot with Spring AI

This tutorial demonstrates how to integrate multiple large language models from different providers into a Spring Boot application using Spring AI, configure primary, secondary, and tertiary models, and implement a fallback mechanism with Spring Retry to ensure high availability of the chatbot.

JavaLLMResilience
0 likes · 12 min read
How to Build a Resilient Multi‑LLM Chatbot with Spring AI
IT Architects Alliance
IT Architects Alliance
Oct 2, 2025 · Cloud Native

Mastering Cloud‑Native Architecture: 6 Core Principles Every Engineer Should Know

This article outlines six fundamental cloud‑native architecture principles—immutable infrastructure, service mesh, observability, declarative APIs, resilient design, and shift‑left security—explaining their purpose, key practices, code examples, and how they interrelate to build scalable, reliable, and secure distributed systems.

Cloud NativeDeclarative APIObservability
0 likes · 11 min read
Mastering Cloud‑Native Architecture: 6 Core Principles Every Engineer Should Know
Java Architecture Diary
Java Architecture Diary
Jul 28, 2025 · Backend Development

How Spring Framework 7.0 Simplifies Retry and Concurrency with Built‑in Resilience

Spring Framework 7.0 introduces built‑in resilience annotations @Retryable and @ConcurrencyLimit, eliminating the need for external spring‑retry dependencies and enabling declarative retry, exponential backoff, and concurrency throttling—including reactive support—so developers can write cleaner, more robust Java backend services.

ConcurrencyLimitJavaResilience
0 likes · 7 min read
How Spring Framework 7.0 Simplifies Retry and Concurrency with Built‑in Resilience
Su San Talks Tech
Su San Talks Tech
Jul 13, 2025 · Backend Development

8 Proven Retry Strategies to Prevent Costly Failures in Distributed Systems

Discover why improper retry logic can cause massive financial losses, learn eight practical retry solutions—from simple loops to advanced Resilience4j and distributed lock techniques—and see how to avoid retry storms, ensure idempotency, and protect resources in high‑traffic backend services.

Distributed SystemsIdempotencyResilience
0 likes · 13 min read
8 Proven Retry Strategies to Prevent Costly Failures in Distributed Systems
Java One
Java One
Jul 12, 2025 · Backend Development

Mastering Alibaba Sentinel: Flow Control, Circuit Breaking, and Hotspot Rules in Production

This guide walks through Alibaba Sentinel's core protection strategies—flow‑control rules (including QPS and concurrency limits, modes, and effects), circuit‑breaker mechanisms (principles and three strategies), and hotspot parameter limiting—providing detailed configuration steps, code samples, and visual illustrations for real‑world microservice environments.

Alibaba SentinelFlow ControlHotspot Limiting
0 likes · 18 min read
Mastering Alibaba Sentinel: Flow Control, Circuit Breaking, and Hotspot Rules in Production
Ops Development & AI Practice
Ops Development & AI Practice
Jul 3, 2025 · Operations

Why Event-Driven Architecture Is the Secret Sauce for Resilient Ops

The article explains how Event‑Driven Architecture (EDA) transforms traditional request‑response systems into decoupled, asynchronous pipelines that boost system resilience, scalability, observability, and agility, and it demonstrates a practical AWS EventBridge image‑processing workflow.

AWS EventBridgeEDAEvent-Driven Architecture
0 likes · 10 min read
Why Event-Driven Architecture Is the Secret Sauce for Resilient Ops
DaTaobao Tech
DaTaobao Tech
Apr 28, 2025 · Frontend Development

Front‑End Architecture and Performance Optimization for a Large‑Scale Chinese New Year Interactive Activity

The article details a large‑scale Chinese New Year interactive activity’s front‑end architecture, describing a layered system for business logic, data abstraction, and animation engines, unified data handling, dynamic animation rendering with downgrade paths, high‑concurrency QPS reduction, resilience measures, and extensive performance and workflow optimizations.

Data ManagementResilienceanimation
0 likes · 15 min read
Front‑End Architecture and Performance Optimization for a Large‑Scale Chinese New Year Interactive Activity
Su San Talks Tech
Su San Talks Tech
Apr 27, 2025 · Backend Development

Mastering Microservices: Advantages, Challenges, and Essential Design Patterns

This article explains what microservices are, outlines their key advantages such as scalability and resilience, details the inherent challenges like complexity and security, and introduces essential design patterns—including Database‑Per‑Service, API Gateway, BFF, CQRS, Event Sourcing, Saga, Sidecar, Circuit Breaker, ACL, and Aggregator—to help architects build robust, maintainable systems.

Backend ArchitectureCloud NativeMicroservices
0 likes · 23 min read
Mastering Microservices: Advantages, Challenges, and Essential Design Patterns
FunTester
FunTester
Mar 31, 2025 · Operations

Performance Testing and Fault Testing: Complementary Pillars for System Stability

The article explains how performance testing measures system efficiency under load while fault testing validates resilience under abnormal conditions, highlighting their shared goals, differences, overlapping toolchains, and how their combined use drives architecture optimization and improves service level agreements in modern complex software systems.

Fault InjectionLoad TestingOperations
0 likes · 14 min read
Performance Testing and Fault Testing: Complementary Pillars for System Stability
FunTester
FunTester
Mar 25, 2025 · Operations

Integrating Chaos Engineering into Service Dependency Governance for Resilient Cloud‑Native Systems

This article explores how to embed chaos engineering practices into service dependency governance, detailing dynamic validation versus static analysis, fault injection techniques, multi‑point failure simulations, and data‑driven optimizations to build robust, self‑healing microservice architectures in cloud‑native environments.

Cloud NativeMicroservicesOperations
0 likes · 18 min read
Integrating Chaos Engineering into Service Dependency Governance for Resilient Cloud‑Native Systems
FunTester
FunTester
Mar 7, 2025 · Operations

Fault Testing: Proactive Resilience Engineering for Distributed Systems

Fault testing, akin to a shield, deliberately injects failures into distributed and cloud‑native systems to expose weak points, verify recovery mechanisms, and improve overall reliability, ensuring business continuity even under unexpected disruptions.

OperationsResiliencechaos engineering
0 likes · 11 min read
Fault Testing: Proactive Resilience Engineering for Distributed Systems
Architect
Architect
Jan 25, 2025 · Backend Development

HTTP Retry Strategies in Offline Store Systems: Simple Loop, Apache HttpClient, and MQ‑Based Asynchronous Retries

This article explores practical HTTP retry solutions for offline store applications, covering a basic loop retry, the built‑in retry mechanism of Apache HttpClient with custom handlers, and an asynchronous retry approach using message queues to achieve higher reliability and eventual consistency.

Apache HttpClientBackendHTTP
0 likes · 12 min read
HTTP Retry Strategies in Offline Store Systems: Simple Loop, Apache HttpClient, and MQ‑Based Asynchronous Retries
JavaEdge
JavaEdge
Oct 21, 2024 · Operations

Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture

This article explores the advantages of unitized architecture over traditional microservices, detailing how its modular design, dedicated routing layer, and tailored observability practices enhance system resilience, fault‑tolerance, and operational insight for large‑scale distributed applications.

Distributed SystemsResiliencefault tolerance
0 likes · 17 min read
Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture
dbaplus Community
dbaplus Community
Oct 3, 2024 · Operations

How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems

This article explains Netflix's chaos engineering practice, detailing the challenges of microservice reliability, the implementation of the Chaos Monkey tool, the step‑by‑step methodology, guiding principles, and real‑world outcomes that demonstrate improved system availability.

Chaos MonkeyDistributed SystemsNetflix
0 likes · 6 min read
How Netflix Uses Chaos Engineering to Build Resilient Distributed Systems
JavaEdge
JavaEdge
Aug 13, 2024 · Backend Development

How to Use Circuit Breakers to Decouple Event Retrieval in Microservices

This article explains why tightly coupled request/response communication can overload downstream services, introduces the circuit‑breaker pattern (including its three‑state state machine), and shows step‑by‑step how to integrate a circuit breaker into event‑driven microservices to pause event retrieval, handle state transitions, and avoid dead‑letter queues.

Resiliencecircuit breaker
0 likes · 9 min read
How to Use Circuit Breakers to Decouple Event Retrieval in Microservices
Architect
Architect
Dec 22, 2023 · Operations

How Tencent Search Built a Multi‑Layered Stability Architecture to Slash MTTD and MTTR

The article details Tencent Search’s end‑to‑end stability engineering practice, covering a ten‑step architecture that combines redundancy, proactive detection, rapid emergency response, automated cut‑over, defensive caching, and continuous drills, and shows how these measures collectively reduced mean‑time‑to‑detect and mean‑time‑to‑recover by an order of magnitude while keeping service availability high.

ObservabilityResiliencearchitecture
0 likes · 32 min read
How Tencent Search Built a Multi‑Layered Stability Architecture to Slash MTTD and MTTR
政采云技术
政采云技术
Nov 29, 2023 · Frontend Development

API Failure Resilience Using CDN and IndexedDB Caching

The article presents a comprehensive strategy for handling API outages by storing data locally with IndexedDB, synchronizing updates through a CDN, and implementing Axios interceptors and Node‑based scheduled jobs to ensure seamless user experience without white‑screen failures.

APICDNIndexedDB
0 likes · 12 min read
API Failure Resilience Using CDN and IndexedDB Caching
Architects Research Society
Architects Research Society
Oct 3, 2023 · Cloud Native

Chaos Engineering: Concepts, History, Benefits, Challenges, and Getting Started

Chaos engineering is a disciplined approach to testing distributed systems by intentionally injecting failures to verify resilience, covering its definition, origins at Netflix, operational workflow, benefits, challenges, and practical steps for organizations to adopt resilient cloud‑native applications.

ObservabilityResiliencechaos engineering
0 likes · 18 min read
Chaos Engineering: Concepts, History, Benefits, Challenges, and Getting Started
MaGe Linux Operations
MaGe Linux Operations
Jun 24, 2023 · Backend Development

12 Essential Microservice Patterns to Boost Scalability and Resilience

This article explains why microservice architecture matters and walks software engineers through twelve core design patterns—such as API Gateway, Service Discovery, Circuit Breaker, and Strangler—that together improve system scalability, fault‑tolerance, performance, and maintainability.

MicroservicesResilienceScalability
0 likes · 17 min read
12 Essential Microservice Patterns to Boost Scalability and Resilience
ByteDance SYS Tech
ByteDance SYS Tech
Feb 28, 2023 · Cloud Native

How ByteDance’s ARES Boosts Cloud‑Native Resilience with Chaos Engineering

This article explains ByteDance’s end‑to‑end chaos engineering practice for cloud‑native environments, covering its background, principles, comparison with traditional testing, the evolution of its internal platforms, and a detailed look at the Application Resilience Enhancement Service (ARES) and its core features.

Fault InjectionKubernetesMicroservices
0 likes · 17 min read
How ByteDance’s ARES Boosts Cloud‑Native Resilience with Chaos Engineering
Architects Research Society
Architects Research Society
Oct 10, 2022 · R&D Management

Future‑Ready CIO Leadership: Insights from Three Executives

The article explores how business‑driven CIOs are updating their leadership playbooks for the future of work, emphasizing adaptability, resilience, proactive problem‑solving, and a people‑first culture, based on interviews with CIOs from GEHA Health, Panera Bread, and Novant Health.

AdaptabilityCIODigitalTransformation
0 likes · 10 min read
Future‑Ready CIO Leadership: Insights from Three Executives
IT Architects Alliance
IT Architects Alliance
Jun 20, 2022 · Cloud Native

Building Resilient Microservices: Fault Tolerance, Graceful Degradation, and Reliability Patterns

This article explains how microservice architectures can achieve high availability by using fault‑tolerant designs such as graceful degradation, health checks, failover caching, circuit breakers, bulkheads, rate limiting, and systematic change‑management practices to mitigate network, hardware, and application errors.

MicroservicesResiliencecircuit breaker
0 likes · 13 min read
Building Resilient Microservices: Fault Tolerance, Graceful Degradation, and Reliability Patterns
Laiye Technology Team
Laiye Technology Team
Jun 10, 2022 · Backend Development

Understanding System Failures and Principles for Resilient Architecture

The article analyzes why modern software systems repeatedly collapse—due to growing business complexity, unpredictable changes, and architectural decay—and proposes principles such as decentralization, integration, and diversity, along with practical strategies like service mesh and eBPF, to design more sustainable, observable, and self‑evolving architectures.

Distributed SystemsMicroservicesResilience
0 likes · 12 min read
Understanding System Failures and Principles for Resilient Architecture
IT Architects Alliance
IT Architects Alliance
May 28, 2022 · Operations

Why Circuit Breaking and Degradation Are Essential for High‑Availability Microservices

The article explains how microservice architectures can suffer from cascading failures, why circuit breaking and degradation are critical for protecting service availability, compares popular libraries such as Sentinel, Hystrix and Resilience4j, and dives deep into Sentinel's degradation implementation, rule definition, data collection, verification, and execution flow.

Circuit BreakingMicroservicesResilience
0 likes · 12 min read
Why Circuit Breaking and Degradation Are Essential for High‑Availability Microservices
DevOps
DevOps
May 18, 2022 · Operations

Understanding and Preventing Cascading Failures in Distributed Systems

The article explains how cascading failures arise from positive feedback loops in distributed systems, illustrates real‑world incidents such as the 2015 DynamoDB outage, outlines anti‑patterns like unlimited retries and unchecked load, and presents practical mitigation techniques including load‑shedding, circuit breakers, exponential back‑off, and controlled replication to improve system resilience.

Distributed SystemsResilienceSRE
0 likes · 19 min read
Understanding and Preventing Cascading Failures in Distributed Systems
Su San Talks Tech
Su San Talks Tech
Mar 14, 2022 · Backend Development

Master OpenFeign: From Basics to Advanced Timeout, Logging, and Resilience

This tutorial walks you through OpenFeign in Spring Cloud, explaining its purpose, differences from Feign, setup steps, various parameter passing methods, timeout handling, logging enhancement, HTTP client replacement, GZIP compression, and circuit‑breaker integration with Sentinel, all illustrated with code snippets and diagrams.

JavaMicroservicesOpenFeign
0 likes · 19 min read
Master OpenFeign: From Basics to Advanced Timeout, Logging, and Resilience
IT Architects Alliance
IT Architects Alliance
Mar 13, 2022 · Operations

30 Essential Architecture Patterns for Scalable, Resilient Systems

This article presents a comprehensive catalog of thirty architectural patterns—including management, monitoring, performance, data management, design, messaging, resilience, and security modes—explaining their purpose, typical use cases, benefits, and implementation considerations to help engineers build robust, high‑performance distributed applications.

Architecture PatternsOperationsResilience
0 likes · 32 min read
30 Essential Architecture Patterns for Scalable, Resilient Systems
IT Architects Alliance
IT Architects Alliance
Mar 10, 2022 · Backend Development

Building Resilient Microservices: Patterns and Practices for High Availability

This article explains the risks of microservice architectures and presents a collection of reliability patterns—including graceful degradation, change management, health checks, self‑healing, failover caching, retries, rate limiting, bulkheads, and circuit breakers—to help engineers design and operate highly available backend services.

BackendMicroservicesResilience
0 likes · 17 min read
Building Resilient Microservices: Patterns and Practices for High Availability
Java High-Performance Architecture
Java High-Performance Architecture
Jul 1, 2021 · R&D Management

How a Self‑Funded Small Team Built a $1M ARR Cross‑Platform Email Client

This article recounts how Missive’s four‑person, self‑funded team overcame technical and market challenges to create a cloud‑based, cross‑platform email client that reached $1 million ARR, highlighting funding strategy, team roles, architecture decisions, customer acquisition, and the importance of resilience.

Product DevelopmentResilienceemail client
0 likes · 10 min read
How a Self‑Funded Small Team Built a $1M ARR Cross‑Platform Email Client
Top Architect
Top Architect
May 24, 2021 · Backend Development

Understanding Hystrix: Service Isolation, Circuit Breaking, and Monitoring in Spring Cloud

This article explains why Hystrix is needed for fault tolerance in distributed systems, describes its key features such as circuit breaking, thread and semaphore isolation, fallback mechanisms, request collapsing, and monitoring, and provides step‑by‑step configuration examples and code snippets for integrating Hystrix into Spring Cloud microservices.

HystrixResiliencecircuit-breaker
0 likes · 18 min read
Understanding Hystrix: Service Isolation, Circuit Breaking, and Monitoring in Spring Cloud
Yang Money Pot Technology Team
Yang Money Pot Technology Team
May 18, 2021 · Backend Development

Understanding Hystrix: Resilience Patterns, Execution Flow, and Custom Extensions

This article explains how Hystrix implements resiliency patterns such as bulkhead, circuit breaker, retry, and degradation for microservice calls, details its execution workflow, core components, dynamic configuration, isolation strategies, metrics collection, and practical usage, and discusses future alternatives and extensions.

BackendCircuitBreakerDistributedSystems
0 likes · 33 min read
Understanding Hystrix: Resilience Patterns, Execution Flow, and Custom Extensions
Architects Research Society
Architects Research Society
Apr 30, 2021 · Operations

Health Management and Diagnostics in Microservices

The article explains how microservices can achieve resilience through health reporting, diagnostics, standardized logging, health‑check implementations, and orchestrator coordination to detect failures, restart services, handle upgrades, and recover from partial cloud‑based failures.

ObservabilityOrchestrationResilience
0 likes · 9 min read
Health Management and Diagnostics in Microservices
Wukong Talks Architecture
Wukong Talks Architecture
Oct 28, 2020 · Operations

From the Battle of Red Cliffs to Service Avalanche: Understanding Circuit Breaker and Resilience in Microservices

This article uses the historic Battle of Red Cliffs as an analogy to explain service avalanche in micro‑service architectures, analyzes its causes, presents real‑world scenarios, and details circuit‑breaker concepts, algorithms, recovery strategies, and practical mitigation techniques.

ResilienceService Avalanchecircuit breaker
0 likes · 10 min read
From the Battle of Red Cliffs to Service Avalanche: Understanding Circuit Breaker and Resilience in Microservices
Architects' Tech Alliance
Architects' Tech Alliance
Oct 12, 2020 · Operations

Designing Resilient Microservices: Patterns for Fault Tolerance and Failure Management

This article examines the inherent risks of microservice architectures and presents practical patterns—such as graceful degradation, change management, health checks, self‑healing, fallback caching, retries, rate limiting, bulkheads, and circuit breakers—to build highly available, fault‑tolerant services.

MicroservicesResiliencebulkhead
0 likes · 15 min read
Designing Resilient Microservices: Patterns for Fault Tolerance and Failure Management
Meituan Technology Team
Meituan Technology Team
Sep 30, 2020 · Information Security

Security Control Algorithms for Cyber‑Physical Systems

Professor Mo Yilin explained that securing cyber‑physical systems—such as autonomous vehicles and smart grids—requires a multi‑layered approach combining control‑theoretic redundancy, active watermark‑based intrusion detection, resilient estimation, and data‑driven design to maintain safe operation despite networked attacks and replay threats, ensuring reliability of critical infrastructure.

ResilienceSecuritycontrol algorithms
0 likes · 25 min read
Security Control Algorithms for Cyber‑Physical Systems
Java Architect Essentials
Java Architect Essentials
Aug 26, 2020 · Backend Development

A Comprehensive Guide to Evolving a Monolithic Online Store into a Robust Microservice Architecture

This article walks through the transformation of a simple online supermarket from a monolithic design to a fully fledged microservice system, explaining the motivations, architectural changes, component selection, common pitfalls, and best‑practice solutions such as service decomposition, database sharding, monitoring, tracing, service mesh, resilience patterns, and testing strategies.

MicroservicesResiliencearchitecture
0 likes · 22 min read
A Comprehensive Guide to Evolving a Monolithic Online Store into a Robust Microservice Architecture
Efficient Ops
Efficient Ops
Mar 10, 2020 · Operations

How to Build Anti‑Fragile Operations in the Cloud Era

This article explains the anti‑fragility concept, illustrates how cloud‑based systems become increasingly vulnerable to unexpected events, and offers practical strategies—including risk reduction, choice diversification, proactive experimentation, and biologically inspired resilience—to transform operations and turn shocks into opportunities.

DevOpsOperationsResilience
0 likes · 19 min read
How to Build Anti‑Fragile Operations in the Cloud Era
JD Retail Technology
JD Retail Technology
Mar 5, 2020 · Backend Development

Technical Implementation and Resilience Practices of JD.com PC Homepage

This article details the architectural redesign, fault‑tolerance mechanisms, performance optimizations, and monitoring strategies employed in JD.com’s PC homepage, illustrating how backend technologies such as OpenResty, Lua, Redis, and NGINX are orchestrated to achieve high availability and sub‑30 ms page loads.

Backend DevelopmentLuaOpenResty
0 likes · 12 min read
Technical Implementation and Resilience Practices of JD.com PC Homepage
Wukong Talks Architecture
Wukong Talks Architecture
Apr 27, 2019 · Backend Development

Implementing a Circuit Breaker Mechanism for Backend API Calls

This article explains a practical circuit‑breaker design for backend services, detailing detection logic, algorithm thresholds, time‑window statistics, recovery duration, manual overrides, a global switch, and how to monitor the breaker’s current state using Redis.

APIResiliencecircuit breaker
0 likes · 6 min read
Implementing a Circuit Breaker Mechanism for Backend API Calls
Wukong Talks Architecture
Wukong Talks Architecture
Apr 24, 2019 · Backend Development

Circuit Breaker Mechanism: Detection, Algorithm, Time Window, Duration, Manual Trigger, Global Switch, and Monitoring

This article explains a project's circuit breaker implementation, covering detection steps, the algorithm based on request count and failure rate, time‑window statistics, recovery duration, manual activation, a global enable switch, and how to monitor its current state.

Resiliencecircuit breakerfailure rate
0 likes · 5 min read
Circuit Breaker Mechanism: Detection, Algorithm, Time Window, Duration, Manual Trigger, Global Switch, and Monitoring
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 28, 2019 · Operations

How ChaosBlade Empowers You to Build Resilient Cloud‑Native Systems

ChaosBlade is an open‑source chaos engineering tool from Alibaba that lets you repeatedly inject failures into distributed systems, helping you measure fault tolerance, validate orchestration, test platform robustness, verify monitoring alerts, and improve emergency response capabilities for more reliable cloud‑native applications.

DevOpsDistributed SystemsResilience
0 likes · 9 min read
How ChaosBlade Empowers You to Build Resilient Cloud‑Native Systems
High Availability Architecture
High Availability Architecture
Sep 12, 2018 · Backend Development

Circuit Breaker and Retry Mechanisms in Microservices with Hystrix‑Go

This article explains the principles and operation of circuit breakers and retry mechanisms in microservice architectures, describes their three states, key configuration parameters, demonstrates a Hystrix‑Go implementation, and discusses back‑off strategies and the combined use of both techniques for resilient backend services.

MicroservicesResiliencecircuit breaker
0 likes · 7 min read
Circuit Breaker and Retry Mechanisms in Microservices with Hystrix‑Go
DevOps
DevOps
May 7, 2018 · Cloud Computing

Netflix’s Journey: From DVD Rental to Cloud‑Native Chaos Engineering on AWS

This article chronicles Netflix’s evolution from a DVD‑rental startup to a cloud‑native streaming giant, highlighting its partnership with AWS, the development of chaos‑engineering tools like Chaos Monkey and the Simian Army, and the open‑source technologies that underpin its resilient, scalable architecture.

AWSNetflixResilience
0 likes · 14 min read
Netflix’s Journey: From DVD Rental to Cloud‑Native Chaos Engineering on AWS
DevOpsClub
DevOpsClub
May 1, 2018 · Cloud Computing

How Netflix Uses Chaos Monkey and AWS to Build Resilient Cloud Services

The article traces Netflix’s evolution from DVD rentals to a cloud‑native streaming giant, explains how it leverages AWS for massive scale, and details its chaos‑engineering tools—Chaos Monkey, Simian Army, and related monkeys—that continuously test and improve system resilience.

AWSDevOpsNetflix
0 likes · 13 min read
How Netflix Uses Chaos Monkey and AWS to Build Resilient Cloud Services
dbaplus Community
dbaplus Community
Oct 23, 2017 · Databases

How eBay Builds Resilient Multi‑Data‑Center Applications with MongoDB

The article explains eBay's use of MongoDB to create highly available, fault‑tolerant multi‑data‑center architectures, detailing design patterns, replica set configurations, read/write strategies, and recent MongoDB features that enable scalable, mission‑critical applications.

Database designMongoDBMulti-Data Center
0 likes · 8 min read
How eBay Builds Resilient Multi‑Data‑Center Applications with MongoDB