Tagged articles

fault tolerance

317 articles · Page 1 of 4
FunTester
FunTester
Jul 1, 2026 · Operations

When One Timeout Triggers a Platform‑Wide Outage

The article explains how unbounded retries, replication fan‑out, and naïve autoscaling can amplify a single timeout into a cascade of failures, and it proposes bounded retry policies, load‑aware scaling, and layered persistence as safeguards for reliable API‑centric systems.

autoscalingbounded retriesdistributed systems
0 likes · 12 min read
When One Timeout Triggers a Platform‑Wide Outage
Lobster Programming
Lobster Programming
Jun 1, 2026 · Backend Development

How ZooKeeper Implements Distributed Locks: Mechanism and Pitfalls

The article explains ZooKeeper's herd effect, how temporary sequential nodes and chain watching reduce notification storms, how client failures are handled, and why most projects use Curator to simplify fault‑tolerant distributed lock implementations.

Chain WatchingCuratorDistributed Lock
0 likes · 5 min read
How ZooKeeper Implements Distributed Locks: Mechanism and Pitfalls
IT Learning Made Simple
IT Learning Made Simple
May 31, 2026 · Backend Development

What Journey to the West Teaches About Distributed System Architecture

Using the classic tale Journey to the West, the article maps each disciple to a microservice, explains the shift from monolith to microservices, and illustrates service governance, load balancing, service discovery, fault tolerance, and distributed transactions through vivid analogies and concrete examples.

MicroservicesService Governancedistributed systems
0 likes · 7 min read
What Journey to the West Teaches About Distributed System Architecture
Java Tech Workshop
Java Tech Workshop
May 31, 2026 · Backend Development

Spring Boot Service Circuit Breaking and Degradation with Sentinel: A Practical Guide

This article explains how microservice architectures suffer from cascading failures and demonstrates how to use Sentinel for rate limiting, circuit breaking, and degradation—including architecture, configuration, code examples, and best‑practice tips—to achieve high‑availability Spring Boot services.

SentinelSpring Bootcircuit breaking
0 likes · 16 min read
Spring Boot Service Circuit Breaking and Degradation with Sentinel: A Practical Guide
FunTester
FunTester
May 21, 2026 · Artificial Intelligence

How Anthropic Solves Agent Forgetfulness with Event Persistence

The article explains why in‑memory state is unreliable for long‑running or parallel agents, defines event persistence, shows how persisted event records enable checkpoint‑restart, observability, and experience extraction, and outlines practical guidelines for what to record.

AIAgentObservability
0 likes · 10 min read
How Anthropic Solves Agent Forgetfulness with Event Persistence
Coder Trainee
Coder Trainee
May 18, 2026 · Cloud Native

Spring Cloud Microservices Tutorial – Sentinel for Fault Tolerance and Rate Limiting

This article walks through adding Alibaba Sentinel to a Spring Cloud microservice suite to protect against service outages, traffic spikes, and slow calls by configuring rate limiting, circuit breaking, and fallback mechanisms across user, order, and gateway services, with full Docker‑compose setup and testing steps.

FeignMicroservicesSentinel
0 likes · 14 min read
Spring Cloud Microservices Tutorial – Sentinel for Fault Tolerance and Rate Limiting
Architect's Guide
Architect's Guide
May 13, 2026 · Big Data

Next‑Gen Visual Drag‑Drop Data Flow Platform: Features, Architecture, and Performance

The article introduces a visual drag‑and‑drop data flow platform that unifies stream and batch processing, offers version control, automatic fault tolerance, configurable data permissions, comprehensive monitoring, data alignment, and query templates, and presents single‑instance performance benchmarks of over 30k and 60k ops/s.

Data AlignmentData FlowDrag-and-Drop
0 likes · 7 min read
Next‑Gen Visual Drag‑Drop Data Flow Platform: Features, Architecture, and Performance
IT Services Circle
IT Services Circle
Apr 30, 2026 · Backend Development

How a Single Front‑end Change Dragged Four Backend Teams – The BFF Solution

A tiny UI tweak that required a meeting with four backend groups exposed the pain of calling many micro‑services from the front‑end, and the article shows how introducing a Backend‑For‑Frontend (BFF) layer can aggregate, transform, and simplify those calls while improving reliability and performance.

API aggregationBFFBackend For Frontend
0 likes · 21 min read
How a Single Front‑end Change Dragged Four Backend Teams – The BFF Solution
Golang Shines
Golang Shines
Apr 28, 2026 · Backend Development

Essential Go Packages for Production Environments

This article compiles a curated list of production‑ready Go packages covering testing, logging, error handling, caching, databases, HTTP routing, HTTP clients, fault tolerance, Kafka, and various utility libraries, explaining their key features, concrete code examples, and why they are preferred in real‑world services.

CachingGoHTTP
0 likes · 15 min read
Essential Go Packages for Production Environments
Coder Trainee
Coder Trainee
Apr 27, 2026 · Cloud Native

Spring Cloud Microservices Practice #6: Sentinel for Service Fault Tolerance and Rate Limiting

This article explains why service fault tolerance is essential in micro‑service architectures, compares Sentinel with Hystrix and Resilience4j, and provides step‑by‑step guidance on integrating Sentinel for circuit breaking, QPS and concurrency limiting, hot‑parameter control, system protection, and dynamic rule management with Nacos.

MicroservicesSentinelcircuit breaking
0 likes · 14 min read
Spring Cloud Microservices Practice #6: Sentinel for Service Fault Tolerance and Rate Limiting
Architecture and Beyond
Architecture and Beyond
Apr 25, 2026 · Artificial Intelligence

Practical Insights on Recent AI Engineering Deployments

The article examines how large language models function as probabilistic components within deterministic software, discusses fault‑tolerance limits for viable AI use cases, and offers detailed engineering guidance on RAG pipelines, tool‑calling determinism, agent fragility, testing, monitoring, and privacy‑conscious deployment in finance.

AI EngineeringLLMRAG
0 likes · 14 min read
Practical Insights on Recent AI Engineering Deployments
AI Tech Publishing
AI Tech Publishing
Apr 21, 2026 · Artificial Intelligence

Why Your AI Agent Stays a Toy: Six Production‑Readiness Gaps and How to Bridge Them

Moving an AI agent from a controlled demo to an unattended production environment introduces six critical gaps—fault handling, state persistence, observability, credential security, cost control, and human supervision—each requiring specific infrastructure, practices, and a comprehensive readiness checklist to avoid costly failures.

AI AgentsObservabilitycost management
0 likes · 15 min read
Why Your AI Agent Stays a Toy: Six Production‑Readiness Gaps and How to Bridge Them
JD Tech
JD Tech
Apr 15, 2026 · Artificial Intelligence

How OpenClaw Powers Multi‑Channel AI Agents with Skills and Sub‑Agents

The article provides an in‑depth analysis of OpenClaw’s architecture, explaining why it was created, its layered design, the core ReAct loop, the Skill system, sub‑agent creation and management, fault‑tolerance mechanisms, tool policies, and how it extends the pi‑mono engine to support robust, multi‑channel AI agents.

AI AgentsOpenClawReAct loop
0 likes · 20 min read
How OpenClaw Powers Multi‑Channel AI Agents with Skills and Sub‑Agents
AI Insight Log
AI Insight Log
Apr 8, 2026 · Artificial Intelligence

Anthropic Blocks Third‑Party Agents, Then Launches Claude Managed Agents to Disrupt the Startup Scene

Anthropic’s Claude Managed Agents is a hosted platform that offers sandboxed execution, long‑running sessions, multi‑agent coordination, MCP integration and immutable session persistence, delivering up to 90% latency reduction and fault‑tolerant design, while early adopters like Notion, Rakuten, Asana and Sentry showcase real‑world production use.

AI agent orchestrationAnthropicClaude Managed Agents
0 likes · 7 min read
Anthropic Blocks Third‑Party Agents, Then Launches Claude Managed Agents to Disrupt the Startup Scene
AI Architecture Hub
AI Architecture Hub
Feb 25, 2026 · Artificial Intelligence

How OpenClaw Turns AI Agents into Production‑Ready Infrastructure

This article analyzes OpenClaw’s engineering‑focused architecture, detailing its three‑layer component boundaries, gateway‑centric session management, concurrency controls, fault‑self‑healing mechanisms, context handling, multi‑agent routing, and practical deployment scenarios for building stable, auditable AI agent systems.

AI AgentsOpenClawfault tolerance
0 likes · 20 min read
How OpenClaw Turns AI Agents into Production‑Ready Infrastructure
Amap Tech
Amap Tech
Feb 3, 2026 · Artificial Intelligence

Building a Scalable AI Agent Smart Task Framework for Offline & Event‑Driven Use

After LLMs entered the deep‑water stage, developers realized that agents must go beyond passive Q&A to support asynchronous, long‑running, and subscribable tasks; this article details the design, architecture, and engineering challenges of the “Xiao Gao Teacher AI Agent” smart‑task system, from event‑driven logic to fault‑tolerant deployment.

AI AgentEvent-Driven ArchitectureLLM
0 likes · 19 min read
Building a Scalable AI Agent Smart Task Framework for Offline & Event‑Driven Use
Architect's Journey
Architect's Journey
Dec 3, 2025 · Cloud Native

Microservice Governance Guide: From Stable Operations to Maximum Efficiency

This comprehensive guide breaks down microservice governance into four pillars—node management, load balancing, routing, and fault tolerance—providing concrete configurations, algorithm choices, and service‑mesh recommendations to achieve 99.99% availability, cut wasted resources by over 30%, and halve iteration cycles.

GovernanceMicroservicesRouting
0 likes · 16 min read
Microservice Governance Guide: From Stable Operations to Maximum Efficiency
Architect's Journey
Architect's Journey
Dec 1, 2025 · Backend Development

Designing Three‑High Systems: Practical Performance Tuning and Fault‑Tolerant Architecture

The article breaks down the design logic and implementation steps for high‑performance, high‑concurrency, and high‑availability systems, covering bottleneck identification, read/write optimization, three‑dimensional scaling, and concrete fault‑tolerance strategies to build resilient, scalable services.

High AvailabilityHigh concurrencyfault tolerance
0 likes · 15 min read
Designing Three‑High Systems: Practical Performance Tuning and Fault‑Tolerant Architecture
JD Tech
JD Tech
Sep 26, 2025 · Operations

Avoiding High‑Availability Pitfalls: Real‑World JD Lessons and Solutions

This article examines common high‑availability challenges across applications, databases, caches, message queues, containers, and GC, presenting real JD engineering cases, root‑cause analyses, and practical mitigation strategies to help engineers design more resilient systems.

High AvailabilityMessage QueueRedis
0 likes · 37 min read
Avoiding High‑Availability Pitfalls: Real‑World JD Lessons and Solutions
Ops Community
Ops Community
Sep 17, 2025 · Operations

Mastering System Fault Tolerance: From Theory to Production‑Ready High‑Availability

This comprehensive guide explores the philosophy, core patterns, and practical techniques for designing fault‑tolerant, highly available systems, covering circuit breakers, retries, rate limiting, monitoring, cloud‑native deployment, and real‑world case studies to help engineers build resilient production architectures.

Cloud NativeHigh Availabilitycircuit breaker
0 likes · 24 min read
Mastering System Fault Tolerance: From Theory to Production‑Ready High‑Availability
Efficient Ops
Efficient Ops
Sep 9, 2025 · Fundamentals

Inside 3FS: How Distributed File Systems Hide Complexity and Scale

3FS is an open‑source distributed file system that abstracts multiple machines into a single namespace, offering massive scalability, fault tolerance, and high throughput through components like Meta, Mgmtd, Storage, and Client, and leveraging the CRAQ protocol for strong consistency and efficient reads and writes.

3FSCRAQDistributed File System
0 likes · 12 min read
Inside 3FS: How Distributed File Systems Hide Complexity and Scale
NiuNiu MaTe
NiuNiu MaTe
Sep 4, 2025 · Operations

Mastering Multi‑Active Distributed Systems: From Single Server to Global Fault Tolerance

This article walks developers through the evolution of distributed system architectures—from single‑machine deployments to master‑slave, same‑city active‑active, and finally true multi‑active setups—explaining core concepts, replication strategies, conflict resolution, fault detection, switch mechanisms, recovery methods, and interview tips for high‑availability design.

CAP theoremData Replicationdistributed systems
0 likes · 26 min read
Mastering Multi‑Active Distributed Systems: From Single Server to Global Fault Tolerance
JD Tech Talk
JD Tech Talk
Sep 4, 2025 · Operations

Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions

This article analyzes the multi‑dimensional challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—by sharing real JD engineering scenarios, common failure patterns, and concrete mitigation strategies to help engineers design more resilient services.

High Availabilitybackenddistributed systems
0 likes · 36 min read
Avoid Common High‑Availability Pitfalls: Real‑World JD Practices and Solutions
JD Cloud Developers
JD Cloud Developers
Sep 4, 2025 · Operations

Mastering High‑Availability: JD Real‑World Pitfalls & Fixes for Apps, DBs, Cache & MQ

This article shares JD's practical high‑availability architecture lessons, detailing common pitfalls across applications, databases, caches, RPC frameworks, containers, data centers, GC, and message queues, and provides concrete troubleshooting steps and optimization techniques to help engineers design more resilient, fault‑tolerant systems.

High AvailabilitySystem Designbackend
0 likes · 36 min read
Mastering High‑Availability: JD Real‑World Pitfalls & Fixes for Apps, DBs, Cache & MQ
JD Retail Technology
JD Retail Technology
Sep 4, 2025 · Operations

Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems

This article walks through the challenges of building high‑availability systems—covering applications, databases, caches, message queues, containers, GC, and more—using JD’s production experiences to highlight common pitfalls, root‑cause analyses, and practical mitigation strategies for engineers seeking resilient architecture.

CacheHigh AvailabilityJDK
0 likes · 37 min read
Mastering High Availability: Real-World Pitfalls and Solutions from JD's Production Systems
Architect's Guide
Architect's Guide
Aug 25, 2025 · Fundamentals

19 Essential Distributed System Design Patterns You Must Know

This article explores nineteen core design patterns for distributed systems—including Bloom filters, consistent hashing, quorum, leader‑follower, heartbeat, fencing, WAL, segmented logs, high‑water mark, leases, gossip, Phi accrual detection, split‑brain handling, checksums, CAP and PACELC theorems, hinted handoff, read repair, and Merkle trees—explaining their purpose, operation, and typical use cases.

consistencydistributed systemsfault tolerance
0 likes · 14 min read
19 Essential Distributed System Design Patterns You Must Know
Tech Freedom Circle
Tech Freedom Circle
Aug 20, 2025 · Backend Development

P0 Eureka Service Discovery Collapse Cost a Top E‑commerce $120M During Double‑11

During the Double‑11 shopping festival, a leading e‑commerce platform suffered a P0 outage when its Eureka service‑discovery cluster overloaded, triggering a full‑chain failure that lasted 2 hours 42 minutes and caused losses exceeding 1.2 billion yuan; the article dissects the timeline, root causes, capacity mis‑planning, monitoring gaps, and remediation strategies.

JavaMicroservicesMonitoring
0 likes · 34 min read
P0 Eureka Service Discovery Collapse Cost a Top E‑commerce $120M During Double‑11
Tech Freedom Circle
Tech Freedom Circle
Jul 27, 2025 · Interview Experience

Designing a Payment Middle Platform from Scratch – Core Challenges (Interview Answer)

This article provides a comprehensive guide to designing a payment middle platform from zero, covering its definition, classic middle‑platform types, core architecture, functional modules, fault‑tolerance, security measures, distributed‑transaction strategies, and detailed Java pseudocode, offering interview‑ready insights for architects.

Microservicesarchitecturedistributed transaction
0 likes · 39 min read
Designing a Payment Middle Platform from Scratch – Core Challenges (Interview Answer)
JakartaEE China Community
JakartaEE China Community
Jul 15, 2025 · Cloud Native

Choosing a Technology Stack for Cloud‑Native Microservices: MicroProfile vs Spring

This article explains why cloud‑native microservices are beneficial, defines their key characteristics, and provides a detailed, side‑by‑side comparison of MicroProfile and Spring frameworks—including REST APIs, dependency injection, configuration, fault tolerance, security, health checks, metrics, and tracing—along with concrete code examples and starter resources.

Cloud NativeConfigurationMicroProfile
0 likes · 27 min read
Choosing a Technology Stack for Cloud‑Native Microservices: MicroProfile vs Spring
Big Data Technology Tribe
Big Data Technology Tribe
Jul 8, 2025 · Operations

Mastering Retry Strategies: Why Exponential Backoff Is Essential for Reliable Systems

This article explains the purpose of retry mechanisms, why exponential backoff is crucial for handling transient failures, compares common backoff strategies, details key parameters such as base delay, max delay, multiplier and jitter, and provides a Java example that demonstrates their practical effects.

Javadistributed systemsexponential backoff
0 likes · 6 min read
Mastering Retry Strategies: Why Exponential Backoff Is Essential for Reliable Systems
IT Architects Alliance
IT Architects Alliance
Jul 7, 2025 · Backend Development

Avoid the 5 Fatal Architecture Mistakes That Cost Millions

This article analyzes five common architectural design errors—over‑pursuing cutting‑edge tech, single points of failure, mishandling data consistency, fragmented performance tuning, and neglecting security—illustrating their costly impacts with real‑world cases and offering practical principles to prevent them.

Microservicesfault toleranceperformance
0 likes · 13 min read
Avoid the 5 Fatal Architecture Mistakes That Cost Millions
Cognitive Technology Team
Cognitive Technology Team
Jun 21, 2025 · Fundamentals

Understanding Faults, Failures, and Fault Tolerance in Distributed Systems

This tutorial explains the definitions of faults and failures in distributed systems, explores their types and root causes, and presents fault‑tolerance mechanisms such as replication, checkpointing, redundancy, error detection, load balancing, and consensus algorithms to build resilient architectures.

Data Replicationconsensus algorithmsdistributed systems
0 likes · 10 min read
Understanding Faults, Failures, and Fault Tolerance in Distributed Systems
Linux Kernel Journey
Linux Kernel Journey
Jun 16, 2025 · Cloud Computing

How Tencent’s TGW Achieves Seamless Fast Migration and Self‑Healing Fault Recovery

The paper presents Tencent’s TGW cloud gateway architecture, highlighting a 2.9× forwarding performance boost, lossless state migration within 4 seconds, sub‑minute fault detection, multi‑level fault‑tolerance mechanisms, and operational best practices that enable 100 % availability for massive online services.

Cloud GatewayDPDKState Migration
0 likes · 16 min read
How Tencent’s TGW Achieves Seamless Fast Migration and Self‑Healing Fault Recovery
Tencent Cloud Developer
Tencent Cloud Developer
May 20, 2025 · Cloud Computing

Efficient and Resilient Cloud Gateway at Scale: Architecture, Key Technologies, and Operational Practices of Tencent TGW

The article presents a comprehensive analysis of Tencent's TGW cloud gateway, detailing its modular architecture, high‑performance forwarding plane, lossless state migration, rapid fault recovery, multi‑level redundancy, operational best practices, and security mechanisms that enable ultra‑low latency and high availability for large‑scale internet services.

Cloud GatewayState Migrationfault tolerance
0 likes · 13 min read
Efficient and Resilient Cloud Gateway at Scale: Architecture, Key Technologies, and Operational Practices of Tencent TGW
Tencent Technical Engineering
Tencent Technical Engineering
May 19, 2025 · Cloud Native

How Tencent’s TGW Delivers 3× Faster Throughput and Near‑Zero Downtime at Scale

The USENIX‑selected paper on Tencent’s TGW cloud gateway reveals how a modular, multi‑layer architecture achieves up to 2.9‑fold throughput gains, seconds‑level elastic scaling, loss‑less hot migration, and sub‑second fault recovery, offering a blueprint for resilient large‑scale cloud networking.

Cloud GatewayHigh AvailabilityNetwork Architecture
0 likes · 16 min read
How Tencent’s TGW Delivers 3× Faster Throughput and Near‑Zero Downtime at Scale
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
May 11, 2025 · Fundamentals

Why Unreliable Networks Threaten Distributed Systems and How to Mitigate Them

Distributed systems suffer from network unreliability—including packet loss, out‑of‑order delivery, variable latency, and ambiguous node failures—making timeout settings and fault detection challenging, and this article explains these issues, compares synchronous and asynchronous networks, and discusses strategies to balance latency and resource utilization.

Network Reliabilityasynchronous networkdistributed systems
0 likes · 8 min read
Why Unreliable Networks Threaten Distributed Systems and How to Mitigate Them
Cognitive Technology Team
Cognitive Technology Team
Apr 8, 2025 · Backend Development

Design and Implementation of RocketMQ NameServer: Core Functions, Architecture, and Optimization Strategies

The article explains RocketMQ NameServer's lightweight, stateless design, its core routing and metadata management functions, AP‑oriented architecture, fault‑tolerant mechanisms, scalability features, and practical optimization techniques for high availability and low operational cost.

Distributed MessagingNameServerRocketMQ
0 likes · 6 min read
Design and Implementation of RocketMQ NameServer: Core Functions, Architecture, and Optimization Strategies
DataFunSummit
DataFunSummit
Mar 20, 2025 · Artificial Intelligence

Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training

The article traces the evolution of AI training stability from early manual operations on small GPU clusters to sophisticated, fault‑tolerant infrastructures for thousand‑card and ten‑thousand‑card models, detailing Baidu Baige’s metrics, monitoring, eBPF‑based diagnostics, and checkpoint strategies that reduce invalid training time and accelerate fault recovery.

Large‑Scale Trainingcheckpointingdistributed systems
0 likes · 22 min read
Evolution of AI Training Stability and Baidu Baige’s Full-Stack Solutions for Large-Scale Model Training
Baidu Geek Talk
Baidu Geek Talk
Mar 17, 2025 · Industry Insights

From Manual Restarts to Automated Fault Tolerance: The Evolution of AI Training Stability

This article traces the decade‑long evolution of AI training stability—from early small‑model manual operations to large‑scale, multi‑thousand‑GPU clusters—detailing metrics like invalid training time, fault‑tolerance architectures, eBPF‑based hidden‑fault detection, BCCL enhancements, multi‑level restart strategies, and trigger‑based checkpointing that together shrink downtime from minutes to seconds.

AI trainingdistributed systemseBPF
0 likes · 22 min read
From Manual Restarts to Automated Fault Tolerance: The Evolution of AI Training Stability
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Mar 10, 2025 · Artificial Intelligence

How Baidu Baige Achieves Near‑Zero Downtime in Massive AI Model Training

The article examines how Baidu Baige evolved AI training stability from manual operations to precise engineering, detailing metrics, fault‑perception techniques, eBPF‑based diagnostics, multi‑level restart strategies, and trigger‑based checkpointing that together achieve sub‑minute recovery and 99.5% effective training time on massive GPU clusters.

AI trainingLarge-Scale Clusterscheckpointing
0 likes · 25 min read
How Baidu Baige Achieves Near‑Zero Downtime in Massive AI Model Training
FunTester
FunTester
Mar 2, 2025 · Operations

Common Fault Propagation Patterns and Prevention Strategies in Distributed Systems

The article examines typical fault propagation scenarios such as avalanche effects, cascading failures, resource exhaustion, data pollution, and dependency cycles in distributed systems, and outlines proactive measures like rate limiting, circuit breaking, isolation, monitoring, and chaos engineering to prevent small issues from escalating into large-scale outages.

Monitoringchaos engineeringcircuit breaker
0 likes · 11 min read
Common Fault Propagation Patterns and Prevention Strategies in Distributed Systems
IT Services Circle
IT Services Circle
Feb 9, 2025 · Big Data

Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability

This article explains how HDFS, the Hadoop Distributed File System, splits large files into blocks, replicates them for fault tolerance, organizes the cluster into NameNode and DataNode components, and provides high‑availability and scalability mechanisms such as standby NameNode and federation, enabling reliable big‑data storage and access.

Big DataDataNodeDistributed File System
0 likes · 11 min read
Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability
Architect
Architect
Jan 23, 2025 · Operations

Designing High‑Availability Systems: Architecture, Capacity Planning, and Fault‑Tolerance Guide

This article presents a comprehensive guide to building high‑availability systems, covering availability metrics, fault prevention, detection and recovery, capacity evaluation, layered architecture design, service tiering, resilience mechanisms, and operational best practices for reliable service delivery.

High AvailabilityOperationscapacity planning
0 likes · 34 min read
Designing High‑Availability Systems: Architecture, Capacity Planning, and Fault‑Tolerance Guide
MaGe Linux Operations
MaGe Linux Operations
Jan 17, 2025 · Databases

Understanding Redis Cluster: Architecture, Data Distribution, and Fault Tolerance

Redis Cluster provides a scalable, fault‑tolerant distributed Redis solution, explaining why it’s needed, its architecture, virtual slot partitioning, data distribution methods, limitations, smart client optimization, and automatic failover mechanisms, while highlighting key operational considerations for high‑performance deployments.

RedisVirtual Slotscluster
0 likes · 11 min read
Understanding Redis Cluster: Architecture, Data Distribution, and Fault Tolerance
IT Architects Alliance
IT Architects Alliance
Jan 14, 2025 · Backend Development

Microservice Architecture: Common Problems and Solutions

Microservice architecture, once a buzzword, breaks monolithic applications into independent services, but introduces challenges such as service governance, communication, gateway management, fault tolerance, and tracing; the article outlines these issues and presents practical solutions like Consul/Eureka, REST/RPC, API gateways, Hystrix, and tracing tools.

API GatewayDistributed TracingService Governance
0 likes · 11 min read
Microservice Architecture: Common Problems and Solutions
High Availability Architecture
High Availability Architecture
Jan 13, 2025 · Operations

Comprehensive Guide to High‑Availability System Architecture and Practices

This article provides a systematic overview of high‑availability system design, covering availability metrics, fault prevention, detection, recovery, capacity planning, service tiering, data layer resilience, monitoring, and the responsibilities of architects, SREs, and developers to ensure reliable, scalable services.

capacity planningfault tolerancesystem architecture
0 likes · 30 min read
Comprehensive Guide to High‑Availability System Architecture and Practices
Tencent Cloud Developer
Tencent Cloud Developer
Jan 7, 2025 · Operations

Designing High‑Availability Systems: Principles, Architecture, and Operations

This comprehensive guide explains how to design, build, and operate high‑availability systems by covering availability metrics, fault‑tolerance strategies, capacity planning, code and data layer architecture, automated testing, monitoring, and clear role responsibilities to ensure services stay reliable and resilient under load.

Cloud NativeHigh AvailabilitySRE
0 likes · 32 min read
Designing High‑Availability Systems: Principles, Architecture, and Operations
IT Architects Alliance
IT Architects Alliance
Jan 6, 2025 · Big Data

How Distributed Architecture Tames Massive Data: Strategies, Benefits, and Real‑World Cases

In an era of exploding data volumes, distributed architecture offers unparalleled scalability, fault tolerance, and parallel performance through sharding, replication, batch and stream processing, with real‑world examples from e‑commerce and social media giants illustrating its practical impact.

Big Datadata shardingdistributed architecture
0 likes · 12 min read
How Distributed Architecture Tames Massive Data: Strategies, Benefits, and Real‑World Cases
IT Architects Alliance
IT Architects Alliance
Jan 6, 2025 · Operations

Ensuring High Reliability in Distributed Systems: Redundancy, Fault Detection, Replication, and Resilience Strategies

The article explores how distributed systems achieve high reliability through redundant design, precise fault detection and recovery, data replication and synchronization, coordinated fault tolerance and load balancing, distributed transaction handling, comprehensive monitoring, elastic scaling, security safeguards, and robust disaster‑recovery planning.

MonitoringReliabilityfault tolerance
0 likes · 18 min read
Ensuring High Reliability in Distributed Systems: Redundancy, Fault Detection, Replication, and Resilience Strategies
BirdNest Tech Talk
BirdNest Tech Talk
Dec 29, 2024 · Fundamentals

Unlocking Distributed System Design: 20 Core Patterns Explained

This article distills the key design patterns behind distributed systems—covering replication, partitioning, consensus, and fault‑tolerance—by presenting each pattern’s problem statement, concrete solution, trade‑offs, and technical considerations, all illustrated with real‑world examples from projects like Kafka and Cassandra.

Consensusdesign patternsdistributed systems
0 likes · 18 min read
Unlocking Distributed System Design: 20 Core Patterns Explained
DevOps Cloud Academy
DevOps Cloud Academy
Dec 2, 2024 · Artificial Intelligence

Key Kubernetes Features that Benefit AI Inference Workloads

This article explains how Kubernetes’ native scalability, resource optimization, performance tuning, portability, and fault‑tolerance features align with the demands of AI inference, helping organizations run large ML models efficiently, cost‑effectively, and reliably across diverse environments.

AI inferencePortabilityfault tolerance
0 likes · 15 min read
Key Kubernetes Features that Benefit AI Inference Workloads
Sanyou's Java Diary
Sanyou's Java Diary
Nov 25, 2024 · Cloud Native

Designing Resilient Stateful Distributed Systems: From Theory to Microservice Architecture

This article explores the fundamentals of distributed systems, compares stateful and stateless services, examines monolithic, SOA, and microservice models, and provides practical guidance on access layers, fault tolerance, service discovery, scaling, and data storage for building robust cloud‑native architectures.

Cloud NativeMicroservicesfault tolerance
0 likes · 29 min read
Designing Resilient Stateful Distributed Systems: From Theory to Microservice Architecture
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 20, 2024 · Backend Development

Design and Implementation of a High‑Performance Message Notification System

This article presents a comprehensive design of a high‑performance, fault‑tolerant message notification system, covering service partitioning, system architecture, idempotent processing, dynamic error detection, thread‑pool management, retry mechanisms, and stability measures such as traffic‑spike handling, resource isolation, third‑party protection, monitoring, and active‑active deployment.

JavaMessage Notificationbackend-architecture
0 likes · 16 min read
Design and Implementation of a High‑Performance Message Notification System
Tencent Cloud Developer
Tencent Cloud Developer
Oct 22, 2024 · Industry Insights

Designing Stateful Distributed Systems: Core Principles and Architecture Patterns

This article analyzes the motivations, benefits, and challenges of building stateful distributed systems, compares monolithic, SOA, and microservice models, and provides detailed guidance on access layers, service discovery, fault tolerance, scaling, and data storage for cloud‑native architectures.

Cloud NativeMicroservicesdistributed systems
0 likes · 29 min read
Designing Stateful Distributed Systems: Core Principles and Architecture Patterns
JavaEdge
JavaEdge
Oct 21, 2024 · Operations

Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture

This article explores the advantages of unitized architecture over traditional microservices, detailing how its modular design, dedicated routing layer, and tailored observability practices enhance system resilience, fault‑tolerance, and operational insight for large‑scale distributed applications.

Resiliencedistributed systemsfault tolerance
0 likes · 17 min read
Why Move Beyond Microservices? Unlocking Resilience with Unitized Architecture
Baidu Geek Talk
Baidu Geek Talk
Oct 9, 2024 · Artificial Intelligence

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

This article analyzes Baidu's Baige 4.0 AI infrastructure, detailing its four‑layer architecture, XMAN 5.0 hardware, HPN network, BCCL communication library, and AIAK inference upgrades, and explains how these innovations address large‑model training and inference challenges while boosting performance, utilization, and cost efficiency.

AI InfrastructureGPU AccelerationHigh-performance computing
0 likes · 16 min read
How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency
IT Services Circle
IT Services Circle
Oct 4, 2024 · Databases

Understanding Redis Split‑Brain: Causes, Data Loss, and Prevention Strategies

This article explains Redis split‑brain behavior, describing its definition, causes such as network failures and Sentinel elections, the resulting data loss during master‑slave switches, and practical prevention measures including quorum configuration, timeout tuning, network monitoring, proxy layers, and the min‑slaves‑to‑write and min‑slaves‑max‑lag settings.

High AvailabilityMaster‑SlaveSentinel
0 likes · 7 min read
Understanding Redis Split‑Brain: Causes, Data Loss, and Prevention Strategies
FunTester
FunTester
Sep 19, 2024 · Fundamentals

Software Antifragility: Rethinking Error Handling and Reliability

This paper introduces the concept of software antifragility, drawing on Taleb’s theory to argue that embracing errors through fault tolerance, automatic runtime repair, and fault injection can transform software systems into self‑improving, more robust entities, and discusses implications for development processes and product reliability.

Antifragilitychaos engineeringfault tolerance
0 likes · 13 min read
Software Antifragility: Rethinking Error Handling and Reliability
Top Architect
Top Architect
Aug 15, 2024 · Backend Development

Handling Interface‑Level Failures: Degradation, Circuit Breaking, Rate Limiting, and Queuing

The article explains how interface‑level faults—where the system stays up but business performance degrades—can be mitigated through four core techniques (degradation, circuit breaking, rate limiting, and queuing), detailing their principles, implementation patterns, and practical trade‑offs for backend services.

backendcircuit breakerdegradation
0 likes · 20 min read
Handling Interface‑Level Failures: Degradation, Circuit Breaking, Rate Limiting, and Queuing
dbaplus Community
dbaplus Community
Aug 13, 2024 · Artificial Intelligence

Why Kubernetes Is the Ideal Platform for AI Inference: 5 Key Benefits

Kubernetes aligns perfectly with AI inference demands by offering built‑in scalability, resource and performance optimization, seamless portability across clouds, and robust fault‑tolerance, making it a cost‑effective, high‑availability foundation for deploying large‑scale machine‑learning models.

AI inferencefault tolerancekubernetes
0 likes · 10 min read
Why Kubernetes Is the Ideal Platform for AI Inference: 5 Key Benefits
MaGe Linux Operations
MaGe Linux Operations
Aug 9, 2024 · Operations

Mastering Elasticsearch Data Sync and Cluster Architecture: Strategies & Best Practices

This article explains how to keep MySQL and Elasticsearch data in sync using synchronous calls, asynchronous notifications, or binlog listeners, and dives deep into Elasticsearch cluster design, node roles, distributed storage, query phases, split‑brain handling, and fault‑tolerance mechanisms.

Cluster ArchitectureData synchronizationDistributed Query
0 likes · 8 min read
Mastering Elasticsearch Data Sync and Cluster Architecture: Strategies & Best Practices
Top Architecture Tech Stack
Top Architecture Tech Stack
Jul 16, 2024 · Cloud Native

Designing Fault‑Tolerant Microservices Architecture: Patterns and Practices

The article explains how to build reliable microservices by isolating failures, applying graceful degradation, change‑management, health checks, self‑healing, fallback caching, retry strategies, rate limiting, fast‑fail principles, circuit breakers, and failure‑testing to ensure high availability in distributed cloud‑native systems.

Cloud NativeMicroservicesOperations
0 likes · 14 min read
Designing Fault‑Tolerant Microservices Architecture: Patterns and Practices
Su San Talks Tech
Su San Talks Tech
Jul 6, 2024 · Backend Development

Mastering High Availability: 10 Essential Design Techniques for Scalable Systems

This article explains ten core techniques—system splitting, decoupling, asynchrony, retry, compensation, backup, multi‑active strategies, isolation, rate limiting, circuit breaking, and degradation—that together enable robust, high‑availability architectures for modern backend services.

High AvailabilitySystem Designdistributed systems
0 likes · 12 min read
Mastering High Availability: 10 Essential Design Techniques for Scalable Systems
Ctrip Technology
Ctrip Technology
Jun 20, 2024 · Backend Development

Design and Architecture of Ctrip Service Registration Center

The article explains Ctrip's service registration center architecture, including its two‑layer Data and Session design, multi‑sharding, fault‑tolerance mechanisms, Redis‑based cluster discovery, design trade‑offs such as proxy versus Smart SDK, hashing strategy, and operational considerations for burst traffic and future scaling.

Redis discoveryService Registrydistributed systems
0 likes · 16 min read
Design and Architecture of Ctrip Service Registration Center
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
May 31, 2024 · Backend Development

Mastering Microservice Splitting: 6 Essential Design Principles

This article outlines six fundamental microservice splitting principles—including single responsibility, appropriate granularity, interface segregation, product impact avoidance, scalability, and fault tolerance—to help architects design maintainable, decoupled, and resilient services.

Microservicesfault toleranceinterface segregation
0 likes · 5 min read
Mastering Microservice Splitting: 6 Essential Design Principles
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 24, 2024 · Artificial Intelligence

How DeepRec Extension Boosts Distributed Sparse Model Training with Elasticity and Fault Tolerance

DeepRec Extension enhances large‑scale sparse model training by adding automatic elastic training, resource‑aware scheduling, real‑time monitoring, and efficient fault‑tolerance mechanisms, enabling lower cost, higher throughput, and more reliable distributed training for AI workloads.

AI InfrastructureDeepRecelastic training
0 likes · 13 min read
How DeepRec Extension Boosts Distributed Sparse Model Training with Elasticity and Fault Tolerance
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Apr 17, 2024 · Backend Development

In-Depth Analysis of Apache RocketMQ Architecture, Operation Principles, and High‑Throughput Mechanisms

This article provides a comprehensive overview of Apache RocketMQ, detailing its core components, producer and consumer workflows, storage strategies, master‑slave synchronization, Raft‑based half‑write and leader election mechanisms, and best‑practice recommendations for high‑throughput, fault‑tolerant messaging systems.

Backend DevelopmentMessage QueueRaft
0 likes · 22 min read
In-Depth Analysis of Apache RocketMQ Architecture, Operation Principles, and High‑Throughput Mechanisms
Architects' Tech Alliance
Architects' Tech Alliance
Apr 6, 2024 · Artificial Intelligence

How ByteDance Scaled LLM Training to Over 10,000 GPUs: Inside the MegaScale System

The article analyzes ByteDance and Peking University's MegaScale system that enables efficient, stable training of large language models on clusters exceeding ten thousand GPUs, detailing algorithmic tweaks, 3D parallel communication overlap, operator optimizations, data‑pipeline improvements, network tuning, and fault‑tolerance mechanisms that together achieve a 55.2% MFU on a 175B model.

GPU clustersLLM trainingMegaScale
0 likes · 15 min read
How ByteDance Scaled LLM Training to Over 10,000 GPUs: Inside the MegaScale System
Architect
Architect
Apr 4, 2024 · Backend Development

Mastering High Availability: 9 Essential Design Techniques for Scalable Systems

The article walks through nine practical techniques—system splitting, decoupling, asynchronous processing, retry, compensation, backup, multi‑active deployment, rate limiting, circuit breaking, and degradation—explaining why each is needed, how they are implemented in real‑world microservice architectures, and what trade‑offs to consider.

High AvailabilityMicroservicesSystem Design
0 likes · 13 min read
Mastering High Availability: 9 Essential Design Techniques for Scalable Systems
Architecture & Thinking
Architecture & Thinking
Mar 5, 2024 · Databases

How Database Middleware Solves High‑Traffic Challenges: Connection Pools, Sharding, and More

This article examines how database middleware tackles the demanding needs of large‑scale internet services by providing centralized connection‑pool management, transparent read‑write splitting, diverse load‑balancing algorithms, sharding support, automatic failover, security controls, comprehensive monitoring, and flexible backup‑recovery mechanisms.

Connection PoolMonitoringSharding
0 likes · 9 min read
How Database Middleware Solves High‑Traffic Challenges: Connection Pools, Sharding, and More
Linux Cloud Computing Practice
Linux Cloud Computing Practice
Mar 4, 2024 · Operations

Building a High‑Performance, Highly Available Membership System with ES, Redis & MySQL

To ensure the massive, multi‑platform membership service remains fast and reliable, this article details a multi‑center architecture using Elasticsearch for unified member data, Redis caching, and MySQL partitioning, along with traffic isolation, fault‑tolerant syncing, and fine‑grained flow‑control and degradation strategies.

Redisfault tolerancemysql
0 likes · 23 min read
Building a High‑Performance, Highly Available Membership System with ES, Redis & MySQL
Architect's Guide
Architect's Guide
Mar 2, 2024 · Fundamentals

RabbitMQ vs Kafka: Core Differences and When to Use Each

This article compares RabbitMQ and Apache Kafka across architecture, message ordering, routing, timing, retention, fault handling, scalability, and consumer complexity, and provides guidance on which platform suits specific use‑cases such as flexible routing, strict ordering, long‑term retention, or high throughput.

Message OrderingMessage QueueRabbitMQ
0 likes · 19 min read
RabbitMQ vs Kafka: Core Differences and When to Use Each
Architecture & Thinking
Architecture & Thinking
Dec 25, 2023 · Databases

How to Detect, Analyze, and Prevent Redis Hot Keys to Avoid Outages

This article explains what Redis hot keys are, the scenarios that generate them, their risks, and provides practical monitoring methods and mitigation strategies—including cache pre‑warming, distributed caching, rate limiting, and secondary caches—to keep production systems stable.

Hot KeyMonitoringfault tolerance
0 likes · 11 min read
How to Detect, Analyze, and Prevent Redis Hot Keys to Avoid Outages
ITPUB
ITPUB
Dec 5, 2023 · Cloud Native

Prevent Massive K8s Outages: Scale, Redundancy, and Embrace Restarts

The article analyzes the November 27 Didi outage caused by an aggressive Kubernetes upgrade, then presents four engineering principles—controlling cluster size, eliminating single points of failure, treating restarts as normal, and decoupling data and control planes—to build more resilient cloud‑native systems.

Cloud Nativecluster upgradefault tolerance
0 likes · 13 min read
Prevent Massive K8s Outages: Scale, Redundancy, and Embrace Restarts
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Dec 1, 2023 · Backend Development

Resilience4j Essentials: Circuit Breaker, TimeLimiter, Bulkhead & RateLimiter

This article introduces Resilience4j, a lightweight fault‑tolerance library for Spring Boot, explaining its core decorators—CircuitBreaker, TimeLimiter, Bulkhead, and RateLimiter—along with configuration examples, annotation usage, fallback handling, and practical test code to improve system stability and resilience.

JavaResilience4jSpring Boot
0 likes · 16 min read
Resilience4j Essentials: Circuit Breaker, TimeLimiter, Bulkhead & RateLimiter
Open Source Linux
Open Source Linux
Nov 23, 2023 · Operations

Mastering RAID Fault Tolerance: Consistency, Hot Spare, Rebuild & More

This article explains RAID fault tolerance mechanisms—including redundancy levels of RAID 1,5,6,10,50,60—covers consistency checks, hot‑spare and emergency backup, data reconstruction, read/write policies, power‑loss protection, striping, mirroring, foreign configurations, energy‑saving and JBOD, providing a comprehensive guide for storage administrators.

Data ProtectionRAIDStorage Management
0 likes · 15 min read
Mastering RAID Fault Tolerance: Consistency, Hot Spare, Rebuild & More
Open Source Linux
Open Source Linux
Nov 21, 2023 · Fundamentals

Understanding RAID Levels: Choose the Right Storage Solution for Performance and Reliability

RAID combines multiple physical disks into virtual drives, offering various levels—RAID 0, 1, 1ADM, 5, 6, 10, 10ADM, 1E, 50, and 60—each balancing performance, fault tolerance, and capacity, with detailed processing flows, storage calculations, and best‑practice recommendations for optimal deployment.

RAIDdata redundancyfault tolerance
0 likes · 20 min read
Understanding RAID Levels: Choose the Right Storage Solution for Performance and Reliability
Sanyou's Java Diary
Sanyou's Java Diary
Nov 20, 2023 · Operations

Mastering High Availability: 10 Essential Design Techniques for Scalable Systems

This article outlines ten practical techniques—including system splitting, decoupling, asynchronous processing, retry strategies, compensation, backup, multi‑active deployment, isolation, rate limiting, circuit breaking, and degradation—to help engineers design highly available, resilient architectures for large‑scale internet applications.

MicroservicesSystem Designfault tolerance
0 likes · 14 min read
Mastering High Availability: 10 Essential Design Techniques for Scalable Systems
Architects' Tech Alliance
Architects' Tech Alliance
Nov 5, 2023 · Fundamentals

Understanding RAID Fault Tolerance, Consistency Checks, Hot Spare, Rebuild, and Data Protection Features

This article explains RAID fault‑tolerance mechanisms, consistency verification, hot‑spare and emergency backup, rebuild processes, virtual‑disk read/write policies, power‑loss protection, disk striping, mirroring, foreign configurations, power‑saving and pass‑through features, providing a comprehensive overview of modern storage system capabilities.

RAIDdisk stripingfault tolerance
0 likes · 16 min read
Understanding RAID Fault Tolerance, Consistency Checks, Hot Spare, Rebuild, and Data Protection Features
Alibaba Cloud Native
Alibaba Cloud Native
Oct 13, 2023 · Cloud Native

Why Microservice Governance Matters and How OpenSergo Tackles Its Challenges

The article explains the stability challenges of modern microservice architectures, outlines the three governance domains (development/testing, change, runtime), and introduces OpenSergo’s open, cloud‑native specifications, control‑plane, and data‑plane solutions for traffic routing, gray‑release, and fault‑tolerance.

OpenSergofault tolerancegray-release
0 likes · 18 min read
Why Microservice Governance Matters and How OpenSergo Tackles Its Challenges
dbaplus Community
dbaplus Community
Oct 7, 2023 · Operations

How to Build a Truly High‑Availability System: 6 Essential Design Layers

This article breaks down high‑availability system design into six critical layers—architecture, development standards, application services, storage, product safeguards, and operations—offering concrete practices such as capacity planning, fault‑tolerant patterns, monitoring, and incident‑response strategies to achieve four‑nine (99.99%) uptime.

OperationsSystem Designcapacity planning
0 likes · 26 min read
How to Build a Truly High‑Availability System: 6 Essential Design Layers
MaGe Linux Operations
MaGe Linux Operations
Aug 29, 2023 · Operations

How to Effectively Monitor and Recover a Kafka Cluster

This guide explains essential Kafka monitoring techniques, third‑party tools, custom scripts, key metrics, and practical strategies for high availability, fault detection, rapid recovery, and ongoing testing to keep Kafka clusters stable and performant.

Operationsdistributed-systemsfault tolerance
0 likes · 7 min read
How to Effectively Monitor and Recover a Kafka Cluster
JD Retail Technology
JD Retail Technology
Aug 14, 2023 · Backend Development

Implementing a Lightweight Distributed Scheduling Solution to Replace TBSchedule

To improve stability and reduce costs during high‑traffic events, we replaced the Zookeeper‑dependent TBSchedule framework with a lightweight, Redis‑based distributed scheduler that decentralizes task execution, uses thread pools instead of timers, and supports dynamic scaling and seamless degradation for reliable order processing.

Distributed SchedulingMicroservicesRedis
0 likes · 4 min read
Implementing a Lightweight Distributed Scheduling Solution to Replace TBSchedule
JD Cloud Developers
JD Cloud Developers
Aug 9, 2023 · Backend Development

Mastering Hystrix: Implementing Circuit Breakers in Spring Cloud Microservices

This article explains why circuit breakers are essential in microservice architectures, introduces Netflix's Hystrix library, details its design principles, shows step‑by‑step demos for Ribbon and Feign integration, and covers dashboards, Turbine, isolation strategies, request merging, caching, and related Spring Boot SPI mechanisms.

HystrixJavaMicroservices
0 likes · 29 min read
Mastering Hystrix: Implementing Circuit Breakers in Spring Cloud Microservices
Architect
Architect
Aug 4, 2023 · Fundamentals

What Exactly Is Software Architecture? A Deep Dive into Systems, Modules, and Design Principles

The article systematically defines software architecture, distinguishes systems, subsystems, modules, and components, compares frameworks with architectures, explores TOGAF and RUP classifications, traces the evolution from monoliths to micro‑services, and presents concrete design principles and common pitfalls for building scalable, maintainable systems.

MicroservicesSystem DesignTOGAF
0 likes · 25 min read
What Exactly Is Software Architecture? A Deep Dive into Systems, Modules, and Design Principles
Architects Research Society
Architects Research Society
Jul 13, 2023 · Operations

Five Patterns to Make Your Microservice Fault‑Tolerant

This article explains essential fault‑tolerance patterns for microservices—including timeouts, retries, circuit breakers, distributed deadlines, and rate limiting—detailing their basic forms, drawbacks, and practical implementation strategies to improve reliability and prevent cascading failures.

Microservicescircuit breakerfault tolerance
0 likes · 12 min read
Five Patterns to Make Your Microservice Fault‑Tolerant