Tagged articles

15 articles

Page 1 of 1

Apr 8, 2026 · Artificial Intelligence

How to Engineer Reliable Long‑Running AI Coding Tasks: Harnessing Agents for Scale

This article analyzes the challenges of using AI coding agents for large‑scale, long‑running tasks such as bulk file migration or code review, and presents a systematic engineering approach—including task decomposition, parallel execution, persistent progress files, resumable workflows, and multi‑level retry strategies—backed by concrete script examples and real‑world case studies.

AI agentsMeta SkillParallel Execution

0 likes · 31 min read

How to Engineer Reliable Long‑Running AI Coding Tasks: Harnessing Agents for Scale

Ops Community

Sep 17, 2025 · Operations

Mastering System Fault Tolerance: From Theory to Production‑Ready High‑Availability

This comprehensive guide explores the philosophy, core patterns, and practical techniques for designing fault‑tolerant, highly available systems, covering circuit breakers, retries, rate limiting, monitoring, cloud‑native deployment, and real‑world case studies to help engineers build resilient production architectures.

Cloud Nativecircuit breakerfault tolerance

0 likes · 24 min read

Mastering System Fault Tolerance: From Theory to Production‑Ready High‑Availability

Architect

Apr 25, 2024 · Backend Development

Design and Implementation of an Elasticsearch Data Synchronization Service (ECP)

This article describes the challenges of synchronizing billions of order records to Elasticsearch and presents the design, architecture, and key technical details of a generic data‑sync service (ECP) that supports multiple data sources, dynamic rate limiting, retry strategies, SPI‑based extensibility, environment isolation, health‑check, fault recovery, smooth migration, and elegant logging.

Backend DevelopmentDynamic Rate LimitingElasticsearch

0 likes · 22 min read

Design and Implementation of an Elasticsearch Data Synchronization Service (ECP)

Zhuanzhuan Tech

Apr 3, 2024 · Backend Development

Design and Implementation of an Elasticsearch Data Synchronization Service (ECP) for Large‑Scale Order Data

This article describes the challenges and technical solutions for synchronizing billions of order records from a relational database to Elasticsearch, including multi‑source data reading, dynamic rate limiting, retry strategies, SPI‑based service integration, environment isolation, health‑checking, smooth migration, and structured logging, all implemented in a backend service called ECP.

JavaSPIbackend service

0 likes · 21 min read

Design and Implementation of an Elasticsearch Data Synchronization Service (ECP) for Large‑Scale Order Data

Selected Java Interview Questions

Feb 2, 2024 · Backend Development

Comprehensive Guide to API Request Retry Mechanisms and Spring Boot Implementation

This article examines why API requests fail, explains the importance of retry mechanisms, compares linear, exponential and randomized back‑off strategies, discusses maximum attempt considerations and idempotency, and provides a detailed Spring Boot implementation using Spring Retry along with alternative approaches.

API RetryBackendIdempotency

0 likes · 21 min read

Comprehensive Guide to API Request Retry Mechanisms and Spring Boot Implementation

Sanyou's Java Diary

Dec 28, 2023 · Operations

Mastering High Availability: Traffic Governance, Circuit Breakers, Isolation, Retries, Timeouts and Rate Limiting

This article explains how to achieve the three‑high goals of high performance, high availability and easy scalability in microservice systems by using traffic governance techniques such as circuit breaking, various isolation strategies, retry mechanisms, timeout controls, degradation tactics and rate‑limiting, illustrated with practical examples and diagrams.

MicroservicesTimeoutcircuit breaker

0 likes · 32 min read

Mastering High Availability: Traffic Governance, Circuit Breakers, Isolation, Retries, Timeouts and Rate Limiting

Laravel Tech Community

Nov 27, 2022 · Blockchain

Magician-Web3 Update: Load Balancing, Retry Strategy, and Detailed Adjustments

The Magician-Web3 toolkit now supports load‑balanced RPC endpoints, a configurable retry strategy for skipped blocks, and several fine‑tuned adjustments such as a reduced minimum scan period and simplified chain detection, with code examples illustrating the new features.

BlockchainJavaSDK

0 likes · 5 min read

Magician-Web3 Update: Load Balancing, Retry Strategy, and Detailed Adjustments

IT Architects Alliance

May 31, 2022 · Backend Development

Optimizing High‑Concurrency Services: Practical Strategies for QPS Over 200K

This article presents practical techniques for optimizing high‑concurrency online services—such as avoiding relational databases, employing multi‑level caching, leveraging multithreading, implementing circuit‑breaker patterns, reducing I/O, managing retries, handling edge cases, and logging efficiently—to maintain sub‑300 ms response times under massive load.

IO optimizationcachingcircuit breaker

0 likes · 10 min read

Optimizing High‑Concurrency Services: Practical Strategies for QPS Over 200K

Architecture Digest

May 17, 2022 · Backend Development

Optimizing High‑Concurrency Services: Practical Strategies for QPS Over 200k

This article outlines practical techniques for handling online services with QPS exceeding 200,000, including avoiding relational databases, employing multi‑level caching, leveraging multithreading, implementing degradation and circuit‑breaker patterns, optimizing I/O, using controlled retries, handling edge cases, and logging efficiently.

IO optimizationcachingcircuit breaker

0 likes · 9 min read

Java Interview Crash Guide

May 11, 2021 · Backend Development

Which Retry Strategy Guarantees Reliable Message Delivery in Microservices?

The article examines five different retry mechanisms for ensuring reliable message delivery between payment and billing microservices, evaluates their pros and cons, and ultimately recommends the third solution as a cost‑effective, highly reliable approach achieving 99.99% consistency.

Message QueueMicroservicesReliability

0 likes · 6 min read

Which Retry Strategy Guarantees Reliable Message Delivery in Microservices?

NiuNiu MaTe

Mar 31, 2021 · Backend Development

How to Ensure Reliable Service‑to‑Service Messaging: 5 Proven Retry Strategies

This article explores why reliable inter‑service communication is essential in microservice architectures, illustrates common pitfalls with real‑world examples, and presents five practical retry and persistence solutions—including fast retry, in‑memory queues, persistent queues, retry services, and pre‑notification—to improve message delivery reliability.

Distributed SystemsMessage QueueMicroservices

0 likes · 11 min read

How to Ensure Reliable Service‑to‑Service Messaging: 5 Proven Retry Strategies

dbaplus Community

Mar 25, 2021 · Operations

Mastering High‑Quality Service Architecture: Load Balancing, Rate Limiting, Retries & Timeouts

This article distills Bilibili's technical director insights on building high‑service‑quality architectures, covering systematic load‑balancing strategies, sophisticated rate‑limiting mechanisms, robust retry policies, precise timeout controls, and comprehensive approaches to prevent cascading failures in large‑scale systems.

Backend ArchitectureSREload balancing

0 likes · 14 min read

Mastering High‑Quality Service Architecture: Load Balancing, Rate Limiting, Retries & Timeouts

iQIYI Technical Product Team

Oct 30, 2020 · Mobile Development

Design and Optimization of iQIYI Mobile APM Network Monitoring System

The iQIYI mobile APM system provides real‑time, user‑level network monitoring with classified error detection, cloud‑controlled SDK sampling, second‑level backend storage, and web dashboards, while employing DNS three‑layer caching, weak‑network grading, gateway multiplexing, super‑pipeline proxies and layered retry strategies, reducing Android error rates from 5.3 % to 0.48 % and iOS from 4.63 % to 0.35 %.

APMDNSMobile Development

0 likes · 11 min read

Design and Optimization of iQIYI Mobile APM Network Monitoring System

ITPUB

Jul 28, 2020 · Databases

Why MySQL Binlog Can Cause Order Fulfillment Delays and How to Fix It

This article explains MySQL Binlog’s role in event‑driven order processing, analyzes hidden pitfalls such as premature Binlog reads and two‑phase commit issues, and offers practical solutions like retry strategies and direct Binlog consumption to ensure data consistency.

Database LogsEvent-Driven Architecturemysql

0 likes · 14 min read

Why MySQL Binlog Can Cause Order Fulfillment Delays and How to Fix It

Tencent Cloud Developer

Apr 22, 2020 · Cloud Native

Designing High‑Quality Service Architecture Under Traffic Peaks: Load Balancing, Rate Limiting, Retries, Timeouts, and Failure Mitigation

Drawing on Google SRE principles, Bilibili’s technical director outlines a systematic, cloud‑native framework for high‑quality service architecture during traffic peaks, covering frontend and internal load balancing, distributed rate limiting, controlled retries, fail‑fast timeouts, and comprehensive failure‑mitigation strategies.

SREcloud-nativeload balancing

0 likes · 13 min read

Designing High‑Quality Service Architecture Under Traffic Peaks: Load Balancing, Rate Limiting, Retries, Timeouts, and Failure Mitigation