Backend Development 10 min read

How to Build Highly Available Backend APIs: 10 Essential Design Principles

This article explains why high availability is crucial for backend services and outlines ten practical design principles—including dependency control, avoiding single points, load balancing, isolation, rate limiting, circuit breaking, async processing, degradation, gray release, and chaos engineering—to help developers create resilient APIs.

Java High-Performance Architecture

Jan 24, 2023

How to Build Highly Available Backend APIs: 10 Essential Design Principles

Preface

As a backend developer, creating service interfaces is routine, whether they serve front‑end HTTP requests or other services via RPC. Although the code may look simple, ensuring high availability is far from easy. This article discusses the key considerations for building highly available APIs and welcomes constructive feedback.

What Is High Availability?

In simple terms, high availability means a system’s ability to handle and mitigate risks.

Why Pursue High Availability?

Development errors can cause online incidents.

System operation depends on CPU, memory, disk, network, etc., any of which may fail.

User registration failures affect experience.

Big‑sale events (e.g., Double‑11, 618) can overload order services, hurting GMV.

Other unknown factors.

Therefore, we must design for high availability to cope with these uncontrollable factors.

Key Factors of High Availability

The essence of high availability is the system’s capacity to confront and avoid risks. From this perspective, four critical factors shape a high‑availability interface design: Dependence, Probability, Time, and Scope.

Minimize dependent resources.

Keep risk probability low.

Limit the impact scope.

Shorten the impact duration.

Design Principles for Highly Available Interfaces

Based on the above factors, consider the following practical guidelines.

1. Control Dependencies

Reduce dependencies whenever possible and avoid strong coupling.

Less Dependency

For example, handling ten requests per minute with a MySQL query is sufficient; introducing Redis unnecessarily wastes resources and adds complexity.

Weak Dependency

When a user‑registration service strongly depends on a coupon‑issuing service, a failure in the latter makes registration unavailable. Using asynchronous processing creates a weak dependency, so coupon service outages do not block registration.

2. Avoid Single Points of Failure

Mitigate single‑point failures through redundancy and backup.

Deploy applications across multiple data centers and machines so that if one server fails, others continue serving.

Retain the previous version after each release to enable quick rollback.

Ensure at least two people understand each business interface for rapid incident response.

Use master‑slave setups for databases and caches like MySQL or Redis.

3. Load Balancing

Distribute risk by spreading traffic across multiple nodes.

For instance, Nginx or JSF load balancers disperse requests to avoid bottlenecks on a single server.

When caching with JIMDB, hotspot keys can overload a shard, causing high CPU usage and timeouts. Interface design should consider data‑store balance and monitor hotspots for dynamic rebalancing.

4. Resource Isolation

Isolate resources to contain failures.

Physical separation of service deployments prevents a single‑machine or single‑room failure from affecting the whole system.

Sharding databases and tables ensures that a server crash does not bring down the entire service.

5. Rate Limiting

Rate limiting protects both the service itself and its downstream dependencies.

The current JSF platform already provides flow‑control capabilities, and custom limits can be added as needed.

6. Service Circuit Breaking

Circuit breaking isolates failing downstream services to prevent cascading failures.

When service A calls B, C, and D, a failure in any of them can degrade A. Using tools like Hystrix or DUCC can downgrade strong dependencies to weak ones.

7. Asynchronous Processing

Convert synchronous operations to asynchronous ones.

During high‑traffic promotions, user reward requests can be queued via MQ and processed later, reducing load and limiting incident impact.

8. Degradation Plans

Degradation is a post‑incident mitigation that narrows the impact scope.

Critical interfaces should have well‑defined fallback strategies, ensuring non‑core functions can be disabled while core services remain operational.

9. Gray Release

Gradual rollout limits risk exposure.

Deploy a new service to a subset of users, collect feedback on performance and stability, then expand or roll back based on results.

10. Chaos Engineering

Proactively inject failures to uncover hidden issues.

Complex systems with many dependencies can exhibit butterfly effects. Using platforms like the Tai Shan chaos‑engineering tool, simulate failures and prepare response plans to keep risk within controllable bounds.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend high availability fault tolerance api-design service reliability

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.