Backend Development 16 min read

How to Prevent Service Failures: Suspect Third‑Party, Guard Users, and Perfect Your Own Service

The article shares practical strategies for preventing service failures by doubting third‑party services, protecting against misuse by consumers, and improving one’s own code and architecture, covering fallback plans, timeout settings, retry policies, API design, traffic control, and resource limits.

Qunar Tech Salon

Dec 1, 2016

How to Prevent Service Failures: Suspect Third‑Party, Guard Users, and Perfect Your Own Service

For every programmer, faults are a looming sword; avoiding them is a constant quest. Drawing on two years of backend experience, the author outlines how to prevent failures from the perspectives of third‑party services, consuming users, and one’s own implementation.

1 Suspect Third‑Party

Assume all external services are unreliable and adopt three actions.

1.1 Fallback and degradation plans – Define business‑level downgrade strategies, such as caching hot items when a user‑profile service is down, or maintaining a full‑index snapshot to roll back if upstream data is polluted.

1.2 Fast‑fail principle – Set strict timeouts (e.g., 200 ms) for remote calls; without them, a slow third‑party can exhaust thread pools and cripple the service.

1.3 Careful retry mechanisms – Evaluate whether retries are appropriate; blind retries can amplify load and worsen the outage.

2 Guard Users

Assume all consumers are unreliable and design robust APIs.

2.1 Good API design – Follow principles such as least exposure, avoid forcing users to make multiple calls, limit request sizes, use parameter objects, and surface real errors. Example signatures:

List<Integer> getDataList(List<Integer> idList);
List<Integer> getDataListWithLimitLength(List<Integer> idList);
public List<Integer> test() {
    try {
        // ...
    } catch (Exception e) {
        return Collections.emptyList();
    }
}

2.2 Traffic control – Allocate traffic per service, employ rate‑limiting or circuit‑breaker patterns, and reject or divert excess requests to protect the system.

3 Do Your Own

Apply solid engineering principles in architecture and code.

3.1 Single‑responsibility principle – Keep services and classes focused on one concern; separate read/write, isolate domains, and avoid monolithic designs.

3.2 Resource control

3.2.1 CPU – Optimize algorithms, minimize lock usage, avoid dead loops, prefer thread pools, and tune JVM parameters.

3.2.2 Memory – Set JVM limits, pre‑size collections, use object pools, cap queue sizes, offload large data to distributed caches, compress cached data, and understand third‑party memory footprints.

3.2.3 Network – Reduce call frequency (batch requests), limit response payloads, and use selective field returns.

3.2.4 Disk – Control log volume, monitor log size, rotate and clean logs regularly, and consider remote log storage.

3.3 Avoid single points – Deploy services across multiple zones, use horizontal scaling, and shard or tier data stores to ensure resilience.

4 Conclusion

Summarizing the experience: “Suspect third‑party, guard users, and do your own” is a concise mantra for building fault‑tolerant services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

operations reliability fault-tolerance resource-management API-design

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.