How to Prevent Service Failures: Trust Third‑Party, Guard Users, Master Your Own Code
An experienced backend engineer shares practical strategies to prevent service failures, covering third‑party distrust, user‑side safeguards, robust API design, traffic limiting, resource management, and architectural best practices such as single‑responsibility and avoiding single points of failure.
For every programmer, failures are like a sword hanging over the head; avoiding them is a constant pursuit.
1 Suspect Third‑Party
Adopt the belief that all third‑party services are unreliable and take actions such as having fallback plans, setting timeouts, and using careful retry mechanisms.
1.1 Provide fallback and degradation plans
If a third‑party service goes down, the business should not collapse. Examples: cache a hot‑product list for recommendation when user‑center fails; use both push (message) and pull (HTTP) sync for data updates, with periodic pull as backup.
1.2 Apply fast‑failure principle with timeouts
Set short timeout thresholds (e.g., 200 ms) for third‑party calls to prevent slow responses from exhausting thread pools and degrading the whole service.
1.3 Choose retry strategies wisely
Only retry when the error is transient; avoid blind retries for business‑logic errors or when retries would increase load on the upstream service.
2 Guard the Users
Assume that all consumers of the API are unreliable and design robust interfaces.
2.1 Design good APIs
Expose only necessary methods, avoid forcing users to make many calls, limit request sizes, provide clear parameter contracts, and return meaningful exceptions.
2.2 Traffic control per service
Implement rate limiting or circuit‑breaker mechanisms to protect services from sudden traffic spikes, similar to a fuse in an electrical circuit.
3 Master Your Own Service
Apply solid engineering principles throughout the development lifecycle.
3.1 Single‑Responsibility Principle
Define clear service boundaries, separate read/write, isolate functionalities into independent services, and keep classes and methods focused on a single task.
3.2 Resource control
Manage CPU, memory, network, and disk usage through algorithm optimization, judicious lock usage, thread pools, JVM tuning, proper collection sizing, object pools, and log management.
3.3 Avoid single points of failure
Deploy services across multiple machines or data centers, enable horizontal scaling, and use sharding or clustering for stateful components.
4 Conclusion
To avoid failures, remember the three‑step mantra: “Suspect third‑party, guard the users, master your own.”
public List<Integer> test() { try { // ... } catch (Exception e) { return Collections.emptyList(); } }
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
