Operations 13 min read

Why Trust Less? Defensive Strategies for High‑Performance, High‑Availability Systems

The article explores how adopting a "don't trust" mindset—through rigorous input validation, defensive coding, thorough testing, gradual rollouts, and comprehensive monitoring—helps build resilient, high‑performance systems and avoid common pitfalls in development and operations.

ITFLY8 Architecture Home

Sep 30, 2017

Why Trust Less? Defensive Strategies for High‑Performance, High‑Availability Systems

Introduction

Trust is essential between people, but in the world of programming less trust can be better; the author reflects on how the "don't trust" principle becomes crucial in high‑performance, high‑availability systems and shares personal and observed pitfalls.

1. The Programming World Is Full of Traps

Programming is both easy and hard: basic constructs are simple, but building performant, reliable code requires extensive knowledge. Like a Minesweeper game, the codebase contains many hidden pitfalls that can cause a sudden "Game Over".

Just as Minesweeper players mark dangerous squares, developers must proactively identify and guard against risky inputs and outputs.

2. Distrust of Input

(1) Null‑pointer checks – Every pointer use should be preceded by a NULL check, and freed pointers should be set to NULL. A real case: a registration system crashed when a log statement used an unchecked string pointer during a timeout path.

(2) Length checks – When copying strings or buffers (e.g., memcpy, strcpy), always verify and truncate lengths. A real case: an OAuth service suffered a core dump after receiving an oversized request payload.

(3) Content checks – Failing to validate data content can lead to SQL injection or XSS attacks.

3. Distrust of Output (Changes)

Changes often manifest in output, which may be complex (e.g., encrypted data). Therefore, each modification should be treated as potentially unsafe:

Adopt "untrusted coding" – verify the impact of even tiny changes.

Test thoroughly before release.

Use gray‑release strategies (machine, IP, user, or proportion based) to limit blast radius.

Monitor comprehensively after deployment (request volume, success/failure rates, key metrics).

Case: an OAuth system introduced an uninitialized variable that altered a packet header; the issue escaped testing and monitoring, causing downstream failures.

4. The Service World Is Full of Uncertainties

Systems have upstream and downstream dependencies; any node can fail unexpectedly, so every link must be defended.

4.1 Distrust of the Service Itself

Key measures:

Service monitoring (request count, success rate, key nodes).

Automated testing to simulate real scenarios.

Process auto‑restart to mitigate core dumps.

4.2 Distrust of Dependent Systems

Apply flexible availability strategies:

Non‑critical paths: limited retries or skip logic when timeout ratio is high.

Critical paths: provide degraded service (e.g., algorithm‑only tickets when storage is unavailable).

4.3 Distrust of Requests

Source distrust – enforce permission controls (IP, module, whitelist, login state) and security audits.

Volume distrust – handle traffic spikes with rate limiting and overload protection, discarding excess requests to avoid avalanche failures.

5. Operations Are Unpredictable

5.1 Distrust of Machines

Mitigate single‑point failures with disaster‑recovery deployment (multiple machines) and heartbeat detection for automatic failover.

5.2 Distrust of Data Centers

Use multi‑region or multi‑IDC deployment and maintain capacity redundancy (e.g., double the capacity for login services) to survive whole‑site outages.

5.3 Distrust of Power

Back up data locally (disk) and remotely to survive power loss.

5.4 Distrust of Network

Employ proximity routing (e.g., CMLB) and network probing to select optimal paths; automatically disable unhealthy nodes while allowing healthy ones.

5.5 Distrust of Humans

Record every operation, require reviews, verify effects in production, keep rollback plans, automate deployments, and perform consistency checks across machines and configurations.

Note: The listed distrust strategies often need to be combined with other measures.

Conclusion

In the programming world, adhering to the "don't trust" principle and setting defenses everywhere is essential for building robust systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring deployment system reliability Defensive Programming trust principle

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.