Why Trust Less? Defensive Strategies for High‑Performance, High‑Availability Systems
The article explores how adopting a "don't trust" mindset—through rigorous input validation, defensive coding, thorough testing, gradual rollouts, and comprehensive monitoring—helps build resilient, high‑performance systems and avoid common pitfalls in development and operations.
Introduction
Trust is essential between people, but in the world of programming less trust can be better; the author reflects on how the "don't trust" principle becomes crucial in high‑performance, high‑availability systems and shares personal and observed pitfalls.
1. The Programming World Is Full of Traps
Programming is both easy and hard: basic constructs are simple, but building performant, reliable code requires extensive knowledge. Like a Minesweeper game, the codebase contains many hidden pitfalls that can cause a sudden "Game Over".
Just as Minesweeper players mark dangerous squares, developers must proactively identify and guard against risky inputs and outputs.
2. Distrust of Input
(1) Null‑pointer checks – Every pointer use should be preceded by a NULL check, and freed pointers should be set to NULL. A real case: a registration system crashed when a log statement used an unchecked string pointer during a timeout path.
(2) Length checks – When copying strings or buffers (e.g., memcpy, strcpy), always verify and truncate lengths. A real case: an OAuth service suffered a core dump after receiving an oversized request payload.
(3) Content checks – Failing to validate data content can lead to SQL injection or XSS attacks.
3. Distrust of Output (Changes)
Changes often manifest in output, which may be complex (e.g., encrypted data). Therefore, each modification should be treated as potentially unsafe:
Adopt "untrusted coding" – verify the impact of even tiny changes.
Test thoroughly before release.
Use gray‑release strategies (machine, IP, user, or proportion based) to limit blast radius.
Monitor comprehensively after deployment (request volume, success/failure rates, key metrics).
Case: an OAuth system introduced an uninitialized variable that altered a packet header; the issue escaped testing and monitoring, causing downstream failures.
4. The Service World Is Full of Uncertainties
Systems have upstream and downstream dependencies; any node can fail unexpectedly, so every link must be defended.
4.1 Distrust of the Service Itself
Key measures:
Service monitoring (request count, success rate, key nodes).
Automated testing to simulate real scenarios.
Process auto‑restart to mitigate core dumps.
4.2 Distrust of Dependent Systems
Apply flexible availability strategies:
Non‑critical paths: limited retries or skip logic when timeout ratio is high.
Critical paths: provide degraded service (e.g., algorithm‑only tickets when storage is unavailable).
4.3 Distrust of Requests
Source distrust – enforce permission controls (IP, module, whitelist, login state) and security audits.
Volume distrust – handle traffic spikes with rate limiting and overload protection, discarding excess requests to avoid avalanche failures.
5. Operations Are Unpredictable
5.1 Distrust of Machines
Mitigate single‑point failures with disaster‑recovery deployment (multiple machines) and heartbeat detection for automatic failover.
5.2 Distrust of Data Centers
Use multi‑region or multi‑IDC deployment and maintain capacity redundancy (e.g., double the capacity for login services) to survive whole‑site outages.
5.3 Distrust of Power
Back up data locally (disk) and remotely to survive power loss.
5.4 Distrust of Network
Employ proximity routing (e.g., CMLB) and network probing to select optimal paths; automatically disable unhealthy nodes while allowing healthy ones.
5.5 Distrust of Humans
Record every operation, require reviews, verify effects in production, keep rollback plans, automate deployments, and perform consistency checks across machines and configurations.
Note: The listed distrust strategies often need to be combined with other measures.
Conclusion
In the programming world, adhering to the "don't trust" principle and setting defenses everywhere is essential for building robust systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
