Designing Scalable and Highly Available Systems on Azure: Patterns, Anti‑Patterns, and Practical Guidance
This article examines key considerations for building highly scalable and available systems on Azure, outlining four architectural dimensions—scalability, availability, manageability, and feasibility—while discussing patterns, anti‑patterns, measurable resources, queue‑based load balancing, authentication services, and common pitfalls such as configuration errors and SQL injection.
When designing a system with high scalability and availability, the most critical decision is the architectural choice. Using Azure customers as examples, Microsoft discusses observed patterns and anti‑patterns and how they affect four architectural aspects.
Scalability
Scalability comes from two angles: resources and density. Resources refer to adding more hardware (e.g., an extra web server behind a load balancer) while density is the efficiency of using existing capacity; traditional performance tuning can greatly increase density.
Point‑Money‑Lighting
During the talk a recurring theme was “point‑money‑lighting,” meaning doing inefficient work for no reason, such as using NAT instead of a proper load balancer or choosing XML as an internal data‑exchange format.
Measurable Resources
Some resources must be carefully monitored; for example, database connections are a measurable resource. Overusing them reduces density. In Azure SQL Standard, each database allows only 180 connections, while the default ADO.NET pool is 100. Two web servers leaking connections can easily exceed the limit. Other measurable resources include authentication servers and third‑party services, often called “invisible resources” because developers tend to overlook them.
Load Balancing via Queues
Peak write loads can be mitigated by using a queue to trade latency for availability. New data is placed in a queue monitored by a background process, smoothing the load on the database and allowing the database to be continuously utilized rather than alternating between busy and idle periods.
Queues also enable batch processing, which is far faster than inserting records one‑by‑one, and they add a decoupling point: if the background process or database fails, the front‑end can still accept new data.
Improving Queue Availability
If too many messages arrive simultaneously, a secondary queue can absorb the overflow. Applications should be designed to support multiple queues even if initially only one is deployed. When a message exceeds the size the queue can handle, it can be stored in blob storage and the queue message replaced with a pointer to that blob.
Web‑Server Availability
To keep web servers available, all downstream calls must be asynchronous and bounded by timeouts and concurrency limits. Ignoring concurrency can cause failures, as illustrated by a two‑hour outage of Visual Studio Online caused by an overload of an external authentication server.
Authentication Service
When an authentication server fails, it should be replaceable by another stable service; Microsoft therefore strongly recommends using federated authentication servers.
Recording Erroneous Data
Developers often validate data but are unsure what to do on failure. Instead of discarding data, the original payload should be logged so developers can diagnose why a request was erroneous. Most bad requests stem from version mismatches between client and server.
Anti‑Pattern: Configuration
Hard‑coded connection strings and configuration data are still seen in client code reviews; when configuration must change to point to different hardware, this becomes a real problem.
Anti‑Pattern: Assuming Database Reliability
Modern developers often assume databases are always reachable and rarely code for failures; when they do, they frequently mishandle exceptions, leading to data loss.
Anti‑Pattern: SQL Injection
SQL injection remains common; many basic network requests already expose obvious injection vulnerabilities.
Anti‑Pattern: Logging to the Wrong Resource
Logging infrastructure should be isolated from the application stack. Writing logs to the same database as product data means a database loss also loses logs.
Anti‑Pattern: Re‑throwing Exceptions
Two common mistakes are using throw ex; instead of throw;, which discards the stack trace, and re‑throwing exceptions without higher‑level handlers, causing the entire application to crash in .NET 2.0 and later.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
