Backend Development 16 min read

Why High Performance Makes Software Architecture So Complex—and How to Tame It

The article analyzes how the relentless pursuit of high performance drives both single‑machine and cluster‑level architectural complexity, explaining the evolution from batch processing to multi‑core CPUs, the trade‑offs of processes, threads, SMP/NUMA/MPP, and the challenges of task allocation and decomposition in large‑scale systems.

IT Architects Alliance

Nov 29, 2020

Why High Performance Makes Software Architecture So Complex—and How to Tame It

In this piece we explore how the quest for ever‑higher performance is a major source of software system complexity, covering both the internal complexity of a single machine and the added complexity of multi‑machine clusters.

Single‑Machine Complexity

The operating system is the core of internal complexity because it must fully exploit hardware advances, especially CPU speed. Early computers lacked an OS; batch processing was introduced to avoid manual input delays. Batch jobs, however, could only run one task at a time, leading to idle CPU cycles during I/O.

Processes were created to represent independent tasks with separate memory spaces, scheduled by the OS using time‑slicing. Although the CPU still executed tasks sequentially, fast CPUs made the experience appear parallel to users. Inter‑process communication mechanisms (pipes, message queues, semaphores, shared memory) were added to allow coordination.

Threads later emerged as lightweight sub‑tasks within a process, sharing the same address space. Mutual‑exclusion locks ensure data correctness. Yet both processes and threads remain fundamentally time‑shared; true parallelism requires multiple CPUs.

Three main multi‑CPU architectures provide genuine parallel execution: SMP (Symmetric Multi‑Processor), NUMA (Non‑Uniform Memory Access), and MPP (Massive Parallel Processing). SMP is the most common today, underpinning modern multi‑core processors.

Cluster‑Level Complexity

Business growth often outpaces hardware improvements, forcing large‑scale services (e.g., Alipay’s 120 k transactions/sec, WeChat’s 760 k red‑packet events/sec) to adopt clusters of thousands of machines.

1. Task Allocation

When scaling from one server to many, a task allocator (hardware load balancer, software like LVS/Nginx/HAProxy, or a custom solution) must be introduced. This adds complexity in choosing the right device, configuring connections, and implementing allocation algorithms (round‑robin, weighted, load‑aware, etc.).

As traffic grows, the allocator itself can become a bottleneck, requiring multiple allocators and a many‑to‑many network topology, further increasing state management and fault‑handling challenges.

2. Task Decomposition

Beyond adding machines, breaking a monolithic service into smaller subsystems (e.g., WeChat’s separate modules for registration, messaging, LBS, etc.) allows targeted scaling. Simpler subsystems are easier to optimise, and bottlenecks can be addressed without touching the entire codebase.

However, over‑decomposition hurts performance: each additional subsystem introduces extra network calls. A simple model shows that splitting a service into 100 parts can raise the per‑request latency from 51 ms to 149 ms due to the cumulative network round‑trips.

The key is to find a balanced granularity—enough to isolate hot spots, but not so fine‑grained that inter‑service communication dominates.

Conclusion

High‑performance requirements increase architectural complexity on two fronts: inside a single machine (processes, threads, multi‑CPU designs) and across clusters (task allocation, task decomposition). Effective architecture design must weigh these trade‑offs and choose appropriate solutions for the specific business context.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend distributed systems Performance Software Architecture scalability complexity

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.