Logical Coupling, Service Layer Design, and Distributed System Architecture for Large-Scale Web Applications
The article examines the inevitability of service coupling in large‑scale web applications and proposes a two‑dimensional architecture that separates business and logic layers, uses internal data stores, introduces a naming‑and‑location service, selects appropriate transport and RPC protocols, and automates operations with health checks, load balancing, and failover to achieve continuous reliability.
This article discusses the practical challenges of service coupling in large‑scale web systems and proposes architectural strategies to manage them.
It begins by revisiting the business‑level coupling problem introduced in earlier stages, illustrating how seemingly independent services such as blog and image still rely on each other through URLs, and how user‑authentication services introduce tighter coupling that cannot be solved by simple HTTP calls.
The author argues that while design patterns (MVC, Struts, Spring, Hibernate) improve code reuse, coupling is inevitable in real deployments. Services are classified into two groups:
Product‑related services (e.g., blog, news) that usually communicate via lightweight HTTP interfaces.
Infrastructure services (e.g., account management, messaging) that must provide secure, stable, and generic APIs and often evolve from a business service to a shared foundation.
To reduce coupling, the system is divided vertically (by business) and horizontally (by logic). The article includes a diagram (image) showing this two‑dimensional partition.
Data‑access and storage layers are kept internal and not exposed externally, while the logic and view layers serve as the primary entry points, supporting both HTTP and non‑HTTP protocols.
In the Data Storage Optimization stage, the author reviews relational databases (MySQL, PostgreSQL, Oracle, etc.) and NoSQL solutions (Cassandra, MongoDB, Redis), emphasizing that most operations reduce to CRUD (select, insert, delete, update). Optimizations are described for workloads with skewed read/write patterns, such as micro‑blogging platforms that prioritize insert and query performance.
The article then examines disk I/O characteristics, contrasting write‑ordered storage (high write efficiency, lower read efficiency) with query‑ordered storage (high read efficiency, lower write efficiency), and suggests hybrid approaches to balance the two.
Next, the Naming & Location Service (NLS) is introduced as a meta‑server that registers service endpoints (IP, port, protocol) and provides health checking, load balancing, and discovery. Two ways to obtain a service instance are shown:
Resource r = new ImageResource(); Resource r = ResourceFactory.get("Image"); Resource r = (Resource)Container.getInstance("ImageResource"); Service s = (Service) NamingService.getService("Image");Implementation considerations include single‑node vs. multi‑node deployment, availability, consistency, and generic key‑value storage for metadata.
The article proceeds to discuss Transport Protocols, APIs, and Remote Procedure Calls (RPC) . It compares binary protocols (e.g., protobuf, Java serialization) with text‑based protocols (XML, JSON), and outlines how to combine existing protocols such as HTTP+JSON for extensibility.
API description languages (IDL) are presented as a way to generate documentation and client/server stubs, illustrated with diagrams of stub/skeleton generation.
Finally, the Operations Design for Distributed Systems is covered. The author stresses that manual management does not scale to thousands of machines and proposes an automated management model consisting of:
Access layer (Web/Private Protocol servers) registering with NLS.
Logic layer (containerized processing units) also registering with NLS and handling stateless HTTP requests.
Storage layer with heterogeneous engines (relational, NoSQL, KV) hidden behind unified interfaces.
Redundancy, health checks, load balancing, and automatic failover are emphasized to achieve 7×24 reliability.
Overall, the article provides a comprehensive guide to designing, optimizing, and operating large‑scale backend systems.
Baidu Tech Salon
Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.