Methodology for Internet Architecture Technical Review and Capacity/Performance Evaluation
This article presents a comprehensive methodology for reviewing internet‑scale system architectures, focusing on non‑functional quality attributes such as performance, availability, scalability, security, and maintainability, and provides detailed guidelines, metrics tables, and a classic case study for capacity and performance planning.
Background
In the IT industry, fundamental technical skills are akin to the "inner kung fu" of Shaolin, while frameworks represent the "sword techniques". Enterprise‑level development emphasizes complex business logic and high reusability, whereas internet development focuses on decomposing responsibilities and optimizing non‑functional qualities like high availability, performance, scalability, security, stability, and maintainability.
This article offers a basic methodology for internet‑oriented technical reviews, helping developers and architects evaluate how well a system meets functional and non‑functional requirements.
Goals
2.1 Overview of Non‑Functional Quality Requirements
Reference technical review indicators to ensure system architecture satisfies user and system non‑functional demands.
Core Non‑Functional Qualities:
Core Quality
Description
High Performance
High efficiency and cost‑effectiveness
Availability
Continuous availability, reduced downtime, error recovery, reliability
Scalability
Vertical and horizontal scaling
Extensibility
Pluggable, component reuse
Security
Data security, encryption, circuit‑breaker, attack resistance
Other Non‑Functional Qualities:
Other Quality
Description
Observability
Fast detection, location, and resolution
Testability
Canary releases, previews, mocks, decomposition
Robustness
Fault tolerance, recoverability
Maintainability
Easy maintenance, monitoring, operation, expansion
Reusability
Portability, decoupling
Usability
Operability
2.2 Specific Indicators for Non‑Functional Requirements
The indicators are divided into four parts: application servers, databases, caches, and message queues.
2.2.1 Application Server
The application server is the entry point; its traffic determines the load on databases, caches, and queues. Key metrics include peak requests per second and response time.
Consider the following metrics:
Metric Category
Deployment Structure
Capacity & Performance
Other
1
Load‑balancing strategy
Daily request volume
Whether requests contain large objects
2
High‑availability strategy
Peak per‑interface traffic
GC collector selection and configuration
3
I/O model (NIO/BIO)
Average response time
4
Thread‑pool model
Maximum response time
5
Thread‑pool size
Concurrent users
6
Mixed business deployment
Request size
7
Network card I/O traffic
8
Disk I/O load
9
Memory usage
10
CPU usage
2.2.2 Database
Based on application traffic, calculate required QPS, TPS, and daily data volume to size the database.
Consider the following metrics:
Metric Category
Deployment Structure
Capacity & Performance
Other
1
Replication model
Current data volume
Whether queries use indexes
2
Failover strategy
Daily data growth (estimated)
Presence of large‑data queries
3
Disaster‑recovery strategy
Read peak per second
Multi‑table joins and index usage
4
Archiving strategy
Write peak per second
Pessimistic vs. optimistic locking, row‑level locks
5
Read‑write separation
Transaction volume
Transaction consistency level
6
Sharding strategy
JDBC datasource type, connection count
7
Cache static/semistatic data
Enable JDBC diagnostic logging
8
Cache penetration protection
Stored procedures usage
9
Cache invalidation & warm‑up
Sharding strategy for partitioned tables
10
Cache invalidation & warm‑up
Implementation method for horizontal sharding (client, proxy, NoSQL)
2.2.3 Cache
Evaluate cache size and access peaks based on hot data proportion.
Consider the following metrics:
Metric Category
Deployment Structure
Capacity & Performance
Other
1
Replication model
Cache size
Cold‑hot data ratio
2
Failover
Number of cached items
Possibility of cache penetration
3
Persistence strategy
Expiration time
Presence of large objects
4
Eviction strategy
Data structure
Use of cache for distributed locks
5
Thread model
Read peak per second
Support for cache scripting
6
Warm‑up method
Write peak per second
Avoidance of race conditions
7
Sharding hash strategy
Cache sharding method (client, proxy, cluster)
2.2.4 Message Queue
Calculate required message‑queue capacity and throughput based on application traffic.
Consider the following metrics:
Metric Category
Deployment Structure
Capacity & Performance
Other
1
Replication model
Daily data increment
Consumer thread‑pool model
2
Failover
Message expiration
Sharding strategy
3
Persistence strategy
Read peak per second
Reliable delivery
4
Write peak per second
5
Message size
6
Average latency
7
Maximum latency
3 Technical Review Outline
The outline helps architects organize thoughts and produce an implementable design.
3.1 Current Situation
Business Background
Project name
Business description
Technical Background
Architecture description
Current system capacity (average calls)
Current peak calls
3.2 Requirements
Business Requirements
Items to be refactored
New functional requirements
Performance Requirements
Estimated average system load
Estimated peak load
Other non‑functional qualities (e.g., security, scalability)
3.3 Solution Description
Solution 1
The solution must consider all metrics from the technical review checklist to satisfy non‑functional quality demands.
Overview – one‑sentence highlight (e.g., dual‑write, master‑slave, sharding, scaling, archiving)
Detailed description – include diagrams if needed (middleware architecture, logical architecture, data architecture, fault handling, disaster recovery, gray‑release)
Performance evaluation – baseline data and resource estimation
Pros and cons – quantified advantages and disadvantages
Solution 2
Similar structure, tailored to alternative trade‑offs.
3.4 Solution Comparison
Compare alternatives and justify the chosen one.
3.5 Risk Assessment
Identify risks and propose mitigation or rollback strategies.
3.6 Workload Estimation
Detail tasks for development, testing, and deployment; present a simple task‑plan table.
4 Classic Capacity & Performance Case Study
4.1 Background
The logistics system has two priority quality demands: maintaining members' frequent addresses and asynchronously generating logistics orders while polling third‑party status.
4.2 Target Data Volume
Use a leading e‑commerce platform as reference: 200 million members (growth 5 万/day) and 14 million orders/day during promotion.
4.3 Evaluation Standards
General Standards
Capacity calculated with 5× redundancy.
Address data retained for 30 years; logistics orders for 3 years.
Third‑party query interface: 5 000 QPS.
MySQL
Read: 1 000 QPS per port.
Write: 700 TPS per port.
Single table capacity: 50 million rows.
Redis
Read: 40 000 QPS per port.
Write: 40 000 TPS per port.
Memory per port: 32 GB.
Kafka
Read: 30 000 QPS per node.
Write: 5 000 TPS per node.
Application Server
Peak request rate: 5 000 QPS.
4.4 Solution
Solution 1 – Maximum Performance
Designed for peak traffic of a top‑tier e‑commerce site.
Requirement 1 – Member Frequent Addresses
Read QPS calculated as (14 M × 0.5) / (2 h) ≈ 1 000 /s; with 5× redundancy → 5 000 QPS, requiring 5 read ports. Write TPS calculated as (14 M × 0.2 + 5 万) / (2 h) ≈ 400 /s; with 5× redundancy → 2 000 TPS, requiring 3 write ports. Data volume: (200 M + 5 万 × 365 × 30) × 5 ≈ 35 billion rows; with 5× redundancy → 175 billion rows, fitting into 350 tables (rounded to 512). Design result: 4 ports × 32 databases × 4 tables per DB, master‑8‑slave configuration. Requirement 2 – Logistics Orders & Records Read QPS ≈ 250 /s (address lookup) → 2 500 QPS with redundancy → 3 read ports. Write TPS ≈ 1 000 /s (order creation) + 1 200 /s (record insertion) → 2 200 /s; with 5× redundancy → 11 000 TPS, requiring 15 write ports. Data volume: 46 billion rows (orders + records) → 230 billion with redundancy, needing 4 096 tables. Design result: 16 ports × 32 databases × 8 tables per DB, master‑16‑slave. Message queue: Kafka 1 node with a processing machine suffices; can scale horizontally if needed. Application servers: 2 – 3 nodes to handle combined read/write peaks. Solution 2 – Minimal Resources Assumes current traffic is low; a single database instance with one port can handle the load, but the design retains sharding and scaling hooks for future growth. Design results: Member addresses: 1 port × 32 DB × 16 tables, master‑1‑slave. Logistics orders/records: 1 port × 128 DB × 32 tables, master‑1‑slave. 4.5 Summary The minimal‑resource solution is preferred because current traffic is modest, it saves cost, yet keeps sharding hooks for future scaling and allows optional activation of cache and message‑queue components. 5 Performance Evaluation Reference Standards Values are based on typical x86 PCs; adjust according to actual hardware. Capacity calculated with 5× redundancy. Sharding typically stores 30 years of data. Third‑party query interface: 5 000 QPS. Average DB row size ≈ 1 KB. 6 Conclusion The article outlines a methodology for internet‑scale non‑functional quality assessment, provides a detailed review checklist, and demonstrates a classic capacity‑performance case study to help architects design, evaluate, and scale high‑concurrency systems. All data are based on the author’s experience on a specific platform and serve as a methodological reference rather than a one‑size‑fits‑all solution.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.