From Personal Homepage to Billion‑Page Site: Lessons in Scalable Web Architecture
This article shares a 14‑year journey of building and evolving website architectures, from a simple personal homepage to billion‑page platforms, highlighting essential accumulation, knowledge structuring, design principles, infrastructure, software engineering practices, and the nuanced differences across business systems.
1: Accumulation Is Essential
Architects are not made in a day. In 1999 I built a personal homepage using Dreamweaver, table layouts, a DB connection and a few lines of PHP, then uploaded it via FTP.
In 2000 the simple page could no longer satisfy curiosity, so I started learning networking fundamentals (the OSI 7‑layer model) and configuring services on Linux, AIX and FreeBSD, including RealServer streaming, FTP, Battle.net gateway, Apache, DNS, Qmail, etc., for school use.
When the site reached about 100 k PV, I began scaling with MySQL master‑slave read/write separation and index optimization. The two‑tier architecture persisted through many later sites, with JSP vs PHP debates having little impact on my approach.
In 2005 I joined a project using Mule ESB, realizing the value of middleware for isolating technical responsibilities.
In 2006 I joined Alibaba Software, working on an external‑trade ERP system, gaining deep understanding of MVC three‑tier architecture and its benefits for scalability and team expansion.
From 2007 onward I applied three‑tier concepts, service‑center layers, and gradually moved from monolithic to distributed designs as traffic grew to near‑tens of millions of daily visits.
In 2008 the Alibaba Open Platform prompted me to design a service‑center (UDB) to handle millions of online users, emphasizing modularity and scalability.
In 2009 I joined the newly formed Aliexpress department, experiencing the evolution from small applications to billion‑PV transaction systems.
2: Knowledge Structure
Website architects come from diverse backgrounds—computer science, art, biology, physics, even police work—but all need solid programming fundamentals: algorithms, design patterns, multithreading, remote calls, and data source handling.
Deep understanding of networking, HTTP protocol, session and cookie management, and DNS behavior is crucial for performance and reliability.
Data format choices (plain text, JSON, XML) affect serialization overhead and QPS.
Mathematical ability to calculate QPS, I/O, CPU, DB connections, and capacity planning is essential.
JVM memory management and GC tuning are vital for large‑scale systems.
Database theory (indexes, storage structures) and avoiding direct DB access from front‑end are fundamental for billion‑PV architectures.
Continuous learning of evolving middleware (e.g., Dubbo, HSF, Redis) and adapting to new frameworks is required.
Understanding business logic and acting as both technical and business architect is key for transaction‑heavy sites.
3: Design Philosophy
Architects aim for at least one‑year foresight, designing for horizontal scalability. Common tactics include:
1> Asynchronous over Synchronous: Move many calls to async (AJAX, message queues) to reduce server RT and improve decoupling.
2> Centralized to Distributed: Start with monolithic services, then split into dedicated services as traffic grows, isolating high‑load components.
3> Layered Architecture: Evolve from two‑tier (app + DB) to three‑tier (app, service‑center, persistence), akin to SaaS, PaaS, IaaS.
4> Functional Decomposition: Break down modules based on traffic, latency, and business change frequency.
5> Service‑Centerization: Encapsulate business logic into stateless services, manage granularity, availability, and routing.
6> Node Monitoring: Implement comprehensive monitoring (snapshot, baseline, key‑node metrics) to detect performance and availability issues.
4: Basic Infrastructure
Large‑scale sites invest heavily in infrastructure: middleware placement, capacity planning, disaster recovery, multi‑region latency considerations, and performance testing.
Monitoring must balance overhead; techniques include log replay, baseline analysis, and message‑driven metrics.
Deployment strategies for 99.99% availability involve symmetric multi‑region setups, data partitioning by buyer, seller, or IP, and asynchronous synchronization.
5: Software Engineering
Beyond code, architects establish release processes, gray‑scale AB testing, coding standards, team mentorship, and testing environments.
Architect roles vary (PM‑type, consulting, business‑focused, algorithmic, performance‑tuning, middleware, infrastructure) depending on site maturity and needs.
6: Differences Across Business System Architectures
Each business domain (e‑commerce, logistics, payment, etc.) has unique architectural challenges; copying another site’s setup is ineffective. Principles and lessons are transferable, but solutions must fit specific contexts.
7: The Vitality of Small Trends
Open platforms, social networking, group buying, mobile tablets, visual recommendation flows, cross‑border e‑commerce, IoT, and long‑tail dynamics each influence architectural decisions over time.
8: Last but Not the Least
Building a site mirrors building a person: a solid skeleton (architecture) plus a good mindset (openness, attitude) are essential. Convincing stakeholders, proactive problem solving, and collaborative effort define a competent architect.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
