How We Scaled the Duosuo English App: Architecture Lessons from Day One to Four Months
This article details the technical background and evolution of the Duosuo English learning app, covering initial architecture, bandwidth estimation, risk control, database sharding, code refactoring, and operational lessons learned over four months of scaling.
APP Technical Background
Duosuo English is a free English learning app that combines AI interaction with real‑time foreign‑teacher speaking practice, supporting offline and online modes. It offers rich content, automatic speech recognition with scoring, and a complete listening‑speaking‑reading‑writing environment.
Initial Architecture Design
2‑1 Bandwidth and Traffic Estimation
Virtual assessment based on market feedback and investment.
Use Poisson distribution to estimate average concurrency and peak values.
Apply the 2/8 rule to calculate data traffic from the peak.
2‑2 Risk Control
Encrypt data transmission.
Circuit‑break and redirect malicious single‑interface requests.
Real‑time monitoring and alerts for slow requests.
2‑3 Server Architecture without Single Points of Failure
Database: master‑slave with multiple read replicas, vertical split, horizontal scaling.
LVS to mount multiple web front‑ends per business, enabling horizontal scaling.
Primary‑secondary storage.
2‑4 Business‑Level Sharding
Loose coupling isolation by major business data blocks.
Initial unclear growth allows coarse‑grained business isolation.
2‑5 Session Storage
Early sessions stored as files with multi‑level hash directories.
2‑6 Log Storage
Logs stored on a log server and analyzed with AWK, which was inconvenient for troubleshooting.
Architecture Evolution After Four Months
3‑1 Rapid Data Growth and Table Partitioning
Market response led to fast data growth and single‑table bottlenecks.
Implemented UID sharding and date‑based table splitting, with periodic archiving.
Partitioning reduces pressure but introduces multi‑table query challenges; DB middleware may help.
3‑2 Multi‑Business Sharding Optimization
Version iterations increase business complexity, requiring clear technical and data separation.
Strong coupling between services hampers maintenance and scaling.
Business isolation starts with data‑layer splitting, producing modules like quests, tasks, user‑center.
3‑3 Code Architecture Refactoring
Initial code mixed all logic in one block, making it unreadable. A four‑layer design was introduced:
C layer: parameter filtering and request forwarding.
M layer: single‑interface business logic.
T layer: common parts for specific business interactions.
D layer: database interaction.
Resulting in high cohesion and low coupling.
Current Service Architecture
The system remains layered with independent services for isolation.
Search service for fast data aggregation (e.g., real‑time PK selection).
Log service: collection (Fluentd), aggregation (Mongo + Elasticsearch), visualization (Kibana).
Multi‑level cache: Redis, MongoDB, Elasticsearch for hot data.
Queue: Redis‑queue; Cloud storage: Upyun.
Push, short messages, virtual currency store, chat – third‑party integration.
Lessons Learned and Pitfalls
5‑1 Peak‑Shaving
Traffic spikes occur at focused points like push notifications; we mitigate by traffic diversion.
5‑2 Offline‑Online Mode Issues
Duplicate uploads and submissions are common; timestamp isolation helps but is not optimal.
5‑3 Optimizing Single‑Table Capacity (up to 50 million rows)
Appropriate indexing and suitable index types.
Design data types with minimal size.
Avoid storing large fields.
Further table optimizations are possible.
5‑4 Data Archiving
Currently manual; automation is planned.
5‑5 Session Storage Improvements
Sessions switched from file‑system storage to token‑based approach to address storage and read‑efficiency issues.
These experiences are shared for discussion with peers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
