Why a Million‑Line Monorepo Works: Lessons from Alibaba’s Quick BI
This article shares how Alibaba’s Quick BI team successfully manages a monorepo with over a million lines of TypeScript, achieving fast cold‑starts, efficient code reviews, and scalable architecture through strict standards, automated tooling, and data‑driven process improvements.
In recent years Alibaba’s data‑center product Quick BI has grown rapidly, becoming the only domestic BI solution listed in Gartner’s Magic Quadrant for two consecutive years. Its single‑repository source code exceeds one million lines, with 820,000 lines of TypeScript and 180,000 lines of Sass/Less/CSS (excluding generated code).
Key metrics:
Code: 820k TypeScript, 180k styles.
Collaboration: 12,111 code reviews, 53,026 commits.
Despite the large codebase, the team chose a monorepo (single repository) rather than splitting into many repos or adopting micro‑frontend/Serverless approaches. Startup time, which initially took several seconds, later grew to 5‑10 minutes, and was eventually reduced back to about 5 seconds through engineering optimizations.
Why Monorepo?
The team found that a large codebase can be beneficial when supported by a simple architecture, clear standards, close collaboration, and efficient execution. Problems that can be solved by engineering should not be forced into development conventions, and vice‑versa.
Core Monorepo Questions
1. Does a single repository become too large?
The code volume is calculated as source size + .git size + resource files. Assuming 100 characters per line, 1,000,000 lines equal roughly 100 MB of source code. In practice the repository is about 85 MB.
The .git directory stores commit history efficiently; 10,000 commits add only 1–3 MB. Resource files can inflate size, but after cleaning (e.g., using BFG), a 22 GB repo was reduced to 200 MB.
Thus a million‑line codebase typically occupies 200–400 MB, and ten million lines would be around 2–4 GB, comparable to a large node_modules folder.
2. Is startup slow?
Three tactics were applied:
Split the application into multiple entry points, loading only one at a time.
Refine inter‑package dependencies and maximize lazy loading and tree‑shaking.
Replace Webpack with Vite.
After switching to Vite, cold‑start time dropped from 2–5 minutes to under 5 seconds, and hot‑compile time fell from 5 seconds to about 1 second (often <500 ms on Apple M1).
3. How to handle code reuse?
The team avoids excessive DRY; instead, they focus on maintainability. Reusable modules are packaged as separate npm packages (e.g., @alife/bi-designer) and imported selectively via tree‑shaking.
Current Development Experience
Cold start ~5 seconds, hot compile ~1 second.
Changes are isolated to a single line and deployed once.
New developers can set up the environment in ~10 minutes.
Version alignment issues are eliminated.
Engineering upgrades are performed once using a Lerna‑based Pri Monorepo solution.
Problems Still to Solve
Beyond simply merging code, challenges remain in collaboration, technical solutions, and stability (preventing a single commit from breaking the whole product).
1. Package Dependency Management
Packages are arranged with a left‑to‑right, single‑direction dependency graph to avoid cycles. Automated checks enforce this rule.
Open‑source npm packages are introduced only after a three‑person review, due to concerns about long‑term maintenance.
2. Code Review Culture
The team enforces 100 % code review, encouraging small, frequent merge requests and clear, human‑readable code. Reviews are categorized into:
Online MR review (1‑to‑1).
Thematic review (3‑5 participants).
Pre‑release collective review (all).
Best practices include timely reviews, plain language code, standardized directory structures, and avoiding flashy, hard‑to‑maintain techniques.
3. Engineering Automation
Tools such as ESLint, TypeScript type checking, and Prettier enforce syntax and style rules. Webpack is used for production builds, while Vite provides fast development feedback.
4. Performance Optimization
Three focus areas:
Resource loading: fine‑grained tree‑shaking and lazy loading of heavy components.
View rendering: minimize re‑renders, use virtual scrolling for tables.
Data fetching: local caching and PWA techniques for mobile.
Performance monitoring tools alert developers when package size grows.
5. Data‑Driven Architecture Optimization
Metrics on startup time and environment configuration are collected to identify bottlenecks. For example, unifying Node.js versions across the team became visible only after reporting.
Summary and Outlook
A million‑line codebase is not frightening; with proper processes it remains agile. Quick BI is now approaching ten million lines, aiming to become a world‑class BI platform. Future work includes deeper data‑analysis integration, cross‑device support, and further architectural refinements such as introducing Redux‑Toolkit for data flow.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
