Backend Development 24 min read

How Meituan Scaled Its Code Hosting Platform to Millions of Repositories

This article details Meituan's three‑stage evolution of its self‑developed Code platform—from a single‑machine service to a multi‑machine read‑write‑separated system and finally to a distributed, sharded architecture—highlighting the scalability and high‑availability challenges faced and the engineering solutions implemented.

21CTO

Feb 5, 2023

How Meituan Scaled Its Code Hosting Platform to Millions of Repositories

1. Introduction

Code is Meituan's self‑developed code‑hosting platform, providing Git version control, branch management, and code review, integrated with many R&D workflow tools. After nearly three years it hosts tens of thousands of repositories and handles tens of millions of Git requests daily.

2. Evolution of Meituan’s Code‑Hosting Platform

2.1 Development History

The platform evolved through three stages: single‑machine deployment, multi‑machine deployment, and a self‑developed distributed system.

Stage 1: Single‑Machine Deployment

Initially the service ran on a single web server. Git operations were performed directly on disk, making high‑IO storage critical.

Stage 2: Multi‑Machine Deployment

Growth in users and CI pipelines exposed storage and CPU bottlenecks (80% SSD usage, >95% CPU/IO during peaks). A read‑write‑separation architecture was introduced, ensuring data consistency from the user’s perspective.

Storage : SSD capacity was saturated.

Load : CPU/IO peaked over 95%.

Writes occur only on the primary node.

Lazy synchronization triggers on reads; failures fall back to the primary.

Primary‑fallback mode disables replicas during emergencies.

Requests are split by protocol (HTTP or SSH) and routed to primary or secondary nodes using a round‑robin algorithm, doubling overall read throughput. Agents on secondary nodes trigger fetch operations to keep data consistent.

Stage 3: Self‑Developed Distributed Platform

Even with read‑write separation, repository data kept growing and availability issues appeared (service downtime for upgrades, cold‑backup data‑loss risk). A distributed architecture based on application‑layer sharding was adopted, providing high availability (three‑replica active‑active) and horizontal scalability.

High Availability : Three‑replica active‑active, no downtime for upgrades.

Horizontal Scaling : Capacity expands by adding shard clusters.

Underlying storage built on GitLab open‑source components.

All services communicate via gRPC for efficient binary transfer.

Routing module isolates logical and physical layers.

Active‑active replication boosts read/write throughput.

Optimizations for hot‑spot migration and cross‑shard data sharing.

3. Architecture Evolution: Implementation and Challenges

The platform achieved two goals: high scalability and high availability.

3.1 Scalability Goal

3.1.1 Technical Challenges

Scale : Must support millions of requests with low latency.

Compatibility : Preserve existing APIs and workflows while transitioning storage back‑ends.

3.1.2 Solution Selection

Compared shared‑storage and application‑layer sharding approaches; chose sharding for better performance and maturity.

3.1.3 Design

Key modules:

Proxy Module

SSH Proxy : Handles Git‑SSH requests, performs key‑based authentication, traffic control, long‑connection timeout, and forwards to gRPC.

HTTP Proxy : Handles Git‑HTTP/Web requests, routes based on shard mapping, supports gray‑release traffic.

Routing Module

Implements shard mapping, decides read/write routing, average response <15 ms.

Maps repository ID to shard.

Uses Go routines to fetch node sync status with timeout.

Caches mappings in KV store.

Application Module

Provides Git‑related business logic (code review, etc.).

Listens to repository and branch change events for downstream integration.

3.1.4 Solution Approach

Scalability : Strict arbitration routing (N=3, R=W=2) achieving 99.999 % availability; read‑repair on inconsistencies.

Compatibility : Ensure zero‑impact migration, keep APIs unchanged, provide visual one‑click migration, proxy abstraction, and shared data for keys.

3.2 Availability Goal

3.2.1 Technical Challenges

Data safety and correct read visibility.

Prevent single‑point data loss.

Ensure users read correct code.

3.2.2 Solution Selection

Adopted multi‑active replication per repository (three‑node replica) to improve write throughput and avoid single‑master bottleneck.

3.2.3 Design

Storage module consists of:

Git Server : Based on GitLab, provides gRPC API.

Replication Manager : Handles read/write/initialization replication logic.

Code Core : Core Git service.

Git Core : Integrated open‑source Git components.

Git Command Factory : Controls process count and parameters.

Git Cluster (shard) comprises three Git Server nodes replicating via gRPC.

3.2.4 Solution Approach

Data Consistency : Refs must match across nodes; multi‑active replication with “single‑write” per repository and data‑safety lock.

Cross‑Data‑Center Backup : Nodes span at least two data centers; failover within 30 minutes.

Hot‑Backup : Immediate replication of writes, zero‑delay consistency.

Write phase uses repository‑level lock, triggers object sync via Git hooks; read phase selects nodes with latest version or synchronizes if needed.

Optimizations reduced redundant sync tasks by 50 %.

Data Inspection Service

Provides transparent, reliable, maintainable health checks on refs and version data, supporting point, full, and scheduled inspections.

4. Conclusion

The article systematically describes Meituan’s challenges in scalability and availability, the architectural choices made, and the practical experiences gained.

5. Future Outlook

Automated operations for anomaly detection and self‑healing.

Share best practices for code management.

Strengthen code security with scanning, auto‑fix, and alerts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Backend Architecture Scalability high availability code hosting

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.