How Alibaba Scaled GitLab to Support Millions of Users with Sharding and High‑Availability
This article details Alibaba Group's journey of transforming its GitLab deployment from a single‑node setup to a distributed, sharded architecture that handles tens of millions of daily requests, achieves near‑perfect reliability, and incorporates performance, monitoring, and disaster‑recovery innovations.
Background
Alibaba Group's GitLab, based on the community edition 8.3, serves tens of thousands of developers, creates hundreds of thousands of projects, and processes tens of millions of daily requests, far exceeding the single‑node limits of the original product.
Challenges
Rapid growth caused CPU utilization to exceed 95%, high error rates, and storage bottlenecks because GitLab stores repositories directly on local file systems and its core components (libgit2, git, grit) also operate on local disks, making horizontal scaling impossible.
Solution Overview
The team adopted a distributed, sharding‑based architecture, routing repositories to different machines based on their namespace_path/repo_path identifiers. This enables horizontal scaling and balances load across multiple nodes.
Key Components
Sharding‑Proxy‑Api : Maintains the mapping between repositories and target machines; acts as the “brain” of sharding.
Proxy : Receives all requests, queries Sharding‑Proxy‑Api, and forwards them to the appropriate backend.
Git Cluster : Consists of three‑node groups (master, mirror, backup). Master handles writes, mirror handles reads, and backup provides hot‑standby.
Ensuring Correctness
The master synchronizes changes to mirror and backup after each write, preventing data loss and avoiding a dual‑master setup that could cause conflicts.
Sharding Accuracy and Load Balancing
Sharding‑Proxy‑Api, built with the Martini framework in Go, updates repository metadata in real time (≈5 ms latency). Weighted sharding based on storage size and request volume ensures balanced resource usage across nodes.
Cross‑Shard Operations
For operations that span multiple shards (e.g., project transfer, fork, cross‑project merge requests), the code was modified to fetch required data from other nodes via SSH or HTTP, with a future goal of full RPC‑based communication.
Performance Optimizations
Reimplemented the SSH transport layer in Go, reducing server load and eliminating bugs.
Optimized high‑traffic endpoints (authentication, SSH key lookup) using MD5 hashing and indexing, with plans to rewrite them in Go or Java.
Data Safety
Each shard uses a master‑mirror‑backup trio, providing one‑primary‑multiple‑backup redundancy. Cross‑region backup and failover were validated through a simulated data‑center outage, achieving sub‑minute alerting and five‑minute traffic switchover.
Monitoring and Reliability
Comprehensive monitoring tracks CPU, memory, network, message queues, database connections, and consistency between Sharding‑Proxy‑Api and GitLab. Automated log analysis identifies 5xx errors, and an automatic failover mechanism swaps roles between master and backup within five minutes of a node failure.
Unit‑Cell Deployment
The architecture adopts a cell‑based model, where each cell (shard) is a self‑contained unit handling all operations for its data subset, simplifying cross‑region deployments and future cloud‑native extensions.
Future Work
Address occasional cache‑induced memory pressure by tuning kernel parameters and exploring kernel upgrades.
Automate deployment, scaling, and upgrades to reduce manual effort.
Complete the RPC replacement to fully separate web‑service load from Git operations.
Conclusion
After a year of scaling, Alibaba's GitLab saw request volume increase fourfold, project count rise by 130 %, and user count by 56 %, while system call success rates improved from 99.5 % to over 99.99 %, demonstrating the effectiveness of the distributed, sharded architecture.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
