Backend Development 15 min read

GitHub’s Journey from Monolith to Microservices: Practices and Lessons

This article details GitHub’s transition from a 12‑year‑old Ruby on Rails monolith to a micro‑service architecture, covering growth challenges, modular design, data splitting, core service extraction, operational changes, and strategies for building resilient, asynchronous systems.

Architecture Digest

Aug 6, 2021

GitHub’s Journey from Monolith to Microservices: Practices and Lessons

1. Journey Begins GitHub was founded in 2008 as a Ruby on Rails‑based monolith, quickly growing to host over 50 million developers, 100 million repositories, and handling a billion API calls per day.

2. Rapid Internal Growth In the past 18 months the company more than doubled its engineering staff, acquired companies such as Semmle, npm, Dependabot and Pull Panda, and now operates with a highly distributed workforce across six continents.

3. Monolith vs. Microservices The monolith offers simplicity for configuration and deployment, but its size creates coordination overhead and slows iteration. Microservices promise smaller codebases, clear API contracts, independent scaling, and faster feature delivery, especially given GitHub’s recent growth.

4. Pragmatic Empowerment The migration aims to empower developers rather than replace existing workflows, adopting a hybrid monolith‑microservice environment and modernizing the stack (e.g., upgrading to Ruby 2.7).

5. Modularity as the Foundation The first step is to modularize the monolith by splitting code and data along feature boundaries, ensuring each service owns its data and accesses it only through well‑defined APIs.

6. Data Splitting Functional domains are identified, grouped into “schema domains,” and recorded in a YAML file. Each domain receives a partition key (e.g., repository ID) to enable sharding across servers and clusters. Static analysis checks that schema changes keep the YAML in sync, and cross‑domain queries are rewritten into multiple, domain‑specific queries.

7. Extracting Core Services and Shared Resources Extraction starts with core services such as authentication and authorization, moving them outside the monolith while keeping dependency direction from the monolith to the new services. Tools like the internal “Scientist” framework help run old and new code paths side‑by‑side during rollout.

8. AuthN/AuthZ Extraction Authentication was rewritten as an external service communicating with the monolith via Twirp (a gRPC‑style RPC framework), keeping the dependency direction inward‑to‑outward.

9. Operational Changes Monitoring, CI/CD, and containerization were adapted for a microservice world: metrics shifted from function calls to network and contract metrics, pipelines were made more automated and language‑agnostic, and a self‑service runtime platform (Kubernetes templates, Ingress, Splunk logging) was built to reduce operational burden for teams.

10. Product/Business‑Value‑Driven Migration New features should be built as independent microservices, starting with low‑coupling, high‑value components (e.g., webhooks, syntax highlighting). Coupling analysis guides which features to split first, while avoiding over‑fragmentation that adds unnecessary complexity.

11. Asynchronous and Resilient Design Synchronous RPC (via Twirp) works for core services, but as the number of services grows, an event‑driven asynchronous pipeline reduces latency and tight coupling. Standard resilience patterns—retries with exponential back‑off, circuit breakers, timeouts, and graceful degradation—are applied to handle network failures.

12. Conclusion By understanding the reasons for migration, modularizing code and data, extracting core services, adjusting operations, and focusing on product value, GitHub’s transition to microservices becomes manageable and sets the stage for a resilient, scalable architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GitHub Data Splitting service extraction

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.