Why Segment Ditch Microservices for a Monolith—and What We Learned
Segment’s engineering team recounts their evolution from a simple monolith to a sprawling micro‑service ecosystem and back again, detailing queue bottlenecks, repo fragmentation, shared‑library chaos, and how consolidating everything into a single codebase restored performance, scalability, and developer productivity.
Background
Segment processes hundreds of events per second, routing each event to one of many "destinations" such as Google Analytics, Optimizely, or custom webhooks. Initially the system was a single message queue that accepted JSON events from web or mobile clients.
{
"type": "identify",
"traits": {
"name": "Alex Noonan",
"email": "[email protected]",
"company": "Segment",
"title": "Software Engineer"
},
"userId": "97980cfea0067"
}Each event was consumed from the queue and dispatched to the appropriate destination APIs.
Microservices Journey
In 2015 Segment adopted microservices to gain modularity, independent deployment, and team autonomy. The architecture grew to over 140 services, each handling a specific destination. While this brought early benefits, the complexity soon caused development speed to drop, defect rates to rise, and operational overhead to increase.
Queue Bottleneck Problem
Failed requests were retried, and retries accumulated in the same queue as new events. When a destination became slow or unavailable, its retries blocked the queue head, causing a cascade of latency across all destinations.
Per‑Destination Queues
To isolate failures, the team introduced a router process that duplicated incoming events and fed each destination its own dedicated queue. This prevented a single slow destination from blocking the entire system.
Repo Fragmentation and Shared Library Issues
Each destination lived in its own repository. As the number of destinations grew, maintaining dozens of repos became painful. A shared library was created to handle common transformations (e.g., name extraction) and HTTP handling, but any change to the shared code required testing and deploying every destination, creating a massive coordination burden.
const traits = {};
traits.dob = segmentEvent.birthday; Identify.prototype.name = function() {
var name = this.proxy('traits.name');
if (typeof name === 'string') {
return trim(name);
}
var firstName = this.firstName();
var lastName = this.lastName();
if (firstName && lastName) {
return trim(firstName + ' ' + lastName);
}
};Consolidating to a Single Monolith Repo
The team merged all destination code into one repository, standardising dependencies to a single version and eliminating the need to track individual library versions. This dramatically reduced code‑base complexity, simplified testing, and allowed a single deployment to update every destination in seconds.
Testing Improvements with Traffic Recorder
HTTP‑dependent tests were flaky and slow. The team built a "Traffic Recorder" based on yakbak that records real HTTP responses on the first test run and replays them on subsequent runs, removing external network dependence. This cut the total test suite time from hours to milliseconds for all 140+ destinations.
Benefits of the Monolith
Developer productivity surged: a single engineer could redeploy the entire system in under a minute.
Performance improvements were measurable; 46 optimisations were made in six months after the merge.
Operational scaling became simpler because a single service pool could absorb traffic spikes.
Drawbacks and Open Challenges
Failure isolation is harder; a bug in one destination can crash the whole service.
Memory cache efficiency drops because many low‑traffic destinations now share a large pool of processes.
Updating the shared library can still break multiple destinations, requiring careful coordination.
Conclusion
Segment’s experience shows that while microservices can solve early scaling problems, they may introduce overwhelming operational and maintenance costs at scale. A carefully engineered monolith, backed by robust testing and isolation mechanisms, can restore efficiency and reliability when the ecosystem becomes too fragmented.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
