What Digg’s Failed Revamp Reveals About Building Scalable Development Processes
The article examines Digg’s former software development workflow, detailing its team structure, Git‑Gerrit code review, Jenkins testing, Puppet deployments, evolving practices, and how Conway’s Law shaped its architecture, offering lessons for modern engineering teams.
Team Structure
Initially Digg’s organization was traditional, with separate product, operations, and engineering teams. The product team had 4 members, operations 8, QA 6, and engineering about 20, which was split into four horizontal teams: front‑end, API, platform, and infrastructure.
Later, due to staff turnover, the structure simplified: operations reduced to 3 people, engineering to 7, and an ads engineering team of 4.
Development Practices
They used Git for source control and Gerrit for code review. All code required peer review and had to pass unit tests.
When a patch was submitted to Gerrit, Jenkins automatically ran unit tests and posted results back to Gerrit, so reviewers never saw failing patches.
Approved patches were merged into Git’s master branch; Puppet automatically deployed them to an alpha environment, and after successful integration tests they were merged into the production branch.
Production deployment was initially automated via Jenkins, but after encountering problems they reverted to manual deployment using Jenkins together with Puppet.
If we had persisted with continuous deployment, it would have forced better testing and made bad code hard to pass, but at the time we felt the effort wasn’t worth it.
Will considered this review, testing, and deployment system the most important and successful part of Digg’s process, enabling a 40‑person team to produce high‑quality, consistent code without slowing a 14‑person team.
Emerging New Practices
Initially they had very high unit‑test coverage, but as the team shrank, testing decreased. They believed unit tests didn’t catch most issues such as high‑load failures or front‑end rendering problems.
They used Thrift to define interfaces between the front‑end and platform teams, rarely changing existing interfaces; new features were added via new interfaces, allowing safe independent deployment of front‑end and back‑end services within minutes.
Front‑end changes were rolled out to a subset of users and disabled if problems arose, serving both A/B testing and version management purposes.
Backend changes that could affect performance or load were deployed without downtime.
Architecture Under Conway’s Law
Conway’s Law states that a system’s design mirrors the organization’s communication structure.
Will observed that the v4 architecture directly reflected their organization: an API team with an API server, a front‑end team with a front‑end server, a platform team with a back‑end server, an ads team with an ads server, infrastructure managing numerous data stores, and an analytics team using a Hadoop cluster.
The front‑end used stateless PHP, HTML, and JavaScript; state and storage were handled by back‑end servers; message queues processed long‑running and non‑transactional tasks.
They also employed Tonardo, Apache+mod_wsgi+Pylons, and gevent servers, with Jenkins and Puppet handling deployments.
Our process and structure were reasonable and effective. If we were to start over, we would do things differently, especially if we hadn’t faced rapid staff reductions and accumulated technical debt.
In today’s fast‑growing startup landscape, early adoption of mature software development processes can prevent long‑term “genetic” problems; Digg’s practices provide valuable reference points.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
