Backend Development 6 min read

Inside Medium’s Scalable Architecture: How Their Backend Powers 25M Readers

This article details Medium’s evolution from its initial AWS‑based backend stack to a sophisticated, service‑oriented architecture that leverages Node.js, Go, DynamoDB, Redis, Aurora, Neo4j, Spark, and CI/CD pipelines to support millions of users.

Java High-Performance Architecture

May 21, 2016

Inside Medium’s Scalable Architecture: How Their Backend Powers 25M Readers

Initial Technical Stack

Medium’s website was first deployed on Amazon EC2 using Node.js, with DynamoDB as the primary database. A dedicated server handled image processing via GraphicsMagick, another managed background task queues, and static assets were stored in S3 with CloudFront CDN. Nginx served as the reverse proxy, while monitoring and alerting relied on Datadog and PagerDuty.

Current Technical Stack

The platform now runs in an Amazon VPC, managed with Ansible. Nginx and HAProxy provide reverse‑proxy and load‑balancing. Logs are aggregated through the ELK stack (Elasticsearch, Logstash, Kibana). A service‑oriented architecture runs dozens of services, primarily written in Node.js, with some auxiliary services in Go for easier compilation and deployment.

Operations Environment

Continuous integration and delivery are handled by Jenkins. Early builds used Make, later migrated to Pants. Testing includes unit and HTTP‑level functional tests, enforced before any merge, with ClusterRunner parallelizing test execution. Deployments follow a blue‑green strategy, initially rolling out to canary instances, then promoting after verification, with DNS‑based rollback if needed.

Databases

DynamoDB remains the main database, supplemented by a Redis cluster to alleviate hot‑key issues. Amazon Aurora is also used for more flexible queries. Neo4j stores graph data about entities (people, articles, tags) and their relationships, enabling graph‑based analysis such as article recommendation.

Data Platform

Data growth highlighted the need for a robust analytics framework. Amazon Redshift serves as the data warehouse, ingesting large volumes of user, article, and log data. A custom job system manages data pipelines with dependency assertions, separating producers and consumers. Apache Spark is increasingly used for flexible processing, and Protocol Buffers standardize schema across services, mobile apps, web APIs, and the warehouse.

Compilation, Testing, Deployment

Medium adopts continuous integration and delivery via Jenkins. Initial builds used Make, later switched to Pants. Testing covers unit and HTTP functional tests, enforced before merges, with ClusterRunner accelerating test runs through parallel execution. Code changes are quickly deployed to a staging environment; after passing tests, they move to production using a blue‑green deployment model with canary instances and DNS‑based rollback.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend architecture AWS

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.