How Flickr Scaled Its Backend: From LAMP to Microservices and Tripod
This article details Flickr's evolution from a simple LAMP stack to a complex micro‑service architecture, describing its database scaling strategies, caching layers, storage solutions, and the modern Tripod platform that powers its image and video services today.
Flickr, acquired by Yahoo! in 2005, originally ran on a classic LAMP platform with Apache, PHP and MySQL on a single server, later separating MySQL into a dual‑server setup.
Initial LAMP Architecture
Rapid user growth stressed MySQL, creating a performance bottleneck that prompted scaling efforts.
Database scaling follows two paths: Scale‑Up (adding CPU, memory) and Scale‑Out (adding more database servers). Flickr employed both, using a master‑slave MySQL cluster for reads and a dual‑master (Dual Tree) design to avoid single points of failure.
Key Technologies
PHP – Runs on Red Hat Linux with Apache; about 60 000 lines of PHP code, stateless design without sessions for easy scaling.
MySQL Cluster – Master‑slave structure with round‑robin master selection to eliminate single‑point failures.
Sharding & Caching – Memcached provides an intermediate cache layer; Squid serves as a reverse proxy for HTML and images.
Template Engine – Smarty, a semi‑compiled PHP template language, enables static‑like page generation.
Perl – Used for system management tasks such as log analysis.
PEAR – PHP libraries for XML and email parsing.
Image Processing – ImageMagick (later Graphics Magick) improves image handling speed by 15%.
Java Services – Provide node‑level functionality; deployment managed by Apache SystemImager.
Monitoring & Deployment – Ganglia for capacity visualization, Subcon for configuration management, Cvsup for file distribution, and load balancers like Wackamole and ServerIron.
Flickr’s infrastructure includes roughly 166 database servers, 244 web servers (excluding Squid), and 14 Memcached servers, all automated via SystemImager/Configurator.
Operational principles focus on machine‑self‑building, self‑monitoring, self‑repair, and reducing MTTR through streamlined processes, using tools such as OpsCode, Puppet, and Subcon.
Tripod New Architecture
Tripod offers three core services:
Pixel Service : Handles upload, storage, resizing, and transcoding of photos and videos, supporting over 500 uploads per second across multiple data centers and CDN nodes.
Enrichment Service : Applies image‑recognition algorithms (from IQ Engines and LookFlow) to generate rich metadata, including location, objects, OCR text, aesthetic scores, and NSFW detection.
Aggregation Service : Enables applications to query media by arbitrary criteria, leveraging Yahoo’s Vespa search engine to index enriched metadata.
All services expose unified APIs built with Spring MVC, Spring Data, Spring Boot, Spring Security, and OAuth 2.0, documented via Swagger and generated SDKs for iOS, Android, and JavaScript.
Tripod introduces multi‑tenant concepts: applications, buckets, and API keys. Buckets act as logical containers with configurable settings (compression, TTL, etc.) and access control enforced by API keys and OAuth tokens.
Data persistence moves much of Flickr’s data to Yahoo!’s distributed NoSQL stores, with Redis Cluster as a caching layer and Vespa powering aggregation.
Enrichment processing relies on Storm and HBase for real‑time visual algorithms, while batch analytics use Pig, Oozie, and Hive on Yahoo’s big‑data platform.
The micro‑service transition is driven by a Pulsar event bus sending Avro messages, enabling rapid development without breaking compatibility.
By 2017, Tripod was slated to run on over 50 % of Flickr’s backend, supporting billions of users across Yahoo applications.
Source: 21CTO Community – https://yahooeng.tumblr.com/post/157200523046/introducing-tripod-flickrs-backend-refactored
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
