Backend Development 9 min read

Evolution of Ctrip Image Service Architecture: From Simple NFS to a Scalable Go‑Based System

This article details the three‑stage evolution of Ctrip's image service architecture—from an early NFS‑backed design with Squid caching, through a Varnish‑and‑Lua powered middle stage, to the current Go‑based multi‑process system using FastDFS—highlighting the challenges, solutions, and performance outcomes.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Evolution of Ctrip Image Service Architecture: From Simple NFS to a Scalable Go‑Based System

Ctrip's rapid business growth has placed increasing emphasis on user experience, making the performance and reliability of media files such as images a critical concern. The article outlines the evolution of Ctrip's image service architecture across three major phases.

1. Initial Stage : The early architecture relied on a simple setup where images were stored on NFS and heavily cached by Squid. While development effort was low, the approach suffered from high storage consumption due to numerous resized copies, decreasing cache hit rates, and I/O bottlenecks on NFS, leading to frequent alerts and instability.

2. Development Stage : To address the shortcomings, Squid was replaced by Varnish for caching and reverse‑proxy duties, and Lua scripts were embedded in Nginx to perform on‑the‑fly image processing, eliminating the need to store many resized variants. FastDFS replaced NFS for storage, providing a simple, controllable solution that handled billions of read/write operations daily. Although Lua offered high‑performance coroutines, its ecosystem limited extensibility and required custom modules for monitoring integration.

3. Current Stage : The service migrated to Go, adopting a multi‑process single‑coroutine model. Image manipulation is performed via GraphicsMagick/ImageMagick accessed through cgo. Master processes spawn worker processes equal to the CPU count, each handling image tasks with a configurable buffer. Load balancing uses a consistency‑hash on URLs to reduce cache invalidation, and Nginx continues to serve as the load balancer with advanced scripting. The architecture also mitigates thread‑safety issues in GraphicsMagick by avoiding multithreaded calls.

Overall, the current architecture supports hundreds of millions of image requests per day, maintains average processing latency under 200 ms, and achieves a failure rate below 0.01 %. The system has been stable since deployment, with occasional worker crashes handled automatically. A subset of the code has been open‑sourced on GitHub (https://github.com/ctripcorp/nephele) and the project welcomes contributions.

backend architecturegolangcachingNginxImage Servicefastdfs
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.