Backend Development 15 min read

FastDFS Overview: Principles, Architecture, Upload/Download Process, Synchronization, and Storage Management

FastDFS is a lightweight, open‑source distributed file system written in C that uses a three‑component architecture—client, tracker server for load‑balancing and discovery, and storage servers with push‑based binlog replication—to handle high‑concurrency upload/download of small to medium files, support group‑wide synchronization, optional trunk storage, Nginx anti‑leech integration, and extensible deduplication via FastDHT.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
FastDFS Overview: Principles, Architecture, Upload/Download Process, Synchronization, and Storage Management

FastDFS is an open‑source lightweight distributed file system implemented in C. It runs on Unix‑like systems such as Linux, FreeBSD and AID, and is designed for high‑concurrency access to large volumes of small files (4 KB ~ 500 MB), making it suitable for image, video, document storage and other file‑based online services.

Architecture

FastDFS consists of three components:

Client (Client)

Tracker Server (TrackerServer)

Storage Server (StorageServer)

Tracker Server

The Tracker Server performs scheduling and load balancing. Its main responsibilities are:

Service registration – StorageServers register themselves and periodically report status (disk space, sync state, upload/download counts).

Service discovery – Clients query the Tracker to obtain the connection information of an available StorageServer.

Load balancing – Allocation strategies for groups, storage servers and storage paths (e.g., round‑robin, specified group, max‑free‑space, IP‑based sorting, priority‑based sorting, etc.).

Storage Server

Storage servers provide capacity and backup services. They are organized into groups; each group contains multiple storage nodes that replicate each other’s data. The group’s effective capacity is limited by the smallest storage node in the group.

Data synchronization between storage nodes is performed via push‑based binlog replication. The source server reads its binlog file, parses the operations, and sends them to the target server.

Upload Process

1. The client selects any Tracker Server (trackers are peers).

2. The Tracker allocates a group, a Storage Server, and a storage path.

3. The Storage Server generates a file_id (Base64‑encoded) that contains the source storage IP, creation timestamp, file size, CRC32 checksum, and a random number. The file is stored in a two‑level directory structure (256 × 256 sub‑directories) based on two hash values derived from the file_id .

4. The final storage path looks like:

group1/M00/00/89/eQ6h3FKJf_PRl8p4AUz4wO8tqaA688.apk

Additional explanations of the path components (group, disk, sub‑directory, file name) are provided in the original article.

Download Process

1. The client sends a download request to a Tracker. The Tracker parses the file_id to obtain the group and other metadata, then returns an appropriate Storage Server.

2. The client connects to the chosen Storage Server, validates the file’s existence, and receives the file data.

Because the file_id embeds the source storage IP, if the file is not present on the contacted node (e.g., due to asynchronous replication), FastDFS can redirect or proxy the request to the original storage node.

FastDFS Nginx Module

The Nginx module adds several features:

Anti‑leech token check – dynamic token generation and validation. Example configuration:

http.default_content_type = application/octet-stream
http.mime_types_filename = mime.types
http.anti_steal.check_token = true
http.anti_steal.token_ttl = 900
http.anti_steal.secret_key = xxx
http.anti_steal.token_check_fail = /etc/fdfs/anti-steal.jpg

Token generation algorithm: md5(fileid_without_group + secret_key + ts) , where ts must be within the TTL.

When a request includes a valid token, the URL may look like:

http://localhost/G1/M00/00/01/wKgBD01c15nvKU1cAABAOeCdFS466570.jpg?token=b32cd06a53dea4376e43d71cc882f9cb&ts=1297930137

Other module capabilities:

File metadata extraction – retrieve source storage IP, file path, name, size, etc., from the file_id .

File access routing – based on the embedded source IP, the module can redirect (302) or proxy the request to the original storage node.

Redirect mode example configuration: response_mode = redirect (returns 302 to http://source_storage_ip:port/file_path?redirect=1 )

Proxy mode example configuration: response_mode = proxy (uses source storage as the proxy host).

Synchronization Mechanism

Synchronization occurs only within a group’s storage servers. When a new storage node joins, existing nodes push all existing data (both source and backup) to the newcomer.

FastDFS uses asynchronous binlog replication. The binlog records only operation type and file name, e.g.:

时间戳 | 操作类型 | 文件名
1490251373 C M02/52/CB/
CtAqWVjTbm2AIqTkAAACd_nIZ7M797.jpg

Operation types include:

C – source create, c – replica create

A – source append, a – replica append

D – source delete, d – replica delete

The synchronization workflow involves the new storage reporting its status to the Tracker, the Tracker coordinating sync requests, and the source storage sending binlog entries to the newcomer until it becomes ACTIVE.

Storage Modes

Two storage modes are supported:

Default mode – each file_id maps to a single physical file on disk.

Merge (trunk) mode – multiple file_id s are stored inside a large trunk file. The file_id contains additional 16 bytes for offset, size, etc. Trunk files are managed by a Trunk Server that allocates space using a balanced free‑space tree.

Metadata stored for each file in a trunk includes file size, modification time, CRC32, extension name, allocated size, trunk ID, offset, and actual size.

File Deduplication

FastDFS itself does not provide deduplication. By integrating FastDHT (a distributed hash table based on Berkeley DB with binlog replication), files can be deduplicated: the content hash is computed, and subsequent uploads of identical content return a soft link to the original file.

Note: FastDFS returns only soft links; when all links are removed, the original file is deleted.

Conclusion

FastDFS is an application‑level distributed file system for managing uploads, images, and other file assets. It provides mechanisms for load balancing, high‑concurrency access, synchronization, optional file merging, and can be extended with Nginx and FastDHT for anti‑leech and deduplication capabilities.

Synchronizationstorage architecturedistributed file systemfastdfsNginx ModuleUpload Download
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.