Understanding FastDFS: A Lightweight Distributed File System
This article introduces the motivations for using a distributed file system, explains the architecture and core concepts of FastDFS—including tracker, storage, client, and group—covers its upload and download mechanisms, synchronization management, and the design of its file identifiers, providing a comprehensive overview for developers.
In the previous article "A FastDFS Concurrency Issue Investigation Experience", the author described a production concurrency problem; this piece aims to give a complete introduction to FastDFS for readers unfamiliar with the software.
Why Use a Distributed File System?
Initially, projects often store static files directly in a project directory (e.g., resources\static\file or resources\static\img ), which is simple but leads to coupling of files and code, messy storage, and resource contention under high traffic.
Introducing an independent file server separates files from the application server, allowing load balancing, easier scaling, disaster recovery, and caching strategies.
A distributed file system further solves single‑point‑of‑failure and storage‑capacity limits by providing high availability, elastic scaling, and data redundancy across multiple nodes.
FastDFS
FastDFS is an open‑source, lightweight distributed file system designed for storing large volumes of small to medium files (4 KB – 500 MB). It offers high performance, scalability, and APIs for C, Java, and PHP.
Key Concepts
FastDFS consists of three roles:
Tracker server : a lightweight coordinator that maintains in‑memory metadata about groups and storage servers, performs load balancing, and directs client requests.
Storage server : stores files and their metadata; organized into groups (or volumes) where each group contains multiple storage nodes with replicated data.
Client : uses proprietary APIs (upload, download, delete, etc.) over TCP/IP to interact with trackers and storage nodes.
Additional concepts include group (a collection of storage servers) and meta data (key‑value attributes such as width=1024, height=768).
Upload Mechanism
The client first contacts a tracker to obtain a storage server’s IP and port, then uploads the file to that storage server. The storage server writes the file to disk, generates a file_id , and returns the file ID, path, and name to the client.
Selection rules:
Tracker selection: round‑robin, specified group, or load‑balance based on free space.
Storage server selection within a group: round‑robin, IP order, or priority order.
Storage path selection: round‑robin among configured directories or the one with most free space.
After choosing a storage path, the storage server creates a two‑level 256×256 subdirectory hierarchy and stores the file using a hashed file ID. The final file name combines group, storage path, subdirectories, file ID, and the original file extension.
Download Mechanism
The client requests the tracker for the storage server’s address using the file name. The tracker parses the file name to determine the group and selects a suitable storage server based on synchronization status, preferring the original storage node or a node that has completed replication.
Synchronization Time Management
Each storage server periodically reports its latest synchronization timestamp to the tracker. The tracker uses these timestamps to decide which storage node can safely serve read requests, ensuring that the requested file has been fully replicated.
File ID (FID) Design
A FastDFS file ID encodes the group name, virtual disk path, two‑level data directories, and a generated file name that includes the source storage IP, creation timestamp, file size, a random number, and the file extension, enabling rapid location of the file on the storage server.
For deployment details, refer to the author's blog post on building a FastDFS cluster.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.