Big Data 5 min read

Overview of MFS Distributed File System Architecture Similar to GoogleFS

The article explains the MFS distributed file system, detailing its four components—Master, Metalogger, Chunkserver, and Client—along with hardware recommendations, metadata handling, replication strategies, and FUSE‑based client mounting, providing a comprehensive guide to building a GoogleFS‑like storage cluster.

DevOps Cloud Academy
DevOps Cloud Academy
DevOps Cloud Academy
Overview of MFS Distributed File System Architecture Similar to GoogleFS

MFS (similar to GoogleFS) is implemented as a storage cluster consisting of one MasterServer and multiple ChunkServers.

The MFS system comprises four parts: Master, Metalogger, Chunkserver, and Client.

Master

The Master acts as the brain of MFS, recording management information such as file size, storage location, and replication count, similar to metadata stored in InnoDB's shared tablespace. This information is saved in metadata.mfs; when loaded into memory the file is renamed to metadata.mfs.back. The Master periodically writes updated metadata back to metadata.mfs.back to ensure reliability.

Hardware recommendation: large memory to load metadata.mfs (size depends on data stored on Chunkservers), ECC memory for error checking, redundant batteries, and RAID1/RAID5/RAID10 disk configurations to guarantee high availability.

Metalogger

The Metalogger provides backup for MFS, analogous to MySQL's master‑slave structure. It periodically downloads the Master’s metadata, changelog, and session files to a local directory, appending the suffix "_ml" to the filenames.

Hardware recommendation: same configuration as the Master, since the Metalogger serves as a standby Master and can be promoted if the primary Master fails.

Chunkserver

Chunkservers store the actual data. Files are split into chunks with a maximum size of 64 MiB; smaller files occupy a single chunk of their exact size. Larger files are divided into multiple 64 MiB chunks. Each chunk can have multiple copies (replicas). The “goal” parameter defines the desired number of replicas: a goal of 1 means a single copy stored on a random Chunkserver, while a goal greater than 1 distributes copies across different Chunkservers. The goal should not exceed the number of Chunkservers; otherwise excess replicas cannot be placed.

Chunkservers should maintain at least 1 GiB of free space (as noted in the Reference Guide). In practice, writes fail when disk usage reaches about 95 % (approximately 1.9 GiB free on a typical test setup).

Client

The client accesses the file system via a kernel‑loaded FUSE module. After establishing a connection with the Master, the client mounts the shared Chunkserver partitions locally and performs read/write operations. Because the FUSE module is an external addition, it must be loaded after a system reboot using modprobe fuse.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datametadatastorage architectureDistributed File SystemMFSchunkservermaster server
DevOps Cloud Academy
Written by

DevOps Cloud Academy

Exploring industry DevOps practices and technical expertise.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.