Operations 13 min read

Build a Scalable Distributed Storage System with MogileFS and Nginx

This guide walks through the concepts of distributed file systems, introduces MogileFS, and provides step‑by‑step instructions—including environment setup, MariaDB and MogileFS configuration, Nginx compilation with the MogileFS module, and testing—to create a scalable small‑file storage solution.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Build a Scalable Distributed Storage System with MogileFS and Nginx

Introduction

With the rise of the information society, data is increasingly digitized and, in the era of big data, grows explosively. Traditional storage struggles with massive data due to limits in vertical scaling, switch capacity, and file‑system constraints. Distributed storage mitigates these issues; popular distributed file systems include GFS, HDFS, GlusterFS, MooseFS, Lustre, TFS, MogileFS, and FastDFS. This article focuses on implementing MogileFS using Nginx reverse proxy.

Distributed File System

A distributed file system combines the concepts of distribution and file management. From the client side it offers a standard file‑system API for creating, moving, deleting, and reading files. Internally, data and directory structures are stored across a cluster of machines and accessed over the network rather than on local disks.

MogileFS

MogileFS is an open‑source distributed file system used by many companies (e.g., Yupoo, Digg, Tudou, Douban, 1hao, Dianping, Sogou, Anjuke). Its components are:

Server side : mogilefsd (tracker) stores global metadata in a database, and mogstored (storage node) listens on port 7500 for file backup requests.

Utils : management tools such as mogadm.

Client API : Perl and PHP modules (e.g., MogileFS.pm) for building client programs.

Implementation Process

Ideal Architecture

Architecture diagram
Architecture diagram

Due to limited resources, the example uses a single Nginx node and a single MariaDB instance.

Experimental Topology

Topology diagram
Topology diagram

# System environment: CentOS 6.6

Workflow

① Client sends request to server. ② Nginx forwards the request to a MogileFS tracker. ③ Tracker queries the backend database for storage location and returns it to Nginx. ④ Nginx retrieves the actual data from the selected storage node and returns it to the client.

MariaDB Configuration

Grant privileges:

grant all on *.* to 'root'@'172.16.%.%' identified by 'scholar';
grant all on mogilefs.* to 'moguser'@'172.16.%.%' identified by 'mogpass';
flush privileges;

MogileFS Configuration

Install required packages

# cd mogilefs/
# ls

Install RPMs:

# yum install MogileFS-* Perlbal-1.78-1.el6.noarch.rpm perl-* perl-IO-AIO -y

Tracker configuration ( /etc/mogilefs/mogilefsd.conf )

daemonize = 1
pidfile = /var/run/mogilefsd/mogilefsd.pid
db_dsn = DBI:mysql:mogilefs:host=172.16.10.211
db_user = moguser
db_pass = mogpass
listen = 0.0.0.0:7001
conf_port = 7001
query_jobs = 10
delete_jobs = 1
replicate_jobs = 5
reaper_jobs = 1

Storage node configuration ( /etc/mogilefs/mogstored.conf )

maxconns = 10000
httplisten = 0.0.0.0:7500
mgmtlisten = 0.0.0.0:7501
docroot = /mogdata

Synchronize configuration files to the second node: # scp /etc/mogilefs/* node2:/etc/mogilefs/ Create mount points and set ownership:

# mkdir -p /mogdata/dev1
# chown -R mogilefs.mogilefs /mogdata/

Initialize the database:

# mogdbsetup --dbhost=172.16.10.211 --dbrootuser=root --dbrootpass=scholar \
    --dbuser=moguser --dbpass=mogpass --dbname=mogilefs --yes

Start services on both nodes:

# service mogilefsd start
# service mogstored start

Check listening ports (images omitted for brevity).

Add nodes and devices

# mogadm --trackers=172.16.10.123:7001 host add node1 --ip=172.16.10.123 --status=alive
# mogadm --trackers=172.16.10.123:7001 host add node2 --ip=172.16.10.124 --status=alive
# mogadm --trackers=172.16.10.123:7001 device add node1 1
# mogadm --trackers=172.16.10.123:7001 device add node2 2

Create domains:

# mogadm --trackers=172.16.10.123:7001 domain add files
# mogadm --trackers=172.16.10.123:7001 domain add html
# mogadm --trackers=172.16.10.123:7001 domain add images

Upload test files:

# mogupload --trackers=172.16.10.123:7001 --domain=html --key='fstab.html' --file='/etc/fstab'
# mogupload --trackers=172.16.10.123:7001 --domain=images --key='test.jpg' --file='/root/test.jpg'

Verify uploads with moglistkeys and retrieve data via the file ID.

Nginx Configuration

Compile Nginx with the MogileFS module:

# yum groupinstall "Development Tools" "Server Platform Development" -y
# yum install openssl-devel pcre-devel -y
# groupadd -r nginx
# useradd -r -g nginx nginx
# tar xf nginx_mogilefs_module-1.0.4.tar.gz
# tar xf nginx-1.6.3.tar.gz
# cd nginx-1.6.3
# ./configure \
    --prefix=/usr/local/nginx \
    --sbin-path=/usr/sbin/nginx \
    --conf-path=/etc/nginx/nginx.conf \
    --error-log-path=/var/log/nginx/error.log \
    --http-log-path=/var/log/nginx/access.log \
    --pid-path=/var/run/nginx/nginx.pid \
    --lock-path=/var/lock/nginx.lock \
    --user=nginx \
    --group=nginx \
    --with-http_ssl_module \
    --with-http_flv_module \
    --with-http_stub_status_module \
    --with-http_gzip_static_module \
    --http-client-body-temp-path=/usr/local/nginx/client/ \
    --http-proxy-temp-path=/usr/local/nginx/proxy/ \
    --http-fastcgi-temp-path=/usr/local/nginx/fcgi/ \
    --http-uwsgi-temp-path=/usr/local/nginx/uwsgi \
    --http-scgi-temp-path=/usr/local/nginx/scgi \
    --with-pcre \
    --with-debug \
    --add-module=../nginx_mogilefs_module-1.0.4
# make && make install

Provide the init script, make it executable, add to startup, and configure Nginx to proxy requests to MogileFS (configuration details omitted for brevity).

Test syntax, start the service, and verify access through the configured domain. Images in the original article illustrate the test results.

Conclusion

The experiment demonstrates that, by using Nginx as a reverse proxy, a MogileFS‑based distributed file system can be deployed for massive small‑file storage. Further work may include high‑availability setups for Nginx and MariaDB nodes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxfile systemdistributed storageMariaDBMogileFS
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.