Build a Scalable Distributed Storage System with MogileFS and Nginx
This guide walks through the concepts of distributed file systems, introduces MogileFS, and provides step‑by‑step instructions—including environment setup, MariaDB and MogileFS configuration, Nginx compilation with the MogileFS module, and testing—to create a scalable small‑file storage solution.
Introduction
With the rise of the information society, data is increasingly digitized and, in the era of big data, grows explosively. Traditional storage struggles with massive data due to limits in vertical scaling, switch capacity, and file‑system constraints. Distributed storage mitigates these issues; popular distributed file systems include GFS, HDFS, GlusterFS, MooseFS, Lustre, TFS, MogileFS, and FastDFS. This article focuses on implementing MogileFS using Nginx reverse proxy.
Distributed File System
A distributed file system combines the concepts of distribution and file management. From the client side it offers a standard file‑system API for creating, moving, deleting, and reading files. Internally, data and directory structures are stored across a cluster of machines and accessed over the network rather than on local disks.
MogileFS
MogileFS is an open‑source distributed file system used by many companies (e.g., Yupoo, Digg, Tudou, Douban, 1hao, Dianping, Sogou, Anjuke). Its components are:
Server side : mogilefsd (tracker) stores global metadata in a database, and mogstored (storage node) listens on port 7500 for file backup requests.
Utils : management tools such as mogadm.
Client API : Perl and PHP modules (e.g., MogileFS.pm) for building client programs.
Implementation Process
Ideal Architecture
Due to limited resources, the example uses a single Nginx node and a single MariaDB instance.
Experimental Topology
# System environment: CentOS 6.6
Workflow
① Client sends request to server. ② Nginx forwards the request to a MogileFS tracker. ③ Tracker queries the backend database for storage location and returns it to Nginx. ④ Nginx retrieves the actual data from the selected storage node and returns it to the client.
MariaDB Configuration
Grant privileges:
grant all on *.* to 'root'@'172.16.%.%' identified by 'scholar'; grant all on mogilefs.* to 'moguser'@'172.16.%.%' identified by 'mogpass'; flush privileges;MogileFS Configuration
Install required packages
# cd mogilefs/ # lsInstall RPMs:
# yum install MogileFS-* Perlbal-1.78-1.el6.noarch.rpm perl-* perl-IO-AIO -yTracker configuration ( /etc/mogilefs/mogilefsd.conf )
daemonize = 1 pidfile = /var/run/mogilefsd/mogilefsd.pid db_dsn = DBI:mysql:mogilefs:host=172.16.10.211 db_user = moguser db_pass = mogpass listen = 0.0.0.0:7001 conf_port = 7001 query_jobs = 10 delete_jobs = 1 replicate_jobs = 5 reaper_jobs = 1Storage node configuration ( /etc/mogilefs/mogstored.conf )
maxconns = 10000 httplisten = 0.0.0.0:7500 mgmtlisten = 0.0.0.0:7501 docroot = /mogdataSynchronize configuration files to the second node: # scp /etc/mogilefs/* node2:/etc/mogilefs/ Create mount points and set ownership:
# mkdir -p /mogdata/dev1 # chown -R mogilefs.mogilefs /mogdata/Initialize the database:
# mogdbsetup --dbhost=172.16.10.211 --dbrootuser=root --dbrootpass=scholar \
--dbuser=moguser --dbpass=mogpass --dbname=mogilefs --yesStart services on both nodes:
# service mogilefsd start # service mogstored startCheck listening ports (images omitted for brevity).
Add nodes and devices
# mogadm --trackers=172.16.10.123:7001 host add node1 --ip=172.16.10.123 --status=alive # mogadm --trackers=172.16.10.123:7001 host add node2 --ip=172.16.10.124 --status=alive # mogadm --trackers=172.16.10.123:7001 device add node1 1 # mogadm --trackers=172.16.10.123:7001 device add node2 2Create domains:
# mogadm --trackers=172.16.10.123:7001 domain add files # mogadm --trackers=172.16.10.123:7001 domain add html # mogadm --trackers=172.16.10.123:7001 domain add imagesUpload test files:
# mogupload --trackers=172.16.10.123:7001 --domain=html --key='fstab.html' --file='/etc/fstab' # mogupload --trackers=172.16.10.123:7001 --domain=images --key='test.jpg' --file='/root/test.jpg'Verify uploads with moglistkeys and retrieve data via the file ID.
Nginx Configuration
Compile Nginx with the MogileFS module:
# yum groupinstall "Development Tools" "Server Platform Development" -y # yum install openssl-devel pcre-devel -y # groupadd -r nginx # useradd -r -g nginx nginx # tar xf nginx_mogilefs_module-1.0.4.tar.gz # tar xf nginx-1.6.3.tar.gz # cd nginx-1.6.3 # ./configure \
--prefix=/usr/local/nginx \
--sbin-path=/usr/sbin/nginx \
--conf-path=/etc/nginx/nginx.conf \
--error-log-path=/var/log/nginx/error.log \
--http-log-path=/var/log/nginx/access.log \
--pid-path=/var/run/nginx/nginx.pid \
--lock-path=/var/lock/nginx.lock \
--user=nginx \
--group=nginx \
--with-http_ssl_module \
--with-http_flv_module \
--with-http_stub_status_module \
--with-http_gzip_static_module \
--http-client-body-temp-path=/usr/local/nginx/client/ \
--http-proxy-temp-path=/usr/local/nginx/proxy/ \
--http-fastcgi-temp-path=/usr/local/nginx/fcgi/ \
--http-uwsgi-temp-path=/usr/local/nginx/uwsgi \
--http-scgi-temp-path=/usr/local/nginx/scgi \
--with-pcre \
--with-debug \
--add-module=../nginx_mogilefs_module-1.0.4 # make && make installProvide the init script, make it executable, add to startup, and configure Nginx to proxy requests to MogileFS (configuration details omitted for brevity).
Test syntax, start the service, and verify access through the configured domain. Images in the original article illustrate the test results.
Conclusion
The experiment demonstrates that, by using Nginx as a reverse proxy, a MogileFS‑based distributed file system can be deployed for massive small‑file storage. Further work may include high‑availability setups for Nginx and MariaDB nodes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
