Cloud Native 16 min read

How to Build a Scalable Distributed File System with MinIO

This guide explains the fundamentals of distributed file systems, compares them with traditional storage, introduces MinIO’s architecture and features, and provides step‑by‑step instructions for deploying a multi‑node MinIO cluster with Nginx load balancing on Linux.

ITPUB
ITPUB
ITPUB
How to Build a Scalable Distributed File System with MinIO

As file data grows, traditional static‑file storage on a single server (via Tomcat or Nginx) can no longer meet system demands, prompting the need for a distributed file system to manage data across multiple nodes.

What Is a Distributed File System?

A Distributed File System (DFS) abstracts physical storage resources that may not be directly attached to a local node, presenting them as a hierarchical, tree‑like structure accessible over a network, making remote file access simple for users.

Advantages of Distributed File Systems

Scalable: can grow to hundreds or thousands of nodes with near‑linear performance gains.

High availability: ensures both system uptime and data consistency.

Low cost: automatic fault tolerance and load balancing allow use of inexpensive servers.

Elastic storage: resources can be added or removed without interrupting service.

Typical Use Cases

E‑commerce sites – massive product images.

Video platforms – large video and image files.

Cloud‑drive applications – general file storage.

Social networks – huge volumes of video and images.

Distributed vs. Traditional File Systems

Traditional storage centralizes all data on a single server, creating performance bottlenecks and reliability concerns. A DFS spreads files across multiple servers, improving reliability, availability, and access efficiency while avoiding single‑point failures.

MinIO Overview

MinIO is a high‑performance object storage system released under the GNU AGPL‑v3 license and compatible with the Amazon S3 API. It is commonly used for machine‑learning, analytics, and application data workloads.

Official documentation: https://docs.min.io/ – Chinese docs: http://docs.minio.org.cn/docs/ – GitHub repository: https://github.com/minio/minio

Key Features

Data protection – uses erasure coding to survive multiple node failures.

High availability – a distributed MinIO cluster remains operational as long as more than half of the drives are online.

Strong consistency – read‑after‑write consistency is guaranteed in both distributed and standalone modes.

Advantages

Simple deployment – a single binary (minio) runs on any platform.

Massive storage – supports zone expansion and objects up to 5 TB.

Low redundancy, high tolerance – default redundancy factor of 2, and data can be recovered even if up to half the drives fail.

Excellent read/write performance – up to 183 GB/s read and 171 GB/s write on standard hardware.

Core Concepts

S3 – Simple Storage Service, the original Amazon API.

Object – the basic stored entity (file, byte stream, etc.).

Bucket – logical container for objects, isolated from other buckets.

Drive – physical disk where MinIO stores object data.

Set – a collection of drives; MinIO automatically creates sets based on cluster size.

Erasure Coding

Erasure Coding (EC) splits data into fragments and adds redundant parity fragments, allowing reconstruction from any subset of the original fragments. MinIO uses Reed‑Solomon codes, dividing objects into N/2 data blocks and N/2 parity blocks; the system can recover data as long as the number of failed drives does not exceed the number of parity blocks.

Installation and Deployment

Deployment Modes

Single‑node, single‑disk (development/testing, single‑point risk).

Single‑node, multi‑disk (requires at least four disks, provides data safety).

Multi‑node, multi‑disk (distributed) – the recommended mode, offering strong redundancy via Reed‑Solomon erasure coding.

Environment Preparation

Two CentOS 7.5 virtual machines, each with two extra disks. Nginx (or etcd) is used for load balancing.

Node configuration:

minio node1 – 192.168.78.101 – /mnt/disk1, /mnt/disk2

minio node2 – 192.168.78.102 – /mnt/disk1, /mnt/disk2

nginx – 192.168.78.101 – /usr/local/nginx

Step‑by‑Step Setup

Create directories:

mkdir -p /home/minio/{run,conf} && mkdir -p /etc/minio

Download MinIO binary:

cd /home/minio/run
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio

Write a startup script ( /home/minio/run/minio-run.sh):

#!/bin/bash
export MINIO_ACCESS_KEY=admin
export MINIO_SECRET_KEY=12345678
/home/minio/run/minio server --config-dir /home/minio/conf \
--address "192.168.78.102:9000" --console-address ":50000" \
http://192.168.78.102/mnt/disk1 http://192.168.78.102/mnt/disk2 \
http://192.168.78.101/mnt/disk1 http://192.168.78.101/mnt/disk2

Execute the script on both nodes to start the distributed cluster:

sh /home/minio/run/minio-run.sh

Load Balancing with Nginx

Sample Nginx configuration for balancing MinIO API and console traffic:

upstream minio_server {
    server 192.168.78.101:9000;
    server 192.168.78.102:9000;
}
upstream minio_console {
    server 192.168.78.101:50000;
    server 192.168.78.102:50000;
}
server{
    listen 9001;
    location / {
        proxy_pass http://minio_server;
        # additional proxy settings omitted for brevity
    }
}
server{
    listen 50001;
    location / {
        proxy_pass http://minio_console;
    }
}

After reloading Nginx, the cluster can be accessed via the load‑balanced endpoints (e.g., http://192.168.78.101:50001/).

Conclusion

The article introduced the concept of distributed file systems, highlighted their benefits over traditional storage, and provided a comprehensive tutorial for deploying a high‑performance MinIO object‑storage cluster with erasure‑coding protection and Nginx load balancing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DeploymentLinuxNGINXMinioerasure codingobject storage
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.