Overview of IBM GPFS Architecture, Components, and Building‑Block Design
This article provides a comprehensive technical overview of IBM GPFS (General Parallel File System), detailing its core components, cluster management roles, networking models, and best‑practice building‑block configurations for high‑performance computing environments.
IBM GPFS (General Parallel FileSystem) is a distributed, shared, parallel cluster file system that allows simultaneous access to a single or multiple file systems from many nodes. It runs on AIX, Linux, and Windows platforms and supports IBM Power as well as Intel/AMD x86 architectures, and is widely used in HPC clusters worldwide.
Each GPFS node consists of three components: the GPFS management command set , the GPFS kernel extension , and the multithreaded GPFS daemon .
The GPFS management command set is a collection of scripts used to control GPFS operations and configuration. By default, GPFS commands can be executed on any node in the cluster, and the cluster automatically redirects requests to the appropriate node. Synchronization can be defined across all nodes or limited to a subnet.
The GPFS kernel extension provides an interface between the operating system’s vNode/VFS layer and GPFS, registering GPFS as a native file system so that OS requests are handled by GPFS.
The GPFS daemon handles all I/O and buffer management, including read‑ahead for sequential reads and write‑behind for asynchronous writes. I/O operations are protected by token management to ensure data consistency across nodes. Daemons on different nodes cooperate to reconfigure, repair, and update metadata in parallel.
GPFS NSD (Network Shared Disk) components expose storage to applications in the cluster. An NSD may be physically attached to all nodes or accessed via an NSD server that presents a virtual connection. Up to eight NSD servers can be assigned to a single NSD; if one server fails, the next in the list takes over.
When a GPFS file system is mounted, the daemon discovers which NSDs are reachable either physically or virtually. The default discovery order prefers local block devices (SAN, SCSI, IDE) before NSD servers.
In GPFS terminology, a Node is an independent OS instance, an NSD is a storage device visible to the GPFS cluster, an NSD server provides I/O services for a specific NSD, and an Application Node runs applications that mount the file system.
GPFS supports configurations where some nodes connect directly to disks while others access those disks through the direct‑connected nodes, a common low‑cost, high‑performance topology in large HPC clusters. Nodes that serve disks to others act as NSD servers, and the nodes that consume those disks are called GPFS clients. Servers can be reused as clients, but a server with a client license cannot act as a server.
Each GPFS cluster elects a cluster management server from the quorum nodes; this server monitors disk health, detects node failures, performs recovery, and manages communication with remote clusters. Additionally, each file system has a file system management server selected by the cluster manager, responsible for configuration, disk space allocation, token management, quota enforcement, and high‑availability coordination.
File system configuration : add disks, change disk accessibility, repair, mount/unmount.
Disk space allocation : assign disk segments to specific nodes.
Token management : coordinate read/write permissions on shared disks.
Quota management : automatically enforce quotas when enabled.
Configuration management : primary/secondary manager nodes provide failover.
To guarantee metadata consistency during write operations, GPFS introduces a Meta node that aggregates metadata updates from all nodes. A Meta node is dynamically selected when a file is opened and removed when the file is closed; the node that holds the file open the longest typically becomes the Meta node.
GPFS offers several networking models, each suited to different deployment scales and application requirements:
Storage Area Network (SAN) Model : compute nodes directly mount storage and act as compute, NSD server, and NSD client. Front‑end uses Gigabit Ethernet; back‑end uses Fibre Channel or InfiniBand. Ideal for small clusters.
Network Shared Disk (NSD) Server Model : compute nodes run GPFS as NSD clients; dedicated NSD servers handle I/O. Nodes connect to NSD servers via 10 GE or InfiniBand; back‑end uses FC or InfiniBand. Suited for large‑scale clusters.
Shared‑Nothing Cluster (SNC) Model : similar to NSD Server Model but NSD storage is local to each server (no striping). Uses FPO layout, front‑end 10 GE/InfiniBand, back‑end FC/InfiniBand. Fits Hadoop/MapReduce workloads.
Remote Cluster Mount Model : GPFS clusters share data across sites; remote clusters mount resources as if local. Front‑end 10 GE/InfiniBand, back‑end FC/InfiniBand. Works for intra‑datacenter or WAN deployments.
Hybrid Model : mixes multiple models within a single cluster (e.g., some nodes use SAN, others use NSD). The mix only affects how storage is accessed, not application behavior.
Designing GPFS Building Blocks involves creating modular server‑storage units that can be stacked to achieve near‑linear performance and capacity scaling. Two primary types exist: Capacity Building Blocks (maximizing storage) and Performance Building Blocks (maximizing I/O throughput). Configurable options include network topology (default InfiniBand, optional 10 GE), disk type and enclosure, number of quorum nodes, and replication factor.
Typical network design defaults to InfiniBand for high‑bandwidth, low‑latency workloads, with optional 10 GE for less demanding scenarios. Storage options cover disk type, chassis size (2.5‑inch 2U, 3.5‑inch 4U), and redundancy settings (default two metadata copies, configurable quorum node count, and disk arbitration).
Building‑block expansion follows two scenarios: initial deployment of multiple identical blocks (to avoid bottlenecks from GPFS’s wide‑striping) and incremental scaling of an existing GPFS cluster by adding new blocks as online NSDs without interrupting services.
For a deeper dive, refer to the original article titled “High‑Performance Computing (HPC) Technologies, Solutions, and Industry Overview.”
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.