Fundamentals 33 min read

File System Concepts and Linux Virtual File System (VFS) Overview

This article explains the purpose and functions of file systems, describes logical and physical file structures, introduces Linux's virtual file system architecture and its core data structures such as superblocks, inodes, dentries and file objects, and details the path‑lookup process used by the kernel when opening files.

政采云技术
政采云技术
政采云技术
File System Concepts and Linux Virtual File System (VFS) Overview

File System Concepts

Introduction to File Systems

Because a computer processes a huge amount of information, it cannot keep everything in memory; therefore data are stored on external storage as files. In a multi‑user system the file system provides a unified management entity that allocates space, prevents unauthorized access, and enables sharing.

Functions of a File System

A file system is the set of software, data structures and the files it manages that implements unified file management in an operating system.

As a unified information‑management agency, a file system should provide five functions:

Unified management of storage space (allocation and reclamation).

Determine where and how file information is stored.

Map names to external‑storage addresses (named access).

Support control operations (create, delete, open, close) and access operations (read, write, modify, copy, dump).

Enable sharing, confidentiality and protection of file information.

File Structure

File structure refers to the organization form of a file and is usually divided into logical structure and physical structure.

The logical structure is the organization visible to the user; it determines how the user stores, retrieves and processes information.

The physical structure is the internal organization on the storage device; it determines the mapping from logical block numbers to physical block numbers.

Access methods are also related to the physical structure; different file systems correspond to different physical structures.

Linux File System Hierarchy

What Is a Virtual File System

To support many different concrete file systems, the kernel provides a Virtual File System (VFS) layer that hides the implementation details and offers a uniform interface to user programs. For example, Linux can "mount" a DOS partition and let applications access it with the same system calls as an ext2 file system.

Usually VFS is divided into three layers, as shown in the diagram:

First layer – the file‑system interface layer, e.g., the system‑call interfaces open , write , close .

Second layer – the VFS interface layer. It has two interfaces: one to user space and one to the specific file‑system implementation. The VFS forwards operations to the appropriate file‑system functions via vfs_operations .

Third layer – the concrete file‑system layer, which implements actual file‑system structures (e.g., network file systems such as NFS).

VFS Data Structures

Superblock object – represents a mounted file system and stores its control information.

Inode object – represents a file and stores generic file metadata.

Dentry object – represents a component of a pathname and links a name to an inode.

File object – represents an opened file for a process and holds the current file position.

Each primary object contains an operation object that describes the methods the kernel can invoke on it.

super_operations – methods applicable to a superblock.

inode_operations – methods applicable to an inode.

dentry_operations – methods applicable to a dentry.

file_operations – methods applicable to an opened file.

Superblock Object

Describes a specific file system (e.g., ext2) and is stored on a dedicated disk sector.

When the kernel registers a file system, it allocates a VFS superblock and fills it with the concrete file system's superblock data.

The VFS superblock exists only in memory and is removed when the concrete file system is unmounted.

struct super_block {
    struct list_head        s_list;          // link all superblocks
    dev_t           s_dev;          // device identifier
    unsigned long   s_blocksize;    // block size in bytes
    unsigned char   s_blocksize_bits; // block size as power of two
    unsigned char   s_dirt;         // dirty flag
    loff_t          s_maxbytes;     // maximum file size
    struct file_system_type *s_type; // pointer to file_system_type
    const struct super_operations *s_op; // superblock operations
    unsigned long   s_flags;        // flag bits
    unsigned long   s_magic;        // magic number
    struct dentry  *s_root;        // root dentry of the mount
    struct rw_semaphore s_umount; // unmount semaphore
    struct mutex    s_lock;        // superblock lock
    int             s_count;       // reference count
    struct list_head s_inodes;     // all inodes list
    struct list_head s_files;      // all open files list
    struct list_head s_dentry_lru; // unused dentry LRU
    int             s_nr_dentry_unused; // number of dentry on LRU
    /* … */
    struct list_head s_instances; // instances of this file system type
    char s_id[32];                // textual name
    void *s_fs_info;               // private data of the concrete file system
};
struct super_operations {
    struct inode *(*alloc_inode)(struct super_block *sb); // allocate an inode
    void (*destroy_inode)(struct inode *);               // destroy an inode
    void (*dirty_inode)(struct inode *);                 // mark inode dirty
    int (*write_inode)(struct inode *, int);            // write inode to disk
    void (*drop_inode)(struct inode *);                 // logically drop inode
    void (*delete_inode)(struct inode *);               // physically free inode
    void (*put_super)(struct super_block *);            // release superblock
    void (*write_super)(struct super_block *);
    int (*sync_fs)(struct super_block *sb, int wait);
    int (*freeze_fs)(struct super_block *);
    int (*unfreeze_fs)(struct super_block *);
    int (*statfs)(struct dentry *, struct kstatfs *);
    int (*remount_fs)(struct super_block *, int *, char *);
    void (*clear_inode)(struct inode *);
    void (*umount_begin)(struct super_block *);
    int (*show_options)(struct seq_file *, struct vfsmount *);
    int (*show_stats)(struct seq_file *, struct vfsmount *);
    int (*bdev_try_to_free_page)(struct super_block *, struct page *, gfp_t);
};

Inode Object

The inode contains all information the kernel needs to operate on a file; the name can change, but the inode uniquely identifies the file.

An inode may represent regular files, devices, pipes, etc., and therefore contains special fields for those types.

When a file is first accessed, the kernel builds an inode in memory, partially from data stored on disk and partially from dynamically generated data.

struct inode {
    struct hlist_node       i_hash;
    struct list_head        i_list;      // backing dev IO list
    struct list_head        i_sb_list;
    struct list_head        i_dentry;
    unsigned long           i_ino;      // inode number
    atomic_t                i_count;    // reference count
    umode_t                 i_mode;     // file type and permissions
    /* … */
    const struct inode_operations *i_op;   // inode operations
    const struct file_operations  *i_fop;  // file operations
    struct super_block *i_sb;              // pointer to superblock
    struct file_lock *i_flock;            // file lock list
    struct address_space *i_mapping;      // shared address space
    struct address_space i_data;          // device address space
    struct list_head i_devices;           // device list
    union {
        struct pipe_inode_info *i_pipe;   // pipe info
        struct block_device *i_bdev;      // block device driver
        struct cdev *i_cdev;              // character device driver
    };
    /* … */
};
struct inode_operations {
    int (*create)(struct inode *, struct dentry *, int, struct nameidata *);
    struct dentry *(*lookup)(struct inode *, struct dentry *, struct nameidata *);
    int (*link)(struct dentry *, struct inode *, struct dentry *);
    int (*unlink)(struct inode *, struct dentry *);
    int (*symlink)(struct inode *, struct dentry *, const char *);
    int (*mkdir)(struct inode *, struct dentry *, int);
    int (*rmdir)(struct inode *, struct dentry *);
    int (*mknod)(struct inode *, struct dentry *, int, dev_t);
    /* … */
};

Directory Entry (Dentry) Object

VFS treats each directory as a file; for a path like /tmp/test , both tmp and test are files, with tmp being a directory file.

Every file has an associated inode and a dentry that links the name to that inode.

Dentries accelerate lookup and improve file‑system performance.

Dentry stores logical attributes; the inode stores physical attributes that have a disk representation.

struct dentry {
    atomic_t        d_count;        // reference count
    unsigned int    d_flags;        // status flags
    spinlock_t      d_lock;         // lock for the dentry
    int             d_mounted;      // is a mount point
    struct inode    *d_inode;       // inode this dentry refers to
    struct hlist_node d_hash;       // hash table entry
    struct dentry   *d_parent;      // parent dentry
    struct qstr     d_name;         // name for quick lookup
    struct list_head d_lru;         // LRU list for unused dentries
    union {
        struct list_head d_child; // children list
        struct rcu_head   d_rcu;
    } d_u;
    struct list_head d_subdirs;     // sub‑directory list
    struct list_head d_alias;       // inode alias list
    unsigned long   d_time;        // revalidation time
    const struct dentry_operations *d_op; // dentry operations
    struct super_block *d_sb;       // superblock of the file system
    void            *d_fsdata;    // file‑system specific data
    unsigned char   d_iname[DNAME_INLINE_LEN_MIN]; // first 15 chars of name
};
struct dentry_operations {
    int (*d_revalidate)(struct dentry *, struct nameidata *);
    int (*d_hash)(struct dentry *, struct qstr *);
    int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
    int (*d_delete)(struct dentry *);
    void (*d_release)(struct dentry *);
    void (*d_iput)(struct dentry *, struct inode *);
};

File Object

A file object has no on‑disk image; it is created when a file is opened.

The main information is the file pointer, i.e., the current position within the file.

The file structure also holds a pointer to the inode of the opened file and participates in the system‑wide open‑file table.

struct file {
    union {
        struct list_head fu_list;   // list of file objects
        struct rcu_head  fu_rcuhead;
    } f_u;
    struct path      f_path;
    const struct file_operations *f_op; // file operations
    spinlock_t       _lock;               // lock for flags, etc.
    atomic_long_t    f_count;             // reference count
    unsigned int     f_flags;
    fmode_t          f_mode;              // access mode
    loff_t           f_pos;               // current file position
    struct fown_struct f_owner;
    const struct cred *f_cred;
    struct file_ra_state f_ra;
    u64              f_version;
    void            *private_data;
    struct address_space *f_mapping;
};
struct file_operations {
    struct module *owner;
    loff_t (*llseek)(struct file *, loff_t, int);
    ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
    ssize_t (*aio_read)(struct kiocb *, const struct iovec *, unsigned long, loff_t);
    ssize_t (*aio_write)(struct kiocb *, const struct iovec *, unsigned long, loff_t);
    int (*ioctl)(struct inode *, struct file *, unsigned int, unsigned long);
    int (*flush)(struct file *, fl_owner_t id);
    int (*release)(struct inode *, struct file *);
    int (*fsync)(struct file *, struct dentry *, int datasync);
    /* … */
};

File‑Related Structures

Based on the physical medium and its organization, different file‑system types are described by a file_system_type structure. Every Linux‑supported file system has exactly one such structure, regardless of how many instances are mounted.

struct file_system_type {
    const char *name;               // file‑system name
    struct subsystem subsys;        // sysfs subsystem object
    int fs_flags;                  // file‑system type flags
    /* Called when the file system is mounted */
    struct super_block *(*get_sb)(struct file_system_type *, int, const char *, void *);
    void (*kill_sb)(struct super_block *); // called when unmounting
    struct module *owner;           // owning module
    struct file_system_type *next;  // next file‑system type in list
    struct list_head fs_supers;    // list of superblocks of this type
};

When a file system is actually mounted, a vfsmount structure is created to represent the mount point.

struct vfsmount {
    struct list_head mnt_hash;          // hash table
    struct vfsmount *mnt_parent;         // parent mount
    struct dentry *mnt_mountpoint;      // dentry of the mount point
    struct dentry *mnt_root;             // root dentry of this file system
    struct super_block *mnt_sb;          // superblock of this file system
    struct list_head mnt_mounts;         // list of child mounts
    struct list_head mnt_child;          // list of child mounts (reverse)
    atomic_t mnt_count;                 // usage count
    int mnt_flags;                      // mount flags
    char *mnt_devname;                   // device name
    struct list_head mnt_list;           // list of all mounts
    struct list_head mnt_fslink;        // per‑file‑system list
    struct namespace *mnt_namespace;      // associated namespace
};

Process‑Related Structures

files_struct – records the file descriptors opened by a process (user‑open file table).

fs_struct – records the current working directory and the root directory of a process.

struct task_struct {
    /* … */
    struct fs_struct   *fs;    // file‑system information
    struct files_struct *files; // currently opened files
    /* … */
};

File descriptor fd is an index into the files_struct ’s fd_array . The first three entries (0, 1, 2) are standard input, output and error.

struct files_struct {
    atomic_t count;               // number of processes sharing this table
    struct fdtable *fdt;         // pointer to the fd table
    struct fdtable fdtab;        // actual fd table
    spinlock_t file_lock;        // protects the structure
    int next_fd;                 // next free descriptor
    struct embedded_fd_set close_on_exec_init; // fds to close on exec
    struct embedded_fd_set open_fds_init;      // initially open fds
    struct file *fd_array[NR_OPEN_DEFAULT];   // array of file pointers
};

struct fdtable {
    unsigned int max_fds;        // maximum number of fds for the process
    struct file **fd;           // pointer to array of file pointers
    fd_set *close_on_exec;      // fds to close on exec
    fd_set *open_fds;           // currently open fds
    struct fdtable *next;       // next fd table (for expansion)
};

Each process also has a fs_struct that stores the root and current working directory dentry objects and the corresponding mount points.

struct fs_struct {
    atomic_t count;          // usage count
    rwlock_t lock;           // protects the structure
    int umask;               // default permission mask
    struct dentry *root;     // root directory dentry
    struct dentry *pwd;      // current working directory dentry
    struct dentry *altroot; // alternative root (chroot)
    struct vfsmount *rootmnt; // mount of the root directory
    struct vfsmount *pwdmnt;  // mount of the current directory
    struct vfsmount *altrootmnt; // mount of the alternative root
};

Linux File System Logical Structure

Disk and File System

Path Lookup

Basic Explanation

Opening a file in the kernel is conceptually simple: the kernel walks the user‑provided pathname component by component; if the file exists, a file structure is created, linked to the process’s files array, and the array index is returned as the user‑space file descriptor.

Path lookup parses a pathname step by step and includes the following aspects:

Determine the starting point (e.g., current->fs->cwd or current->fs->root ).

Check whether the process has permission to access the inode associated with the current dentry.

Search the next component; this may be a child entry or the parent entry "..".

Handle mount points, crossing file‑system boundaries when necessary.

Handle symbolic links by following them to their target.

Create missing components when a new file is being created.

Item 1 is the primary step; items 2‑6 are checks performed for each component during the walk.

The system call do_sys_open() ultimately invokes do_filp_open() , which performs most of the open logic, including the path lookup.

sys_open() Sequence Diagram

First, get_unused_fd_flags() obtains an available file descriptor (the index in the process’s open‑file list).

Then, do_filp_open() opens the file and returns a file object representing the opened file.

Finally, fd_install() links the descriptor with the file object; subsequent reads/writes use the descriptor.

path_init()

Sets the starting point for pathname search, mainly by initializing the nameidata variable nd . If the LOOKUP_ROOT flag is set, the function is called by open_by_handle_at() with a user‑specified root (a special case not covered here).

If the pathname begins with ‘/’, it is absolute and nd is set to the root via set_root . Otherwise it is relative.

If dfd equals AT_FDCWD , the relative pathname starts from the current working directory ( pwd ).

If dfd is not AT_FDCWD , the caller supplied a directory file descriptor; the kernel obtains the corresponding file and uses its f_path as the starting point.

In all cases the nd structure’s last_type field is initialized to LAST_ROOT . The inode field of nd is later filled with path.dentry->d_inode .

link_path_walk

link_path_walk(const char *name, struct nameidata *nd) walks the pathname component by component. Before the loop it strips redundant leading ‘/’ characters from an absolute path.

Inside the loop the kernel:

Determines the next path component and its hash value.

Updates the hash if needed.

Classifies the component as LAST_DOT ("."), LAST_DOTDOT (".."), or LAST_NORM (normal).

Collapses multiple consecutive ‘/’ characters.

Calls walk_component() to update nd and the next path . If the component is a symbolic link, only next is updated.

If a symbolic link is encountered, nested_symlink() processes it and updates nd .

Repeats until the entire pathname is consumed; the final component is later verified by do_last() .

Summary

The Virtual File System (VFS) is an abstract software layer in Linux that enables different concrete file systems to coexist and allows cross‑file‑system operations. By providing a uniform set of operations (open, read, write, close, etc.) through four core data structures—superblock, inode, dentry and file—VFS gives the kernel a consistent view of “everything is a file”. Only when control is passed to a concrete file system does the kernel perform type‑specific actions, making the Unix/Linux “everything is a file” philosophy possible.

KernellinuxOperating Systemfile systemVFS
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.