File System Concepts and Linux Virtual File System (VFS) Overview
This article explains the purpose and functions of file systems, describes logical and physical file structures, introduces Linux's virtual file system architecture and its core data structures such as superblocks, inodes, dentries and file objects, and details the path‑lookup process used by the kernel when opening files.
File System Concepts
Introduction to File Systems
Because a computer processes a huge amount of information, it cannot keep everything in memory; therefore data are stored on external storage as files. In a multi‑user system the file system provides a unified management entity that allocates space, prevents unauthorized access, and enables sharing.
Functions of a File System
A file system is the set of software, data structures and the files it manages that implements unified file management in an operating system.
As a unified information‑management agency, a file system should provide five functions:
Unified management of storage space (allocation and reclamation).
Determine where and how file information is stored.
Map names to external‑storage addresses (named access).
Support control operations (create, delete, open, close) and access operations (read, write, modify, copy, dump).
Enable sharing, confidentiality and protection of file information.
File Structure
File structure refers to the organization form of a file and is usually divided into logical structure and physical structure.
The logical structure is the organization visible to the user; it determines how the user stores, retrieves and processes information.
The physical structure is the internal organization on the storage device; it determines the mapping from logical block numbers to physical block numbers.
Access methods are also related to the physical structure; different file systems correspond to different physical structures.
Linux File System Hierarchy
What Is a Virtual File System
To support many different concrete file systems, the kernel provides a Virtual File System (VFS) layer that hides the implementation details and offers a uniform interface to user programs. For example, Linux can "mount" a DOS partition and let applications access it with the same system calls as an ext2 file system.
Usually VFS is divided into three layers, as shown in the diagram:
First layer – the file‑system interface layer, e.g., the system‑call interfaces open , write , close .
Second layer – the VFS interface layer. It has two interfaces: one to user space and one to the specific file‑system implementation. The VFS forwards operations to the appropriate file‑system functions via vfs_operations .
Third layer – the concrete file‑system layer, which implements actual file‑system structures (e.g., network file systems such as NFS).
VFS Data Structures
Superblock object – represents a mounted file system and stores its control information.
Inode object – represents a file and stores generic file metadata.
Dentry object – represents a component of a pathname and links a name to an inode.
File object – represents an opened file for a process and holds the current file position.
Each primary object contains an operation object that describes the methods the kernel can invoke on it.
super_operations – methods applicable to a superblock.
inode_operations – methods applicable to an inode.
dentry_operations – methods applicable to a dentry.
file_operations – methods applicable to an opened file.
Superblock Object
Describes a specific file system (e.g., ext2) and is stored on a dedicated disk sector.
When the kernel registers a file system, it allocates a VFS superblock and fills it with the concrete file system's superblock data.
The VFS superblock exists only in memory and is removed when the concrete file system is unmounted.
struct super_block {
struct list_head s_list; // link all superblocks
dev_t s_dev; // device identifier
unsigned long s_blocksize; // block size in bytes
unsigned char s_blocksize_bits; // block size as power of two
unsigned char s_dirt; // dirty flag
loff_t s_maxbytes; // maximum file size
struct file_system_type *s_type; // pointer to file_system_type
const struct super_operations *s_op; // superblock operations
unsigned long s_flags; // flag bits
unsigned long s_magic; // magic number
struct dentry *s_root; // root dentry of the mount
struct rw_semaphore s_umount; // unmount semaphore
struct mutex s_lock; // superblock lock
int s_count; // reference count
struct list_head s_inodes; // all inodes list
struct list_head s_files; // all open files list
struct list_head s_dentry_lru; // unused dentry LRU
int s_nr_dentry_unused; // number of dentry on LRU
/* … */
struct list_head s_instances; // instances of this file system type
char s_id[32]; // textual name
void *s_fs_info; // private data of the concrete file system
}; struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb); // allocate an inode
void (*destroy_inode)(struct inode *); // destroy an inode
void (*dirty_inode)(struct inode *); // mark inode dirty
int (*write_inode)(struct inode *, int); // write inode to disk
void (*drop_inode)(struct inode *); // logically drop inode
void (*delete_inode)(struct inode *); // physically free inode
void (*put_super)(struct super_block *); // release superblock
void (*write_super)(struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
int (*freeze_fs)(struct super_block *);
int (*unfreeze_fs)(struct super_block *);
int (*statfs)(struct dentry *, struct kstatfs *);
int (*remount_fs)(struct super_block *, int *, char *);
void (*clear_inode)(struct inode *);
void (*umount_begin)(struct super_block *);
int (*show_options)(struct seq_file *, struct vfsmount *);
int (*show_stats)(struct seq_file *, struct vfsmount *);
int (*bdev_try_to_free_page)(struct super_block *, struct page *, gfp_t);
};Inode Object
The inode contains all information the kernel needs to operate on a file; the name can change, but the inode uniquely identifies the file.
An inode may represent regular files, devices, pipes, etc., and therefore contains special fields for those types.
When a file is first accessed, the kernel builds an inode in memory, partially from data stored on disk and partially from dynamically generated data.
struct inode {
struct hlist_node i_hash;
struct list_head i_list; // backing dev IO list
struct list_head i_sb_list;
struct list_head i_dentry;
unsigned long i_ino; // inode number
atomic_t i_count; // reference count
umode_t i_mode; // file type and permissions
/* … */
const struct inode_operations *i_op; // inode operations
const struct file_operations *i_fop; // file operations
struct super_block *i_sb; // pointer to superblock
struct file_lock *i_flock; // file lock list
struct address_space *i_mapping; // shared address space
struct address_space i_data; // device address space
struct list_head i_devices; // device list
union {
struct pipe_inode_info *i_pipe; // pipe info
struct block_device *i_bdev; // block device driver
struct cdev *i_cdev; // character device driver
};
/* … */
}; struct inode_operations {
int (*create)(struct inode *, struct dentry *, int, struct nameidata *);
struct dentry *(*lookup)(struct inode *, struct dentry *, struct nameidata *);
int (*link)(struct dentry *, struct inode *, struct dentry *);
int (*unlink)(struct inode *, struct dentry *);
int (*symlink)(struct inode *, struct dentry *, const char *);
int (*mkdir)(struct inode *, struct dentry *, int);
int (*rmdir)(struct inode *, struct dentry *);
int (*mknod)(struct inode *, struct dentry *, int, dev_t);
/* … */
};Directory Entry (Dentry) Object
VFS treats each directory as a file; for a path like /tmp/test , both tmp and test are files, with tmp being a directory file.
Every file has an associated inode and a dentry that links the name to that inode.
Dentries accelerate lookup and improve file‑system performance.
Dentry stores logical attributes; the inode stores physical attributes that have a disk representation.
struct dentry {
atomic_t d_count; // reference count
unsigned int d_flags; // status flags
spinlock_t d_lock; // lock for the dentry
int d_mounted; // is a mount point
struct inode *d_inode; // inode this dentry refers to
struct hlist_node d_hash; // hash table entry
struct dentry *d_parent; // parent dentry
struct qstr d_name; // name for quick lookup
struct list_head d_lru; // LRU list for unused dentries
union {
struct list_head d_child; // children list
struct rcu_head d_rcu;
} d_u;
struct list_head d_subdirs; // sub‑directory list
struct list_head d_alias; // inode alias list
unsigned long d_time; // revalidation time
const struct dentry_operations *d_op; // dentry operations
struct super_block *d_sb; // superblock of the file system
void *d_fsdata; // file‑system specific data
unsigned char d_iname[DNAME_INLINE_LEN_MIN]; // first 15 chars of name
}; struct dentry_operations {
int (*d_revalidate)(struct dentry *, struct nameidata *);
int (*d_hash)(struct dentry *, struct qstr *);
int (*d_compare)(struct dentry *, struct qstr *, struct qstr *);
int (*d_delete)(struct dentry *);
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
};File Object
A file object has no on‑disk image; it is created when a file is opened.
The main information is the file pointer, i.e., the current position within the file.
The file structure also holds a pointer to the inode of the opened file and participates in the system‑wide open‑file table.
struct file {
union {
struct list_head fu_list; // list of file objects
struct rcu_head fu_rcuhead;
} f_u;
struct path f_path;
const struct file_operations *f_op; // file operations
spinlock_t _lock; // lock for flags, etc.
atomic_long_t f_count; // reference count
unsigned int f_flags;
fmode_t f_mode; // access mode
loff_t f_pos; // current file position
struct fown_struct f_owner;
const struct cred *f_cred;
struct file_ra_state f_ra;
u64 f_version;
void *private_data;
struct address_space *f_mapping;
}; struct file_operations {
struct module *owner;
loff_t (*llseek)(struct file *, loff_t, int);
ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
ssize_t (*aio_read)(struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*aio_write)(struct kiocb *, const struct iovec *, unsigned long, loff_t);
int (*ioctl)(struct inode *, struct file *, unsigned int, unsigned long);
int (*flush)(struct file *, fl_owner_t id);
int (*release)(struct inode *, struct file *);
int (*fsync)(struct file *, struct dentry *, int datasync);
/* … */
};File‑Related Structures
Based on the physical medium and its organization, different file‑system types are described by a file_system_type structure. Every Linux‑supported file system has exactly one such structure, regardless of how many instances are mounted.
struct file_system_type {
const char *name; // file‑system name
struct subsystem subsys; // sysfs subsystem object
int fs_flags; // file‑system type flags
/* Called when the file system is mounted */
struct super_block *(*get_sb)(struct file_system_type *, int, const char *, void *);
void (*kill_sb)(struct super_block *); // called when unmounting
struct module *owner; // owning module
struct file_system_type *next; // next file‑system type in list
struct list_head fs_supers; // list of superblocks of this type
};When a file system is actually mounted, a vfsmount structure is created to represent the mount point.
struct vfsmount {
struct list_head mnt_hash; // hash table
struct vfsmount *mnt_parent; // parent mount
struct dentry *mnt_mountpoint; // dentry of the mount point
struct dentry *mnt_root; // root dentry of this file system
struct super_block *mnt_sb; // superblock of this file system
struct list_head mnt_mounts; // list of child mounts
struct list_head mnt_child; // list of child mounts (reverse)
atomic_t mnt_count; // usage count
int mnt_flags; // mount flags
char *mnt_devname; // device name
struct list_head mnt_list; // list of all mounts
struct list_head mnt_fslink; // per‑file‑system list
struct namespace *mnt_namespace; // associated namespace
};Process‑Related Structures
files_struct – records the file descriptors opened by a process (user‑open file table).
fs_struct – records the current working directory and the root directory of a process.
struct task_struct {
/* … */
struct fs_struct *fs; // file‑system information
struct files_struct *files; // currently opened files
/* … */
};File descriptor fd is an index into the files_struct ’s fd_array . The first three entries (0, 1, 2) are standard input, output and error.
struct files_struct {
atomic_t count; // number of processes sharing this table
struct fdtable *fdt; // pointer to the fd table
struct fdtable fdtab; // actual fd table
spinlock_t file_lock; // protects the structure
int next_fd; // next free descriptor
struct embedded_fd_set close_on_exec_init; // fds to close on exec
struct embedded_fd_set open_fds_init; // initially open fds
struct file *fd_array[NR_OPEN_DEFAULT]; // array of file pointers
};
struct fdtable {
unsigned int max_fds; // maximum number of fds for the process
struct file **fd; // pointer to array of file pointers
fd_set *close_on_exec; // fds to close on exec
fd_set *open_fds; // currently open fds
struct fdtable *next; // next fd table (for expansion)
};Each process also has a fs_struct that stores the root and current working directory dentry objects and the corresponding mount points.
struct fs_struct {
atomic_t count; // usage count
rwlock_t lock; // protects the structure
int umask; // default permission mask
struct dentry *root; // root directory dentry
struct dentry *pwd; // current working directory dentry
struct dentry *altroot; // alternative root (chroot)
struct vfsmount *rootmnt; // mount of the root directory
struct vfsmount *pwdmnt; // mount of the current directory
struct vfsmount *altrootmnt; // mount of the alternative root
};Linux File System Logical Structure
Disk and File System
Path Lookup
Basic Explanation
Opening a file in the kernel is conceptually simple: the kernel walks the user‑provided pathname component by component; if the file exists, a file structure is created, linked to the process’s files array, and the array index is returned as the user‑space file descriptor.
Path lookup parses a pathname step by step and includes the following aspects:
Determine the starting point (e.g., current->fs->cwd or current->fs->root ).
Check whether the process has permission to access the inode associated with the current dentry.
Search the next component; this may be a child entry or the parent entry "..".
Handle mount points, crossing file‑system boundaries when necessary.
Handle symbolic links by following them to their target.
Create missing components when a new file is being created.
Item 1 is the primary step; items 2‑6 are checks performed for each component during the walk.
The system call do_sys_open() ultimately invokes do_filp_open() , which performs most of the open logic, including the path lookup.
sys_open() Sequence Diagram
First, get_unused_fd_flags() obtains an available file descriptor (the index in the process’s open‑file list).
Then, do_filp_open() opens the file and returns a file object representing the opened file.
Finally, fd_install() links the descriptor with the file object; subsequent reads/writes use the descriptor.
path_init()
Sets the starting point for pathname search, mainly by initializing the nameidata variable nd . If the LOOKUP_ROOT flag is set, the function is called by open_by_handle_at() with a user‑specified root (a special case not covered here).
If the pathname begins with ‘/’, it is absolute and nd is set to the root via set_root . Otherwise it is relative.
If dfd equals AT_FDCWD , the relative pathname starts from the current working directory ( pwd ).
If dfd is not AT_FDCWD , the caller supplied a directory file descriptor; the kernel obtains the corresponding file and uses its f_path as the starting point.
In all cases the nd structure’s last_type field is initialized to LAST_ROOT . The inode field of nd is later filled with path.dentry->d_inode .
link_path_walk
link_path_walk(const char *name, struct nameidata *nd) walks the pathname component by component. Before the loop it strips redundant leading ‘/’ characters from an absolute path.
Inside the loop the kernel:
Determines the next path component and its hash value.
Updates the hash if needed.
Classifies the component as LAST_DOT ("."), LAST_DOTDOT (".."), or LAST_NORM (normal).
Collapses multiple consecutive ‘/’ characters.
Calls walk_component() to update nd and the next path . If the component is a symbolic link, only next is updated.
If a symbolic link is encountered, nested_symlink() processes it and updates nd .
Repeats until the entire pathname is consumed; the final component is later verified by do_last() .
Summary
The Virtual File System (VFS) is an abstract software layer in Linux that enables different concrete file systems to coexist and allows cross‑file‑system operations. By providing a uniform set of operations (open, read, write, close, etc.) through four core data structures—superblock, inode, dentry and file—VFS gives the kernel a consistent view of “everything is a file”. Only when control is passed to a concrete file system does the kernel perform type‑specific actions, making the Unix/Linux “everything is a file” philosophy possible.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.