How the Linux Kernel Determines File Access: Inside permission() and vfs_permission()
This article dissects the Linux kernel's permission checking flow, explaining how the path_walk call invokes permission(), how vfs_permission evaluates inode mode bits, mask constants, capability overrides, and how prepare_binprm sets up execution permissions for binaries.
In the Linux kernel source, the path_walk function calls permission(inode, MAY_EXEC) to determine whether the current process can access a target node.
permission() implementation :
int permission(struct inode *inode, int mask) {
if (inode->i_op && inode->i_op->permission) {
int retval;
lock_kernel();
retval = inode->i_op->permission(inode, mask);
unlock_kernel();
return retval;
}
return vfs_permission(inode, mask);
}If the inode's i_op does not provide a permission callback (as is the case for ext2_file_inode_operations, ext2_dir_inode_operations, ext2_fast_symlink_inode_operations, and page_symlink_inode_operations), the generic vfs_permission function is used.
vfs_permission() implementation :
int vfs_permission(struct inode *inode, int mask) {
int mode = inode->i_mode;
if ((mask & S_IWOTH) && IS_RDONLY(inode) &&
(S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
return -EROFS; // write on read‑only filesystem
if ((mask & S_IWOTH) && IS_IMMUTABLE(inode))
return -EACCES;
if (current->fsuid == inode->i_uid)
mode >>= 6; // owner bits
else if (in_group_p(inode->i_gid))
mode >>= 3; // group bits
if (((mode & mask & S_IRWXO) == mask) || capable(CAP_DAC_OVERRIDE))
return 0;
/* read and search access */
if ((mask == S_IROTH) ||
(S_ISDIR(inode->i_mode) && !(mask & ~(S_IROTH | S_IXOTH)))) {
if (capable(CAP_DAC_READ_SEARCH))
return 0;
}
return -EACCES;
}The mask values are defined as:
#define MAY_EXEC 1
#define MAY_WRITE 2
#define MAY_READ 4File mode bits ( inode->i_mode) encode permissions for user, group, and others:
#define S_IRWXU 00700
#define S_IRUSR 00400
#define S_IWUSR 00200
#define S_IXUSR 00100
#define S_IRWXG 00070
#define S_IRGRP 00040
#define S_IWGRP 00020
#define S_IXGRP 00010
#define S_IRWXO 00007
#define S_IROTH 00004
#define S_IWOTH 00002
#define S_IXOTH 00001Additional flag bits indicate special attributes:
#define S_ISUID 0004000
#define S_ISGID 0002000
#define S_ISVTX 0001000Only four bits remain for file‑type encoding, defined as:
#define S_IFMT 00170000
#define S_IFSOCK 0140000
#define S_IFLNK 0120000
#define S_IFREG 0100000
#define S_IFBLK 0060000
#define S_IFDIR 0040000
#define S_IFCHR 0020000
#define S_IFIFO 0010000The capability check uses the inline function:
static inline int capable(int cap) {
if (cap_raised(current->cap_effective, cap)) {
current->flags |= PF_SUPERPRIV;
return 1;
}
return 0;
}
#define cap_raised(c, flag) (cap_t(c) & CAP_TO_MASK(flag))
#define cap_t(x) (x)
#define CAP_TO_MASK(x) (1 << (x))During binary execution, prepare_binprm sets up the effective UID/GID and capability sets based on the inode's mode and set‑uid/set‑gid bits:
int prepare_binprm(struct linux_binprm *bprm) {
struct inode *inode = bprm->file->f_dentry->d_inode;
int mode = inode->i_mode;
if (!(mode & 0111))
return -EACCES; // no execute bit
if (bprm->file->f_op == NULL)
return -EACCES;
bprm->e_uid = current->euid;
bprm->e_gid = current->egid;
if (!IS_NOSUID(inode)) {
if (mode & S_ISUID)
bprm->e_uid = inode->i_uid;
if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
bprm->e_gid = inode->i_gid;
}
cap_clear(bprm->cap_inheritable);
cap_clear(bprm->cap_permitted);
cap_clear(bprm->cap_effective);
if (!issecure(SECURE_NOROOT)) {
if (bprm->e_uid == 0 || current->uid == 0) {
cap_set_full(bprm->cap_inheritable);
cap_set_full(bprm->cap_permitted);
}
if (bprm->e_uid == 0)
cap_set_full(bprm->cap_effective);
}
memset(bprm->buf, 0, BINPRM_BUF_SIZE);
return kernel_read(bprm->file, 0, bprm->buf, BINPRM_BUF_SIZE);
}The accompanying diagram (shown below) visualizes how the permission bits map to user, group, and others, and how the special flag bits are used.
Another illustration demonstrates the relationship between capability checks and permission evaluation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
