Fundamentals 15 min read

Understanding Linux Interprocess Shared Memory: memfd_create, mmap, and Unix Domain Socket Transfer

This article explains how Linux processes can share large data efficiently by creating a memory file with memfd_create, mapping it with mmap, and transferring the file descriptor over a Unix Domain Socket, while detailing the kernel mechanisms that enable true cross‑process memory sharing.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding Linux Interprocess Shared Memory: memfd_create, mmap, and Unix Domain Socket Transfer

Linux isolates each process's virtual address space, but when large data needs to be exchanged, using network I/O incurs costly memory copies. Shared memory avoids this by allowing processes to map the same physical pages.

1. Using shared memory

The sender creates a memory file with memfd_create , maps it with mmap (using MAP_SHARED ), writes data, and then sends the file descriptor to the receiver via a Unix Domain Socket.

int main(int argc, char **argv) {
    // 创建内存文件
    fd = memfd_create("Server memfd", ...);

    // 为内存文件申请 MAP_SHARED 类型的内存
    shm = mmap(NULL, shm_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

    // 向共享内存中写入数据
    sprintf(shm, "这段内容是保存在共享内存里的,接收方和发送方都能根据自己的fd访问到这块内容");

    // 把共享内存文件的句柄给接收方进程发送过去
    struct msghdr msgh;
    *((int *) CMSG_DATA(CMSG_FIRSTHDR(&msgh))) = fd;
    sendmsg(conn, &msgh, 0);
    ......
}

The receiver connects to the sender, receives the file descriptor, maps the same memory, and reads the data.

int main(int argc, char **argv) {
    // 通过 Unix Domain Socket 连接发送方
    connect(conn, (struct sockaddr *)&address, sizeof(struct sockaddr_un));

    // 通过连接取出发送方发送过来的内存文件句柄
    int size = recvmsg(conn, &msgh, 0);
    fd = *((int *) CMSG_DATA(cmsgh));

    // 读取共享文件中的内容
    shm = mmap(NULL, shm_size, PROT_READ, MAP_PRIVATE, fd, 0);
    printf("共享内存中的文件内容是: %s\n", shm);
    ......
}

2. Internal workings of the shared‑memory file

memfd_create is a system call that obtains an unused file descriptor and creates an in‑memory file via shmem_file_setup :

// file:mm/memfd.c
SYSCALL_DEFINE2(memfd_create,
  const char __user *, uname,
  unsigned int, flags)
{
    // 申请一个未使用过的文件句柄
    fd = get_unused_fd_flags((flags & MFD_CLOEXEC) ? O_CLOEXEC : 0);

    // 创建一个共享内存的文件
    file = shmem_file_setup(name, 0, VM_NORESERVE);

    fd_install(fd, file);
    return fd;
}

The helper shmem_file_setup eventually calls __shmem_file_setup , which allocates an inode and creates a pseudo file that lives only in memory.

// file:mm/shmem.c
static struct file *__shmem_file_setup(struct vfsmount *mnt, const char *name, ...)
{
    // 申请一个 inode
    inode = shmem_get_inode(mnt->mnt_sb, NULL, S_IFREG | S_IRWXUGO, 0, flags);
    inode->i_flags |= i_flags;
    inode->i_size = size;

    // 创建一个文件
    res = alloc_file_pseudo(inode, mnt, name, O_RDWR, &shmem_file_operations);
    return res;
}

3. mmap and the VM_SHARED flag

The mmap system call ultimately reaches do_mmap , where the MAP_SHARED flag adds VM_SHARED to the VMA flags, allowing the same physical pages to be mapped into multiple processes.

// file:mm/mmap.c
unsigned long do_mmap(struct file *file, unsigned long addr,
    unsigned long len, unsigned long prot,
    unsigned long flags, vm_flags_t vm_flags,
    unsigned long pgoff, unsigned long *populate,
    struct list_head *uf)
{
    // 如果包含 MAP_SHARED,则对要申请的虚拟内存设置一个 VM_SHARED
    switch (flags & MAP_TYPE) {
        case MAP_SHARED:
        case MAP_SHARED_VALIDATE:
            vm_flags |= VM_SHARED | VM_MAYSHARE;
            ...
    }
    addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
    ...
}

The mmap_region function creates a VMA, sets its start/end, flags, and links it into the process's memory map.

// file:mm/mmap.c
unsigned long mmap_region(struct file *file, ...)
{
    struct mm_struct *mm = current->mm;
    // 申请虚拟内存 vma
    vma = vm_area_alloc(mm);
    vma->vm_start = addr;
    vma->vm_end = addr + len;
    vma->vm_flags = vm_flags;
    vma->vm_page_prot = vm_get_page_prot(vm_flags);
    vma->vm_pgoff = pgoff;
    // 加入到进程的虚拟内存 vma 链表中
    vma_link(mm, vma, prev, rb_link, rb_parent);
}

Because the VMA carries VM_SHARED , page faults for this region can resolve to the same physical page in different processes, achieving true sharing.

4. Sending the file descriptor

The sender uses sendmsg to embed the fd in a control message; the kernel path traverses several layers before reaching unix_stream_sendmsg , which packages the fd into an scm_cookie and places the resulting skb into the receiver's queue.

// file:net/socket.c
SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags)
{
    return __sys_sendmsg(fd, msg, flags, true);
}
// file:net/unix/af_unix.c
static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, ...)
{
    struct scm_cookie scm;
    scm_send(sock, msg, &scm, false);
    while (sent < len) {
        skb = sock_alloc_send_pskb(sk, size - data_len, data_len,
                msg->msg_flags & MSG_DONTWAIT, &err, get_order(UNIX_SKB_FRAGS_SZ));
        err = unix_scm_to_skb(&scm, skb, !fds_sent);
        err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
        skb_queue_tail(&other->sk_receive_queue, skb);
        other->sk_data_ready(other);
        sent += size;
        ...
    }
}

During scm_send , the kernel resolves the fd to its underlying struct file and stores the pointer in the control message.

// file:net/core/scm.c
static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
{
    for (i = 0; i < num; i++) {
        int fd = fdp[i];
        struct file *file;
        if (fd < 0 || !(file = fget_raw(fd)))
            return -EBADF;
        *fpp++ = file;
        fpl->count++;
    }
}

5. Receiving the file descriptor

The receiver calls recvmsg , which eventually invokes unix_stream_read_generic to dequeue the skb and then scm_recv to extract the embedded fd.

// file:net/unix/af_unix.c
static int unix_stream_read_generic(struct unix_stream_read_state *state, bool freezable)
{
    do {
        last = skb = skb_peek(&sk->sk_receive_queue);
        ...
    } while (...);
    if (state->msg)
        scm_recv(sock, state->msg, &scm, flags);
    return copied ? : err;
}
// file:net/core/scm.c
void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
{
    for (i = 0; i < fdmax; i++) {
        err = receive_fd_user(scm->fp->fp[i], cmsg_data + i, o_flags);
        if (err < 0)
            break;
    }
    ...
}

After extracting the struct file pointer, the kernel creates a new fd in the receiving process with fd_install , allowing the process to mmap the same memory region.

// file:fs/file.c
int __receive_fd(struct file *file, int __user *ufd, unsigned int o_flags)
{
    // 申请一个新的文件描述符
    new_fd = get_unused_fd_flags(o_flags);
    ...
    // 关联文件
    fd_install(new_fd, get_file(file));
    return new_fd;
}

6. Summary

The complete workflow consists of three steps: creating a memory file with memfd_create , mapping it with mmap using MAP_SHARED , and transferring the file descriptor over a Unix Domain Socket via sendmsg / recvmsg . The kernel achieves sharing by keeping a single struct file object and marking the VMA with VM_SHARED , which maps the same physical pages into each participating process.

kernelLinuxMMAPshared memoryIPCUnix Domain Socketmemfd_create
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.