Fundamentals 10 min read

Why ls Slows Down on Directories with Millions of Files and How to Speed It Up Using getdents

The article explains that the ls command becomes slow on directories containing millions of files because it repeatedly calls the getdents system call through readdir, and demonstrates how directly using getdents with a larger buffer can reduce the listing time from several seconds to under one second.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Why ls Slows Down on Directories with Millions of Files and How to Speed It Up Using getdents

When listing a directory with only a few entries, ls finishes instantly, but on a directory containing one million small files the command time ls -l | wc -l takes about 5.8 seconds, prompting an investigation into the cause of the slowdown.

Using strace reveals that ls repeatedly invokes the getdents system call. The source code of GNU coreutils shows that ls opens a directory with opendir , then repeatedly calls readdir to read each dirent entry. The readdir implementation maintains a buffer whose size is set during opendir via __alloc_dir . The default allocation is 32 KB (derived from 4*BUFSIZ , where BUFSIZ is 8192). When the buffer is exhausted, readdir refills it by calling getdents , which reads up to the buffer size (32768 bytes) each time.

static void
print_dir (const char *name, const char *realname)
{
    register DIR *dirp;
    register struct dirent *next;
    ...
    dirp = opendir(name);
    while ((next = readdir(dirp)) != NULL) {
        ...
    }
}

The struct dirent contains fields such as d_ino , d_off , d_reclen , d_type , and d_name . Because the buffer size is fixed at 32 KB, a directory with a huge number of entries forces many getdents calls, which explains the linear increase in execution time of ls as the file count grows.

Since the glibc buffer size cannot be changed, the article proposes bypassing readdir and invoking getdents directly, allowing the programmer to choose a larger buffer. The following example, listdir.c , sets a 5 MB buffer and uses the syscall(SYS_getdents, ...) interface to list directory entries.

#define _GNU_SOURCE
#include
#include
#include
#include
#include
#include
#include
#define BUF_SIZE (1024*1024*5)

int main(int argc, char *argv[]) {
    int fd = open(argc>1?argv[1]:".", O_RDONLY|O_DIRECTORY);
    char buf[BUF_SIZE];
    int nread, bpos;
    struct linux_dirent *d;
    char d_type;
    while ((nread = syscall(SYS_getdents, fd, buf, BUF_SIZE)) > 0) {
        for (bpos = 0; bpos < nread; ) {
            d = (struct linux_dirent *)(buf + bpos);
            d_type = *(buf + bpos + d->d_reclen - 1);
            printf("%8ld %s %4d %lld %s\n", d->d_ino,
                   (d_type==DT_REG)?"regular":(d_type==DT_DIR)?"directory": "???",
                   d->d_reclen, (long long)d->d_off, d->d_name);
            bpos += d->d_reclen;
        }
    }
    return 0;
}

Compiling and running this program on the same one‑million‑file directory reduces the time to list all entries from 5.8 seconds to about 0.75 seconds, demonstrating a significant performance gain.

The same principle applies to other commands that rely on readdir , such as rm -r . When dealing with directories containing massive numbers of files, directly using getdents with a suitably sized buffer can provide noticeable speed improvements.

PerformanceLinuxfilesystemgetdentslsreaddir
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.