Backend Development 28 min read

How DataLoader Solves the GraphQL N+1 Problem: Deep Dive into Batch & Cache Mechanics

This article explains the GraphQL N+1 performance issue, demonstrates how DataLoader batches and caches database calls to eliminate redundant queries, walks through its core TypeScript implementation—including batch scheduling, cache handling, and load methods—while providing practical examples and integration tips for real‑world GraphQL servers.

Taobao Frontend Technology

Sep 28, 2021

How DataLoader Solves the GraphQL N+1 Problem: Deep Dive into Batch & Cache Mechanics

Preface

Before reading this article, make sure you have a basic understanding of GraphQL, such as schema syntax, resolvers, and object types. If you are not familiar with the GraphQL N+1 problem, the introductory section will explain its cause before presenting the solution and a walkthrough of DataLoader source code.

As a competitor to RESTful APIs, GraphQL offers strong typing, version compatibility, on‑demand fields, and concise nested queries, but it also has drawbacks like the N+1 problem and the need for special caching algorithms (e.g., Apollo Client’s cache). This article focuses solely on the N+1 issue.

How the N+1 Problem Arises

Assume we need to fetch all users and their pets. With REST we would first request all user IDs, then request each pet by ID:

GET /users
[
  {
    "id": 1,
    "name": "aaa",
    "petsId": [1, 2]
  },
  {
    "id": 2,
    "name": "bbb",
    "petsId": [2, 3, 4]
  }
]

GET /pet/:id
[
  {
    "id": 1,
    "kind": "Cat"
  },
  {
    "id": 2,
    "kind": "Dog"
  }
  // ...
]

Don’t ask why pet IDs are duplicated; they might be shared in the future.

If there are N users, we need N+1 API calls and N+1 database I/O operations.

If the /users endpoint does not include the petsId field, we would need an additional N calls to /user/:id , resulting in 2N+1 requests.

In GraphQL the query looks like this:

query {
  fetchAllUsers {
    id
    name
    pets {
      id
      kind
      age
      isMale
    }
  }
}

It appears that only one request is made, but the database I/O still occurs N+1 times because each user’s pets resolver runs separately.

For 100 users, the fetchAllUsers resolver returns an array of 100 users, then each user’s pets resolver is invoked, resulting in 100 additional calls – the classic GraphQL N+1 problem.

Even though GraphQL excels at nested data retrieval, a query like “all users’ high‑intimacy male friends’ likes in the last half‑year” would be extremely cumbersome with REST.

Using a Backend‑For‑Frontend (BFF) layer merely moves the data‑shaping logic elsewhere; GraphQL can serve as the BFF, especially when multiple front‑ends consume the same data.

Practical Example

All code is hosted on GitHub in the DataLoader-Source-Explore repository. The example reproduces the N+1 problem and shows how to use DataLoader. The server is built with Apollo‑Server and local mock data. For a full‑featured server, consider TypeGraphQL and the GraphQL‑Explorer‑Server project.

Note: Apollo Server v3 introduces breaking changes; do not upgrade when following the example.

The GraphQL schema for User and Pet is:

type Query {
  fetchAllUsers: [User]
  fetchUserByName(name: String!): User
}

type User {
  id: Int!
  name: String!
  partner: User
  pets: [Pet]
}

type Pet {
  id: Int!
  kind: String!
  age: Int!
  isMale: Boolean!
}

DataLoader Source Code Walkthrough

DataLoader is widely used in production GraphQL servers. Its core features are Batch and Cache . The constructor signature is:

class DataLoader<K, V, C = K> {
  constructor(batchLoadFn: BatchLoadFn<K, V>, options?: Options<K, V, C>) {
    // ...
  }
  // ...
}

batchLoadFn

is a function that receives an array of keys and returns a matching array of values. DataLoader collects keys during a single tick, then calls this function once.

The internal properties are:

class DataLoader {
  _batchLoadFn: BatchLoadFn<K, V>;
  _batchScheduleFn: (fn: () => void) => void;
  _maxBatchSize: number;
  _cacheKeyFn: (key: K) => C;
  _cacheMap: CacheMap<C, Promise<V>> | null;
  _batch: Batch<K, V> | null;
}

Key points:

_batchLoadFn stores the user‑provided batch function.

_batchScheduleFn decides when a batch is dispatched. In Node it uses process.nextTick; in browsers it falls back to setImmediate or setTimeout.

_maxBatchSize defaults to Infinity; setting batch: false forces a batch size of 1.

_cacheKeyFn converts a key to a cache identifier (defaults to identity).

_cacheMap holds pending promises; by default it is a native Map.

The batch object looks like:

type Batch<K, V> = {
  hasDispatched: boolean;
  keys: Array<K>;
  callbacks: Array<{ resolve: (value: V) => void; reject: (error: Error) => void }>;
  cacheHits?: Array<() => void>;
};

When load(key) is called, DataLoader checks the cache. If a pending promise exists, it registers a cacheHit callback; otherwise it pushes the key and a new promise’s resolve/reject into the current batch.

load(key: K): Promise<V> {
  let batch = getCurrentBatch(this);
  let cacheMap = this._cacheMap;
  let cacheKey = this._cacheKeyFn(key);

  if (cacheMap) {
    let cachedPromise = cacheMap.get(cacheKey);
    if (cachedPromise) {
      let cacheHits = batch.cacheHits || (batch.cacheHits = []);
      return new Promise(resolve => {
        cacheHits.push(() => resolve(cachedPromise as V | PromiseLike<V>));
      });
    }
  }

  batch.keys.push(key);
  const promise = new Promise<V>((resolve, reject) => {
    batch.callbacks.push({ resolve, reject });
  });

  if (cacheMap) {
    cacheMap.set(cacheKey, promise);
  }
  return promise;
}

Dispatching a batch is performed by dispatchBatch after the scheduled tick:

function dispatchBatch<K, V>(loader: DataLoader<K, V, any>, batch: Batch<K, V>) {
  batch.hasDispatched = true;
  if (batch.keys.length === 0) { resolveCacheHits(batch); return; }
  let batchPromise = loader._batchLoadFn(batch.keys);
  batchPromise
    .then(values => {
      resolveCacheHits(batch);
      for (let i = 0; i < batch.callbacks.length; i++) {
        let value = values[i];
        if (value instanceof Error) {
          batch.callbacks[i].reject(value);
        } else {
          batch.callbacks[i].resolve(value);
        }
      }
    })
    .catch(error => { failedDispatch(loader, batch, error); });
}

function resolveCacheHits(batch: Batch<any, any>) {
  if (batch.cacheHits) {
    for (let i = 0; i < batch.cacheHits.length; i++) {
      batch.cacheHits[i]();
    }
  }
}

The loadMany(keys) method simply calls load for each key and aggregates the results with Promise.all, converting rejections into Error objects so the whole batch does not fail.

loadMany(keys: ReadonlyArray<K>): Promise<Array<V | Error>> {
  const loadPromises: Promise<any>[] = [];
  for (let i = 0; i < keys.length; i++) {
    loadPromises.push(this.load(keys[i]).catch(error => error));
  }
  return Promise.all(loadPromises);
}

Additional DataLoader methods include clear(key), clearAll(), and prime(key, value) for manual cache manipulation.

Integration Options

NestJS‑DataLoader : Provide a batch function and inject the loader via a parameter decorator.

TypeGraphQL‑DataLoader : Automatically derives batch functions from TypeORM relations, but also allows manual registration.

Hasura / PostGraphile : Offer GraphQL‑as‑a‑service, eliminating the need to write custom resolvers or DataLoader code.

Outlook

The repository also contains a stripped‑down 100‑line version of DataLoader that always uses process.nextTick for scheduling and creates a batch as soon as the first key is added.

Prisma 2 includes its own DataLoader implementation (see runtime/DataLoader.ts), reflecting the close relationship between Prisma and GraphQL.

Conclusion

The essential idea is to enqueue a post‑promise job ( enqueuePostPromiseJob) that batches single‑key requests into a single bulk database call, dramatically reducing N+1 queries. However, DataLoader only improves performance when the data volume is large enough; for small datasets the extra overhead may outweigh the benefits.

Code examples in this article use mock data and do not perform real database I/O.

Node.js Caching GraphQL DataLoader Batching N+1 problem

Written by

Taobao Frontend Technology

The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.