Backend Development 21 min read

Implementing a Debugging and Diagnostic Platform for Node.js Processes and Threads

This article explains how to build a non‑intrusive debugging and diagnostic platform for Node.js, covering process and thread inspection using the V8 Inspector API, dynamic control via an SDK, multi‑process and multi‑thread handling with Agent processes, and practical usage steps.

ByteDance Web Infra
ByteDance Web Infra
ByteDance Web Infra
Implementing a Debugging and Diagnostic Platform for Node.js Processes and Threads

1. Background

With the rapid growth of front‑end development, Node.js is increasingly used in business scenarios, making service stability a critical concern. Traditional server architectures rely on multiple processes or threads, isolating tasks so that a failure in one does not affect others. However, Node.js runs on a single thread; a blocking operation such as an infinite loop can halt the entire service, so convenient debugging and diagnostic tools are essential for quickly locating and solving problems as well as identifying performance bottlenecks.

2. Goal

Based on the debugging and diagnostic capabilities provided by Node.js itself, we aim to offer a platform that requires only the inclusion of an SDK. Users can then debug and diagnose service processes and threads through this platform.

3. Implementation

Both multi‑process and multi‑thread debugging and diagnostics are currently supported. The following sections describe the principles and concrete implementations for each.

3.1 Single Process

3.1.1 Debugging and Diagnosis Basics

In Node.js, process data can be collected via the inspector module:

const inspector = require('inspector');
const session = new inspector.Session();
session.connect();
// Send command
session.post('Profiler.enable', () => {});

By creating a session that communicates with V8 Inspector, we can capture heap snapshots, CPU profiles, etc. The ability can be wrapped into an SDK.

const http = require('http');
const inspector = require('inspector');
const fs = require('fs');
const session = new inspector.Session();
session.connect();
function getCpuprofile(req, res) {
  // Enable and start CPU profiling
  session.post('Profiler.enable', () => {
    session.post('Profiler.start', () => {
      setTimeout(() => {
        session.post('Profiler.stop', (err, { profile }) => {
          if (!err && profile) {
            fs.writeFileSync('./profile.cpuprofile', JSON.stringify(profile));
          }
          res.end('ok');
        });
      }, 3000);
    });
  });
}
http.createServer((req, res) => {
  if (req.url == '/debug/getCpuprofile') {
    getCpuprofile(req, res);
  } else {
    res.end('ok');
  }
}).listen(80);

To debug a process, a different API is required. The following opens an Inspector WebSocket server that Chrome DevTools can connect to:

const inspector = require('inspector');
inspector.open();
console.log(inspector.url());

We then expose dynamic control via an API, allowing the front‑end to request opening or closing the Inspector port without exposing the raw port directly.

const inspector = require('inspector');
const http = require('http');
let isOpend = false;
function getHTML() {
  return `<html>
    <meta charset="utf-8" />
    <body>
      复制到新 Tab 打开该 URL 开始调试 devtools://devtools/bundled/js_app.html?experiments=true&v8only=true&ws=${inspector.url().replace("ws://", '')}
    </body>
  </html>`;
}
http.createServer((req, res) => {
  if (req.url == '/debug/open') {
    if (!isOpend) {
      isOpend = true;
      inspector.open();
    }
    const html = getHTML();
    res.end(html);
  } else if (req.url == '/debug/close') {
    if (isOpend) {
      inspector.close();
      isOpend = false;
    }
    res.end('ok');
  } else {
    res.end('ok');
  }
}).listen(80);

The platform provides an API‑driven way to dynamically control process debugging and diagnostics, allowing further extensions such as uploading collected data to the cloud or returning a custom URL to the front‑end.

3.1.2 Concrete Implementation

The architecture adopts a plugin‑based design: the core framework handles request routing, while specific plugins implement data collection and debugging logic.

Data collection follows the same pattern as the earlier example. Debugging is more complex because the Inspector port should not be directly exposed to the front‑end. Instead, an API informs the front‑end whether the port was opened successfully, and the platform proxies WebSocket requests through an external port.

Basic proxy implementation:

const client = connect(WebSocketServerAddress);
client.on('connect', () => {
  // Forward upgraded HTTP request to WebSocket server
  client.write(`GET ${req.path} HTTP/1.1\r\n` + buildHeaders(req.headers) + '\r\n');
  // Pipe data
  socket.pipe(client);
  client.pipe(socket);
});

3.2 Multi‑Process

Node.js services often spawn multiple processes to utilize multiple CPU cores, making multi‑process debugging essential. The single‑process solution cannot be directly extended.

3.2.1 Limitations of the Single‑Process Scheme

When multiple processes share a single external port (e.g., via the Cluster module), requests may be routed to a process that has not opened the Inspector port, causing inconsistencies. Using child_process.fork also leads to port conflicts.

3.2.1 Agent Process

Introducing an Agent process solves the problem. The Agent collects information from worker processes (PID, listening address) and handles all debugging/diagnostic requests, forwarding them to the appropriate worker based on parameters.

Agent workflow:

Agent starts a server.

Each worker registers its PID and listening address with the Agent.

Clients request the PID list from the Agent and select a target process.

Agent forwards the client request to the chosen worker.

Worker processes the request and returns the result to the Agent, which then replies to the client.

3.2.2 Creating the Agent Process

Because frameworks may not expose Agent creation, each worker can spawn its own Agent. Multiple Agents compete for a single port; the one that successfully binds continues running while the others exit, leaving a single Agent.

3.3 Multi‑Thread

Thread debugging is similar to process debugging, with some differences.

3.3.1 Debugging and Diagnosis Basics

Thread data can be collected via the worker_threads and inspector modules:

const { Worker, workerData } = require('worker_threads');
const { Session } = require('inspector');
const session = new Session();
session.connect();
let id = 1;
function post(sessionId, method, params, callback) {
  session.post('NodeWorker.sendMessageToWorker', {
    sessionId,
    message: JSON.stringify({ id: id++, method, params })
  }, callback);
}
session.on('NodeWorker.attachedToWorker', (data) => {
  post(data.params.sessionId, 'Profiler.enable');
  post(data.params.sessionId, 'Profiler.start');
  setTimeout(() => {
    post(data.params.sessionId, 'Profiler.stop');
  }, 10000);
});
const worker = new Worker('./httpServer.js', { workerData: { port: 80 } });
worker.on('online', () => {
  session.post('NodeWorker.enable', { waitForDebuggerOnStart: false }, (err) => {
    console.log(err, 'NodeWorker.enable');
  });
});
setInterval(() => {}, 100000);

After enabling the worker, the Inspector port is opened via Runtime.evaluate that executes code inside the worker.

session.post('Runtime.evaluate', {
  includeCommandLineAPI: true,
  expression: `const inspector = process.binding('inspector');
    inspector.open(${port}, ${host});
    inspector.url();`
}, (err, result) => {
  // handle result
});

Node.js version differences require handling cases where a session already exists; in such cases, a C++ binding may be used to force the port open.

const inspector = require('inspector');
const session = new inspector.Session();
session.connect();
inspector.open();

When the command fails on newer Node versions, the platform falls back to the low‑level binding to guarantee dynamic opening.

4. Usage

The platform currently supports debugging and data collection for multiple processes and threads, including CPU profiles, heap snapshots, heap profiles, and memory statistics (RSS, external heap, ArrayBuffer, etc.). Users load the SDK in their business code, deploy the service, and then interact with the platform UI as follows:

Select whether to work with a process or a thread.

Enter the Agent address and choose the operation type; for data collection, also specify the duration.

Retrieve the list of processes and pick the target; hovering shows process details such as file path.

If operating on a thread, after selecting a process, retrieve its thread list and choose the desired thread.

Click execute to obtain the collected data or a URL for live debugging.

5. Summary

Implementing process and thread debugging and diagnostics in Node.js is complex. Understanding the underlying mechanisms and usage patterns is essential for applying the solution in real business scenarios. While powerful, such operations can be risky, so security considerations are crucial. Debugging, like security, may not be used daily but becomes invaluable when problems arise.

Further reading:

Deep Dive into Node.js Inspector: https://mp.weixin.qq.com/s/GLIlhURSrCYQ-8Bqg7i1kA

Node.js Worker Thread Debugging and Diagnosis Guide: https://zhuanlan.zhihu.com/p/402855448

debuggingPerformanceNode.jsMulti-ProcessDiagnosticsMulti-threadInspector
ByteDance Web Infra
Written by

ByteDance Web Infra

ByteDance Web Infra team, focused on delivering excellent technical solutions, building an open tech ecosystem, and advancing front-end technology within the company and the industry | The best way to predict the future is to create it

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.