How to Debug Node.js in Production: Performance, Crashes, and Memory Leaks
This guide explains practical techniques for diagnosing Node.js production issues, covering request‑latency analysis, CPU profiling with perf and FlameGraph, crash investigation via Core Dumps and mdb_v8, and memory‑leak detection using gcore and mdb_v8 diff tools.
Measuring Request Latency
When user traffic grows, Node request handling slows down. Each request passes through network, Node middleware, and the target API. To pinpoint bottlenecks, collect timing data for each stage. With Express you can enable response-time and forward metrics to StatsD; alternatively, the Restify framework integrates DTrace to record per‑stage timings.
Restify also offers request‑rate limiting, built‑in Ajax error types, and Bunyan‑based logging.
Identifying CPU‑Intensive Functions
Use Linux perf to profile Node processes and generate flame graphs that visualize stack‑time distribution.
Ensure Node version ≥ 5.0.
Start the app with the flag --perf-basic-prof-only-functions:
node --perf-basic-prof-only-functions app.js &Install perf and record a 30‑second stack trace of the running Node process:
sudo yum install perf
sudo perf record -F 99 -p `pgrep -n node` -g -- sleep 30
sudo perf script > nodestacksClone Brendan Gregg’s FlameGraph repository and generate an SVG flame graph:
git clone --depth 1 http://github.com/brendangregg/FlameGraph
cd FlameGraph
./stackcollapse-perf.pl < ../nodestacks | ./flamegraph.pl --colors js > ../node-flamegraph.svgThe resulting image (example below) shows each function as a horizontal block; longer blocks indicate higher CPU consumption.
Each rectangle represents a called function.
The X‑axis encodes CPU time.
The Y‑axis shows stack depth.
Colors are arbitrary.
By locating the longest rectangles you can identify the functions that dominate CPU usage and investigate their source code.
Collecting Crash Information (Core Dumps)
Run Node with --abort-on-uncaught-exception so that an unhandled error triggers a Core Dump, which captures the full process state.
var obj = {
myproperty: "Hello World",
count: 0,
};
function increment() {
obj.count++;
if (obj.count === 1000) throw new Error("sad trombone");
setImmediate(increment);
}
setImmediate(increment);Execute the script: node --abort-on-uncaught-exception throw.js Package the generated Core file together with the Node binary, transfer it to a Solaris (or OmniOS) VM, and load it with mdb and the mdb_v8 module:
wget https://us-east.manta.joyent.com/Joyent_Dev/public/mdb_v8/v1.1.2/mdb_v8_amd64.so
mdb ./node ./core
::load ./mdb_v8_amd64.soUse ::jsstack -v to view the final JavaScript stack, then locate the offending function address with ::jsfunctions -n increment, and inspect its closure variables via ::jsclosure and ::findjsobjects. The analysis reveals that the count variable reached 1000, causing the crash.
Tracking Memory Leaks
When the process does not crash, generate Core Dumps with gcore at intervals and compare the JavaScript object graphs.
Run a script that intentionally leaks memory (creates a large array and retains a closure referencing previous data).
Periodically dump the process:
# Replace PID with the actual process ID
gcore -o leak_1 PID
gcore -o leak_2 PIDExtract JavaScript objects from each dump using dumpjsobjects:
./dumpjsobjects ./leak_1.PID ./mdb_v8_amd64.so obj_id_1 obj_content_1
./dumpjsobjects ./leak_2.PID ./mdb_v8_amd64.so obj_id_2 obj_content_2Compare the two object sets with mdbv8diff (installed from the Joyent repo):
git clone https://github.com/joyent/mdb_v8.git
cd mdb_v8/tools/mdbv8diff
npm install
./mdbv8diff /path/to/obj_content_1 /path/to/obj_content_2The diff highlights objects that persisted between snapshots.
Inspect the leaked object’s address (e.g., 135f38df83d9) with ::jsprint and list its instances using ::findjsobjects.
The output shows that the object containing the large string remains in memory, confirming the leak.
Conclusion
The article, based on talks by Netflix engineer Yunong Xiao, demonstrates a workflow for production‑grade Node.js debugging: measuring request latency, profiling CPU usage with FlameGraph, capturing crashes via Core Dumps, and detecting memory leaks with gcore and mdb_v8 tools. Complex production problems may require additional instrumentation, but the presented techniques provide a solid foundation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Aotu Lab
Aotu Lab, founded in October 2015, is a front-end engineering team serving multi-platform products. The articles in this public account are intended to share and discuss technology, reflecting only the personal views of Aotu Lab members and not the official stance of JD.com Technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
