How to Slash Node.js Serverless Startup Time Below 100 ms
This article examines why Node.js serverless functions often exceed the desired 100 ms cold‑start target, analyzes profiling data to pinpoint costly file I/O and compilation steps, and presents practical optimizations such as module bundling, V8 code‑caching, and custom require handling to dramatically reduce startup latency.
When deploying Node.js applications, developers rarely focus on the time it takes to start the process; typical startups finish in about five minutes, involving many interactions with corporate systems, which seems acceptable.
However, in the era of serverless, Node.js serverless‑runtime is a cornerstone of modern development, promising elasticity, efficiency, and cost‑effectiveness. If a Node.js FaaS still requires minute‑level deployment, it cannot respond quickly to requests, especially under burst traffic, negating those advantages.
All platforms offering Node.js FaaS strive to shorten cold‑ and warm‑start times. Beyond infrastructure optimizations, the Node.js function itself must also contribute to this effort.
The total time from request receipt to a ready container should be under 500 ms, with the function runtime portion targeted at 100 ms—coincidentally the human reaction limit.
How Fast Is Node.js?
We often assume Node.js is fast; a simple
console.logseems instantaneous. How fast is it really?
<code>// console.js
console.log(process.uptime() * 1000);
</code>On Node.js LTS v10.16.0 on a personal workstation:
<code>node console.js
// average time: 86 ms
time node console.js
// node console.js 0.08s user 0.03s system 92% cpu 0.114 total
</code>Thus, with a 100 ms budget, little time remains for additional code loading. In a typical function platform container:
<code>node console.js
// average time: 170 ms
real 0m0.177s
user 0m0.051s
sys 0m0.009s
</code>Introducing a module, e.g.,
serverless-runtime:
<code>// require.js
console.time('load');
require('serverless-runtime');
console.timeEnd('load');
</code>Local environment:
<code>node require.js
// average time: 329 ms
</code>Server environment:
<code>node require.js
// average time: 1433 ms
</code>Loading the function runtime therefore takes about 1.7 seconds, far exceeding the 100 ms goal.
Why Is It Slow?
Profiling with Node.js’s built‑in
--proftool reveals the bottlenecks.
<code>node --prof require.js
node --prof-process isolate-xxx-v8.log > result
</code> <code>[Summary]:
ticks total nonlib name
60 13.7% 13.8% JavaScript
371 84.7% 85.5% C++
10 2.3% 2.3% GC
4 0.9% Shared libraries
3 0.7% Unaccounted
[C++]:
ticks total nonlib name
198 45.2% 45.6% node::contextify::ContextifyScript::New(...)
13 3.0% 3.0% node::fs::InternalModuleStat(...)
8 1.8% 1.8% node::Buffer::StringSlice(...)
5 1.1% 1.2% node::GetBinding(...)
4 0.9% 0.9% __memmove_ssse3_back
4 0.9% 0.9% __GI_mprotect
3 0.7% 0.7% v8::internal::StringTable::LookupStringIfExists_NoAllocate(...)
3 0.7% 0.7% v8::internal::Scavenger::ScavengeObject(...)
3 0.7% 0.7% node::fs::Open(...)
</code>Running the same profiling on the runtime startup shows a similar pattern, with the majority of time spent in C++ functions such as
Open,
ContextifyContext, and
CompileFunction, all triggered during
requireoperations. Therefore, optimizing
requireis the first target.
How to Speed It Up
The two main contributors to startup latency are file I/O and code compilation.
File I/O
File I/O occurs during module lookup and module content reading. Module lookup involves probing directories for files, causing many
opencalls, especially with complex dependency trees. Reading large module files also adds overhead.
Flattening dependencies into a single bundle (as front‑end code does) can reduce I/O. Using the community tool
nccto bundle
serverless-runtimeyields:
<code>ncc build node_modules/serverless-runtime/src/index.ts
node require.js
// average load time: 934 ms
</code>This improves speed by about 34 %.
However, bundling everything can increase the bundle size, leading to slower loads for large modules, as shown in the following test:
<code>import * as _ from 'lodash';
import * as Sequelize from 'sequelize';
import * as Pandorajs from 'pandora';
console.log('lodash: ', _);
console.log('Sequelize: ', Sequelize);
console.log('Pandorajs: ', Pandorajs);
</code>Benchmark results (image):
The bundled version actually took longer because the single file grew large, highlighting the need for tree‑shaking when using
ncc.
Code Compilation
Beyond I/O, compiling JavaScript to V8 bytecode is costly. Since many modules are static, caching compiled code can avoid repeated compilation. V8 introduced
VM.ScriptcachedData in Node.js v5.7.0, allowing reuse of compiled scripts across runs.
<code>// Use v8-compile-cache to generate cache locally, then deploy
node require.js
// average time: 868 ms
</code>This yields roughly a 40 % speed boost, though file‑lookup I/O remains.
Advanced Idea
By modifying the
requirefunction to load modules directly from a pre‑generated cache, we could eliminate the lookup phase entirely, completing all module loads with a single I/O operation. This approach may affect remote debugging and source indexing and requires further investigation.
Upcoming Plans
We intend to apply the proven optimizations—
nccbundling, code cache, and the custom
requirehack—in production, balancing speed gains with maintainability.
We will also review the overall function runtime design and business logic to eliminate unnecessary delays.
Node.js 12 introduces default code‑caching for internal modules, reducing startup to around 120 ms in server environments, which we plan to adopt.
Future Considerations
V8’s snapshot feature can further accelerate startup, as used in NW.js and Electron. Node.js 12.6 enables user‑code snapshots, offering a modest 10‑15 % improvement. Packaging the function runtime as a snapshot is a potential avenue, though results are still uncertain.
For Java functions, GraalVM can achieve sub‑10 ms cold starts at the cost of some language features. Compiling the entire runtime to LLVM IR and then to native code is another research direction, albeit a challenging one.
Conclusion
Achieving a 100 ms function runtime startup is an exciting and demanding goal; contributions and ideas from the community are welcome.
Taobao Frontend Technology
The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.