How Much Memory Do Async and Threaded Programs Really Use? A Multi‑Language Benchmark
This article benchmarks memory consumption of concurrent programs across Rust, Go, Java, C#, Node.js, Python, Elixir and others, revealing surprising differences in how threads, async runtimes, and virtual threads scale from a single task up to one million concurrent tasks.
This benchmark compares memory consumption of concurrent tasks across several languages: Rust (threads, Tokio async, async‑std), Go (goroutines), Java (threads and JDK 21 virtual threads), C#/.NET (Task.Run), Node.js (Promise.all), Python (asyncio), and Elixir (Task.async).
Benchmark Design
Each program launches N concurrent tasks that sleep for 10 seconds, then exits. The task count is supplied via a command‑line argument. Source code is available at https://github.com/pkolaczk/async-runtimes-benchmarks.
Rust
Three variants:
Native threads using std::thread::spawn and join.
Tokio async tasks created with task::spawn and awaited.
Async‑std version (similar to Tokio).
let mut handles = Vec::new();
for _ in 0..num_threads {
let handle = thread::spawn(|| {
thread::sleep(Duration::from_secs(10));
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}Go
Goroutines synchronized with a sync.WaitGroup:
var wg sync.WaitGroup
for i := 0; i < numRoutines; i++ {
wg.Add(1)
go func() {
defer wg.Done()
time.Sleep(10 * time.Second)
}()
}
wg.Wait()Java
Two versions:
Classic Thread objects stored in a List<Thread>, started and joined.
JDK 21 preview virtual threads created with Thread.startVirtualThread using the same task logic.
C# (.NET)
Tasks are created with Task.Run (optional) and awaited via Task.WhenAll. Omitting Task.Run reduces overhead.
Node.js
Creates a delay promise with util.promisify(setTimeout), pushes numTasks promises into an array, and awaits Promise.all.
const delay = util.promisify(setTimeout);
const tasks = [];
for (let i = 0; i < numTasks; i++) {
tasks.push(delay(10000));
}
await Promise.all(tasks);Python
Async function sleeps for 10 seconds; tasks are created with asyncio.create_task and gathered with asyncio.gather.
async def perform_task():
await asyncio.sleep(10)
tasks = []
for _ in range(num_tasks):
tasks.append(asyncio.create_task(perform_task()))
await asyncio.gather(*tasks)Elixir
Spawns num_tasks processes with Task.async, each sleeping via :timer.sleep(10000), then waits with Task.await_many.
tasks = for _ <- 1..num_tasks do
Task.async(fn -> :timer.sleep(10000) end)
end
Task.await_many(tasks, :infinity)Test Environment
CPU: Intel Xeon E3-1505M v6 @ 3.00 GHz
OS: Ubuntu 22.04 LTS, Linux 5.15.0‑72‑generic
Versions: Rust 1.69, Go 1.18.1, OpenJDK 21‑ea, .NET 6.0.116, Node v12.22.9, Python 3.10.6, Elixir 1.12.2 (Erlang/OTP 24)
All programs were built in release mode where applicable.
Results
Minimal Memory (1 task)
Native compiled binaries (Rust, Go) use very little memory. Managed runtimes (Java, .NET, Node, Python, Elixir) consume more, with a roughly order‑of‑magnitude gap.
10 000 Tasks
Rust threads remain lightweight, below many runtimes' idle memory. Go’s goroutine memory is about 50 % higher than Rust threads. Java threads consume ~250 MB, the worst in this range. .NET shows modest increase, possibly due to pre‑allocation.
100 000 Tasks
Thread‑based benchmarks cannot run at this scale on the test machine. Go falls behind Rust, Java, and C# by more than 12×, contradicting the belief that Go is extremely lightweight. Rust’s Tokio runtime still leads.
1 000 000 Tasks
Elixir hits a system limit unless the Erlang VM is started with increased process limits (e.g., erl +P 4000000). C# outperforms most languages, ranking second only to Rust/Tokio. Go’s memory usage explodes, becoming the worst performer.
Conclusions
Massive concurrency can consume substantial memory even when tasks are idle. Native compiled languages with low‑overhead threads (Rust, Go) excel at small to moderate scales, but Go’s fixed 2 KiB stack per goroutine becomes a liability at very high counts. Managed runtimes with higher initial overhead (Java, .NET) handle larger task counts more gracefully once the overhead is amortized. Language‑specific tuning—such as adjusting .NET GC heap, Java GC flags, or Erlang process limits—can significantly affect memory usage.
Future work will examine task start‑up latency and inter‑task communication throughput.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
