How Much Memory Do Async and Threaded Programs Really Use? A Multi‑Language Benchmark

This article benchmarks memory consumption of concurrent programs across Rust, Go, Java, C#, Node.js, Python, Elixir and others, revealing surprising differences in how threads, async runtimes, and virtual threads scale from a single task up to one million concurrent tasks.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
How Much Memory Do Async and Threaded Programs Really Use? A Multi‑Language Benchmark

This benchmark compares memory consumption of concurrent tasks across several languages: Rust (threads, Tokio async, async‑std), Go (goroutines), Java (threads and JDK 21 virtual threads), C#/.NET (Task.Run), Node.js (Promise.all), Python (asyncio), and Elixir (Task.async).

Benchmark Design

Each program launches N concurrent tasks that sleep for 10 seconds, then exits. The task count is supplied via a command‑line argument. Source code is available at https://github.com/pkolaczk/async-runtimes-benchmarks.

Rust

Three variants:

Native threads using std::thread::spawn and join.

Tokio async tasks created with task::spawn and awaited.

Async‑std version (similar to Tokio).

let mut handles = Vec::new();
for _ in 0..num_threads {
    let handle = thread::spawn(|| {
        thread::sleep(Duration::from_secs(10));
    });
    handles.push(handle);
}
for handle in handles {
    handle.join().unwrap();
}

Go

Goroutines synchronized with a sync.WaitGroup:

var wg sync.WaitGroup
for i := 0; i < numRoutines; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        time.Sleep(10 * time.Second)
    }()
}
wg.Wait()

Java

Two versions:

Classic Thread objects stored in a List<Thread>, started and joined.

JDK 21 preview virtual threads created with Thread.startVirtualThread using the same task logic.

C# (.NET)

Tasks are created with Task.Run (optional) and awaited via Task.WhenAll. Omitting Task.Run reduces overhead.

Node.js

Creates a delay promise with util.promisify(setTimeout), pushes numTasks promises into an array, and awaits Promise.all.

const delay = util.promisify(setTimeout);
const tasks = [];
for (let i = 0; i < numTasks; i++) {
    tasks.push(delay(10000));
}
await Promise.all(tasks);

Python

Async function sleeps for 10 seconds; tasks are created with asyncio.create_task and gathered with asyncio.gather.

async def perform_task():
    await asyncio.sleep(10)

tasks = []
for _ in range(num_tasks):
    tasks.append(asyncio.create_task(perform_task()))
await asyncio.gather(*tasks)

Elixir

Spawns num_tasks processes with Task.async, each sleeping via :timer.sleep(10000), then waits with Task.await_many.

tasks = for _ <- 1..num_tasks do
    Task.async(fn -> :timer.sleep(10000) end)
end
Task.await_many(tasks, :infinity)

Test Environment

CPU: Intel Xeon E3-1505M v6 @ 3.00 GHz

OS: Ubuntu 22.04 LTS, Linux 5.15.0‑72‑generic

Versions: Rust 1.69, Go 1.18.1, OpenJDK 21‑ea, .NET 6.0.116, Node v12.22.9, Python 3.10.6, Elixir 1.12.2 (Erlang/OTP 24)

All programs were built in release mode where applicable.

Results

Minimal Memory (1 task)

Native compiled binaries (Rust, Go) use very little memory. Managed runtimes (Java, .NET, Node, Python, Elixir) consume more, with a roughly order‑of‑magnitude gap.

10 000 Tasks

Rust threads remain lightweight, below many runtimes' idle memory. Go’s goroutine memory is about 50 % higher than Rust threads. Java threads consume ~250 MB, the worst in this range. .NET shows modest increase, possibly due to pre‑allocation.

100 000 Tasks

Thread‑based benchmarks cannot run at this scale on the test machine. Go falls behind Rust, Java, and C# by more than 12×, contradicting the belief that Go is extremely lightweight. Rust’s Tokio runtime still leads.

1 000 000 Tasks

Elixir hits a system limit unless the Erlang VM is started with increased process limits (e.g., erl +P 4000000). C# outperforms most languages, ranking second only to Rust/Tokio. Go’s memory usage explodes, becoming the worst performer.

Conclusions

Massive concurrency can consume substantial memory even when tasks are idle. Native compiled languages with low‑overhead threads (Rust, Go) excel at small to moderate scales, but Go’s fixed 2 KiB stack per goroutine becomes a liability at very high counts. Managed runtimes with higher initial overhead (Java, .NET) handle larger task counts more gracefully once the overhead is amortized. Language‑specific tuning—such as adjusting .NET GC heap, Java GC flags, or Erlang process limits—can significantly affect memory usage.

Future work will examine task start‑up latency and inter‑task communication throughput.

JavaPythonRustGoNode.jsC++AsyncElixirmemory benchmarkruntime comparison
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.