Databases 6 min read

Generating One Billion SQLite Rows in Under a Minute: Python, PyPy, and Rust Performance Comparison

A programmer needed to create a billion‑row SQLite test database within a minute, found a naïve Python script unbearably slow, applied batch inserts and SQLite PRAGMA tweaks, then compared CPython, PyPy, and Rust implementations, ultimately achieving sub‑minute runtimes with Rust and highlighting best‑practice optimizations.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Generating One Billion SQLite Rows in Under a Minute: Python, PyPy, and Rust Performance Comparison

A developer attempted to generate a 1 billion‑row SQLite database on a 2019 MacBook Pro (2.4 GHz quad‑core i5) within one minute; the initial Python script using a simple for‑loop took about 15 minutes for 10 million rows and was deemed far too slow.

Switching to batch inserts of 100 k rows reduced the runtime to roughly 10 minutes, revealing that SQLite’s per‑transaction overhead was the main bottleneck.

Further SQLite optimizations were applied: disabling journal_mode, turning off synchronous, increasing cache_size, setting locking_mode=EXCLUSIVE, and using temp_store=MEMORY. The author warned that these settings are unsuitable for production environments.

Running the batch version under PyPy (which claims a 4× speedup over CPython) required only 2.5 minutes, about 3.5× faster than the CPython run.

A Rust rewrite was then explored. The naïve Rust loop still took about 3 minutes, but using prepared statements and inserting in batches of 50 rows cut the time to 34.3 seconds; a multithreaded version with one consumer thread and four producer threads achieved the best result of 32.37 seconds.

Testing with an in‑memory SQLite database ( :memory:) showed the Rust version completing in 29 seconds, indicating that disk I/O contributed only ~2 seconds to the overall runtime and that most time is spent generating and inserting data.

Benchmark results were summarized in a table: Rust – 33 seconds, PyPy – 126 seconds, CPython – 210 seconds.

The final recommendations emphasized using SQLite PRAGMA statements, prepared statements, and batch inserts; acknowledging that PyPy can be significantly faster than CPython, while async approaches do not always yield speed gains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancePythondatabaseRustSQLiteBenchmarkingPyPy
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.