Databases 19 min read

Why an LLM‑Rewritten SQLite Is 20,000× Slower: Hidden Path Errors and Lessons

A Rust rewrite of SQLite generated largely by an LLM runs a simple primary‑key lookup 20,171 times slower than native SQLite, exposing how seemingly correct code can miss critical system constraints, and illustrating the need for explicit acceptance criteria, benchmark baselines, and governance when using AI‑generated software.

Architect
Architect
Architect
Why an LLM‑Rewritten SQLite Is 20,000× Slower: Hidden Path Errors and Lessons

Benchmark Overview

An LLM‑generated Rust implementation of SQLite was benchmarked against the native C version using a minimal test: selecting 100 rows by primary key. Native SQLite completed the query in 0.09 ms, while the Rust rewrite took 1815.43 ms – a slowdown of more than 20 000×.

Benchmark comparison
Benchmark comparison

Root Cause: Incorrect Primary‑Key Path

SQLite implements primary‑key lookup as a B‑tree search with O(log n) complexity. The Rust version kept the same module names and structures but failed to recognize INTEGER PRIMARY KEY columns as a row‑id path. Consequently the query planner generated a full‑table scan instead of a B‑tree seek.

Missing Validation

Architecture terminology and module layering were correct.

The codebase is massive (≈576 k lines of Rust).

No acceptance test verified that a primary‑key query follows the O(log n) path.

SQLite Performance Invariants

SQLite’s efficiency relies on several hard‑wired invariants that keep the hot path fast:

Row‑id mapping: INTEGER PRIMARY KEY columns are mapped to rowid and accessed via SeekRowid, guaranteeing B‑tree lookup.

Zero‑copy page cache: The cache returns direct references to memory. The rewrite unnecessarily called .to_vec(), incurring extra allocations.

Statement reuse: sqlite3_prepare_v2() compiles SQL once; subsequent sqlite3_step() / sqlite3_reset() reuse the bytecode. The rewrite recompiles on every execution.

Cheap schema change detection: SQLite stores a schema cookie and reparses only when the cookie changes. The rewrite reloads the entire schema after each commit.

Selective fsync: SQLite defaults to fdatasync(2), syncing only data pages. The rewrite uses a full sync_all(), dramatically increasing autocommit cost.

Chain of Reasonable‑but‑Costly Decisions

Each of the following choices is defensible in isolation, but together they inflate cost on the hot path:

Cloned the AST after cache hits to avoid ownership complexity.

Used .to_vec() to copy page data for simplicity.

Performed a full sync on every autocommit.

Rebuilt the entire schema after each transaction.

Created fresh runtime objects for every statement.

Other Illustrative Cases

Over‑engineered Cleanup Daemon

A Rust daemon (~82 k lines) was written to clean target/ directories. The same task can be performed with a one‑line cron expression:

*/5 * * * * find ~/*/target -type d -name "incremental" -mtime +7 -exec rm -rf {} +

The daemon adds massive complexity for a problem that a simple shell command solves.

Governance Failure: terraform destroy

An AI agent with production permissions executed terraform destroy, wiping an entire environment (databases, servers, network configuration, load balancers, and automatic snapshots). The root cause was not the command itself but the lack of an approval workflow, backup protection, and verification that the destructive action was authorized.

Three Levels of Risk

Implementation: Incorrect algorithmic path in the SQLite rewrite (full‑table scan instead of B‑tree).

System: Solution size mismatched to the problem (82 k‑line cleanup daemon versus a one‑line cron).

Governance: Missing permission checks and rollback mechanisms (unauthorized terraform destroy).

Practical Safeguards for AI‑Generated Code

Define verifiable performance and correctness baselines (benchmark, complexity target, regression baseline) before allowing generated code into the main branch.

Require human review of any changes to critical paths, pruning unnecessary complexity and confirming algorithmic choices.

Separate automatic generation from production‑impacting operations with explicit permission, approval, and immutable backup mechanisms.

Conclusion

LLM‑generated code can compile and pass unit tests, yet without forward‑moved constraints, validation, and governance the system may fail on fundamental paths. Engineering must treat code generation as cheap and shift verification, review, and safety checks earlier in the workflow.

Path analysis
Path analysis
Engineered benchmark
Engineered benchmark
PerformanceLLMsoftware engineeringdatabase designbenchmarkSQLite
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.