SQLite Reimplementation Shows 20,000x Slowdown: When AI-Generated Code Looks Right But Fails Hard

In a striking demonstration of AI’s current limitations in software development, a Rust reimplementation of SQLite’s C API has been shown to perform 20,171 times slower on basic database operations compared to the original SQLite implementation.

The culprit isn’t a missing semicolon or syntax error—the code compiles, passes all tests, and even claims to support MVCC concurrent writers and file compatibility. Instead, the issue lies in fundamental architectural decisions that no amount of testing would catch without proper benchmarking.

The Benchmark That Exposes the Problem

The test case was deceptively simple: performing a primary key lookup on 100 rows. SQLite completed this in 0.09 milliseconds. The AI-generated Rust version took 1,815.43 milliseconds—a performance gap so massive it reveals a critical flaw in how large language models generate code.

“It’s not a misplaced comma!” explains the researcher who discovered this. “The rewrite is 20,171 times slower on one of the most basic database operations.”

The Root Cause: Missing the Critical Details

The performance disaster stems from a single oversight in the query planner. When SQLite encounters a table declaration like:

sql
CREATE TABLE test (id INTEGER PRIMARY KEY, name TEXT, value REAL);

The id column becomes an alias for the internal rowid—the B-tree key itself. This allows queries like WHERE id = 5 to resolve to a direct B-tree search, scaling at O(log n) rather than O(n).

The Rust reimplementation has a proper B-tree implementation that works correctly when called. However, the query planner never invokes it for named columns. Instead, it defaults to full table scans for every query, even when a direct key lookup would be exponentially faster.

The Danger of “Vibe Coding”

This case exemplifies what’s being called “vibe coding”—a term coined by Andrej Karpathy describing a coding style where developers “fully give in to the vibes, embrace exponentials, and forget that the code even exists.”

The problem isn’t that the AI-generated code is broken—it’s that it’s plausible but incorrect. It matches the expected structure and passes all tests, but fails under real-world conditions.

Why This Matters for Developers

The SQLite reimplementation contains 576,000 lines of Rust code across 625 files—3.7x more code than SQLite itself. Yet it still misses the critical is_ipk check that handles the selection of the correct search operation.

This highlights a crucial point: competence in software development isn’t about writing the most lines of code. It’s about knowing which details matter and which can be safely ignored.

The Broader Pattern

This isn’t an isolated incident. The same developer created an 82,000-line Rust “cleanup daemon” to solve a simple problem that a one-line cron job could handle. The pattern repeats: AI generates sophisticated solutions to problems that already have simple answers.

What This Means for AI in Development

The research suggests that LLMs work best when developers define clear acceptance criteria before any code is generated. Without specific, measurable conditions, you’re not programming—you’re generating tokens and hoping they produce the right result.

As one expert puts it: “The code is not yours until you understand it well enough to break it.”

The Path Forward

If you’re using AI tools for coding in 2026 (and most developers are), the question isn’t whether the output compiles. It’s whether you could find the bug yourself. Prompting with “find all bugs and fix them” won’t work for semantic issues like choosing the wrong algorithm or syscall.

The solution? Define what correct means, then measure. Don’t trust vibes—trust benchmarks.

Tags: SQLite, Rust, AI coding, performance optimization, database, query planner, O(log n), O(n), vibe coding, software development, benchmarks, correctness, sycophancy, RLHF, COCOMO, fsync, fdatasync, B-tree, rowid, INTEGER PRIMARY KEY

Viral phrases: “It’s not a misplaced comma!” “20,171 times slower” “vibe coding” “plausible but incorrect” “The code is not yours until you understand it well enough to break it” “Don’t trust vibes—trust benchmarks” “Competence isn’t writing the most lines of code” “The vibes are not enough”

Your LLM Doesn’t Write Correct Code. It Writes Plausible Code.

SQLite Reimplementation Shows 20,000x Slowdown: When AI-Generated Code Looks Right But Fails Hard

The Benchmark That Exposes the Problem

The Root Cause: Missing the Critical Details

The Danger of “Vibe Coding”

Why This Matters for Developers

The Broader Pattern

What This Means for AI in Development

The Path Forward

Leave a Reply

Leave a Reply Cancel reply

Interesting links

Pages

Categories

Archive