Databases 8 min read

Human‑Crafted XOR Trick Beats LLMs in Detecting Redis Vector Set Bugs

The author recounts fixing a complex Redis Vector Sets bug, explores how human creativity outperforms LLMs in devising efficient data‑consistency checks, and shares experimental ideas—including XOR accumulators and MurmurHash—to detect non‑mutual links in large HNSW graphs.

Radish, Keep Going!

Jun 2, 2025

Human‑Crafted XOR Trick Beats LLMs in Detecting Redis Vector Set Bugs

Original link: antirez.com/user/antirez

The original author, antirez, the creator of Redis, uses a story to show that human creativity still outperforms large language models (LLMs).

He explains that he is not anti‑AI; he frequently uses LLMs for code review, idea generation, and exploring solutions, but emphasizes that current AI is far from human intelligence.

Redis Vector Sets Real Case

While fixing a complex bug in Redis's Vector Sets module, a new data‑safety mechanism was introduced: even if the checksum passes, the system rejects loading when structural inconsistencies are detected. The mechanism is disabled by default but provides an extra safety net.

To speed up saving and loading the HNSW graph structure in RDB files, he serialized only the connection graph instead of individual vector elements, storing each node’s neighbor list as integer IDs and reconstructing pointers on load.

However, this approach caused three classic problems:

If data is corrupted, a link may exist from A to B but not back from B to A.

When deleting B, the reference from A to B is not cleared because the reverse link was missing.

Scanning the graph then accesses B and subsequently A, leading to a use‑after‑free error.

Human Vs LLM

To detect non‑mutual links after loading, the naïve exhaustive check (enumerating every neighbor for every node and layer) is O(N²) and would double loading time for a 20‑million‑vector graph.

He asked Gemini 2.5 Pro for an efficient solution. The model suggested sorting neighbor lists and using binary search, which the author already knew. When pressed for alternatives, Gemini had none.

The author proposed recording each link as a key A:B:X (X is the layer) while ensuring A > B to avoid duplicates, storing them in a hash table, and deleting the entry when the reverse link is found. A non‑empty hash table after scanning indicates non‑mutual links.

Gemini pointed out the overhead of constructing keys with snprintf() and hashing. The author countered that a fixed‑size key can be built with three memcpy() calls, eliminating the need for snprintf().

He then suggested an XOR accumulator: for each link A:B:X (8 + 8 + 4 bytes), XOR the three values into a fixed‑size accumulator. Mutual links cancel out, leaving a non‑zero accumulator if any link is missing. He warned about possible collisions (false negatives) and noted that pointer patterns could be predictable.

To reduce collision risk, he proposed using a high‑quality hash function such as MurmurHash‑128 with a random seed from /dev/urandom. Each key S:A:B:X (S is the seed) is hashed, and the 128‑bit result is XORed into a 128‑bit register. A non‑zero register after processing indicates non‑mutual links, with a much lower false‑positive rate.

Gemini approved this approach, noting its practicality for a best‑effort feature that is disabled by default but valuable for performance‑critical systems.

Conclusion: Why Humans Still Lead

The author concludes that human brains retain a decisive creative advantage, allowing unconventional yet effective solutions that LLMs struggle to generate.

He also acknowledges Gemini’s role as a valuable “intelligent rubber duck,” helping accelerate the brainstorming process.

For high‑performance system developers, LLMs are powerful assistants, but they complement rather than replace human ingenuity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Redis Data Consistency algorithm design Vector Sets

Written by

Radish, Keep Going!

Personal sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.