How to Perform Fuzzy Search on Encrypted Data
This article examines various techniques for enabling fuzzy queries on encrypted fields, comparing naive memory‑based methods, conventional token‑based approaches that leverage database indexes, and advanced cryptographic schemes, and recommends practical solutions for real‑world applications.
In the previous article we discussed data security and the difficulty of fuzzy searching encrypted data; this article explores implementation ideas for fuzzy queries on encrypted fields.
Encrypted sensitive data such as passwords, phone numbers, addresses, and credit‑card information are stored using reversible or irreversible encryption; while exact‑match queries are straightforward, fuzzy search requires special handling.
The author classifies three types of approaches:
“Silly” methods: loading all data into memory for decryption and matching, or maintaining a plaintext tag table; these are only feasible for very small datasets and consume excessive memory.
Conventional methods: using database decryption functions in the WHERE clause, or tokenizing the plaintext, encrypting each token and storing them in auxiliary columns, then performing LIKE queries on the encrypted tokens.
Advanced (“god‑level”) methods: designing new algorithms such as order‑preserving encryption, Bloom‑filter based schemes, or other research‑grade techniques that allow fuzzy matching without revealing plaintext.
Examples are given using DES encryption where the plaintext 13800138000 becomes the ciphertext HE9T75xNx6c5yLmS5l4r6Q== , illustrating the storage overhead (24 bytes vs 11 bytes).
The conventional token‑based method is recommended as a balanced solution: it incurs additional storage for encrypted tokens but can leverage database indexes for efficient fuzzy search, especially when the search token length is at least four English characters or two Chinese characters.
Finally, the article concludes that “silly” approaches should be avoided, conventional tokenization is the most practical for most scenarios, and advanced cryptographic schemes are worth exploring when high security and performance are both critical.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.