Databases 11 min read

How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons

This article examines why encrypted data hinders fuzzy queries and compares three categories of solutions—naïve, conventional, and advanced—detailing their implementation steps, performance trade‑offs, storage costs, and practical suitability for real‑world systems.

dbaplus Community
dbaplus Community
dbaplus Community
How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons

When sensitive fields such as phone numbers or bank cards are stored encrypted, traditional fuzzy search becomes difficult; this article categorises and analyses three families of techniques for enabling fuzzy queries on reversible‑encrypted data.

1. Naïve Approaches

Load the entire dataset into memory, decrypt it, and perform fuzzy matching in application code.

Create a plaintext “tag” table that maps each ciphertext to its clear‑text value, then query the tag table with fuzzy conditions.

Naïve 1 works only for very small tables (hundreds to a few thousand rows); otherwise memory consumption explodes because each encrypted field expands (e.g., a DES‑encrypted phone number occupies 24 bytes). This can quickly cause out‑of‑memory failures.

Naïve 2 defeats the purpose of encryption by storing a clear‑text lookup table, exposing the data and adding unnecessary maintenance overhead, so it is strongly discouraged.

2. Conventional Approaches

These methods are widely adopted and balance security with query friendliness.

Implement the same encryption/decryption algorithm as the application inside the database and modify the fuzzy condition to decode(key) LIKE '%partial%'.

Tokenise the ciphertext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key LIKE '%partial%'.

Conventional 1 is easy to adopt and requires only minor changes to existing queries, but it cannot leverage indexes and may suffer from algorithm mismatches between the application and the database.

Conventional 2 splits a field into fixed‑length segments (e.g., four English characters or two Chinese characters) before encryption. For example, the string ningyu1 becomes the token groups ning, ingy, ngyu, gyu1. Queries then match encrypted tokens using LIKE '%partial%'. This approach incurs storage overhead because encrypted data expands (DES expands 11 bytes to 24 bytes, a 2.18× increase), but it allows index utilisation and acceptable performance for moderate data volumes.

When the token length is too short (less than four English characters or two Chinese characters), the number of generated tokens grows dramatically, raising storage costs and reducing security.

Reference implementations from major e‑commerce platforms illustrate this technique:

Taobao: https://open.taobao.com/docV3.htm?docId=106213&docType=1

Alibaba: https://jaq-doc.alibaba.com/docs/doc.htm?treeId=1&articleId=106213&docType=1

Pinduoduo: https://open.pinduoduo.com/application/document/browse?idStr=3407B605226E77F2

JD.com: https://jos.jd.com/commondoc?listId=345

3. Advanced (Algorithmic) Approaches

These solutions require deep cryptographic research and often involve designing new algorithms that preserve order and limit ciphertext growth while supporting fuzzy matching.

Design a reversible encryption scheme where ciphertext retains the same ordering as plaintext, enabling direct fuzzy matching on encrypted values.

Relevant research and blog posts include:

Database character fuzzy‑match encryption methods: https://www.jiamisoft.com/blog/6542-zifushujumohupipeijiamifangfa.html

Hill cipher and FMES fuzzy encryption: (see discussion in the linked article)

Bloom‑filter‑based encrypted fuzzy search: http://kzyjc.cnjournals.com/html/2019/1/20190112.htm

Fast‑query encrypted databases: https://www.jiamisoft.com/blog/5961-kuaisuchaxunshujukujiami.html

Lucene‑based encrypted fuzzy search: https://www.cnblogs.com/arthurqin/p/6307153.html

Verifiable fuzzy search in cloud storage: http://jeit.ie.ac.cn/fileDZYXXXB/journal/article/dzyxxxb/2017/7/PDF/160971.pdf

Conclusion

Naïve methods are only viable for tiny datasets and should be avoided in production. Conventional approaches—especially the token‑based method (Conventional 2)—offer a practical balance of security, implementation effort, and query performance for most applications. When high security and performance are critical and expertise is available, advanced algorithmic solutions can be explored.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

query optimizationfuzzy-searchencryptiondata security
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.