How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons
This article examines why encrypted data hinders fuzzy queries and compares three categories of solutions—naïve, conventional, and advanced—detailing their implementation steps, performance trade‑offs, storage costs, and practical suitability for real‑world systems.
When sensitive fields such as phone numbers or bank cards are stored encrypted, traditional fuzzy search becomes difficult; this article categorises and analyses three families of techniques for enabling fuzzy queries on reversible‑encrypted data.
1. Naïve Approaches
Load the entire dataset into memory, decrypt it, and perform fuzzy matching in application code.
Create a plaintext “tag” table that maps each ciphertext to its clear‑text value, then query the tag table with fuzzy conditions.
Naïve 1 works only for very small tables (hundreds to a few thousand rows); otherwise memory consumption explodes because each encrypted field expands (e.g., a DES‑encrypted phone number occupies 24 bytes). This can quickly cause out‑of‑memory failures.
Naïve 2 defeats the purpose of encryption by storing a clear‑text lookup table, exposing the data and adding unnecessary maintenance overhead, so it is strongly discouraged.
2. Conventional Approaches
These methods are widely adopted and balance security with query friendliness.
Implement the same encryption/decryption algorithm as the application inside the database and modify the fuzzy condition to decode(key) LIKE '%partial%'.
Tokenise the ciphertext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key LIKE '%partial%'.
Conventional 1 is easy to adopt and requires only minor changes to existing queries, but it cannot leverage indexes and may suffer from algorithm mismatches between the application and the database.
Conventional 2 splits a field into fixed‑length segments (e.g., four English characters or two Chinese characters) before encryption. For example, the string ningyu1 becomes the token groups ning, ingy, ngyu, gyu1. Queries then match encrypted tokens using LIKE '%partial%'. This approach incurs storage overhead because encrypted data expands (DES expands 11 bytes to 24 bytes, a 2.18× increase), but it allows index utilisation and acceptable performance for moderate data volumes.
When the token length is too short (less than four English characters or two Chinese characters), the number of generated tokens grows dramatically, raising storage costs and reducing security.
Reference implementations from major e‑commerce platforms illustrate this technique:
Taobao: https://open.taobao.com/docV3.htm?docId=106213&docType=1
Alibaba: https://jaq-doc.alibaba.com/docs/doc.htm?treeId=1&articleId=106213&docType=1
Pinduoduo: https://open.pinduoduo.com/application/document/browse?idStr=3407B605226E77F2
JD.com: https://jos.jd.com/commondoc?listId=345
3. Advanced (Algorithmic) Approaches
These solutions require deep cryptographic research and often involve designing new algorithms that preserve order and limit ciphertext growth while supporting fuzzy matching.
Design a reversible encryption scheme where ciphertext retains the same ordering as plaintext, enabling direct fuzzy matching on encrypted values.
Relevant research and blog posts include:
Database character fuzzy‑match encryption methods: https://www.jiamisoft.com/blog/6542-zifushujumohupipeijiamifangfa.html
Hill cipher and FMES fuzzy encryption: (see discussion in the linked article)
Bloom‑filter‑based encrypted fuzzy search: http://kzyjc.cnjournals.com/html/2019/1/20190112.htm
Fast‑query encrypted databases: https://www.jiamisoft.com/blog/5961-kuaisuchaxunshujukujiami.html
Lucene‑based encrypted fuzzy search: https://www.cnblogs.com/arthurqin/p/6307153.html
Verifiable fuzzy search in cloud storage: http://jeit.ie.ac.cn/fileDZYXXXB/journal/article/dzyxxxb/2017/7/PDF/160971.pdf
Conclusion
Naïve methods are only viable for tiny datasets and should be avoided in production. Conventional approaches—especially the token‑based method (Conventional 2)—offer a practical balance of security, implementation effort, and query performance for most applications. When high security and performance are critical and expertise is available, advanced algorithmic solutions can be explored.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
