Information Security 12 min read

How to Perform Fuzzy Queries on Encrypted Data: Methods, Pros and Cons

This article examines the challenges of fuzzy searching encrypted data and presents three categories of solutions—silly, conventional, and advanced—detailing their implementation ideas, performance trade‑offs, storage costs, and practical recommendations for secure yet searchable data.

Architect's Guide
Architect's Guide
Architect's Guide
How to Perform Fuzzy Queries on Encrypted Data: Methods, Pros and Cons

In the previous article we discussed data security and how to prevent data leaks; this follow‑up focuses on the problem of fuzzy querying encrypted data and explores practical implementation ideas.

To protect sensitive information such as phone numbers, addresses, bank cards, etc., developers often encrypt these fields. Passwords are usually stored with irreversible slow hash functions, while other data may require reversible encryption to support lookup and fuzzy search.

How to Perform Fuzzy Queries on Encrypted Data

The approaches can be divided into three categories:

Silly approaches – naive implementations without proper design.

Conventional approaches – widely used methods that balance performance and security.

Advanced approaches – algorithm‑level designs that aim for optimal security and efficiency.

Silly Approaches

Load all data into memory, decrypt it, and perform fuzzy matching in the application.

Create a plaintext mapping (tag) table for the ciphertext and query the tag table for fuzzy matches.

Silly Approach 1

Decrypting the entire dataset in memory works only for very small tables; for larger volumes it quickly leads to out‑of‑memory failures. For example, the phone number 13800138000 encrypted with DES becomes HE9T75xNx6c5yLmS5l4r6Q== , which occupies 24 bytes. Hundreds of megabytes to several gigabytes of data can exhaust application memory.

Silly Approach 2

Maintaining a separate plaintext mapping table defeats the purpose of encryption, exposing the data and adding unnecessary complexity.

Conventional Approaches

Two common methods are widely adopted:

Implement encryption/decryption functions in the database and use them in fuzzy queries, e.g., decode(key) like '%partial%' .

Tokenize the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key like '%partial%' .

Conventional Approach 1

Using database functions to decrypt before applying a LIKE condition is easy to implement but cannot leverage indexes, leading to poor performance. It works when security requirements are moderate and query performance is not critical.

Conventional Approach 2

This method splits a field into fixed‑length groups (e.g., four English characters or two Chinese characters), encrypts each group, and stores the results in an extra column. Queries then match encrypted tokens using key like "%partial%" . While it increases storage (ciphertext length grows, e.g., DES expands 11‑byte plaintext to 24‑byte ciphertext, a 2.18× increase), it allows index usage and offers a reasonable trade‑off.

Typical implementations require the fuzzy token length to be at least four English characters or two Chinese characters; shorter tokens would cause excessive token explosion and higher storage costs.

Several e‑commerce platforms use similar schemes:

Taobao encrypted field search: https://open.taobao.com/docV3.htm?docId=106213&docType=1

Alibaba encrypted field search: https://jaq-doc.alibaba.com/docs/doc.htm?treeId=1&articleId=106213&docType=1

Pinduoduo encrypted field search: https://open.pinduoduo.com/application/document/browse?idStr=3407B605226E77F2

JD encrypted field search: https://jos.jd.com/commondoc?listId=345

Most of these solutions are essentially the same, often copying each other’s formats.

The method is relatively simple to implement and, despite the extra storage, can benefit from database indexes, making it a recommended compromise.

Advanced Approaches

These solutions involve deeper algorithmic research, sometimes designing new encryption schemes that preserve order or enable fuzzy matching without excessive ciphertext growth. Examples include Hill cipher‑based methods, FMES, Bloom‑filter‑enhanced searchable encryption, and other academic proposals.

Database character fuzzy match encryption: https://www.jiamisoft.com/blog/6542-zifushujumohupipeijiamifangfa.html

Bloom‑filter based encrypted fuzzy search research: http://kzyjc.cnjournals.com/html/2019/1/20190112.htm

Fast searchable encrypted databases: https://www.jiamisoft.com/blog/5961-kuaisuchaxunshujukujiami.html

Lucene‑based cloud search on ciphertext: https://www.cnblogs.com/arthurqin/p/6307153.html

Verified fuzzy searchable encryption in cloud storage: http://jeit.ie.ac.cn/fileDZYXXXB/journal/article/dzyxxxb/2017/7/PDF/160971.pdf

These approaches often require specialized expertise and may not be ready for direct production use, but they illustrate the frontier of searchable encryption.

Conclusion

We have reviewed all major strategies for searching encrypted data. Silly approaches are discouraged; conventional approaches—especially the token‑based method—offer a practical balance of security, cost, and performance. Advanced algorithmic solutions are worth exploring when you have dedicated security engineers.

Overall, considering investment‑to‑return ratio and implementation effort, Conventional Approach 2 is highly recommended.

Author: Ningyu‑Yun Source: ningyu1.github.io/20201230/encrypted-data-fuzzy-query.html Copyright: Content is shared for learning purposes; original author retains rights.
AlgorithmDatabasefuzzy searchencryptiondata security
Architect's Guide
Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.