Information Security 13 min read

How to Perform Fuzzy Search on Encrypted Data

This article examines the challenges of fuzzy searching encrypted data and compares three implementation approaches—naïve, conventional, and advanced—detailing their principles, performance implications, storage costs, and security trade‑offs, ultimately recommending the conventional token‑based method for most practical applications.

IT Architects Alliance

Aug 4, 2024

How to Perform Fuzzy Search on Encrypted Data

We know that encrypted data is not friendly to fuzzy queries; this article explores ideas for implementing fuzzy search on encrypted data.

For data security, important fields such as passwords, phone numbers, addresses, bank cards are often stored encrypted. Passwords typically use irreversible slow hash algorithms to resist brute‑force attacks.

When retrieving data we can match ciphertext directly, but for fields like phone numbers we need to view the original value and support fuzzy search, so we examine methods for reversible encrypted data.

Many online posts discuss “fuzzy search on encrypted data”, some of which are unreliable; we categorize three types of solutions—naïve, conventional, and advanced—and discuss their pros and cons.

How to Perform Fuzzy Search on Encrypted Data

Naïve Approaches

Load all data into memory, decrypt it, and perform fuzzy matching in the application.

Create a plaintext mapping table (often called a tag table) and query the tags to locate the ciphertext.

Naïve One

For small data sets this can work, but memory consumption grows quickly. For example, the phone number 13800138000 encrypted with DES becomes HE9T75xNx6c5yLmS5l4r6Q==, which occupies 24 bytes.

Count

Bytes

1,000,000

24,000,000

22.89

10,000,000

240,000,000

228.89

100,000,000

2,400,000,000

2,288.89

With hundreds of megabytes to several gigabytes of memory required, this approach easily leads to out‑of‑memory errors for large data sets.

Naïve Two

Creating a plaintext mapping table defeats the purpose of encryption and introduces security and maintenance problems, so this method is strongly discouraged.

Conventional Approaches

Implement encryption/decryption functions in the database and use a query such as decode(key) like '%partial%'.

Tokenize the ciphertext, encrypt each token, store the encrypted tokens in an extra column, and query with key like '%partial%'.

Conventional One

Embedding the same encryption algorithm in the database and modifying fuzzy‑search conditions to decrypt before matching has low implementation cost, but it cannot leverage indexes and may suffer from algorithm mismatches between the application and the database.

If performance requirements are modest and security needs are average, standard algorithms like AES or DES are acceptable.

Conventional Two

This method splits a field into fixed‑length groups (e.g., four English characters or two Chinese characters), encrypts each group, and stores the encrypted groups in an auxiliary column. Queries use key like "%partial%" to match the encrypted tokens.

ningyu1 can be tokenized as ning , ingy , ngyu , gyu1 , etc.

When searching for a substring such as “ingy”, the encrypted token is matched with key like "%partial%". Encryption typically expands data size; using DES, the 11‑byte plaintext 13800138000 becomes a 24‑byte ciphertext, a 2.18× increase.

The token length must be at least four English characters (or two Chinese characters); shorter tokens increase storage overhead and reduce security.

Several e‑commerce platforms (Taobao, Alibaba, Pinduoduo, JD) use similar schemes for encrypted fields while supporting fuzzy search.

Taobao encrypted field search: https://open.taobao.com/docV3.htm?docId=106213&docType=1

Alibaba encrypted field search: https://jaq-doc.alibaba.com/docs/doc.htm?treeId=1&articleId=106213&docType=1

Pinduoduo encrypted field search: https://open.pinduoduo.com/application/document/browse?idStr=3407B605226E77F2

JD encrypted field search: https://jos.jd.com/commondoc?listId=345

This approach balances implementation complexity, storage cost, and the ability to use database indexes, making it a practical choice.

Advanced Approaches

These solutions involve deeper algorithmic research, such as designing new encryption schemes that preserve order or using specialized structures like Bloom filters. References include Hill cipher‑based fuzzy encryption (FMES), BloomFilter‑enhanced encrypted search, and encrypted fuzzy search mechanisms based on Lucene.

Database fuzzy matching encryption methods: https://www.jiamisoft.com/blog/6542-zifushujumohupipeijiamifangfa.html

Hill cipher and FMES: see above link.

BloomFilter‑based encrypted fuzzy search: http://kzyjc.cnjournals.com/html/2019/1/20190112.htm

Fast query encrypted databases: https://www.jiamisoft.com/blog/5961-kuaisuchaxunshujukujiami.html

Lucene‑based cloud search on ciphertext: https://www.cnblogs.com/arthurqin/p/6307153.html

Verifiable fuzzy query encryption in cloud storage: http://jeit.ie.ac.cn/fileDZYXXXB/journal/article/dzyxxxb/2017/7/PDF/160971.pdf

The Lucene‑based idea is similar to Conventional Two, but stores encrypted tokens in a search engine rather than a relational database.

Summary

We have reviewed naïve, conventional, and advanced methods for fuzzy searching encrypted data. Naïve methods are discouraged; conventional methods, especially the token‑based approach, are recommended for most scenarios due to their reasonable trade‑off between cost, performance, and security.

Overall, the second conventional method offers the best balance of implementation effort, storage overhead, and query efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm database Fuzzy Search Encryption information security

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.