How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons
This article examines why encrypted data hinders fuzzy queries, categorizes three implementation strategies—from naïve to conventional to advanced—explains their mechanisms, evaluates performance and security trade‑offs, and provides practical references for building searchable encrypted fields.
How to Perform Fuzzy Queries on Encrypted Data
Encrypted data is not friendly to fuzzy search. This article discusses the problem of fuzzy querying encrypted data and presents implementation ideas to inspire developers.
In practice we often encrypt sensitive fields such as passwords, phone numbers, addresses, bank cards, etc. Passwords use irreversible slow‑hash algorithms, while reversible encryption is needed for fields that must be displayed and searched, like phone numbers.
Many online posts propose solutions, some of which are unreliable. Below we classify the approaches into three categories and analyze their advantages and disadvantages.
Naïve Approaches
Load all data into memory, decrypt it, and perform fuzzy matching in the application.
Create a plaintext mapping table (tag table) for ciphertext and query the tag table.
Naïve Approach 1
Loading all data into memory works only for small datasets. For example, the plaintext phone number 13800138000 encrypted with DES becomes HE9T75xNx6c5yLmS5l4r6Q==, which occupies 24 bytes. With millions of records this quickly leads to out‑of‑memory errors.
Naïve Approach 2
Maintaining a plaintext mapping table defeats the purpose of encryption, exposing the data and adding unnecessary complexity. This method is insecure and not recommended.
Conventional Approaches
Implement encryption/decryption functions in the database and modify fuzzy queries to use decode(key) LIKE '%partial%'.
Tokenize the ciphertext into fixed‑length segments, encrypt each segment, store them in an auxiliary column, and query with key LIKE '%partial%'.
Conventional Approach 1
Store the same encryption algorithm in the database, decrypt during query, and then apply fuzzy matching. This is easy to implement but cannot leverage indexes and may suffer performance penalties.
If query performance is not critical and security requirements are moderate, using standard algorithms like AES or DES is acceptable.
Conventional Approach 2
Split the plaintext into fixed‑length tokens (e.g., every 4 English characters or 2 Chinese characters). For the string ningyu1, the tokens are ning, ingy, ngyu, gyu1, etc. Encrypt each token and store them in an extra column; queries use key LIKE '%partial%'.
ningyu1 uses 4‑character groups: first ning , then ingy , then ngyu , then gyu1 …
Encryption increases data length (e.g., DES expands 11‑byte plaintext to 24‑byte ciphertext, a 2.18× growth). This method requires the fuzzy term to be at least 4 English characters or 2 Chinese characters; shorter terms cause storage blow‑up and reduced security.
Major e‑commerce platforms (Taobao, Alibaba, Pinduoduo, JD) adopt similar schemes for searchable encrypted fields.
ps. Most solutions are essentially the same, often copied from each other.
This approach balances implementation simplicity with acceptable performance, though it incurs extra storage for the auxiliary column.
Advanced Approaches
These methods consider algorithmic design, sometimes creating new encryption schemes that preserve order and allow fuzzy matching without excessive ciphertext growth. They are research‑level and typically require expertise in cryptography.
Relevant literature includes:
A Bloom‑filter‑based improved encrypted text fuzzy search mechanism.
Techniques for fast searchable encryption in databases.
Lucene‑based cloud search over encrypted data.
Some approaches store encrypted tokens in search engines like Elasticsearch instead of relational databases.
Conclusion
We have reviewed three categories of solutions for fuzzy searching encrypted data. Naïve methods are discouraged; conventional methods—especially the token‑based approach—offer a practical trade‑off between security, storage, and performance. Advanced algorithmic solutions are promising but require specialized expertise.
Overall, the token‑based conventional approach (method 2) is highly recommended for most scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
