Databases 12 min read

Techniques for Fuzzy Query on Encrypted Data

This article examines the challenges of performing fuzzy searches on encrypted data and compares three categories of solutions—naïve, conventional, and advanced—detailing their implementation methods, performance trade‑offs, storage costs, and security implications for real‑world applications.

Top Architect
Top Architect
Top Architect
Techniques for Fuzzy Query on Encrypted Data

We know that encrypted data is not friendly to fuzzy queries, so this article explores the thinking and implementation approaches for fuzzy searching encrypted data.

For data security, important fields such as passwords, phone numbers, addresses, bank cards, etc., are often stored encrypted. Passwords typically use irreversible hash algorithms, while reversible encryption is needed for fields like phone numbers that require fuzzy search.

How to Perform Fuzzy Query on Encrypted Data

The methods can be divided into three categories:

Naïve approaches ("silly" methods)

Conventional approaches (balanced performance and storage)

Advanced approaches (algorithm‑level solutions)

Silly Approaches

Load all data into memory, decrypt it, and perform fuzzy matching in the application.

Create a plaintext mapping table ("tag" table) for ciphertext and query the tag table.

Silly Example 1

Loading all data into memory works only for small datasets; for large volumes it leads to out‑of‑memory errors. Example: encrypting 13800138000 with DES yields a 24‑byte ciphertext, so 1 million records consume ~22.9 MB, 10 million records ~228.9 MB, and 100 million records ~2.3 GB.

Silly Example 2

Maintaining a plaintext mapping table defeats the purpose of encryption and is insecure.

Conventional Approaches

Two common methods are:

Implement encryption/decryption functions in the database and modify fuzzy conditions to decode(key) like '%partial%'.

Tokenize the plaintext, encrypt each token, store them in an extra column, and query with key like '%partial%'.

Conventional Example 1

Use database functions to decrypt before fuzzy matching. This is easy to implement but cannot use indexes and may have algorithm compatibility issues.

Typical algorithms include AES and DES. If the company has its own algorithm, extra effort is required.

DES example: plaintext length 11 bytes, ciphertext length 24 bytes (2.18× growth)

Conventional Example 2

Split the ciphertext into fixed‑length groups (e.g., 4 English characters or 2 Chinese characters) and store each encrypted token. Queries search for tokens like ingy using key like "%partial%". This increases storage size (e.g., DES ciphertext is 24 bytes vs. 11‑byte plaintext) but allows index usage.

Typical query length requirements: at least 4 English characters or 2 Chinese characters.

Advanced Approaches

These involve designing new algorithms or using specialized techniques such as Bloom filters, FMES, or Lucene‑based encrypted search. They aim to keep ciphertext length growth low while supporting fuzzy matching, but require deep algorithmic research.

Algorithm‑level design for fuzzy search on ciphertext.

Research papers: Bloom‑filter‑based encrypted fuzzy search, FMES, Hill password handling.

Lucene‑based cloud search on encrypted data (DB vs. ES).

Summary

Naïve methods are not recommended; conventional methods, especially the token‑based approach, offer a good balance of implementation cost, performance, and security. Advanced methods are suitable when specialized algorithm expertise is available.

Overall, the second conventional method is highly recommended for most scenarios.

Welcome to discuss and share viewpoints. For questions, feel free to contact the author.

Additional resources and promotional links are provided at the end of the original article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancedatabaseSecurityfuzzy-searchencryption
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.