How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons

This article explores the challenges of fuzzy searching encrypted data, categorizes three implementation approaches—naïve, conventional, and advanced—examines their trade‑offs, provides practical examples and performance calculations, and offers recommendations for secure and efficient query solutions.

Architecture Digest
Architecture Digest
Architecture Digest
How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons

How to Perform Fuzzy Searches on Encrypted Data

We know that encrypted data is not friendly to fuzzy queries. This article discusses the problem and presents three categories of solutions.

Three categories of approaches

Naïve approach (no deep thinking, just implement the feature)

Conventional approach (consider performance, may use extra storage)

Advanced approach (algorithm‑level design)

Naïve approach

Method 1: Load all data into memory, decrypt, then perform fuzzy matching

This works for small data sets but quickly runs out of memory for larger volumes. Example: a phone number 13800138000 encrypted with DES becomes HE9T75xNx6c5yLmS5l4r6Q==, which occupies 24 bytes. A table of memory consumption shows that 1 million records require ~22.9 MB, 10 million require ~228.9 MB, and 100 million require ~2.3 GB, leading to out‑of‑memory failures.

Method 2: Create a plaintext mapping (tag) table and query the tag

This defeats the purpose of encryption because the plaintext mapping must be stored, compromising security and adding unnecessary complexity.

Conventional approach

Option 1: Use database encryption functions and modify the fuzzy condition

Example query: decode(key) like '%partial%'. This is easy to implement but cannot use indexes and may not guarantee algorithm consistency across platforms.

Option 2: Tokenize the ciphertext, encrypt each token, store in an auxiliary column, and query with key like '%partial%'

Data is split into fixed‑length groups (e.g., 4 English characters or 2 Chinese characters). For the string ningyu1, groups are ning, ingy, ngyu, gyu1. Searching for the token ingy uses a LIKE query on the encrypted tokens.

Encryption increases storage size; using DES, an 11‑byte plaintext becomes a 24‑byte ciphertext (2.18× growth). The method works when the fuzzy term is at least 4 characters (or 2 Chinese characters); shorter terms are not recommended.

Advanced approach

These solutions involve algorithmic research, such as designing new reversible encryption schemes, Bloom‑filter‑based fuzzy search, or leveraging specialized cryptographic methods. References include academic papers on Hill cipher, FMES, and Bloom‑filter improvements.

Conclusion

Naïve methods are discouraged. The conventional option 2 offers a good balance of security, performance, and implementation cost, while the advanced approach is suitable for teams with strong algorithm expertise.

Performancedatabasesecurityfuzzy-searchencryption
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.