How to Perform Fuzzy Queries on Encrypted Data
This article reviews why reversible encryption is needed for certain sensitive fields, classifies three categories of fuzzy‑search‑on‑encrypted‑data techniques—naïve, conventional, and advanced—and evaluates their implementation steps, performance trade‑offs, and practical recommendations.
How to Perform Fuzzy Queries on Encrypted Data
During development we often encrypt important data such as passwords, phone numbers, addresses, bank card numbers, and verification codes. Passwords are stored with irreversible slow‑hash algorithms to resist brute‑force attacks, while other fields (e.g., phone numbers) require reversible encryption and support for fuzzy search.
When retrieving data we can match ciphertext directly for exact look‑ups, but phone numbers need to be decrypted and also support partial matching. This article examines practical ways to enable fuzzy queries on reversible‑encrypted data.
Classification of Approaches
Naïve approaches (quick but insecure)
Conventional approaches (balanced performance and security)
Advanced approaches (algorithm‑level designs)
Naïve Approaches
Load all encrypted records into memory, decrypt them, and perform fuzzy matching in application code.
Create a plaintext “tag” table that maps ciphertext to its clear‑text value, then query the tag table for fuzzy matches.
Example: the phone number 13800138000 encrypted with DES becomes HE9T75xNx6c5yLmS5l4r6Q== , which occupies 24 bytes. Loading millions of such records can quickly exhaust memory (hundreds of megabytes to several gigabytes).
Conventional Approaches
Two widely used methods are described.
Implement encryption/decryption functions inside the database and modify the fuzzy‑search condition to decrypt on‑the‑fly, e.g., decode(key) LIKE '%partial%' .
Tokenize the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key LIKE '%partial%' . This allows the use of indexes on the auxiliary column.
Method 1 is easy to adopt but cannot leverage indexes and may suffer from algorithm mismatches between application and database. Method 2 adds storage overhead (encrypted tokens increase data size) but enables index‑based search and is recommended when query performance matters.
For example, using DES the original 11‑byte string 13800138000 expands to a 24‑byte ciphertext, a 2.18× increase. Choosing an efficient algorithm can significantly reduce storage cost.
Advanced Approaches
These involve designing new cryptographic schemes that preserve order or enable direct fuzzy matching on ciphertext. Techniques mentioned include Hill cipher‑based methods, FMES, Bloom‑filter‑enhanced encryption, and Lucene‑style tokenization stored in search engines like Elasticsearch.
References for deeper study:
Database character fuzzy‑match encryption methods
Bloom‑filter based encrypted fuzzy search research
Lucene‑based cloud search on encrypted data
Conclusion
The article discourages naïve methods and recommends the second conventional approach (token‑based encrypted columns) as a practical compromise between security, storage cost, and query performance. When specialized algorithm expertise is available, advanced algorithmic solutions can be explored for even better results.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.