Information Security 11 min read

How to Perform Fuzzy Queries on Encrypted Data

This article reviews why reversible encryption is needed for certain sensitive fields, classifies three categories of fuzzy‑search‑on‑encrypted‑data techniques—naïve, conventional, and advanced—and evaluates their implementation steps, performance trade‑offs, and practical recommendations.

Architect's Guide

Nov 30, 2022

How to Perform Fuzzy Queries on Encrypted Data

During development we often encrypt important data such as passwords, phone numbers, addresses, bank card numbers, and verification codes. Passwords are stored with irreversible slow‑hash algorithms to resist brute‑force attacks, while other fields (e.g., phone numbers) require reversible encryption and support for fuzzy search.

When retrieving data we can match ciphertext directly for exact look‑ups, but phone numbers need to be decrypted and also support partial matching. This article examines practical ways to enable fuzzy queries on reversible‑encrypted data.

Classification of Approaches

Naïve approaches (quick but insecure)

Conventional approaches (balanced performance and security)

Advanced approaches (algorithm‑level designs)

Naïve Approaches

Load all encrypted records into memory, decrypt them, and perform fuzzy matching in application code.

Create a plaintext “tag” table that maps ciphertext to its clear‑text value, then query the tag table for fuzzy matches.

Example: the phone number 13800138000 encrypted with DES becomes HE9T75xNx6c5yLmS5l4r6Q==, which occupies 24 bytes. Loading millions of such records can quickly exhaust memory (hundreds of megabytes to several gigabytes).

Conventional Approaches

Two widely used methods are described.

Implement encryption/decryption functions inside the database and modify the fuzzy‑search condition to decrypt on‑the‑fly, e.g., decode(key) LIKE '%partial%'.

Tokenize the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key LIKE '%partial%'. This allows the use of indexes on the auxiliary column.

Method 1 is easy to adopt but cannot leverage indexes and may suffer from algorithm mismatches between application and database. Method 2 adds storage overhead (encrypted tokens increase data size) but enables index‑based search and is recommended when query performance matters.

For example, using DES the original 11‑byte string 13800138000 expands to a 24‑byte ciphertext, a 2.18× increase. Choosing an efficient algorithm can significantly reduce storage cost.

Advanced Approaches

These involve designing new cryptographic schemes that preserve order or enable direct fuzzy matching on ciphertext. Techniques mentioned include Hill cipher‑based methods, FMES, Bloom‑filter‑enhanced encryption, and Lucene‑style tokenization stored in search engines like Elasticsearch.

References for deeper study:

Database character fuzzy‑match encryption methods

Bloom‑filter based encrypted fuzzy search research

Lucene‑based cloud search on encrypted data

Conclusion

The article discourages naïve methods and recommends the second conventional approach (token‑based encrypted columns) as a practical compromise between security, storage cost, and query performance. When specialized algorithm expertise is available, advanced algorithmic solutions can be explored for even better results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm Database Fuzzy Search Encryption Data Security AES

Written by

Architect's Guide

Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.