Databases 11 min read

How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons

This article examines three categories of techniques—naïve, conventional, and advanced—for enabling fuzzy queries on encrypted data, comparing their implementation steps, performance impact, storage overhead, and security trade‑offs, and provides practical examples and reference links for further study.

Architect
Architect
Architect
How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros & Cons

How to Perform Fuzzy Searches on Encrypted Data

Encrypted fields such as passwords, phone numbers, addresses, or bank details protect sensitive information but make fuzzy searching difficult. This article categorises the possible solutions into three groups—naïve, conventional, and advanced—explains their implementation ideas, evaluates their advantages and disadvantages, and offers practical guidance.

Naïve Approaches

Load all encrypted records into memory, decrypt them, and perform fuzzy matching in application code.

Create a plaintext "tag" table that maps encrypted values to their clear‑text equivalents, then query the tag table with fuzzy conditions.

These methods are only viable for very small datasets (hundreds or thousands of rows). For example, encrypting the phone number 13800138000 with DES yields a 24‑byte ciphertext. Storing one million such rows would require roughly 23 GB of memory, and ten million rows would exceed 200 GB, leading to out‑of‑memory failures.

Conventional Approaches

Implement encryption/decryption functions inside the database and modify fuzzy conditions to decrypt on the fly, e.g., decode(key) LIKE '%partial%'.

Tokenise the plaintext, encrypt each token, store the encrypted tokens in an auxiliary column, and query with key LIKE '%partial%'.

The first method is simple and low‑cost but cannot use indexes, so query performance suffers. It works when the encryption algorithm used by the application is also supported by the database (e.g., AES, DES). The second method stores encrypted token strings; a typical tokenisation splits a field into fixed‑length groups (e.g., four English characters or two Chinese characters). For the string ningyu1, the groups would be ning, ingy, ngyu, gyu1. Queries then match encrypted tokens that contain the desired substring. This approach increases storage (ciphertext length grows, e.g., DES expands 11 bytes to 24 bytes, a 2.18× increase) but allows index usage and acceptable performance for moderate data volumes.

Advanced (Algorithmic) Approaches

Design or adopt specialised algorithms that enable fuzzy matching on ciphertext without excessive length growth, such as Bloom‑filter‑based schemes, FMES, or Hill‑cipher variations.

Store encrypted tokens in a search engine like Elasticsearch or a custom index that preserves order and supports fuzzy queries.

These solutions require deep cryptographic expertise and often involve creating new algorithms that keep ciphertext length close to the plaintext length while preserving order for fuzzy matching. References include research papers on Bloom‑filter‑enhanced encrypted search, Hill‑cipher fuzzy matching, and cloud‑storage‑compatible verifiable fuzzy query schemes.

Conclusion

Naïve methods are discouraged except for tiny datasets. Conventional token‑based approaches offer a balanced trade‑off between security, storage cost, and query performance and are recommended for most practical scenarios. When a team has strong cryptographic capabilities, advanced algorithmic solutions can provide superior performance without sacrificing security, though they demand significant development effort.

performancedatabaseSecurityfuzzy-searchencrypted datatokenisation
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.