How Data Masking Protects Your Users: Techniques & Best Practices
Data masking, also known as data desensitization, transforms sensitive information such as phone numbers and ID numbers using static and dynamic methods—including truncation, randomization, replacement, encryption, and averaging—to prevent privacy breaches while preserving data utility for testing, analysis, and production environments.
After receiving strange phone calls that exposed personal data, the author reflects on how personal information is often leaked by insiders, emphasizing the need for developers to prevent such breaches.
One effective measure is data masking (data desensitization), which modifies or replaces sensitive fields before they are used in insecure environments.
What Is Data Masking
Data masking, also called data de‑identification, applies predefined rules to transform sensitive data such as
phone number,
bank card number, or
ID numberso that the original values cannot be recovered in untrusted contexts.
Government, healthcare, finance, and telecom sectors were early adopters because the impact of leaks is severe. In everyday e‑commerce, platforms like Taobao hide parts of personal data with asterisks (
*) to protect privacy.
Static Data Masking
Static Data Masking (SDM) extracts production data, masks it, and then distributes the sanitized copy to testing, development, training, or analytics environments. This isolates sensitive information while still supporting business needs.
During masking, fields such as
name,
phone number,
ID number, and
bank card numberare processed with techniques like
replacement,
nullification,
scrambling,
reordering, or
symmetric encryption.
Dynamic Data Masking
Dynamic Data Masking (DDM) works in production, masking data in real time based on the requester’s role or permission level, ensuring that different users see appropriately masked values.
Note: While removing sensitive content, the masked data must retain its original characteristics, business rules, and relationships so that development, testing, and analytics remain unaffected.
Data Masking Techniques
1. Nullification
Nullification replaces sensitive values with truncation, encryption, or hiding symbols (e.g.,
*). This simple method obscures the real value but may hide the data format from users.
For example, an ID number "220724******3523" masks the middle digits.
2. Random Values
Randomization substitutes each character with a random letter or digit, preserving the original format while making the data appear realistic.
Fields like
nameand
idnumbercan be randomized, though name randomization requires a surname dictionary.
3. Data Replacement
Replacement swaps the original value with a predefined dummy value, such as setting every phone number to "13651300000".
4. Symmetric Encryption
Symmetric encryption encrypts the data with a key, producing ciphertext that retains the original format; decryption restores the original value, so key management is critical.
5. Averaging
For numeric fields, the average value is computed and each masked value is generated around that average, preserving the total sum while obscuring individual entries.
6. Offset & Rounding
This method adds a random offset and rounds numbers, keeping values realistic while altering dates or timestamps (e.g., changing
2020-12-08 15:12:25to
2018-01-02 15:00:00).
Conclusion
Whether using static or dynamic masking, the goal is to prevent internal misuse of private data and stop unmasked information from leaking out of an organization, a fundamental responsibility for any developer.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.