6 Practical Data Masking Techniques to Secure Sensitive Information

This article presents six widely used data masking solutions—from simple regex string replacement to advanced K‑anonymity—detailing their principles, Java implementations, pros and cons, performance impact, and suitable application scenarios, helping developers protect sensitive data in production and test environments.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
6 Practical Data Masking Techniques to Secure Sensitive Information

Introduction

A new colleague once synchronized production data containing phone numbers and ID numbers directly to the test environment, which led to criticism from management and highlighted the critical importance of data masking.

Solution 1: String Replacement (Bronze)

Technical principle : Use regular expressions to replace parts of sensitive fields.

Typical code implementation :

public class StringMasker {
    // Mobile masking: 13812345678 → 138****5678
    public static String maskMobile(String mobile) {
        return mobile.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
    }

    // ID card masking: 110101199003077777 → 1101********7777
    public static String maskIdCard(String idCard) {
        if (idCard.length() == 18) {
            return idCard.replaceAll("(\\d{4})\\d{10}(\\w{4})", "$1****$2");
        }
        return idCard; // handle 15‑digit old IDs elsewhere
    }
}

Advantages: simple to implement, high performance (O(n)).

Disadvantages: irreversible, regex must handle multiple country formats, pattern can be cracked.

Solution 2: Encryption Algorithm (Silver)

Algorithm selection :

Symmetric encryption – AES – fast encryption/decryption, complex key management – suitable for payment information storage.

Asymmetric encryption – RSA – slower but high security – suitable for key exchange.

National standard – SM4 – complies with Chinese standards – suitable for government/financial systems.

Full implementation example :

public class AESEncryptor {
    private static final String ALGORITHM = "AES/GCM/NoPadding";
    private static final int TAG_LENGTH = 128; // authentication tag length

    public static String encrypt(String plaintext, SecretKey key) throws Exception {
        byte[] iv = new byte[12]; // GCM recommends 12‑byte IV
        SecureRandom random = new SecureRandom();
        random.nextBytes(iv);
        Cipher cipher = Cipher.getInstance(ALGORITHM);
        cipher.init(Cipher.ENCRYPT_MODE, key, new GCMParameterSpec(TAG_LENGTH, iv));
        byte[] ciphertext = cipher.doFinal(plaintext.getBytes(StandardCharsets.UTF_8));
        return Base64.getEncoder().encodeToString(iv) + ":" +
               Base64.getEncoder().encodeToString(ciphertext);
    }
    // Decryption method omitted for brevity
}

Key‑management comparison :

Solution 3: Data Masking (Gold)

Database‑level masking implementation :

-- Create a masked view
CREATE VIEW masked_customers AS
SELECT
    id,
    CONCAT(SUBSTR(name,1,1), '***') AS name,
    CONCAT(SUBSTR(mobile,1,3), '****', SUBSTR(mobile,8,4)) AS mobile
FROM customers;

-- Grant column‑level permissions
GRANT SELECT (id, name, mobile) ON masked_customers TO test_user;

The view masks key fields while allowing normal queries to retrieve masked data.

Solution 4: Data Replacement (Platinum)

Original and masked data are cached to enable fast conversion.

Mapping table design :

LoadingCache<String, String> dataMapping = CacheBuilder.newBuilder()
    .maximumSize(100000)
    .expireAfterAccess(30, TimeUnit.MINUTES)
    .build(new CacheLoader<String, String>() {
        public String load(String key) {
            return UUID.randomUUID().toString().replace("-", "");
        }
    });

public String replaceData(String original) {
    return dataMapping.get(original);
}

Solution 5: Dynamic Masking (Diamond)

Application‑layer implementation (Spring AOP example) :

@Aspect
@Component
public class DataMaskAspect {
    @Around("@annotation(requiresMasking)")
    public Object maskData(ProceedingJoinPoint joinPoint, RequiresMasking requiresMasking) throws Throwable {
        Object result = joinPoint.proceed();
        return mask(result, requiresMasking.type());
    }

    private Object mask(Object data, MaskType type) {
        if (data instanceof User) {
            User user = (User) data;
            switch (type) {
                case MOBILE:
                    user.setMobile(MaskUtil.maskMobile(user.getMobile()));
                    break;
                case ID_CARD:
                    user.setIdCard(MaskUtil.maskIdCard(user.getIdCard()));
                    break;
            }
        }
        return data;
    }
}

Annotate fields with @RequiresMasking; the AOP interceptor masks data at runtime.

Solution 6: K‑Anonymity (King)

Principle

When publishing hospital visit data, setting K=3 means each record shares the same combination of quasi‑identifiers (e.g., age = 25, gender = male), preventing attackers from pinpointing an individual’s disease.

Sample data (age, gender, disease) shows three identical rows, achieving 3‑anonymity.

Implementation steps

Medical data generalization example :

public class KAnonymity {
    // Age generalization: exact value → range
    public static String generalizeAge(int age) {
        int range = 10; // K = 10
        int lower = (age / range) * range;
        int upper = lower + range - 1;
        return lower + "-" + upper;
    }
}

For an input age of 28, the method returns "20-29".

Summary

Below is a concise comparison of the six schemes:

String Replacement – ★★ security, ★★★★ performance, irreversible, suitable for logs/display.

Encryption Algorithm – ★★★★ security, ★★ performance, reversible, suitable for payment info storage.

Data Masking – ★★★ security, ★★★ performance, partially reversible, suitable for database queries.

Data Replacement – ★★★★ security, ★★ performance, reversible, suitable for test data generation.

Dynamic Masking – ★★★★ security, ★★★ performance, dynamically controllable, suitable for production queries.

K‑Anonymity – ★★★★★ security, ★ performance, irreversible, suitable for medical/location data.

Three core recommendations :

Classify and grade data, applying different masking strategies per level.

Conduct regular audits using automated tools to scan for sensitive data leaks.

Adopt the minimization principle: do not collect sensitive data unless absolutely necessary.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

privacyencryptioninformation securitydata maskingk-anonymity
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.