Six Common Data Masking Techniques: From Simple String Replacement to K‑Anonymity
This article presents six practical data‑desensitization methods—including string replacement, encryption, database masking, cache‑based replacement, dynamic AOP masking, and K‑anonymity—explains their principles, shows Java implementations, compares security, performance, and reversibility, and offers concrete recommendations for protecting sensitive fields such as phone numbers and ID cards.
When a new colleague directly copied production data containing sensitive fields (phone numbers, ID cards) into a test environment, the company faced criticism, highlighting the critical need for data masking.
Solution 1: String Replacement (Bronze Level) – Uses regular expressions to replace parts of a string. Example implementation masks mobile numbers and ID cards:
public void syncUserToTest(User user) {
testDB.insert(user); // contains phone, ID, etc.
}
public class StringMasker {
// 手机号脱敏:13812345678 → 138****5678
public static String maskMobile(String mobile) {
return mobile.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
}
// 身份证脱敏:110101199003077777 → 1101********7777
public static String maskIdCard(String idCard) {
if (idCard.length() == 18) {
return idCard.replaceAll("(\\d{4})\\d{10}(\\w{4})", "$1****$2");
}
return idCard; // handle 15‑digit old IDs
}
}Advantages: simple, high performance (O(n)). Drawbacks: irreversible, regex must handle international formats, risk of pattern cracking.
Solution 2: Encryption Algorithms (Silver Level) – Applies symmetric (AES/GCM) or asymmetric (RSA) encryption to protect data at rest. Example AES encryptor:
public class AESEncryptor {
private static final String ALGORITHM = "AES/GCM/NoPadding";
private static final int TAG_LENGTH = 128; // authentication tag length
public static String encrypt(String plaintext, SecretKey key) throws Exception {
byte[] iv = new byte[12]; // GCM recommends 12‑byte IV
SecureRandom random = new SecureRandom();
random.nextBytes(iv);
Cipher cipher = Cipher.getInstance(ALGORITHM);
cipher.init(Cipher.ENCRYPT_MODE, key, new GCMParameterSpec(TAG_LENGTH, iv));
byte[] ciphertext = cipher.doFinal(plaintext.getBytes(StandardCharsets.UTF_8));
return Base64.getEncoder().encodeToString(iv) + ":" + Base64.getEncoder().encodeToString(ciphertext);
}
// decryption method omitted for brevity
}Advantages: strong security, reversible. Drawbacks: key management complexity and performance overhead.
Solution 3: Database Masking (Gold Level) – Creates masked views or uses column‑level permissions. Example view definition:
CREATE VIEW masked_customers AS
SELECT id,
CONCAT(SUBSTR(name,1,1), '***') AS name,
CONCAT(SUBSTR(mobile,1,3), '****', SUBSTR(mobile,8,4)) AS mobile
FROM customers;
GRANT SELECT (id, name, mobile) ON masked_customers TO test_user;Provides partial reversibility and works transparently for queries.
Solution 4: Data Replacement with Cache (Platinum Level) – Stores original‑to‑masked mappings in an LRU cache (Guava) for fast lookup.
LoadingCache
dataMapping = CacheBuilder.newBuilder()
.maximumSize(100000)
.expireAfterAccess(30, TimeUnit.MINUTES)
.build(new CacheLoader
() {
public String load(String key) {
return UUID.randomUUID().toString().replace("-", "");
}
});
public String replaceData(String original) {
return dataMapping.get(original);
}Suitable for generating large volumes of test data with reversible mapping.
Solution 5: Dynamic Masking via Spring AOP (Diamond Level) – Uses a custom annotation and an aspect to mask fields at runtime.
@Aspect
@Component
public class DataMaskAspect {
@Around("@annotation(requiresMasking)")
public Object maskData(ProceedingJoinPoint joinPoint, RequiresMasking requiresMasking) throws Throwable {
Object result = joinPoint.proceed();
return mask(result, requiresMasking.type());
}
private Object mask(Object data, MaskType type) {
if (data instanceof User) {
User user = (User) data;
switch (type) {
case MOBILE:
user.setMobile(MaskUtil.maskMobile(user.getMobile()));
break;
case ID_CARD:
user.setIdCard(MaskUtil.maskIdCard(user.getIdCard()));
break;
}
}
return data;
}
}Offers configurable, reversible masking with minimal code changes.
Solution 6: K‑Anonymity (King Level) – Generalizes quasi‑identifiers (e.g., age) into ranges so that each group contains at least K records, making re‑identification difficult.
public class KAnonymity {
public static String generalizeAge(int age) {
int range = 10; // K = 10
int lower = (age / range) * range;
int upper = lower + range - 1;
return lower + "-" + upper;
}
}Provides the highest privacy guarantee for datasets such as medical records, at the cost of data utility.
Summary Table compares each method on security, performance, reversibility, and suitable scenarios (logging, payment storage, database queries, test data generation, production queries, medical data).
Core recommendations from the author:
Classify and grade data, applying appropriate masking strategies per level.
Conduct regular automated audits to detect sensitive data leaks.
Adopt the data minimization principle: avoid collecting unnecessary sensitive information.
Overall, the article equips developers with a toolbox of practical masking techniques, guiding them to choose the right balance between privacy, performance, and reversibility for their specific use cases.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.