6 Practical Data Masking Techniques to Secure Sensitive Information
This article presents six widely used data masking solutions—from simple regex string replacement to advanced K‑anonymity—detailing their principles, Java implementations, pros and cons, performance impact, and suitable application scenarios, helping developers protect sensitive data in production and test environments.
Introduction
A new colleague once synchronized production data containing phone numbers and ID numbers directly to the test environment, which led to criticism from management and highlighted the critical importance of data masking.
Solution 1: String Replacement (Bronze)
Technical principle : Use regular expressions to replace parts of sensitive fields.
Typical code implementation :
public class StringMasker {
// Mobile masking: 13812345678 → 138****5678
public static String maskMobile(String mobile) {
return mobile.replaceAll("(\\d{3})\\d{4}(\\d{4})", "$1****$2");
}
// ID card masking: 110101199003077777 → 1101********7777
public static String maskIdCard(String idCard) {
if (idCard.length() == 18) {
return idCard.replaceAll("(\\d{4})\\d{10}(\\w{4})", "$1****$2");
}
return idCard; // handle 15‑digit old IDs elsewhere
}
}Advantages: simple to implement, high performance (O(n)).
Disadvantages: irreversible, regex must handle multiple country formats, pattern can be cracked.
Solution 2: Encryption Algorithm (Silver)
Algorithm selection :
Symmetric encryption – AES – fast encryption/decryption, complex key management – suitable for payment information storage.
Asymmetric encryption – RSA – slower but high security – suitable for key exchange.
National standard – SM4 – complies with Chinese standards – suitable for government/financial systems.
Full implementation example :
public class AESEncryptor {
private static final String ALGORITHM = "AES/GCM/NoPadding";
private static final int TAG_LENGTH = 128; // authentication tag length
public static String encrypt(String plaintext, SecretKey key) throws Exception {
byte[] iv = new byte[12]; // GCM recommends 12‑byte IV
SecureRandom random = new SecureRandom();
random.nextBytes(iv);
Cipher cipher = Cipher.getInstance(ALGORITHM);
cipher.init(Cipher.ENCRYPT_MODE, key, new GCMParameterSpec(TAG_LENGTH, iv));
byte[] ciphertext = cipher.doFinal(plaintext.getBytes(StandardCharsets.UTF_8));
return Base64.getEncoder().encodeToString(iv) + ":" +
Base64.getEncoder().encodeToString(ciphertext);
}
// Decryption method omitted for brevity
}Key‑management comparison :
Solution 3: Data Masking (Gold)
Database‑level masking implementation :
-- Create a masked view
CREATE VIEW masked_customers AS
SELECT
id,
CONCAT(SUBSTR(name,1,1), '***') AS name,
CONCAT(SUBSTR(mobile,1,3), '****', SUBSTR(mobile,8,4)) AS mobile
FROM customers;
-- Grant column‑level permissions
GRANT SELECT (id, name, mobile) ON masked_customers TO test_user;The view masks key fields while allowing normal queries to retrieve masked data.
Solution 4: Data Replacement (Platinum)
Original and masked data are cached to enable fast conversion.
Mapping table design :
LoadingCache<String, String> dataMapping = CacheBuilder.newBuilder()
.maximumSize(100000)
.expireAfterAccess(30, TimeUnit.MINUTES)
.build(new CacheLoader<String, String>() {
public String load(String key) {
return UUID.randomUUID().toString().replace("-", "");
}
});
public String replaceData(String original) {
return dataMapping.get(original);
}Solution 5: Dynamic Masking (Diamond)
Application‑layer implementation (Spring AOP example) :
@Aspect
@Component
public class DataMaskAspect {
@Around("@annotation(requiresMasking)")
public Object maskData(ProceedingJoinPoint joinPoint, RequiresMasking requiresMasking) throws Throwable {
Object result = joinPoint.proceed();
return mask(result, requiresMasking.type());
}
private Object mask(Object data, MaskType type) {
if (data instanceof User) {
User user = (User) data;
switch (type) {
case MOBILE:
user.setMobile(MaskUtil.maskMobile(user.getMobile()));
break;
case ID_CARD:
user.setIdCard(MaskUtil.maskIdCard(user.getIdCard()));
break;
}
}
return data;
}
}Annotate fields with @RequiresMasking; the AOP interceptor masks data at runtime.
Solution 6: K‑Anonymity (King)
Principle
When publishing hospital visit data, setting K=3 means each record shares the same combination of quasi‑identifiers (e.g., age = 25, gender = male), preventing attackers from pinpointing an individual’s disease.
Sample data (age, gender, disease) shows three identical rows, achieving 3‑anonymity.
Implementation steps
Medical data generalization example :
public class KAnonymity {
// Age generalization: exact value → range
public static String generalizeAge(int age) {
int range = 10; // K = 10
int lower = (age / range) * range;
int upper = lower + range - 1;
return lower + "-" + upper;
}
}For an input age of 28, the method returns "20-29".
Summary
Below is a concise comparison of the six schemes:
String Replacement – ★★ security, ★★★★ performance, irreversible, suitable for logs/display.
Encryption Algorithm – ★★★★ security, ★★ performance, reversible, suitable for payment info storage.
Data Masking – ★★★ security, ★★★ performance, partially reversible, suitable for database queries.
Data Replacement – ★★★★ security, ★★ performance, reversible, suitable for test data generation.
Dynamic Masking – ★★★★ security, ★★★ performance, dynamically controllable, suitable for production queries.
K‑Anonymity – ★★★★★ security, ★ performance, irreversible, suitable for medical/location data.
Three core recommendations :
Classify and grade data, applying different masking strategies per level.
Conduct regular audits using automated tools to scan for sensitive data leaks.
Adopt the minimization principle: do not collect sensitive data unless absolutely necessary.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
