Transparent Data Masking with Apache ShardingSphere for New and Legacy Apps
Apache ShardingSphere provides a complete, transparent, low‑cost data masking solution that lets both new and existing applications encrypt sensitive fields without modifying business SQL, using Encrypt‑JDBC or Encrypt‑Proxy, configurable encryption rules, and step‑by‑step migration guidance for seamless, secure database transformations.
Background
Data security and masking are critical for internet companies and traditional industries. Data masking transforms sensitive information (ID numbers, phone numbers, etc.) to protect privacy. Businesses often need to implement encryption without changing existing SQL logic, and need transparent, low‑risk migration.
ShardingSphere Overview
Apache ShardingSphere is an open‑source distributed database middleware ecosystem consisting of Sharding‑JDBC, Sharding‑Proxy, and a planned Sharding‑Sidecar. It provides data sharding, distributed transactions, and governance. The data masking module is part of ShardingSphere’s distributed governance.
Encrypt‑JDBC intercepts SQL, rewrites it according to user‑defined masking rules, stores ciphertext (and optionally plaintext) in the underlying database, and decrypts data on query, making masking transparent to applications.
Requirement Scenarios
New applications: security teams require sensitive fields (e.g., bank account, phone) to be encrypted at rest; no historical data to clean.
Existing applications: large volumes of plaintext data need encryption, and new data must be encrypted without changing business SQL.
Masking Process Overview
Encrypt‑JDBC works as a bridge between business code and the database, parsing SQL, applying encryption/decryption based on the masking configuration, and interacting with the database.
Masking Configuration
The configuration consists of four parts: data source, encryptor, table (masking) definition, and query properties.
Data source: defines the JDBC connection.
Encryptor: built‑in AES or MD5, or custom implementations.
Table configuration: maps logical column (used by SQL) to plainColumn (stores plaintext) and cipherColumn (stores ciphertext).
Query property: decides whether queries return plaintext or ciphertext.
Example YAML for a new application:
encryptRule:
encryptors:
aes_encryptor:
type: aes
props:
aes.key.value: 123456abc
tables:
t_user:
columns:
pwd:
cipherColumn: pwd
encryptor: aes_encryptorSolution for New Applications
Configure an AES encryptor and map the logical column to the cipher column. The business SQL uses the logical column; Encrypt‑JDBC handles encryption/decryption automatically.
Result: only ciphertext is stored; plaintext can be stored optionally by adding plainColumn.
Solution for Existing Applications
Three‑step migration:
Before migration: add a cipherColumn (e.g., pwd_cipher) to the table, keep plainColumn (pwd) for existing data.
During migration: use Encrypt‑JDBC to encrypt new writes to both columns; manually encrypt historical plaintext to cipherColumn.
After migration: switch query.with.cipher.column to true, so queries return decrypted data from the cipher column while still writing plaintext to the plain column for rollback capability.
Configuration for the migration phase (YAML):
encryptRule:
encryptors:
aes_encryptor:
type: aes
props:
aes.key.value: 123456abc
tables:
t_user:
columns:
pwd:
plainColumn: pwd
cipherColumn: pwd_cipher
encryptor: aes_encryptor
props:
query.with.cipher.column: trueAdvantages of ShardingSphere Masking
Automated and transparent masking; no code changes required.
Multiple built‑in and third‑party encryption strategies.
Customizable masking APIs for user‑defined algorithms.
Supports switching between masking strategies.
Allows simultaneous storage of plaintext and ciphertext for seamless migration.
Applicable Scenarios
Java‑based projects.
Back‑end databases such as MySQL, Oracle, PostgreSQL, SQL Server.
Need to mask one or more columns.
Compatible with standard SQL.
Limitations
Users must handle historical data cleaning themselves.
Some special SQL statements are not supported when using masking together with sharding.
Masked columns cannot be used in comparison, ordering, range, or LIKE operations.
Aggregations (AVG, SUM) on masked columns are not supported.
Future Directions
ShardingSphere also offers Encrypt‑Proxy for language‑agnostic access, supporting MySQL and PostgreSQL protocols, allowing tools like Navicat or command‑line clients to connect to a virtual masked database.
Conclusion
ShardingSphere’s data masking module provides a low‑cost, transparent solution for both new and legacy applications, enabling secure data handling without altering business SQL, and integrates with other ShardingSphere capabilities such as sharding, read/write splitting, and distributed transactions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
