Transparent Data Masking with Apache ShardingSphere for New and Legacy Apps

Apache ShardingSphere provides a complete, transparent, low‑cost data masking solution that lets both new and existing applications encrypt sensitive fields without modifying business SQL, using Encrypt‑JDBC or Encrypt‑Proxy, configurable encryption rules, and step‑by‑step migration guidance for seamless, secure database transformations.

dbaplus Community
dbaplus Community
dbaplus Community
Transparent Data Masking with Apache ShardingSphere for New and Legacy Apps

Background

Data security and masking are critical for internet companies and traditional industries. Data masking transforms sensitive information (ID numbers, phone numbers, etc.) to protect privacy. Businesses often need to implement encryption without changing existing SQL logic, and need transparent, low‑risk migration.

ShardingSphere Overview

Apache ShardingSphere is an open‑source distributed database middleware ecosystem consisting of Sharding‑JDBC, Sharding‑Proxy, and a planned Sharding‑Sidecar. It provides data sharding, distributed transactions, and governance. The data masking module is part of ShardingSphere’s distributed governance.

Encrypt‑JDBC intercepts SQL, rewrites it according to user‑defined masking rules, stores ciphertext (and optionally plaintext) in the underlying database, and decrypts data on query, making masking transparent to applications.

Requirement Scenarios

New applications: security teams require sensitive fields (e.g., bank account, phone) to be encrypted at rest; no historical data to clean.

Existing applications: large volumes of plaintext data need encryption, and new data must be encrypted without changing business SQL.

Masking Process Overview

Encrypt‑JDBC works as a bridge between business code and the database, parsing SQL, applying encryption/decryption based on the masking configuration, and interacting with the database.

Overall architecture of ShardingSphere masking
Overall architecture of ShardingSphere masking

Masking Configuration

The configuration consists of four parts: data source, encryptor, table (masking) definition, and query properties.

Data source: defines the JDBC connection.

Encryptor: built‑in AES or MD5, or custom implementations.

Table configuration: maps logical column (used by SQL) to plainColumn (stores plaintext) and cipherColumn (stores ciphertext).

Query property: decides whether queries return plaintext or ciphertext.

Example YAML for a new application:

encryptRule:
  encryptors:
    aes_encryptor:
      type: aes
      props:
        aes.key.value: 123456abc
  tables:
    t_user:
      columns:
        pwd:
          cipherColumn: pwd
          encryptor: aes_encryptor

Solution for New Applications

Configure an AES encryptor and map the logical column to the cipher column. The business SQL uses the logical column; Encrypt‑JDBC handles encryption/decryption automatically.

Result: only ciphertext is stored; plaintext can be stored optionally by adding plainColumn.

Solution for Existing Applications

Three‑step migration:

Before migration: add a cipherColumn (e.g., pwd_cipher) to the table, keep plainColumn (pwd) for existing data.

During migration: use Encrypt‑JDBC to encrypt new writes to both columns; manually encrypt historical plaintext to cipherColumn.

After migration: switch query.with.cipher.column to true, so queries return decrypted data from the cipher column while still writing plaintext to the plain column for rollback capability.

Configuration for the migration phase (YAML):

encryptRule:
  encryptors:
    aes_encryptor:
      type: aes
      props:
        aes.key.value: 123456abc
  tables:
    t_user:
      columns:
        pwd:
          plainColumn: pwd
          cipherColumn: pwd_cipher
          encryptor: aes_encryptor
      props:
        query.with.cipher.column: true

Advantages of ShardingSphere Masking

Automated and transparent masking; no code changes required.

Multiple built‑in and third‑party encryption strategies.

Customizable masking APIs for user‑defined algorithms.

Supports switching between masking strategies.

Allows simultaneous storage of plaintext and ciphertext for seamless migration.

Applicable Scenarios

Java‑based projects.

Back‑end databases such as MySQL, Oracle, PostgreSQL, SQL Server.

Need to mask one or more columns.

Compatible with standard SQL.

Limitations

Users must handle historical data cleaning themselves.

Some special SQL statements are not supported when using masking together with sharding.

Masked columns cannot be used in comparison, ordering, range, or LIKE operations.

Aggregations (AVG, SUM) on masked columns are not supported.

Future Directions

ShardingSphere also offers Encrypt‑Proxy for language‑agnostic access, supporting MySQL and PostgreSQL protocols, allowing tools like Navicat or command‑line clients to connect to a virtual masked database.

Conclusion

ShardingSphere’s data masking module provides a low‑cost, transparent solution for both new and legacy applications, enabling secure data handling without altering business SQL, and integrates with other ShardingSphere capabilities such as sharding, read/write splitting, and distributed transactions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

migrationShardingSphereDatabase Securitydata maskingEncrypt-JDBCYAML configuration
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.