How d18n Enables Cross‑Platform Data Desensitization for Secure Databases

This article introduces d18n, a Go‑based, cross‑platform data‑desensitization tool that supports multiple databases and file formats, explains common desensitization scenarios, details its sensitive‑data identification techniques—including keyword, regex, and NLP‑based DFA—and outlines six practical masking algorithms with export and import workflows.

dbaplus Community
dbaplus Community
dbaplus Community
How d18n Enables Cross‑Platform Data Desensitization for Secure Databases

Introduction

Data desensitization (masking, transformation, or obscuring of sensitive fields) is required to comply with China’s Data Security Law (effective 2021‑09‑01) while still allowing analysis and testing.

Typical Use Cases

Testing & Development : Production databases have strict access controls; test environments need realistic but masked data samples.

Data Analysis : Large‑scale analytics must avoid exposing personal information; desensitization preserves analytical value.

Data Sharing : Inter‑company or inter‑department exchanges require tailored masking to protect core assets.

Cross‑Platform Design of d18n

d18n is implemented in Go and deliberately avoids CGO‑dependent drivers, so the same binary runs on Windows, Linux, macOS (including Apple Silicon) without recompilation. Go 1.16’s embed feature bundles static resources (e.g., corpora) directly into the executable, eliminating external files and simplifying container deployment.

Supported relational databases are those with pure‑Go drivers, including MySQL, Oracle, Microsoft SQL Server, PostgreSQL, etc. Export and import formats cover Excel, TXT, CSV, JSON, SQL, and HTML, enabling both human inspection and automated pipelines.

Sensitive Data Identification

d18n provides two classic detection mechanisms:

Keyword matching.

Regular‑expression matching.

A built‑in generic rule library allows immediate use, while users can add custom rules via configuration files without recompiling the source.

For higher accuracy, d18n integrates the gse natural‑language‑processing library. Text is tokenized into a Trie, and a deterministic‑finite‑automaton (DFA) scans the token stream to locate matches. Sample corpora for addresses and names are included as reference data.

Masking (Export) Techniques

d18n implements six masking strategies:

Nullification : Replace the value with a placeholder (e.g., *) or truncate the field.

Randomization : Substitute each character/number with a random value of the same type; supports random Chinese characters for Unicode text.

Data Replacement : Substitute the original value with a predefined dummy (e.g., replace all IPs with 127.0.0.1).

Encryption Replacement : Apply symmetric or asymmetric encryption (RSA, ECC). d18n does not store or distribute the private keys.

Differential Privacy : Use Google’s github.com/google/differential-privacy library to add calibrated noise, balancing utility and privacy.

Offset Rounding : Shift numeric or timestamp values by a fixed offset, preserving approximate ranges while hiding exact values.

Import Workflow

After masking, d18n can ingest Excel, TXT, CSV, HTML, or JSON files, optionally re‑mask them, and generate SQL scripts for direct database import. This creates a seamless pipeline from raw source data to a protected test environment.

Command‑Line and Library Usage

d18n is distributed both as a CLI executable and as a Go library, allowing developers to embed desensitization logic in custom applications. Typical CLI usage follows the pattern:

d18n --source=file.xlsx --db=mysql --mask=rand --out=masked.sql

When used as a library, the main API functions are:

import "github.com/LianjiaTech/d18n"

cfg := d18n.Config{MaskMode: d18n.ModeRandom, ...}
processor := d18n.NewProcessor(cfg)
processor.Run(inputPath, outputPath)

Open‑Source Resources

Source code, documentation, and issue tracker are hosted on GitHub:

Project documentation: https://github.com/LianjiaTech/d18n/blob/main/doc/toc.md

Repository: https://github.com/LianjiaTech/d18n

Issue feedback: https://github.com/LianjiaTech/d18n/issues

These resources provide the full codebase, usage examples, and a place for community contributions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Goprivacydata desensitizationDatabase Securitysensitive datad18n
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.