How d18n Enables Cross‑Platform Data Desensitization for Secure Databases
This article introduces d18n, a Go‑based, cross‑platform data‑desensitization tool that supports multiple databases and file formats, explains common desensitization scenarios, details its sensitive‑data identification techniques—including keyword, regex, and NLP‑based DFA—and outlines six practical masking algorithms with export and import workflows.
Introduction
Data desensitization (masking, transformation, or obscuring of sensitive fields) is required to comply with China’s Data Security Law (effective 2021‑09‑01) while still allowing analysis and testing.
Typical Use Cases
Testing & Development : Production databases have strict access controls; test environments need realistic but masked data samples.
Data Analysis : Large‑scale analytics must avoid exposing personal information; desensitization preserves analytical value.
Data Sharing : Inter‑company or inter‑department exchanges require tailored masking to protect core assets.
Cross‑Platform Design of d18n
d18n is implemented in Go and deliberately avoids CGO‑dependent drivers, so the same binary runs on Windows, Linux, macOS (including Apple Silicon) without recompilation. Go 1.16’s embed feature bundles static resources (e.g., corpora) directly into the executable, eliminating external files and simplifying container deployment.
Supported relational databases are those with pure‑Go drivers, including MySQL, Oracle, Microsoft SQL Server, PostgreSQL, etc. Export and import formats cover Excel, TXT, CSV, JSON, SQL, and HTML, enabling both human inspection and automated pipelines.
Sensitive Data Identification
d18n provides two classic detection mechanisms:
Keyword matching.
Regular‑expression matching.
A built‑in generic rule library allows immediate use, while users can add custom rules via configuration files without recompiling the source.
For higher accuracy, d18n integrates the gse natural‑language‑processing library. Text is tokenized into a Trie, and a deterministic‑finite‑automaton (DFA) scans the token stream to locate matches. Sample corpora for addresses and names are included as reference data.
Masking (Export) Techniques
d18n implements six masking strategies:
Nullification : Replace the value with a placeholder (e.g., *) or truncate the field.
Randomization : Substitute each character/number with a random value of the same type; supports random Chinese characters for Unicode text.
Data Replacement : Substitute the original value with a predefined dummy (e.g., replace all IPs with 127.0.0.1).
Encryption Replacement : Apply symmetric or asymmetric encryption (RSA, ECC). d18n does not store or distribute the private keys.
Differential Privacy : Use Google’s github.com/google/differential-privacy library to add calibrated noise, balancing utility and privacy.
Offset Rounding : Shift numeric or timestamp values by a fixed offset, preserving approximate ranges while hiding exact values.
Import Workflow
After masking, d18n can ingest Excel, TXT, CSV, HTML, or JSON files, optionally re‑mask them, and generate SQL scripts for direct database import. This creates a seamless pipeline from raw source data to a protected test environment.
Command‑Line and Library Usage
d18n is distributed both as a CLI executable and as a Go library, allowing developers to embed desensitization logic in custom applications. Typical CLI usage follows the pattern:
d18n --source=file.xlsx --db=mysql --mask=rand --out=masked.sqlWhen used as a library, the main API functions are:
import "github.com/LianjiaTech/d18n"
cfg := d18n.Config{MaskMode: d18n.ModeRandom, ...}
processor := d18n.NewProcessor(cfg)
processor.Run(inputPath, outputPath)Open‑Source Resources
Source code, documentation, and issue tracker are hosted on GitHub:
Project documentation: https://github.com/LianjiaTech/d18n/blob/main/doc/toc.md
Repository: https://github.com/LianjiaTech/d18n
Issue feedback: https://github.com/LianjiaTech/d18n/issues
These resources provide the full codebase, usage examples, and a place for community contributions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
