Information Security 24 min read

Tokenization for Data Security: Design, Implementation, and Engineering Practices

The article explains how tokenization transforms data security into a built‑in attribute that automatically scales with data growth, detailing its design principles, generation methods, architectural layers, security safeguards, and practical engineering experiences to address exposure risks in modern digital businesses.

Meituan Technology Team

Sep 22, 2022

Tokenization for Data Security: Design, Implementation, and Engineering Practices

0. Introduction

Rapid digital expansion makes data a core production asset, but the fluidity and openness of data create large exposure surfaces. Traditional perimeter security and layered approvals cripple productivity and incur prohibitive costs. Regulations such as GDPR require new compliance mechanisms.

1. Data‑Technology Challenges to Security

Data fluidity and openness : Low‑cost, large‑scale data flow is essential for value, yet per‑node hardening and multi‑level approvals dramatically increase cost and reduce agility.

Copyability and loss of control : Once accessed, data’s control shifts away from the provider, making centralized security policies costly and ineffective.

Varied data forms and complex applications : Data traverses virtually every IT system; AI/ML workloads further complicate usage logic.

Complex, evolving threats : High commercial value attracts black‑/gray‑market actors, insiders, and nation‑state espionage.

2. Tokenization – A Digital Banking Analogy

Just as banks replaced cash with electronic deposits to reduce theft, tokenization replaces raw personally identifiable information (PII) with a one‑to‑one pseudonym (Token) at the moment data enters the organization. Tokens circulate throughout the ecosystem, while the original plaintext can be recovered only through a hardened tokenization service, rendering stolen tokens useless.

Figure 3 (illustrated in the source) shows that after tokenization the number of services with direct plaintext access drops to a two‑digit count, reducing exposure to under 1 %.

Figure 4 demonstrates that tokenization can achieve “0 storage, 0 cache, 0 interface, 0 data‑warehouse” for sensitive data; only a few privileged hosts or UI components can retrieve plaintext.

3. Tokenization Scheme Overview

3.1 What is Tokenization?

Tokenization replaces sensitive data with a non‑sensitive equivalent (Token) to lower risk and meet privacy compliance. It originated in the payment‑card industry (PCI) for PAN replacement and now extends to general PII.

PII : Any identifier that can directly or indirectly identify an individual (e.g., ID card, phone, email).

De‑identification : Techniques that temporarily or permanently remove the link between data and individuals (pseudonymization, anonymization, encryption).

Pseudonymization : Replaces data with artificial IDs; tokenization is a concrete implementation.

Anonymization : Irreversibly masks data (e.g., data masking).

Data encryption : Produces ciphertext that must be decrypted to be usable, limiting its applicability for analytics.

3.2 Basic Design

3.2.1 Usable, Invisible

Usability : Tokens enable deduplication, statistics, and correlation in big‑data scenarios; they can fully replace plaintext in most workflows, with a fallback service to retrieve plaintext when absolutely necessary.

Invisibility : Security of the tokenization service itself is the foundation; the design must prevent attackers from deriving the original data from a token.

3.2.2 Architectural Requirements

Business adaptability : Support all data‑exchange patterns, including online transactions, real‑time and batch processing, and AI/ML workloads.

Security : Protect the mapping between token and plaintext through strong algorithms and service hardening.

Performance : Introduce no noticeable degradation to system stability.

3.3 Token Generation Logic

Three common approaches are described:

Randomization : Generate completely random tokens and store a one‑to‑one mapping table. This offers the highest security because tokens have no algorithmic relationship to plaintext, but it limits distributed generation and may affect performance.

MAC‑based : Use a salted HMAC to produce deterministic tokens across locations, improving scalability. Security depends on protecting the salt; the same protection strategy as encryption keys is recommended.

Deterministic encryption : Apply algorithms such as AES‑SIV or format‑preserving encryption (FPE) to produce reversible tokens. This weakens randomness and introduces key‑rotation challenges, so it is generally discouraged.

3.4 Logical Architecture

The tokenization service consists of three layers:

Access layer : Provides portal, API, and big‑data job interfaces, enforcing fine‑grained access control, IAM, and service authentication.

Service layer : Executes token creation, storage, and lookup.

Storage layer : Stores encrypted mapping tables (HASH → Token → Ciphertext). Applications never retrieve plaintext directly; they obtain ciphertext and decrypt locally via a Key Management Service (KMS).

3.5 Application Panorama

Tokens flow from online data sources and data warehouses through the tokenization service (both online and offline Hive), then to downstream applications. Two consumer types exist:

Regular applications : Operate directly on tokens.

Decryption applications : Convert tokens to ciphertext and then decrypt via KMS when business logic requires plaintext.

4. Tokenization Security Implementation

4.1 Security Essentials

Security assumes token‑plaintext independence; any table linking them must be prevented. Compromise of such a table destroys the security model.

4.2 Risks and Design Measures

Token service risks

Token generation must use a trusted RNG (hardware RNG preferred) or a cryptographically secure PRNG; HMAC‑based schemes must protect the salt.

Runtime should run on hardened, dedicated systems.

Storage must contain only indexes, tokens, and encrypted ciphertext, with strict access controls.

API authentication should use mTLS + OAuth2 tickets and audit logging.

Token‑to‑plaintext conversion returns only ciphertext; decryption occurs locally via KMS.

UI operations require IAM and ABAC‑based fine‑grained controls.

Downstream ecosystem risks

All downstream services with token access must be included in security reviews.

Export or forwarding of token‑plaintext mapping tables is prohibited.

Proxying is disallowed; services must call the tokenization service directly.

Comprehensive monitoring, scanning, and incident response must be implemented for all ecosystem components.

5. Engineering Practice Experience

Consistency strategy : Publish unified policies, guidelines, and tooling so that all teams understand tokenization requirements, including decryption, access control, and AI data baselines.

Incremental rollout : Decompose migration into fine‑grained service‑level units, enabling gray‑scale adoption.

DevOps transformation : Package tokenization logic into easy‑to‑use SDKs, automate testing, validation, and data‑cleaning pipelines.

Service reliability : Design tokenization services for high performance, availability, and graceful degradation, with continuous testing and optimization.

Operation and governance : Monitor token usage, scan for uncovered data islands, and ensure full token coverage across cold, static, and isolated data sets.

Learning and iteration : Continuously adapt to new data forms, applications, and privacy‑preserving technologies such as secure multi‑party computation.

6. Unaddressed Issues

Tokenization does not currently handle unstructured data (images, video) or cross‑enterprise data exchange, which may require encryption or privacy‑preserving computation. It focuses on structured PII in databases and Hive; semi‑structured JSON logs and file‑based PII need complementary data‑discovery tools.

Overall, tokenization establishes a “default security” paradigm for internal data use, enabling traceability from collection to third‑party exchange while supporting lossless deletion and other advanced capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Privacy Tokenization Data Governance Data Security security architecture PII

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.