Information Security 18 min read

Data Security Practices and Solutions at Meituan: Application Systems and Data Warehouse

Meituan‑Dianping’s Information Security Center combats data leakage by deploying multi‑layered safeguards—device fingerprinting, CAPTCHAs, behavior‑based crawler detection, robust watermarking, honey‑pot datasets, UEBA analytics, and advanced masking, tokenization, privacy‑preserving techniques, asset mapping, and automated database scanning—to protect both application systems and its massive data‑warehouse environment.

Meituan Technology Team

May 17, 2018

Data Security Practices and Solutions at Meituan: Application Systems and Data Warehouse

Background: In recent years data security incidents have become increasingly severe. Internet companies have reached a consensus that, while servers may be compromised, sensitive data must never be leaked. Server loss is tolerable, but data leakage can cause serious reputational and financial damage.

Traditional data‑security life‑cycle models and vendor solutions often fail in practice because they do not scale to the massive data volumes and complex environments of large internet companies.

For example, classifying and grading data manually is impossible when millions of tables are constantly growing.

Vendor‑provided audit solutions that rely on hardware boxes are unsuitable for Hadoop‑based environments, where the cost of such hardware is prohibitive.

Therefore, Meituan‑Dianping’s Information Security Center has explored concrete measures at two layers: the application system layer and the data‑warehouse layer.

1. Application Systems

Application‑level security covers external attack resistance (OWASP Top‑10 risks, SDL, red‑blue exercises) and internal safeguards. The discussion below focuses on the latter.

1.1 Account Sweeping and Crawlers

Account sweeping includes credential stuffing (using leaked credentials) and weak‑password attacks. Countermeasures such as device fingerprinting, complex CAPTCHAs, IP reputation, sensor‑based behavior analysis, and battery‑level checks are employed. For instance, if a login does not trigger any sensor changes (accelerometer, gyroscope), the request is likely scripted.

Crawlers consume resources without business value and increase data‑leakage risk. In the era of internet finance, crawlers have evolved from unauthorized scraping to authorized data collection, where users voluntarily provide credentials for credit‑scoring, raising new privacy concerns.

Defending against crawlers now also involves machine‑learning models that distinguish normal from abnormal behavior, but attackers continuously adapt, leading to an arms race between machines.

1.2 Watermark

Robust watermarking is needed to trace the leakage of internal sensitive files. Techniques include spatial filtering, Fourier transforms, and geometric distortion, which embed information that can survive harsh conditions.

1.3 Data Honey‑Pot

A honey‑pot creates a fake data set that records any access, allowing the detection of attacker behavior. Implementation often involves embedding a “trojan” in a data file that reports back when opened.

1.4 Big‑Data Behavior Auditing

Big data enables correlation‑based anomaly detection. Traditional security‑audit products are limited; large internet companies must build their own solutions, such as UEBA (User and Entity Behavior Analytics), clustering algorithms to spot outliers, and association models that link IP, device, MAC, GPS, logistics, and financial flows to identify malicious groups.

Examples include detecting insider threats by rules like “shared a device with a known bad actor” or spotting abnormal transaction sequences (login → password change → large‑value order).

1.5 Data Masking

Masking protects sensitive fields in both external‑facing and internal systems. External masking typically replaces critical data (bank card, ID, phone) with asterisks, while still allowing privileged users to view full data after additional verification. Internally, masking can be driven by log analysis or front‑end JavaScript, with role‑based policies (e.g., risk‑control staff see full data, customer‑service staff see masked data). A global view of masking activity helps monitor and reduce sensitive‑data exposure.

2. Data Warehouse

The data warehouse is the core of company data; its security is a subset of overall data governance. Key security‑related constructions include data masking, privacy protection, big‑data asset mapping, and database scanning.

2.1 Data Masking

Masking in the warehouse transforms sensitive data for analysis purposes. Simple partial masking (e.g., 139****0011) is cheap but insufficient for some scenarios, which may require tokenization, range‑based segmentation, or base64‑level image masking. Choices involve trade‑offs between storage cost (copying masked tables) and runtime cost (visual masking).

2.2 Privacy Protection

Academic methods such as k‑anonymity, l‑diversity, and differential privacy are referenced, but production use is still limited. Google DLP API is a common commercial tool, yet it is complex and scenario‑specific.

2.3 Big‑Data Asset Map

An asset map visualizes data flow, usage, and permissions across departments, enabling security teams to monitor high‑sensitivity assets and trigger alerts or permission revocations.

2.4 Database Scanner

Automated scanners discover sensitive fields in massive tables using regex patterns and machine‑learning‑augmented labeling, addressing the impracticality of manual classification.

In summary, as business scales, data security becomes increasingly critical. Microscopic tool‑building reduces operational impact, while macroscopic cooperation with partners and subsidiaries extends the security perimeter.

Author : Pengfei, Head of Data Security, Meituan‑Dianping Group.

Team : The security department aggregates top security experts and engineers to build a multi‑layered, big‑data‑plus‑machine‑learning security perception system covering network, virtualization, OS, runtime, VM, web, and data‑access layers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

privacy protection Application Security Data Masking UEBA

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.