How Alibaba Cloud SLS’s New mask Function Simplifies Large‑Scale Log Desensitization

In the AI era, massive interaction data drives rapid smart‑app growth, but personal privacy risks demand robust data‑masking; Alibaba Cloud Log Service (SLS) introduces a versatile mask function that replaces complex regex pipelines with concise configurations, boosting performance, reducing maintenance, and meeting strict compliance such as GDPR and China’s Personal Information Protection Law.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Alibaba Cloud SLS’s New mask Function Simplifies Large‑Scale Log Desensitization

Background

AI‑driven applications generate huge volumes of interaction data that often contain personal privacy information, creating severe security challenges. Regulations like GDPR, China’s Data Security Law and Personal Information Protection Law make data desensitization a mandatory compliance requirement.

Existing SLS Desensitization Solutions

SLS already provides three flexible pipelines for data masking:

Logtail side‑masking : plugin mode using regular‑expression plugins or SPL statements for precise field replacement.

Logtail + Ingest Processor : Logtail handles collection while Ingest Processor performs server‑side masking via regexp_replace, reducing client resource consumption.

SDK + Ingest Processor : logs are written via SDK; masking rules are defined in the Ingest Processor using regexp_replace, keeping processing off the client.

These solutions rely heavily on regex matching, which leads to three major drawbacks:

Configuration complexity : handling dozens of sensitive fields requires writing and maintaining many complex regexes.

Performance bottlenecks : nested regex operations significantly slow real‑time processing.

Scenario adaptation difficulty : mixed log formats (JSON, URI, plain text) are hard to cover with a single regex configuration.

Introducing the mask Function

To address the above issues, SLS releases a new mask function, initially available in the Ingest Processor and later expanding to LoongCollector and other components.

Function Syntax

mask(field, varchar params)

Parameter Overview

The params JSON can specify one or more masking rules. Two built‑in modes are supported:

keyword mode : automatically detects key‑value patterns such as "key":"value", "key":"value" or key=value and masks the values.

buildin mode : provides six predefined types – EMAIL, PHONE, IDCARD, LANDLINE_PHONE, IP_ADDRESS, CREDIT_CARD.

Performance Comparison

A benchmark comparing the traditional regex pipeline with the new mask function was conducted on the SLS ingestion end‑to‑end environment. Test data ranged from 70 KB to 7 MB, with masking complexities of 1 keyword, 3 keywords, and >100 keywords + 6 built‑in rules. The metric was average processing latency (ms). Results show the mask function consistently yields lower latency, especially for large data volumes and complex configurations.

Use Cases

Case 1 – Transaction Data Desensitization

DeFi platforms generate nested JSON logs containing wallet addresses, IPs, phone numbers, etc. The mask function in keyword mode can mask multiple fields in a single rule while preserving log structure.

* | extend content = mask(content,'[ {"mode":"keyword","keys":["wallet","address","sourceIp","phone","transactionHash"],"maskChar":"*","keepPrefix":3,"keepSuffix":3} ]')

Resulting logs hide sensitive values but keep enough characters for traceability.

Case 2 – Large‑Model Interaction Logs

Chat logs from AI assistants contain unstructured user inputs with various PII. Using buildin mode, the function automatically detects and masks emails, phone numbers, IDs, credit cards, etc., without writing dozens of regexes.

* | extend content = mask(content,'[ {"mode":"buildin","types":["IP_ADDRESS","EMAIL","LANDLINE_PHONE"]}, {"mode":"buildin","types":["PHONE","IDCARD","CREDIT_CARD"],"maskChar":"*","keepPrefix":3,"keepSuffix":4} ]')

Case 3 – Nginx URI Parameter Masking

API gateway logs expose uid and token parameters in the URI. Keyword mode can target specific key‑value pairs, masking only the sensitive parts while leaving other parameters intact.

* | extend content = mask(content,'[ {"mode":"keyword","keys":["uid","token"],"maskChar":"*","keepPrefix":2,"keepSuffix":2} ]')

After processing, uid becomes us*******45 and token becomes bf**********04, preserving analysis value.

Configuration Tips

Use keepPrefix and keepSuffix to retain characters needed for business tracing.

Combine multiple rule objects in a single mask call to handle heterogeneous log formats.

Prefer buildin mode for unstructured text; keyword mode excels with structured JSON or query‑string logs.

References

For detailed configuration and best‑practice guidance, see the original article “云上数据安全保护:敏感日志扫描与脱敏实践详解” and the official SLS Write Processor documentation.

cloud-nativeSLSLog Processingdata maskingprivacy compliancemask function
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.