How Alibaba Cloud SLS’s New mask Function Simplifies Large‑Scale Log Desensitization
In the AI era, massive interaction data drives rapid smart‑app growth, but personal privacy risks demand robust data‑masking; Alibaba Cloud Log Service (SLS) introduces a versatile mask function that replaces complex regex pipelines with concise configurations, boosting performance, reducing maintenance, and meeting strict compliance such as GDPR and China’s Personal Information Protection Law.
Background
AI‑driven applications generate huge volumes of interaction data that often contain personal privacy information, creating severe security challenges. Regulations like GDPR, China’s Data Security Law and Personal Information Protection Law make data desensitization a mandatory compliance requirement.
Existing SLS Desensitization Solutions
SLS already provides three flexible pipelines for data masking:
Logtail side‑masking : plugin mode using regular‑expression plugins or SPL statements for precise field replacement.
Logtail + Ingest Processor : Logtail handles collection while Ingest Processor performs server‑side masking via regexp_replace, reducing client resource consumption.
SDK + Ingest Processor : logs are written via SDK; masking rules are defined in the Ingest Processor using regexp_replace, keeping processing off the client.
These solutions rely heavily on regex matching, which leads to three major drawbacks:
Configuration complexity : handling dozens of sensitive fields requires writing and maintaining many complex regexes.
Performance bottlenecks : nested regex operations significantly slow real‑time processing.
Scenario adaptation difficulty : mixed log formats (JSON, URI, plain text) are hard to cover with a single regex configuration.
Introducing the mask Function
To address the above issues, SLS releases a new mask function, initially available in the Ingest Processor and later expanding to LoongCollector and other components.
Function Syntax
mask(field, varchar params)Parameter Overview
The params JSON can specify one or more masking rules. Two built‑in modes are supported:
keyword mode : automatically detects key‑value patterns such as "key":"value", "key":"value" or key=value and masks the values.
buildin mode : provides six predefined types – EMAIL, PHONE, IDCARD, LANDLINE_PHONE, IP_ADDRESS, CREDIT_CARD.
Performance Comparison
A benchmark comparing the traditional regex pipeline with the new mask function was conducted on the SLS ingestion end‑to‑end environment. Test data ranged from 70 KB to 7 MB, with masking complexities of 1 keyword, 3 keywords, and >100 keywords + 6 built‑in rules. The metric was average processing latency (ms). Results show the mask function consistently yields lower latency, especially for large data volumes and complex configurations.
Use Cases
Case 1 – Transaction Data Desensitization
DeFi platforms generate nested JSON logs containing wallet addresses, IPs, phone numbers, etc. The mask function in keyword mode can mask multiple fields in a single rule while preserving log structure.
* | extend content = mask(content,'[ {"mode":"keyword","keys":["wallet","address","sourceIp","phone","transactionHash"],"maskChar":"*","keepPrefix":3,"keepSuffix":3} ]')Resulting logs hide sensitive values but keep enough characters for traceability.
Case 2 – Large‑Model Interaction Logs
Chat logs from AI assistants contain unstructured user inputs with various PII. Using buildin mode, the function automatically detects and masks emails, phone numbers, IDs, credit cards, etc., without writing dozens of regexes.
* | extend content = mask(content,'[ {"mode":"buildin","types":["IP_ADDRESS","EMAIL","LANDLINE_PHONE"]}, {"mode":"buildin","types":["PHONE","IDCARD","CREDIT_CARD"],"maskChar":"*","keepPrefix":3,"keepSuffix":4} ]')Case 3 – Nginx URI Parameter Masking
API gateway logs expose uid and token parameters in the URI. Keyword mode can target specific key‑value pairs, masking only the sensitive parts while leaving other parameters intact.
* | extend content = mask(content,'[ {"mode":"keyword","keys":["uid","token"],"maskChar":"*","keepPrefix":2,"keepSuffix":2} ]')After processing, uid becomes us*******45 and token becomes bf**********04, preserving analysis value.
Configuration Tips
Use keepPrefix and keepSuffix to retain characters needed for business tracing.
Combine multiple rule objects in a single mask call to handle heterogeneous log formats.
Prefer buildin mode for unstructured text; keyword mode excels with structured JSON or query‑string logs.
References
For detailed configuration and best‑practice guidance, see the original article “云上数据安全保护:敏感日志扫描与脱敏实践详解” and the official SLS Write Processor documentation.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
