How Alibaba Cloud’s New mask Function Boosts Log Data Security and Performance
This article explains why data desensitization is now a compliance must‑have, reviews Alibaba Cloud Log Service’s existing masking pipelines, introduces the new mask function with its keyword and built‑in modes, compares its performance against regex solutions, and showcases three real‑world use cases covering transaction logs, large‑model interactions, and Nginx URI parameters.
In the AI era, massive interaction data fuels intelligent applications, but personal privacy creates severe security challenges; data desensitization has shifted from optional to mandatory for compliance.
Stricter compliance requirements : Regulations such as GDPR, China’s Data Security Law and Personal Information Protection Law impose heavy fines and trust loss if sensitive data (identity, financial, medical) is leaked.
Urgent security guarantees : Regular sensitive‑data scanning and masking can prevent unauthorized access and improve overall system security.
Alibaba Cloud Log Service (SLS) already offers three flexible data‑masking pipelines that combine collection and masking to meet various business scenarios.
SLS existing masking solutions
Logtail client‑side masking via configurable plugins (regex‑based field replacement).
Logtail SPL statement mode for flexible client‑side masking.
Logtail + Ingest Processor joint masking, where Logtail focuses on collection and Ingest Processor performs high‑performance server‑side masking using SPL regexp_replace.
Mask function overview
The new mask function simplifies masking for both structured and unstructured logs and is already available in SLS Ingest Processors, with future expansion to LoongCollector and other scenarios.
Function syntax
mask(field, varchar params)Parameters are defined in JSON. Two modes are supported:
keyword mode : intelligently detects key‑value pairs such as "key":"value", 'key':'value' or key=value and masks the value.
buildin mode : provides six built‑in rules for email, phone (CN), ID card, landline, IP address, and credit card.
Performance comparison
In an end‑to‑end SLS ingestion test, the mask function was benchmarked against regex‑based masking across data packets ranging from 70 KB to 7 MB and masking complexities from a single keyword to over 100 keywords plus six built‑in rules. The mask function consistently showed lower average latency (ms), demonstrating higher efficiency for large‑scale and complex masking scenarios.
Use case 1: Transaction data masking
DeFi platforms generate JSON logs containing wallet addresses, IPs, phone numbers, etc. The mask function’s keyword mode can precisely mask these fields while preserving log structure.
2025-08-20 18:04:40,998 INFO blockchain-event-poller-3 [10.0.1.20] [com.service.listener.TransactionStatusListener:65] {"message":"On-chain transaction successfully confirmed","confirmationDetails":{"transactionHash":"0x2baf892e9a164b1979","status":"success","blockNumber":45101239,"gasUsed":189543,"effectiveGasPrice":"58.2 Gwei","userProfileSnapshot":{"wallet":"0x71C7656EC7a5f6d8A7C4","sourceIp":"203.0.113.55","phone":"19901012345","address":"上海市浦东新区文明路1000号","birthday":null}}}Mask configuration (keyword mode, keep 3 prefix & suffix characters):
* | extend content = mask(content,'[ {"mode":"keyword","keys":["wallet","address","sourceIp","phone","transactionHash"],"maskChar":"*","keepPrefix":3,"keepSuffix":3} ]')Resulting logs show all sensitive fields masked while retaining enough characters for traceability.
Use case 2: Large‑model interaction logs
Chat logs often contain arbitrary PII. Using buildin mode, the mask function automatically detects emails, phones, IPs, ID cards, and credit cards without maintaining dozens of regexes.
你好,我需要紧急帮助!我是你们平台的长期付费用户,我的账户好像被锁定了,而且一笔年度会员续费失败了。我非常着急,因为我今晚需要使用你们的高级功能来完成一个项目
以下是我的全部信息,请你们的系统管理员或技术支持立刻为我核实并解决问题:
姓名 张伟
注册手机号是 19901012345
注册邮箱是 [email protected]
我最近一次登录的IP地址 203.0.113.55
身份证号是 110105199003070033
用于支付的信用卡信息如下:
信用卡类型 Visa
卡号是 4539-1488-0343-6467
持卡人姓名 ZHANG WEI
有效期 12/25
CVV码 123
请尽快处理,万分感谢!我真的非常需要你们的帮助!Mask configuration (buildin mode with partial masking for phone, ID, credit card):
* | extend content = mask(content,'[ {"mode":"buildin","types":["IP_ADDRESS","EMAIL","LANDLINE_PHONE"]}, {"mode":"buildin","types":["PHONE","IDCARD","CREDIT_CARD"],"maskChar":"*","keepPrefix":3,"keepSuffix":4} ]')The output replaces all detected PII with asterisks while keeping the conversational context intact.
Use case 3: Nginx URI parameter masking
API gateway logs often expose user IDs, tokens, and API keys in query strings. Keyword mode can target specific parameters for selective masking.
uri: "uid=user12345&token=bf81639a41d604&from=web"Mask configuration (keep 2 prefix and suffix characters):
* | extend content = mask(content,'[ {"mode":"keyword","keys":["uid","token"],"maskChar":"*","keepPrefix":2,"keepSuffix":2} ]')Result: uid=us*******45&token=bf**********04&from=web, protecting sensitive parameters while preserving other query data.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
