Baidu Geek Talk
Jan 6, 2025 · Information Security
MarkupLM-based Detection of Malicious Content Scraping
The article presents a MarkupLM‑based approach that enriches BERT with XPath embeddings to jointly model webpage text and structure, enabling site‑level detection of malicious content‑scraping pages that bypass traditional rule‑based filters and demonstrating the critical role of structural cues in improving spam classification accuracy.
Document UnderstandingMarkupLMWeb Security
0 likes · 16 min read