Creative Front‑End Anti‑Crawling Tricks Every Developer Should Know
This article explores a variety of front‑end anti‑crawling techniques—from font‑face obfuscation and background‑image sprites to pseudo‑elements and iframe loading—illustrating how developers can make data extraction harder for bots while acknowledging that no method is foolproof.
1. Introduction
For a web page we want good structure and clear content for search engines, but sometimes we need to hide data such as e‑commerce revenue or exam questions, which leads to the topic of crawlers and anti‑crawlers.
2. Common Anti‑Crawler Strategies
There is no perfect solution. Most backend‑oriented methods try to distinguish humans from bots, such as:
User‑Agent and Referer checks
Account and Cookie verification
CAPTCHA
IP rate limiting
Crawlers can mimic humans using headless browsers, OCR for CAPTCHAs, or purchased proxy IPs, so 100% protection is impossible.
3. Front‑End Anti‑Crawler Techniques
3.1 Font‑Face Obfuscation
Example: Maoyan movies – numeric data is rendered via a custom font and Unicode mapping, requiring the crawler to download and decode the font. The font URL changes on each refresh, increasing difficulty.
3.2 Background‑Image Sprites
Example: Meituan – numbers are displayed as background‑positioned images, with different offsets for each digit.
3.3 Hidden Characters
Some public‑account articles insert random characters and hide them with CSS, making simple text extraction harder.
3.4 Pseudo‑Element Content
Example: Autohome – critical information is placed in CSS pseudo‑elements, forcing the crawler to parse CSS to retrieve it.
3.5 Element Position Overlap
Example: Qunar – the price is built from several i tags, then two b tags are absolutely positioned to cover them with false values; the correct price appears only after visual rendering.
3.6 Iframe Asynchronous Loading
Example: NetEase Cloud Music – the initial HTML contains only an empty iframe (src="about:blank"); JavaScript later injects the full page into the iframe, requiring the crawler to execute scripts or intercept the network request.
3.7 Split‑Node Digits
Example: Proxy‑IP listings – the IP address is split into separate DOM nodes with decoy numbers inserted between them, confusing simple scrapers.
3.8 Character‑Set Replacement
Example: Qunar mobile – the HTML contains "3211" but CSS redefines the character set so the visual order becomes "1233", swapping digits to mislead crawlers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
