Tagged articles

anti-scraping

25 articles · Page 1 of 1

May 6, 2026 · Backend Development

Scrapling: Self‑Healing Web Scraper That Bypasses Cloudflare and Is 784× Faster Than BS4

Scrapling is an open‑source, adaptive web‑scraping framework that automatically tracks element changes, bypasses Cloudflare and other anti‑scraping defenses, offers multiple fetchers (including stealth mode), and delivers extraction speeds up to 784× faster than BeautifulSoup (BS4) while supporting concurrency, AI integration, and easy CLI usage.

PythonScraplingWeb Scraping

0 likes · 16 min read

Scrapling: Self‑Healing Web Scraper That Bypasses Cloudflare and Is 784× Faster Than BS4

Python Crawling & Data Mining

Sep 22, 2024 · Backend Development

Master Python Web Scraping with requests_html: A Step-by-Step Guide

Learn how to overcome anti‑scraping defenses by using Python's requests_html library to fetch and parse dynamic web pages, with a complete code example that extracts company names from a target site, plus tips on handling cookies, headers, and Unicode decoding.

anti-scrapingdata-miningrequests-html

0 likes · 6 min read

Master Python Web Scraping with requests_html: A Step-by-Step Guide

Architecture and Beyond

Jul 1, 2023 · Industry Insights

Web Crawlers Unveiled: History, Value, and How to Tackle Their Challenges

This article traces the development of web crawlers from their 1990s origins to modern implementations, examines their multifaceted value in search, data analysis, and archiving, outlines technical, ethical, and legal challenges for both crawler creators and target sites, and presents practical strategies to mitigate malicious crawling.

anti-scrapingdata extractionrobots.txt

0 likes · 24 min read

Web Crawlers Unveiled: History, Value, and How to Tackle Their Challenges

Python Programming Learning Circle

Oct 11, 2022 · Backend Development

How to Earn Money with Python: Web Scraping, Platforms, and Practical Tips

This guide explains how unemployed developers can use Python, especially web‑scraping techniques, to secure freelance gigs by leveraging various platforms, community groups, and effective order‑taking strategies while warning about common pitfalls and anti‑scraping challenges.

anti-scrapingdata processingfreelance

0 likes · 7 min read

How to Earn Money with Python: Web Scraping, Platforms, and Practical Tips

IT Services Circle

Mar 26, 2022 · Information Security

Common Reasons Why Your Proxy Fails to Hide Your Web Scraper

The article explains several typical situations—such as not configuring HTTPS proxies, using server IPs, non‑anonymous proxies, polluted IP pools, and lack of HTTP/2 support—that cause websites to easily detect that a request is made through a proxy, even for beginner Python scrapers.

HTTPPythonWeb Scraping

0 likes · 7 min read

Common Reasons Why Your Proxy Fails to Hide Your Web Scraper

Python Crawling & Data Mining

Jan 31, 2022 · Backend Development

Bypass Anti‑Scraping Measures with Python’s requests_html

This article walks through a real‑world Python web‑scraping case, explaining why standard requests failed, how the requests_html library can overcome anti‑scraping defenses, and provides a complete, runnable code example with analysis and results.

anti-scrapingrequests_html

0 likes · 6 min read

Bypass Anti‑Scraping Measures with Python’s requests_html

Python Crawling & Data Mining

Nov 27, 2021 · Backend Development

Crack Custom Font Anti‑Scraping on Dianping: A Complete Python Guide

This article walks through decoding custom font anti‑scraping on a Chinese group‑buying site using Python, covering page fetching, CSS font URL extraction, .woff download, glyph mapping with fontTools, OCR via ddddocr, DOM manipulation with BeautifulSoup, and exporting the cleaned data to Excel.

Pythonanti-scrapingbeautifulsoup

0 likes · 28 min read

Crack Custom Font Anti‑Scraping on Dianping: A Complete Python Guide

MaGe Linux Operations

Jul 25, 2021 · Backend Development

How to Bypass Common Anti‑Scraping Mechanisms with Python

This guide explains common anti‑scraping defenses—identity verification via request headers and IP rate limiting—and shows how to bypass them in Python using custom user‑agents, request throttling, and BeautifulSoup to successfully scrape Douban’s Top 250 movies.

anti-scraping

0 likes · 6 min read

Python Programming Learning Circle

Jul 14, 2021 · Backend Development

Bypassing Anti‑Scraping Mechanisms: User‑Agent Spoofing and IP Rate Limiting with Python

This article explains how to overcome common anti‑scraping defenses such as identity verification and IP rate limiting by spoofing the User‑Agent header and adding request delays, providing complete Python code examples using requests and BeautifulSoup to scrape Douban's Top 250 movies.

IP throttlingUser-AgentWeb Scraping

0 likes · 6 min read

Bypassing Anti‑Scraping Mechanisms: User‑Agent Spoofing and IP Rate Limiting with Python

Python Programming Learning Circle

Dec 25, 2020 · Backend Development

Bypassing Anti‑Scraping Measures on Mayi Short‑Rent Site Using Cookies and BeautifulSoup

This tutorial explains how to analyze the Mayi short‑rent website, overcome its anti‑scraping defenses by setting appropriate Cookie and User‑Agent headers, and use Python's urllib2 and BeautifulSoup to extract rental details, store them in CSV, and optionally employ Selenium.

CookiePythonanti-scraping

0 likes · 8 min read

Bypassing Anti‑Scraping Measures on Mayi Short‑Rent Site Using Cookies and BeautifulSoup

Python Programming Learning Circle

Dec 17, 2020 · Backend Development

Request Header Spoofing and Anti‑Anti‑Scraping Techniques for Web Crawlers

This article explains how to disguise a web crawler's identity by customizing request headers, managing request frequency with sleep and proxy settings, and tackling common anti‑scraping mechanisms such as captchas, dynamic loading, and encrypted content using tools like Selenium.

anti-scrapingproxiesrequest headers

0 likes · 6 min read

Request Header Spoofing and Anti‑Anti‑Scraping Techniques for Web Crawlers

Python Crawling & Data Mining

Jul 29, 2020 · Backend Development

How to Build a Python Web Scraper that Automatically Downloads and Organizes Images

This tutorial walks you through creating a Python web scraper that extracts images from a target site, handles anti‑scraping measures, saves them into categorized folders, and logs the download results, while explaining the required libraries, code structure, and best practices.

Image DownloadPythonWeb Scraping

0 likes · 6 min read

How to Build a Python Web Scraper that Automatically Downloads and Organizes Images

Python Crawling & Data Mining

Jul 9, 2020 · Big Data

How to Build a Python Web Scraper for Job Listings and Bypass Anti‑Scraping Measures

This tutorial explains how to crawl 58.com job listings with Python, extract location, company, and salary information, handle anti‑scraping defenses using realistic headers and random User‑Agents, and save the results into a text file.

PythonWeb Scrapinganti-scraping

0 likes · 7 min read

How to Build a Python Web Scraper for Job Listings and Bypass Anti‑Scraping Measures

Python Programming Learning Circle

Jun 20, 2020 · Information Security

Bypassing Implicit Style‑CSS Anti‑Scraping: Analysis and Restoration of Obfuscated Content

This article explains how many Chinese web sites use hidden CSS ::before content to hide characters, shows how to locate the relevant network request, decode the span class mappings from obfuscated JavaScript, and restore the original text for successful web scraping.

JavaScriptObfuscationanti-scraping

0 likes · 10 min read

Bypassing Implicit Style‑CSS Anti‑Scraping: Analysis and Restoration of Obfuscated Content

Python Programming Learning Circle

Jun 6, 2020 · Information Security

Understanding CSS Sprites and Techniques to Bypass Sprite‑Based Anti‑Scraping

This article explains the concept and benefits of CSS sprites, analyzes their drawbacks for web performance and security, and provides a step‑by‑step Python‑based method—including code snippets—to extract and sum numbers hidden behind sprite images used as an anti‑scraping measure.

Front-endSpriteWeb Scraping

0 likes · 9 min read

Understanding CSS Sprites and Techniques to Bypass Sprite‑Based Anti‑Scraping

Python Crawling & Data Mining

Apr 17, 2020 · Backend Development

Bypass Anti‑Scraping Measures with Python Requests and Proxy Pools

This tutorial explains how to overcome common anti‑scraping defenses of a proxy‑listing website by capturing legitimate HTTP headers with Fiddler, configuring the Python requests library, and building a dynamic proxy pool to keep your crawler running smoothly.

FiddlerPythonanti-scraping

0 likes · 5 min read

Bypass Anti‑Scraping Measures with Python Requests and Proxy Pools

Sohu Tech Products

Mar 25, 2020 · Information Security

Designing Anti‑Scraping Techniques Using Custom Base64 Encoding

This article explains how to hide real intentions behind visible actions by using text obfuscation and custom Base64‑like encoding to defeat standard web scrapers, detailing the underlying principles, decoding challenges, and Python implementations of a flexible Custom64 encoder.

Base64Pythonanti-scraping

0 likes · 10 min read

Designing Anti‑Scraping Techniques Using Custom Base64 Encoding

Python Programming Learning Circle

Dec 27, 2019 · Backend Development

How to Bypass Anti‑Scraping Measures: Delays, Headers, Proxies & Distributed Crawling

This guide explains practical techniques to avoid IP bans and 403 errors when web‑scraping, covering explicit and implicit waiting, User‑Agent spoofing, proxy usage, IP pools, and distributed crawling architectures.

PythonSeleniumWeb Scraping

0 likes · 8 min read

How to Bypass Anti‑Scraping Measures: Delays, Headers, Proxies & Distributed Crawling

Python Programming Learning Circle

Oct 19, 2019 · Backend Development

How to Bypass Anti‑Scraping Measures: User‑Agent, Cookies & Proxies

This guide explains practical techniques such as faking User‑Agent headers, rotating cookies, adding random delays, and using proxy pools to prevent IP bans while crawling large amounts of data from websites with anti‑scraping defenses.

User-AgentWeb Scrapinganti-scraping

0 likes · 4 min read

How to Bypass Anti‑Scraping Measures: User‑Agent, Cookies & Proxies

21CTO

Sep 28, 2019 · Backend Development

Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide

This article walks through the challenges of scraping Dazhong Dianping, explains how the site hides numeric data with custom CSS fonts, and provides a complete Python workflow—including HTTP requests, font extraction, glyph rendering, and OCR—to decode and retrieve the protected information.

CSS encryptionOCRPython

0 likes · 13 min read

Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide

MaGe Linux Operations

Aug 10, 2019 · Backend Development

Master Web Scraping in Python: From Basics to Bypassing Anti‑Scraping

Learn how to start web scraping with Python by mastering the three core steps—fetching, analyzing, and storing data—using urllib and requests, handling login, evading anti‑scraping measures like user‑agents and IP proxies, and saving results to JSON, CSV, or MongoDB.

PythonSeleniumanti-scraping

0 likes · 9 min read

Master Web Scraping in Python: From Basics to Bypassing Anti‑Scraping

JD Tech

Sep 7, 2018 · Information Security

Big Data and AI Security Insights from ISC 2018 Conference

The ISC 2018 conference highlighted the growing importance of big data and artificial intelligence security, presenting JD's research on anti‑scraping techniques, AI‑driven defenses against black‑market attacks, and a service‑oriented approach to protecting user data across enterprises.

AI securityBig Dataanti-scraping

0 likes · 5 min read

Big Data and AI Security Insights from ISC 2018 Conference

ITPUB

May 2, 2017 · Backend Development

How to Bypass Common Anti‑Scraping Measures with Scrapy

This guide explains why websites employ anti‑scraping defenses, outlines the most common header checks such as User‑Agent, Referer, and Cookies, and provides practical Scrapy code snippets for rotating user agents, managing proxies, handling X‑Forwarded‑For, limiting request rates, and dealing with dynamic AJAX content using Selenium or PhantomJS.

HeadersScrapyWeb Scraping

0 likes · 7 min read

How to Bypass Common Anti‑Scraping Measures with Scrapy

ITPUB

Mar 21, 2016 · Backend Development

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

This guide outlines the main anti‑scraping techniques used by websites—including header validation, user‑behavior monitoring, and dynamic content loading—and provides practical methods such as header spoofing, IP proxy rotation, request throttling, and Selenium/PhantomJS automation to overcome them.

HeadersPhantomJSSelenium

0 likes · 6 min read

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

21CTO

Jan 26, 2016 · Backend Development

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

This article summarizes common anti‑scraping techniques—including header checks, user‑behavior detection, and dynamic page defenses—and provides practical ways to circumvent them using custom headers, IP proxies, request timing, and tools like Selenium with PhantomJS to simulate real browsers.

HeadersSeleniumWeb Scraping

0 likes · 6 min read