Tagged articles
25 articles
Page 1 of 1
AI Architecture Path
AI Architecture Path
May 6, 2026 · Backend Development

Scrapling: Self‑Healing Web Scraper That Bypasses Cloudflare and Is 784× Faster Than BS4

Scrapling is an open‑source, adaptive web‑scraping framework that automatically tracks element changes, bypasses Cloudflare and other anti‑scraping defenses, offers multiple fetchers (including stealth mode), and delivers extraction speeds up to 784× faster than BeautifulSoup (BS4) while supporting concurrency, AI integration, and easy CLI usage.

PythonScraplingWeb Scraping
0 likes · 16 min read
Scrapling: Self‑Healing Web Scraper That Bypasses Cloudflare and Is 784× Faster Than BS4
Architecture and Beyond
Architecture and Beyond
Jul 1, 2023 · Industry Insights

Web Crawlers Unveiled: History, Value, and How to Tackle Their Challenges

This article traces the development of web crawlers from their 1990s origins to modern implementations, examines their multifaceted value in search, data analysis, and archiving, outlines technical, ethical, and legal challenges for both crawler creators and target sites, and presents practical strategies to mitigate malicious crawling.

Data ExtractionSecurityWeb Crawling
0 likes · 24 min read
Web Crawlers Unveiled: History, Value, and How to Tackle Their Challenges
IT Services Circle
IT Services Circle
Mar 26, 2022 · Information Security

Common Reasons Why Your Proxy Fails to Hide Your Web Scraper

The article explains several typical situations—such as not configuring HTTPS proxies, using server IPs, non‑anonymous proxies, polluted IP pools, and lack of HTTP/2 support—that cause websites to easily detect that a request is made through a proxy, even for beginner Python scrapers.

HTTPProxyPython
0 likes · 7 min read
Common Reasons Why Your Proxy Fails to Hide Your Web Scraper
MaGe Linux Operations
MaGe Linux Operations
Jul 25, 2021 · Backend Development

How to Bypass Common Anti‑Scraping Mechanisms with Python

This guide explains common anti‑scraping defenses—identity verification via request headers and IP rate limiting—and shows how to bypass them in Python using custom user‑agents, request throttling, and BeautifulSoup to successfully scrape Douban’s Top 250 movies.

anti-scraping
0 likes · 6 min read
How to Bypass Common Anti‑Scraping Mechanisms with Python
Python Programming Learning Circle
Python Programming Learning Circle
Jul 14, 2021 · Backend Development

Bypassing Anti‑Scraping Mechanisms: User‑Agent Spoofing and IP Rate Limiting with Python

This article explains how to overcome common anti‑scraping defenses such as identity verification and IP rate limiting by spoofing the User‑Agent header and adding request delays, providing complete Python code examples using requests and BeautifulSoup to scrape Douban's Top 250 movies.

IP throttlingUser-AgentWeb Scraping
0 likes · 6 min read
Bypassing Anti‑Scraping Mechanisms: User‑Agent Spoofing and IP Rate Limiting with Python
Sohu Tech Products
Sohu Tech Products
Mar 25, 2020 · Information Security

Designing Anti‑Scraping Techniques Using Custom Base64 Encoding

This article explains how to hide real intentions behind visible actions by using text obfuscation and custom Base64‑like encoding to defeat standard web scrapers, detailing the underlying principles, decoding challenges, and Python implementations of a flexible Custom64 encoder.

Base64Pythonanti-scraping
0 likes · 10 min read
Designing Anti‑Scraping Techniques Using Custom Base64 Encoding
21CTO
21CTO
Sep 28, 2019 · Backend Development

Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide

This article walks through the challenges of scraping Dazhong Dianping, explains how the site hides numeric data with custom CSS fonts, and provides a complete Python workflow—including HTTP requests, font extraction, glyph rendering, and OCR—to decode and retrieve the protected information.

CSS encryptionOCRPython
0 likes · 13 min read
Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide
JD Tech
JD Tech
Sep 7, 2018 · Information Security

Big Data and AI Security Insights from ISC 2018 Conference

The ISC 2018 conference highlighted the growing importance of big data and artificial intelligence security, presenting JD's research on anti‑scraping techniques, AI‑driven defenses against black‑market attacks, and a service‑oriented approach to protecting user data across enterprises.

AI securityBig Dataanti-scraping
0 likes · 5 min read
Big Data and AI Security Insights from ISC 2018 Conference
ITPUB
ITPUB
May 2, 2017 · Backend Development

How to Bypass Common Anti‑Scraping Measures with Scrapy

This guide explains why websites employ anti‑scraping defenses, outlines the most common header checks such as User‑Agent, Referer, and Cookies, and provides practical Scrapy code snippets for rotating user agents, managing proxies, handling X‑Forwarded‑For, limiting request rates, and dealing with dynamic AJAX content using Selenium or PhantomJS.

HeadersProxyScrapy
0 likes · 7 min read
How to Bypass Common Anti‑Scraping Measures with Scrapy
ITPUB
ITPUB
Mar 21, 2016 · Backend Development

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

This guide outlines the main anti‑scraping techniques used by websites—including header validation, user‑behavior monitoring, and dynamic content loading—and provides practical methods such as header spoofing, IP proxy rotation, request throttling, and Selenium/PhantomJS automation to overcome them.

HeadersPhantomJSSelenium
0 likes · 6 min read
How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages
21CTO
21CTO
Jan 26, 2016 · Backend Development

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

This article summarizes common anti‑scraping techniques—including header checks, user‑behavior detection, and dynamic page defenses—and provides practical ways to circumvent them using custom headers, IP proxies, request timing, and tools like Selenium with PhantomJS to simulate real browsers.

HeadersProxySelenium
0 likes · 6 min read
How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages