Tagged articles
83 articles
Page 1 of 1
Python Programming Learning Circle
Python Programming Learning Circle
Jun 7, 2025 · Backend Development

Master Python Web Scraping: From Requests to Selenium and Scrapy

Learn how to efficiently scrape web pages using Python by exploring multiple approaches—including simple requests with BeautifulSoup, fast parsing with lxml, dynamic content extraction with Selenium, and large‑scale crawling with Scrapy—complete with installation steps, code snippets, and detailed explanations.

PythonScrapySelenium
0 likes · 10 min read
Master Python Web Scraping: From Requests to Selenium and Scrapy
php Courses
php Courses
May 14, 2025 · Backend Development

Python Advantages for Web Scraping and Core Library Guide

This article outlines Python's advantages for web crawling, introduces core libraries such as Requests, BeautifulSoup, and Scrapy, details a step-by-step development workflow, provides practical code examples for extracting news titles, and highlights important considerations and advanced techniques for robust scraper implementation.

Data ExtractionPythonScrapy
0 likes · 5 min read
Python Advantages for Web Scraping and Core Library Guide
Python Programming Learning Circle
Python Programming Learning Circle
Dec 10, 2024 · Big Data

23 Python Web Scraping Projects with GitHub Links

This article compiles twenty‑three Python web‑scraping projects, each described with its purpose, key features, and a direct GitHub repository link, offering developers a ready‑made toolbox for data collection across platforms such as WeChat, DouBan, Zhihu, Bilibili, and more.

GitHubPythonScrapy
0 likes · 9 min read
23 Python Web Scraping Projects with GitHub Links
Python Programming Learning Circle
Python Programming Learning Circle
Jun 5, 2024 · Backend Development

Various Python Methods for E‑commerce Data Collection and Web Scraping

This article introduces ten practical Python techniques—including requests, Selenium, Scrapy, Crawley, PySpider, aiohttp, asks, vibora, Pyppeteer, and Fiddler‑based reverse engineering—to efficiently collect e‑commerce and app data while addressing common challenges such as IP blocking, captchas, and authentication.

ScrapySeleniumaiohttp
0 likes · 8 min read
Various Python Methods for E‑commerce Data Collection and Web Scraping
Python Programming Learning Circle
Python Programming Learning Circle
Mar 11, 2024 · Fundamentals

7 Essential Python Tools to Boost Development Efficiency

This article introduces seven practical Python tools—including Pandas, Selenium, Flask, Scrapy, Requests, Faker, and Pillow—explaining their core features, typical use cases, and providing ready‑to‑run code snippets to help developers automate tasks and accelerate project development.

FakerScrapySelenium
0 likes · 6 min read
7 Essential Python Tools to Boost Development Efficiency
Test Development Learning Exchange
Test Development Learning Exchange
Jul 6, 2023 · Backend Development

Scrapy Framework Overview and Usage Guide

Scrapy is a powerful Python-based web scraping framework designed for large-scale and complex website data extraction. It offers high-level abstractions, built-in data extraction tools using XPath and CSS selectors, asynchronous processing for parallel requests, and flexible pipelines for data storage, making it ideal for efficient and scalable web scraping projects.

Backend DevelopmentData ExtractionPython
0 likes · 5 min read
Scrapy Framework Overview and Usage Guide
Big Data Technology Architecture
Big Data Technology Architecture
Feb 11, 2023 · Backend Development

Understanding Scrapy and Twisted: Architecture, Components, and Debugging Techniques

This article explains Scrapy's comprehensive crawling framework and Twisted's event‑driven networking engine, detailing their core concepts, workflow, code execution process, and how to debug Scrapy spiders using breakpoint tracing, providing a deep technical overview for backend developers.

Backend DevelopmentEvent-drivenPython
0 likes · 15 min read
Understanding Scrapy and Twisted: Architecture, Components, and Debugging Techniques
Architecture Digest
Architecture Digest
Sep 24, 2022 · Information Security

Web Crawling and Anti‑Crawling Techniques: Principles, Implementation, and Countermeasures

This article explains the technical principles and implementation steps of web crawlers, introduces common crawling frameworks, provides a Python example for extracting app store rankings, and then details various anti‑crawling methods such as CSS offset, image camouflage, custom fonts, dynamic rendering, captchas, request signing, and honeypots, followed by counter‑strategies for each.

PythonScrapyWeb Crawling
0 likes · 24 min read
Web Crawling and Anti‑Crawling Techniques: Principles, Implementation, and Countermeasures
vivo Internet Technology
vivo Internet Technology
Sep 14, 2022 · Information Security

Web Crawling, Anti‑Crawling, and Anti‑Anti‑Crawling Techniques: Principles, Frameworks, and Code Examples

The article explains web‑crawling basics, Python and Scrapy examples, then surveys common anti‑crawling defenses such as CSS offsets, image camouflage, custom fonts, dynamic rendering, captchas, request signatures and honeypots, and finally presents anti‑anti‑crawling countermeasures—including CSS‑offset reversal, font decoding, headless‑browser rendering and YOLOv5‑based captcha cracking, while stressing legal compliance.

CaptchaPythonScrapy
0 likes · 25 min read
Web Crawling, Anti‑Crawling, and Anti‑Anti‑Crawling Techniques: Principles, Frameworks, and Code Examples
Python Programming Learning Circle
Python Programming Learning Circle
Jul 13, 2022 · Backend Development

Comprehensive Scrapy Tutorial: Architecture, XPath Basics, Installation, Project Setup, and Advanced Features

This article provides a detailed walkthrough of Scrapy, covering its event‑driven architecture, component interactions, XPath parsing fundamentals, installation steps, project creation, sample spider code, item pipelines, middleware customization, and essential configuration settings for effective web crawling in Python.

PipelineScrapySpider
0 likes · 12 min read
Comprehensive Scrapy Tutorial: Architecture, XPath Basics, Installation, Project Setup, and Advanced Features
Sohu Tech Products
Sohu Tech Products
Aug 25, 2021 · Backend Development

Scrapy Tutorial: Installation, Project Structure, Basic Usage, and Real‑World Example

This article provides a comprehensive, step‑by‑step guide to the Scrapy web‑crawling framework, covering its core components, installation methods, project layout, spider creation, data extraction techniques, pagination handling, pipeline configuration, and how to run the crawler to collect and store data.

CrawlerData ExtractionPython
0 likes · 13 min read
Scrapy Tutorial: Installation, Project Structure, Basic Usage, and Real‑World Example
21CTO
21CTO
Jul 12, 2021 · Backend Development

Master Scrapy: From Basics to Advanced Spider Development

This comprehensive guide introduces Scrapy's architecture, explains its core components and data flow, teaches XPath fundamentals, walks through installation, project creation, spider coding, item and pipeline definitions, middleware customization, pagination handling, and essential settings for effective Python web crawling.

CrawlerPythonScrapy
0 likes · 14 min read
Master Scrapy: From Basics to Advanced Spider Development
360 Quality & Efficiency
360 Quality & Efficiency
Jul 2, 2021 · Backend Development

Integrating Scrapy with Selenium for Dynamic Web Page Crawling

This guide explains how to combine Scrapy and Selenium to scrape dynamically rendered web pages, covering installation, project setup, middleware configuration, Selenium driver handling, and code examples that demonstrate a complete end‑to‑end crawling workflow.

Dynamic PagesPythonScrapy
0 likes · 12 min read
Integrating Scrapy with Selenium for Dynamic Web Page Crawling
Python Programming Learning Circle
Python Programming Learning Circle
Jun 30, 2021 · Backend Development

Comparison of Seven Popular Python Web Frameworks

This article introduces seven open‑source Python web frameworks—Django, Flask, Scrapy, Tornado, Web2py, Weppy, and Bottle—detailing their main features, typical use cases, and the key advantages and disadvantages of each to help developers choose the most suitable framework for their projects.

DjangoPythonScrapy
0 likes · 8 min read
Comparison of Seven Popular Python Web Frameworks
MaGe Linux Operations
MaGe Linux Operations
May 14, 2021 · Fundamentals

Boost Your Python Productivity with 7 Essential Efficiency Tools

This article introduces seven powerful Python tools—including Pandas, Selenium, Flask, Scrapy, Requests, Faker, and Pillow—explaining their core features and providing ready-to-use code snippets to help developers automate data analysis, testing, web development, crawling, API calls, fake data generation, and image processing.

FakerFlaskScrapy
0 likes · 6 min read
Boost Your Python Productivity with 7 Essential Efficiency Tools
Python Crawling & Data Mining
Python Crawling & Data Mining
Mar 21, 2021 · Fundamentals

Master Web Crawling: Focused, General, Incremental & Deep Techniques in Python

This article introduces various web crawling strategies—including focused crawlers, general-purpose crawlers, incremental crawlers, and deep‑web crawlers—explains their underlying principles, presents practical Python code examples for image, e‑commerce and movie data extraction, and discusses deduplication methods and form‑filling techniques.

Scrapydeep webincremental crawling
0 likes · 13 min read
Master Web Crawling: Focused, General, Incremental & Deep Techniques in Python
Tencent Cloud Developer
Tencent Cloud Developer
Jan 21, 2021 · Big Data

A Beginner's Guide to Using Scrapy for Web Crawling

This beginner‑friendly guide walks readers through installing Scrapy, creating a project and spider, running and debugging crawlers, implementing parsing with CSS/XPath, and overcoming common hurdles such as JavaScript rendering, user‑agent spoofing, and proxy rotation via configurable middlewares, enabling quick start of web‑crawling projects.

Data ExtractionProxyPython
0 likes · 13 min read
A Beginner's Guide to Using Scrapy for Web Crawling
FunTester
FunTester
Dec 15, 2020 · Backend Development

Run All Scrapy Spiders Together and Fix Video Download Errors

This guide shows how to create a custom Scrapy command to launch every spider at once, separate each spider's settings for better modularity, and resolve video download problems by adjusting request headers and handling file saving correctly.

BackendCustom CommandPython
0 likes · 5 min read
Run All Scrapy Spiders Together and Fix Video Download Errors
Python Crawling & Data Mining
Python Crawling & Data Mining
Nov 6, 2020 · Backend Development

Extract Article Cover Images with Scrapy’s meta Parameter

This tutorial explains how to retrieve an article’s cover image URL by starting from the list page, extracting the first image, and passing its URL through Scrapy’s Request meta dictionary to subsequent parsing callbacks, highlighting why list‑page extraction is more reliable than detail‑page scraping.

MetaPythonScrapy
0 likes · 5 min read
Extract Article Cover Images with Scrapy’s meta Parameter
MaGe Linux Operations
MaGe Linux Operations
Oct 27, 2020 · Backend Development

Build a Distributed Scrapy Crawler in Minutes with RabbitMQ and RedisBloom

This guide walks you through installing Scrapy-Distributed, setting up RabbitMQ and RedisBloom containers, creating a sitemap spider, configuring the distributed scheduler and dupefilter, and running the spider, while explaining why this non‑intrusive solution improves over existing Scrapy‑Redis and scrapy‑rabbitmq approaches.

PythonRabbitMQRedisBloom
0 likes · 7 min read
Build a Distributed Scrapy Crawler in Minutes with RabbitMQ and RedisBloom
Python Crawling & Data Mining
Python Crawling & Data Mining
Oct 24, 2020 · Backend Development

Master Scrapy: Extract Likes, Comments, and Content with XPath

This article continues a Scrapy tutorial by showing how to extract like counts, comment counts, and full article content using XPath selectors, regular expressions, and debugging techniques, providing step‑by‑step code examples and screenshots to help Python developers automate web data collection.

Data ExtractionPythonScrapy
0 likes · 6 min read
Master Scrapy: Extract Likes, Comments, and Content with XPath
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Apr 26, 2020 · Backend Development

Scrapy Tutorial: Installation, Components, Project Setup, Code Implementation, and Data Storage

This article provides a comprehensive step‑by‑step guide to installing Scrapy, understanding its core components and processing flow, creating a weather‑data crawling project, writing items, settings, middlewares, spiders, running the crawler, exporting results, and storing the scraped data into MongoDB.

CrawlerMongoDBPython
0 likes · 15 min read
Scrapy Tutorial: Installation, Components, Project Setup, Code Implementation, and Data Storage
Python Crawling & Data Mining
Python Crawling & Data Mining
Apr 18, 2020 · Backend Development

How to Scrape and Extract Proxy Data with Python: Step-by-Step Guide

This tutorial walks through analyzing a proxy‑listing website’s structure, building a Python scraper using requests, Scrapy, regular expressions and BeautifulSoup, extracting IP, port, location and type fields across multiple pages, and saving the collected data to files, illustrating key web‑crawling techniques.

ProxyPythonScrapy
0 likes · 6 min read
How to Scrape and Extract Proxy Data with Python: Step-by-Step Guide
MaGe Linux Operations
MaGe Linux Operations
Jan 3, 2020 · Backend Development

Master Web Scraping with Scrapy: A Complete Python Guide

This guide introduces Scrapy, a powerful Python web‑scraping framework, explains its architecture and components, walks through installation, project creation, spider development, query syntax, recursive crawling, and item pipelines, providing practical code examples for building robust crawlers.

CrawlerData ExtractionPython
0 likes · 8 min read
Master Web Scraping with Scrapy: A Complete Python Guide
Python Programming Learning Circle
Python Programming Learning Circle
Jan 2, 2020 · Backend Development

How to Crawl Responsibly: Avoid Legal Risks and Server Overload

This guide outlines responsible web‑crawling practices, covering robots.txt compliance, legal pitfalls such as unauthorized personal data and copyrighted content, recommended request intervals, and relevant Chinese data‑security regulations, helping developers avoid server overloads and potential lawsuits.

Backend DevelopmentData EthicsScrapy
0 likes · 4 min read
How to Crawl Responsibly: Avoid Legal Risks and Server Overload
MaGe Linux Operations
MaGe Linux Operations
Dec 27, 2019 · Backend Development

Master Scrapy: Build Powerful Python Web Crawlers Step‑by‑Step

This guide introduces the Scrapy framework, explains its architecture—including engine, scheduler, downloader, spiders, pipelines, and middlewares—covers installation, project setup, item definition, spider coding, pipeline handling, pagination, and provides practical code examples for extracting data from Douban books.

Data ExtractionItem PipelinePython
0 likes · 18 min read
Master Scrapy: Build Powerful Python Web Crawlers Step‑by‑Step
21CTO
21CTO
Aug 16, 2019 · Backend Development

Master Scrapy: Build, Deploy, and Scale a Python Web Crawler Platform

This guide walks through designing a full‑featured web‑crawler platform, covering rule maintenance, job scheduling, async and real‑time crawling with Scrapy, project setup, item pipelines, settings, local execution, custom parameters, server deployment via Scrapyd, API usage, and fast real‑time crawling with Requests, BeautifulSoup, Flask, and multithreading.

AsyncFlaskPython
0 likes · 16 min read
Master Scrapy: Build, Deploy, and Scale a Python Web Crawler Platform
Architecture Digest
Architecture Digest
Aug 15, 2019 · Backend Development

Design and Implementation of a Scrapy‑Based Web Crawling Platform

This article explains how to design a flexible web‑crawling platform using Scrapy, covering rule maintenance, job scheduling, asynchronous and real‑time crawlers, project setup, code structure, settings, local execution, deployment with scrapyd, API usage, and examples of Flask‑based real‑time services.

AsyncDeploymentFlask
0 likes · 16 min read
Design and Implementation of a Scrapy‑Based Web Crawling Platform
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 8, 2019 · Backend Development

Master Python Web Scraping: From Basics to Advanced Techniques

This comprehensive guide explains what web crawlers are, walks through HTTP request/response fundamentals, introduces essential Python modules like requests, re, XPath, BeautifulSoup, and threading, provides practical code examples, and details how to use the Scrapy framework—including its architecture, components, distributed crawling, and useful auxiliary tools.

HTTPPythonScrapy
0 likes · 11 min read
Master Python Web Scraping: From Basics to Advanced Techniques
JavaEdge
JavaEdge
Mar 21, 2019 · Backend Development

Master Web Crawling with Scrapy: From Tech Choices to Powerful Regex Extraction

This guide walks through selecting Scrapy over Requests + BeautifulSoup, explains web page types, outlines crawler use‑cases, details regular‑expression syntax and non‑greedy matching, demonstrates practical regex patterns with images, compares depth‑first and breadth‑first crawling, and covers URL deduplication and string‑encoding pitfalls in Python.

PythonScrapyWeb Crawling
0 likes · 11 min read
Master Web Crawling with Scrapy: From Tech Choices to Powerful Regex Extraction
Python Crawling & Data Mining
Python Crawling & Data Mining
Feb 6, 2019 · Backend Development

Master Scrapy: Build Powerful Python Web Crawlers in Minutes

This article introduces the Scrapy framework, explains its architecture and five core components, guides you through creating a Scrapy project, configuring spiders, pipelines, and middlewares, and demonstrates how to run the crawler to efficiently collect and process web data using Python.

Backend DevelopmentPythonScrapy
0 likes · 7 min read
Master Scrapy: Build Powerful Python Web Crawlers in Minutes
MaGe Linux Operations
MaGe Linux Operations
Jan 14, 2019 · Backend Development

How to Build a Scrapy Spider to Crawl AutoHome Car Data in Python

This article walks through building a Python Scrapy spider to extract comprehensive car brand, series, and model data from Autohome, covering environment setup, project initialization, spider and item definitions, handling lazy-loaded pages, CSV output configuration, rate limiting, user‑agent rotation, and debugging tips.

AutohomeCar DataScrapy
0 likes · 10 min read
How to Build a Scrapy Spider to Crawl AutoHome Car Data in Python
Python Crawling & Data Mining
Python Crawling & Data Mining
Jan 13, 2019 · Backend Development

How to Fix Common Scrapy Installation Errors on Windows

This guide walks you through step‑by‑step solutions for typical Scrapy installation problems on Windows, covering missing libxml2/lxml wheels, Visual C++ requirements, and Twisted wheel compatibility, so you can get the framework up and running smoothly.

PythonScrapyWeb Crawling
0 likes · 7 min read
How to Fix Common Scrapy Installation Errors on Windows
MaGe Linux Operations
MaGe Linux Operations
Dec 5, 2018 · Backend Development

Build a Scrapy Spider for dmoz.org in Four Simple Steps

This tutorial walks you through creating a Scrapy project, defining items, writing a spider, and exporting scraped data to JSON while covering common pitfalls like encoding errors and XPath selector usage for extracting titles, URLs, and descriptions from dmoz.org.

CrawlerScrapy
0 likes · 12 min read
Build a Scrapy Spider for dmoz.org in Four Simple Steps
MaGe Linux Operations
MaGe Linux Operations
Nov 23, 2018 · Backend Development

Master Scrapy: Build Powerful Python Web Crawlers Step‑by‑Step

This guide introduces Scrapy, a fast Python web‑crawling framework, explains its architecture, installation, project setup, spider creation, execution, and advanced features like XPath selectors, recursion, and item pipelines, providing a complete hands‑on tutorial.

Backend DevelopmentCrawlerScrapy
0 likes · 9 min read
Master Scrapy: Build Powerful Python Web Crawlers Step‑by‑Step
21CTO
21CTO
Sep 7, 2018 · Backend Development

Why Scaling Web Crawlers Is Harder Than You Think: Lessons from 1,000B Pages

This article outlines the major challenges of large‑scale e‑commerce product data extraction—such as ever‑changing site formats, scalable architecture, performance throughput, anti‑bot defenses, and data quality—and shares the hard‑won lessons Scrapinghub gained after crawling over a trillion product pages.

Data ExtractionScaleScrapy
0 likes · 15 min read
Why Scaling Web Crawlers Is Harder Than You Think: Lessons from 1,000B Pages
MaGe Linux Operations
MaGe Linux Operations
Jun 9, 2018 · Backend Development

Build a Fast Scrapy Spider to Crawl Forum Posts in Minutes

This tutorial walks beginners through setting up a Python Scrapy project, writing a spider to fetch forum thread titles and contents, using XPath for parsing, and enhancing the crawler with pipelines, middleware, and common settings for robust web scraping.

PipelineScrapyXPath
0 likes · 13 min read
Build a Fast Scrapy Spider to Crawl Forum Posts in Minutes
MaGe Linux Operations
MaGe Linux Operations
Nov 14, 2017 · Backend Development

How to Use Scrapy to Crawl Zhihu Users and Analyze Their Data

This tutorial explains how a Python developer can set up a Scrapy project, write spiders to crawl Zhihu user profiles, store the results in a MySQL database, adjust settings for headers and delays, and finally perform simple gender and location analysis on the collected data.

Backend DevelopmentPythonScrapy
0 likes · 14 min read
How to Use Scrapy to Crawl Zhihu Users and Analyze Their Data
MaGe Linux Operations
MaGe Linux Operations
Nov 13, 2017 · Backend Development

Master Scrapy: A Complete Guide to Building Powerful Python Web Crawlers

Scrapy is a fast, high‑level Python framework for web crawling and data extraction, featuring an asynchronous Twisted engine, modular components like spiders, pipelines, and middlewares, and includes detailed installation steps, project setup, spider creation, query syntax, recursion, and item pipelines for robust scraping.

PythonScrapyScrapy Tutorial
0 likes · 12 min read
Master Scrapy: A Complete Guide to Building Powerful Python Web Crawlers
MaGe Linux Operations
MaGe Linux Operations
Jul 29, 2017 · Backend Development

Build a Fast Python Web Scraper for Novel Rankings – Step by Step

This guide walks through building a Python web crawler to extract novel titles and URLs from the qu.la ranking page, explains the site’s clear HTML structure, shows how to deduplicate entries with a set, and provides complete code snippets plus performance tips and a Scrapy upgrade path.

CrawlerPythonScrapy
0 likes · 5 min read
Build a Fast Python Web Scraper for Novel Rankings – Step by Step
ITPUB
ITPUB
May 2, 2017 · Backend Development

How to Bypass Common Anti‑Scraping Measures with Scrapy

This guide explains why websites employ anti‑scraping defenses, outlines the most common header checks such as User‑Agent, Referer, and Cookies, and provides practical Scrapy code snippets for rotating user agents, managing proxies, handling X‑Forwarded‑For, limiting request rates, and dealing with dynamic AJAX content using Selenium or PhantomJS.

HeadersProxyScrapy
0 likes · 7 min read
How to Bypass Common Anti‑Scraping Measures with Scrapy
MaGe Linux Operations
MaGe Linux Operations
Mar 28, 2017 · Backend Development

Master Scrapy: Step-by-Step Guide to Install the Powerful Python Web Crawler

This article walks you through the complete installation process for Scrapy, the Python-based web crawling framework, covering prerequisite Python setup, required dependencies like lxml, setuptools, zope.interface, Twisted, pyOpenSSL, win32py, and finally verifying the installation, preparing you for large‑scale data extraction tasks.

Data ExtractionInstallationPython
0 likes · 4 min read
Master Scrapy: Step-by-Step Guide to Install the Powerful Python Web Crawler
ITPUB
ITPUB
May 6, 2016 · Backend Development

Scrapy vs. Gevent: Choosing the Right Python Web‑Crawling Framework

This guide compares Scrapy (especially version 0.16) with gevent‑based crawling solutions, outlines their strengths, weaknesses, and common pitfalls, and provides practical tips, resource links, and deployment advice for building efficient Python web scrapers.

BackendPythonScraping
0 likes · 11 min read
Scrapy vs. Gevent: Choosing the Right Python Web‑Crawling Framework
Qunar Tech Salon
Qunar Tech Salon
Nov 30, 2015 · Backend Development

Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others

This article compares distributed, Java‑based, and non‑Java web crawlers—examining Nutch, Crawler4j, WebMagic, WebCollector, Scrapy and alternatives—highlighting their strengths, limitations, and suitability for tasks such as data extraction, multi‑threading, AJAX handling, and search‑engine construction.

NutchScrapyWeb Crawling
0 likes · 11 min read
Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others