Tagged articles
104 articles
Page 2 of 2
ITPUB
ITPUB
May 6, 2016 · Backend Development

Scrapy vs. Gevent: Choosing the Right Python Web‑Crawling Framework

This guide compares Scrapy (especially version 0.16) with gevent‑based crawling solutions, outlines their strengths, weaknesses, and common pitfalls, and provides practical tips, resource links, and deployment advice for building efficient Python web scrapers.

BackendPythonScraping
0 likes · 11 min read
Scrapy vs. Gevent: Choosing the Right Python Web‑Crawling Framework
21CTO
21CTO
Dec 22, 2015 · Big Data

How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting

This article explains how to design and implement a distributed web‑crawling framework in Java that can collect, structure, and store massive amounts of data while handling anti‑scraping measures, duplicate detection, and real‑time monitoring.

Big DataData ExtractionJava
0 likes · 11 min read
How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting
Qunar Tech Salon
Qunar Tech Salon
Nov 30, 2015 · Backend Development

Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others

This article compares distributed, Java‑based, and non‑Java web crawlers—examining Nutch, Crawler4j, WebMagic, WebCollector, Scrapy and alternatives—highlighting their strengths, limitations, and suitability for tasks such as data extraction, multi‑threading, AJAX handling, and search‑engine construction.

NutchScrapyWeb Crawling
0 likes · 11 min read
Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others