Tag

Nutch

0 views collected around this technical thread.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Jun 8, 2019 · Big Data

The Story of Doug Cutting: From Stanford to Hadoop and Beyond

This article chronicles Doug Cutting's journey from his humble beginnings at Stanford through his pioneering work on Lucene, Nutch, and Hadoop, highlighting how his innovations in search and distributed computing reshaped the big data landscape and led to the rise of Cloudera.

ClouderaDoug CuttingHadoop
0 likes · 8 min read
The Story of Doug Cutting: From Stanford to Hadoop and Beyond
Qunar Tech Salon
Qunar Tech Salon
Nov 30, 2015 · Backend Development

Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others

This article compares distributed, Java‑based, and non‑Java web crawlers—examining Nutch, Crawler4j, WebMagic, WebCollector, Scrapy and alternatives—highlighting their strengths, limitations, and suitability for tasks such as data extraction, multi‑threading, AJAX handling, and search‑engine construction.

NutchPythonScrapy
0 likes · 11 min read
Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others