Tagged articles

web crawling

108 articles · Page 2 of 2

Nov 9, 2016 · Backend Development

Unlocking the Power of Web Crawlers: How to Harvest Data Efficiently

This article explains what web crawlers are, why they’re essential for content recommendation systems, the technical approaches across languages, practical use‑cases like price monitoring and news aggregation, and best practices for building efficient, ethical crawlers.

Backend Developmentdata extractionweb crawling

0 likes · 5 min read

Unlocking the Power of Web Crawlers: How to Harvest Data Efficiently

360 Quality & Efficiency

Oct 28, 2016 · Backend Development

Introduction to Web Crawlers: Basics, Architecture, Workflow, and Testing Applications

This article introduces the fundamentals of web crawlers, explaining their architecture, workflow, implementation challenges such as handling HTTP status codes, JavaScript and AJAX content, and discusses their applications in automated testing and large‑scale distributed systems.

crawler4jdistributed systemsweb crawling

0 likes · 5 min read

Introduction to Web Crawlers: Basics, Architecture, Workflow, and Testing Applications

Architecture Digest

Jul 16, 2016 · Big Data

Building a Closed-Loop Data Platform: Architecture, Technologies, and Case Studies

This article describes how to design and implement a closed‑loop data platform using Python, Java, and Spark stacks, covering data acquisition, structuring, mining, visualization, real‑time processing, and deployment with Docker, ELK, Kafka, and cloud services, illustrated by three industry case studies.

DockerELKSpark

0 likes · 13 min read

Building a Closed-Loop Data Platform: Architecture, Technologies, and Case Studies

21CTO

Jun 9, 2016 · Backend Development

Mastering Web Crawlers: From a 3‑Line Script to Scalable Distributed Scrapers

This article explains what a web crawler is, shows a minimal three‑line Python example, expands it into a functional crawler, identifies common shortcomings, and presents practical solutions such as parallelism, priority queues, DNS caching, Bloom‑filter deduplication, storage choices, and inter‑process communication for building robust distributed scrapers.

Deduplicationdistributed scrapingdns cache

0 likes · 9 min read

Mastering Web Crawlers: From a 3‑Line Script to Scalable Distributed Scrapers

ITPUB

May 6, 2016 · Backend Development

Scrapy vs. Gevent: Choosing the Right Python Web‑Crawling Framework

This guide compares Scrapy (especially version 0.16) with gevent‑based crawling solutions, outlines their strengths, weaknesses, and common pitfalls, and provides practical tips, resource links, and deployment advice for building efficient Python web scrapers.

PythonScrapingScrapy

0 likes · 11 min read

Scrapy vs. Gevent: Choosing the Right Python Web‑Crawling Framework

21CTO

Dec 22, 2015 · Big Data

How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting

This article explains how to design and implement a distributed web‑crawling framework in Java that can collect, structure, and store massive amounts of data while handling anti‑scraping measures, duplicate detection, and real‑time monitoring.

Big DataJavadata extraction

0 likes · 11 min read

How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting

Qunar Tech Salon

Nov 30, 2015 · Backend Development

Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others

This article compares distributed, Java‑based, and non‑Java web crawlers—examining Nutch, Crawler4j, WebMagic, WebCollector, Scrapy and alternatives—highlighting their strengths, limitations, and suitability for tasks such as data extraction, multi‑threading, AJAX handling, and search‑engine construction.

NutchScrapycrawler frameworks

0 likes · 11 min read

Choosing a Web Crawler: Nutch, Crawler4j, WebMagic, WebCollector, Scrapy, or Others

21CTO

Oct 21, 2015 · Fundamentals

How Graph Traversal Powers Web Crawlers: From BFS to Internet Indexing

This article explains how graph traversal algorithms like BFS and DFS underpin web crawlers, illustrating the concepts with examples from China's road network and tracing the history from Euler's bridges to modern internet indexing.

BFSDFSSearch Engine

0 likes · 6 min read

How Graph Traversal Powers Web Crawlers: From BFS to Internet Indexing