Tagged articles

Web Crawler

21 articles · Page 1 of 1

Sep 26, 2024 · Backend Development

Step-by-Step Guide to Building a Spring Boot Backend and Douyin Hot Search Crawler

This tutorial walks through creating a Maven‑based Spring Boot backend project with multiple modules, configuring pom.xml files, application properties, and logging, then adds a scheduled Douyin hot‑search crawler using OkHttp, demonstrating full end‑to‑end setup for a web service.

DouyinSpring BootWeb Crawler

0 likes · 31 min read

Step-by-Step Guide to Building a Spring Boot Backend and Douyin Hot Search Crawler

Ops Development & AI Practice

Apr 21, 2024 · Backend Development

Mastering Go Concurrency: Goroutines, Channels, and a Real‑World Web Crawler

This article explains Go's native concurrency model, covering lightweight goroutines, channel communication patterns, scheduling details, and demonstrates a practical web‑crawler example that leverages these features for efficient parallel processing.

ChannelGoGoroutine

0 likes · 5 min read

Mastering Go Concurrency: Goroutines, Channels, and a Real‑World Web Crawler

php Courses

Jan 18, 2024 · Backend Development

Building an Efficient Web Crawler with PHP and Selenium

This article explains how to set up a web crawler using PHP and Selenium, covering installation of Selenium and its PHP bindings via Composer, configuring a Chrome WebDriver, simulating user actions to fetch news links, extracting titles and content, and storing results, with tips for further optimization.

AutomationPHPSelenium

0 likes · 4 min read

Building an Efficient Web Crawler with PHP and Selenium

php Courses

Dec 14, 2023 · Backend Development

Building a Simple Web Crawler with PHP on Linux

This article explains how to create a basic web crawler in a Linux environment using PHP, covering prerequisite installations, script development with cURL and DOMDocument, execution steps, and sample output while emphasizing legal and ethical considerations for web scraping.

DOMDocumentLinuxPHP

0 likes · 4 min read

Building a Simple Web Crawler with PHP on Linux

php Courses

May 4, 2023 · Backend Development

How to Write a Simple PHP Web Crawler

This guide explains how to create a basic PHP web crawler by using cURL to fetch pages, DOMDocument and XPath to parse HTML, and then storing the extracted data, while also providing a complete example script and reminders about legal and ethical considerations.

Backend DevelopmentDOMDocumentPHP

0 likes · 3 min read

php Courses

Apr 10, 2023 · Backend Development

A PHP Web Crawler: Design, Implementation, and Challenges

This article describes a PHP‑based web crawler that extracts links and images using regular expressions, stores URLs in MySQL, handles duplicate detection via MD5, discusses performance limitations, and provides the full source code and usage instructions.

Backend DevelopmentMySQLPHP

0 likes · 8 min read

A PHP Web Crawler: Design, Implementation, and Challenges

Python Crawling & Data Mining

Feb 28, 2023 · Backend Development

How to Fix Common Python Web‑Crawler Issues in PyCharm

This article walks through a Python web‑crawler problem raised in a community, showing step‑by‑step how to start the project, troubleshoot terminal errors in PyCharm, and verify the directory structure using the tree command, providing a clear solution for beginners.

PythonTroubleshootingWeb Crawler

0 likes · 3 min read

How to Fix Common Python Web‑Crawler Issues in PyCharm

Python Programming Learning Circle

Dec 31, 2021 · Information Security

Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

Photon is a fast, multithreaded Python web crawler that extracts URLs, files, and various intelligence from targets, offering flexible options, Ninja mode, and extensive command‑line parameters while supporting Linux, Windows, macOS, and Termux environments.

Web Crawlercommand-lineinformation security

0 likes · 10 min read

Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

MaGe Linux Operations

Sep 2, 2021 · Backend Development

Build a Python Baidu Baike Crawler: Step-by-Step Guide

This article demonstrates how to create a Python web crawler that fetches Baidu Baike entries, covering the main program structure, URL manager, page downloader, HTML parser using BeautifulSoup, and output generator, with complete code snippets and sample results.

PythonWeb Crawlerbaidu-baike

0 likes · 8 min read

Build a Python Baidu Baike Crawler: Step-by-Step Guide

ITPUB

Jun 17, 2021 · Information Security

How Illegal Web Crawlers Stole Over 1 Billion Chinese Users’ Data and Got Sent to Prison

A recent Chinese court case reveals that a university graduate used a custom web‑crawler to harvest more than 1.18 billion Taobao user records, which were then sold to a partner who ran fraudulent WeChat groups, leading to both perpetrators’ conviction for violating personal information protection laws.

ChinaData ScrapingWeb Crawler

0 likes · 10 min read

How Illegal Web Crawlers Stole Over 1 Billion Chinese Users’ Data and Got Sent to Prison

Python Programming Learning Circle

Apr 25, 2020 · Backend Development

Building a Node.js Web Crawler for Indeed Job Listings with MongoDB

This article details how to build a Node.js web crawler for Indeed job listings, covering entry page selection, HTML parsing with Cheerio, request handling, MongoDB task storage, and a modular architecture that extracts city, category, search, brief, and detail data for a searchable job engine.

MongoDBWeb Crawlerbackend

0 likes · 15 min read

Building a Node.js Web Crawler for Indeed Job Listings with MongoDB

FunTester

Oct 9, 2019 · Backend Development

How to Build a Java/Groovy Web Crawler with Regex and MySQL Storage

This article demonstrates a Java‑based web crawler written in Groovy that uses regular‑expression parsing to retrieve paginated company data from a government portal, constructs SQL insert statements, and stores the results in MySQL, with full source code and structural screenshots.

GroovyJavaMySQL

0 likes · 6 min read

How to Build a Java/Groovy Web Crawler with Regex and MySQL Storage

Java Architecture Diary

Aug 2, 2019 · Backend Development

Mastering Mica-HTTP v1.1.7: A Lightweight Web Crawler Guide

This tutorial continues the mica-http complete guide, showcasing the new v1.1.7 release with proxy, retry, page crawling, model visualization, results, documentation links, and open‑source tool recommendations for building efficient backend crawlers.

HTTPJavaWeb Crawler

0 likes · 3 min read

Mastering Mica-HTTP v1.1.7: A Lightweight Web Crawler Guide

21CTO

May 22, 2019 · Fundamentals

What Is a Web Crawler? Definitions, Types, and How It Works

This article explains web crawlers—what they are, their classifications, typical use cases, and step‑by‑step workflow—covers the robots protocol, then delves into HTTP and HTTPS fundamentals, request/response structures, common methods, headers, status codes, and the security trade‑offs of HTTPS.

HTTPStatus CodesWeb Crawler

0 likes · 10 min read

What Is a Web Crawler? Definitions, Types, and How It Works

MaGe Linux Operations

Apr 22, 2019 · Backend Development

Build a Robust Python Web Crawler: Modular Architecture & Full Code Walkthrough

This article explains how to design a modular Python web crawler by breaking the system into five core components—scheduler, URL manager, downloader, parser, and data storage—provides detailed code examples for each module, and demonstrates a complete end‑to‑end crawling workflow on a sample website.

Backend DevelopmentPythonScraping

0 likes · 12 min read

Build a Robust Python Web Crawler: Modular Architecture & Full Code Walkthrough

Sohu Tech Products

Dec 5, 2018 · Backend Development

Overview of Web Crawler Types and the Architecture of the Mole Crawler System

This article explains the evolution and classification of web crawlers, describes the design and components of the Mole distributed crawler—including scheduler, fetcher, processor, rate‑limiting, URL deduplication, and Elasticsearch storage optimization—and outlines common anti‑anti‑crawling strategies.

ElasticsearchWeb Crawleranti‑crawling

0 likes · 12 min read

Overview of Web Crawler Types and the Architecture of the Mole Crawler System

ITFLY8 Architecture Home

May 23, 2018 · Backend Development

Designing a Robust Web Crawler Architecture: Insights from Three Iterations

This article examines the evolution of a web crawler architecture across three versions, highlighting the importance of completeness, standardization with UML, clear goals, accuracy, and maintainability to build a scalable and cost‑effective backend system.

System DesignUMLWeb Crawler

0 likes · 6 min read

Designing a Robust Web Crawler Architecture: Insights from Three Iterations

Tencent IMWeb Frontend Team

Jan 18, 2018 · Backend Development

Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

This guide walks you through creating a lightweight Node.js web crawler using the request and cheerio modules, covering preparation, installation, core code, and testing steps, so you can fetch page HTML, parse data, and store results with just a few dozen lines of code.

Node.jsWeb Crawlercheerio

0 likes · 5 min read

Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

Architecture Digest

Jan 17, 2018 · Backend Development

Design and Implementation of a Java Web Crawler Framework Inspired by Scrapy

This article explains how to design and build a lightweight Java web crawler framework, covering crawler fundamentals, anti‑scraping challenges, core components such as URL manager, scheduler, downloader, parser and pipeline, and provides concrete code examples and architectural diagrams.

JavaScrapyWeb Crawler

0 likes · 14 min read

Design and Implementation of a Java Web Crawler Framework Inspired by Scrapy

21CTO

Nov 13, 2016 · Backend Development

How to Build a Simple PHP Web Crawler: From Robots.txt to cURL

This guide explains the fundamentals of creating a PHP web crawler, covering server communication basics, interpreting robots.txt and sitemap files, and providing practical code examples using file_get_contents and cURL for efficient content retrieval.

Backend DevelopmentPHPWeb Crawler

0 likes · 6 min read

How to Build a Simple PHP Web Crawler: From Robots.txt to cURL

21CTO

Oct 9, 2015 · Big Data

33 Open-Source Web Crawlers to Supercharge Your Data Collection

This article compiles 33 notable open‑source web crawler projects across multiple programming languages, detailing their core features, licensing, supported platforms, and typical use cases, helping developers choose the right tool for large‑scale data harvesting and analysis.

C#C++Java

0 likes · 22 min read

33 Open-Source Web Crawlers to Supercharge Your Data Collection