Tagged articles
21 articles
Page 1 of 1
php Courses
php Courses
Jan 18, 2024 · Backend Development

Building an Efficient Web Crawler with PHP and Selenium

This article explains how to set up a web crawler using PHP and Selenium, covering installation of Selenium and its PHP bindings via Composer, configuring a Chrome WebDriver, simulating user actions to fetch news links, extracting titles and content, and storing results, with tips for further optimization.

PHPSeleniumWeb Crawler
0 likes · 4 min read
Building an Efficient Web Crawler with PHP and Selenium
php Courses
php Courses
Dec 14, 2023 · Backend Development

Building a Simple Web Crawler with PHP on Linux

This article explains how to create a basic web crawler in a Linux environment using PHP, covering prerequisite installations, script development with cURL and DOMDocument, execution steps, and sample output while emphasizing legal and ethical considerations for web scraping.

DOMDocumentLinuxPHP
0 likes · 4 min read
Building a Simple Web Crawler with PHP on Linux
php Courses
php Courses
May 4, 2023 · Backend Development

How to Write a Simple PHP Web Crawler

This guide explains how to create a basic PHP web crawler by using cURL to fetch pages, DOMDocument and XPath to parse HTML, and then storing the extracted data, while also providing a complete example script and reminders about legal and ethical considerations.

DOMDocumentPHPWeb Crawler
0 likes · 3 min read
How to Write a Simple PHP Web Crawler
php Courses
php Courses
Apr 10, 2023 · Backend Development

A PHP Web Crawler: Design, Implementation, and Challenges

This article describes a PHP‑based web crawler that extracts links and images using regular expressions, stores URLs in MySQL, handles duplicate detection via MD5, discusses performance limitations, and provides the full source code and usage instructions.

PHPURL processingWeb Crawler
0 likes · 8 min read
A PHP Web Crawler: Design, Implementation, and Challenges
Python Crawling & Data Mining
Python Crawling & Data Mining
Feb 28, 2023 · Backend Development

How to Fix Common Python Web‑Crawler Issues in PyCharm

This article walks through a Python web‑crawler problem raised in a community, showing step‑by‑step how to start the project, troubleshoot terminal errors in PyCharm, and verify the directory structure using the tree command, providing a clear solution for beginners.

PythonWeb Crawlertroubleshooting
0 likes · 3 min read
How to Fix Common Python Web‑Crawler Issues in PyCharm
MaGe Linux Operations
MaGe Linux Operations
Sep 2, 2021 · Backend Development

Build a Python Baidu Baike Crawler: Step-by-Step Guide

This article demonstrates how to create a Python web crawler that fetches Baidu Baike entries, covering the main program structure, URL manager, page downloader, HTML parser using BeautifulSoup, and output generator, with complete code snippets and sample results.

PythonWeb Crawlerbaidu-baike
0 likes · 8 min read
Build a Python Baidu Baike Crawler: Step-by-Step Guide
ITPUB
ITPUB
Jun 17, 2021 · Information Security

How Illegal Web Crawlers Stole Over 1 Billion Chinese Users’ Data and Got Sent to Prison

A recent Chinese court case reveals that a university graduate used a custom web‑crawler to harvest more than 1.18 billion Taobao user records, which were then sold to a partner who ran fraudulent WeChat groups, leading to both perpetrators’ conviction for violating personal information protection laws.

ChinaInformation SecurityWeb Crawler
0 likes · 10 min read
How Illegal Web Crawlers Stole Over 1 Billion Chinese Users’ Data and Got Sent to Prison
FunTester
FunTester
Oct 9, 2019 · Backend Development

How to Build a Java/Groovy Web Crawler with Regex and MySQL Storage

This article demonstrates a Java‑based web crawler written in Groovy that uses regular‑expression parsing to retrieve paginated company data from a government portal, constructs SQL insert statements, and stores the results in MySQL, with full source code and structural screenshots.

Data ExtractionGroovyWeb Crawler
0 likes · 6 min read
How to Build a Java/Groovy Web Crawler with Regex and MySQL Storage
21CTO
21CTO
May 22, 2019 · Fundamentals

What Is a Web Crawler? Definitions, Types, and How It Works

This article explains web crawlers—what they are, their classifications, typical use cases, and step‑by‑step workflow—covers the robots protocol, then delves into HTTP and HTTPS fundamentals, request/response structures, common methods, headers, status codes, and the security trade‑offs of HTTPS.

HTTPStatus CodesWeb Crawler
0 likes · 10 min read
What Is a Web Crawler? Definitions, Types, and How It Works
Sohu Tech Products
Sohu Tech Products
Dec 5, 2018 · Backend Development

Overview of Web Crawler Types and the Architecture of the Mole Crawler System

This article explains the evolution and classification of web crawlers, describes the design and components of the Mole distributed crawler—including scheduler, fetcher, processor, rate‑limiting, URL deduplication, and Elasticsearch storage optimization—and outlines common anti‑anti‑crawling strategies.

ElasticsearchWeb Crawleranti‑crawling
0 likes · 12 min read
Overview of Web Crawler Types and the Architecture of the Mole Crawler System
21CTO
21CTO
Nov 13, 2016 · Backend Development

How to Build a Simple PHP Web Crawler: From Robots.txt to cURL

This guide explains the fundamentals of creating a PHP web crawler, covering server communication basics, interpreting robots.txt and sitemap files, and providing practical code examples using file_get_contents and cURL for efficient content retrieval.

PHPWeb Crawlerbackend-development
0 likes · 6 min read
How to Build a Simple PHP Web Crawler: From Robots.txt to cURL
21CTO
21CTO
Oct 9, 2015 · Big Data

33 Open-Source Web Crawlers to Supercharge Your Data Collection

This article compiles 33 notable open‑source web crawler projects across multiple programming languages, detailing their core features, licensing, supported platforms, and typical use cases, helping developers choose the right tool for large‑scale data harvesting and analysis.

C++Data ExtractionPHP
0 likes · 22 min read
33 Open-Source Web Crawlers to Supercharge Your Data Collection