Tag

web crawling

0 views collected around this technical thread.

Nightwalker Tech
Nightwalker Tech
Mar 14, 2025 · Backend Development

Overview and Installation Guide for Various MCP Services and Their Use with Sequential Thinking for Manus‑like Effects

This article introduces several Model Context Protocol (MCP) services—including Sequential Thinking, Firecrawl, Fetch, Hot News, Playwright, Magic, and Brave Search—provides their GitHub links, detailed Mac and Windows installation commands, and explains how to combine them with a Sequential Thinking prompt to achieve a Manus‑style AI agent workflow.

AIAutomationInstallation
0 likes · 9 min read
Overview and Installation Guide for Various MCP Services and Their Use with Sequential Thinking for Manus‑like Effects
Python Programming Learning Circle
Python Programming Learning Circle
Dec 21, 2024 · Backend Development

Comprehensive List of Python Libraries for Web Crawling, Data Processing, and Web Development

This article provides an extensive overview of Python libraries and frameworks for web crawling, data extraction, parsing, storage, browser automation, asynchronous programming, and popular web development frameworks, helping readers choose appropriate tools for their projects.

Data ProcessingLibrariesPython
0 likes · 9 min read
Comprehensive List of Python Libraries for Web Crawling, Data Processing, and Web Development
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 7, 2024 · Backend Development

Integrating XXL‑Job for Scheduled Hot‑Search Crawlers in a Java Backend

This tutorial explains how to replace the basic @Scheduled annotation with the flexible XXL‑Job distributed scheduler, covering repository download, admin deployment, database initialization, Spring‑Boot executor configuration, job registration for Douyin and Bilibili hot‑search crawling, and a Vue front‑end component for displaying ranked results with real‑time update timestamps.

Backend DevelopmentJavaSpring Boot
0 likes · 14 min read
Integrating XXL‑Job for Scheduled Hot‑Search Crawlers in a Java Backend
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 29, 2024 · Backend Development

Storing Douyin and Baidu Hot Search Data with MySQL, MyBatis Generator, and Java Crawlers

This tutorial explains how to design a MySQL table for hot‑search records, generate Java entity and mapper classes using MyBatis Generator, create unique IDs for each entry, and implement scheduled Java crawlers for Douyin and Baidu hot‑search data that persist the results via Spring‑Boot services.

Database DesignJavaMyBatis
0 likes · 19 min read
Storing Douyin and Baidu Hot Search Data with MySQL, MyBatis Generator, and Java Crawlers
FunTester
FunTester
Jan 29, 2024 · Operations

Curated Index of Technical Articles on Testing, Bugs, Web Crawling, UI Automation, and Selenium

This collection provides a curated list of technical articles covering performance testing strategies, bug case studies, web crawling implementations, UI automation techniques, UiAutomator usage, Selenium best practices, and mobile app performance monitoring, each with its original title and publication date.

AutomationBug Analysisperformance
0 likes · 8 min read
Curated Index of Technical Articles on Testing, Bugs, Web Crawling, UI Automation, and Selenium
Python Programming Learning Circle
Python Programming Learning Circle
Jan 24, 2024 · Backend Development

Running Scrapy Crawlers: Command‑Line, CrawlerProcess, and CrawlerRunner Approaches

This tutorial demonstrates how to execute Scrapy spiders from the command line, run them within Python files using cmdline, and manage single or multiple spiders with CrawlerProcess and CrawlerRunner, highlighting configuration steps, limitations, and best‑practice recommendations.

Backend DevelopmentCrawlerProcessCrawlerRunner
0 likes · 3 min read
Running Scrapy Crawlers: Command‑Line, CrawlerProcess, and CrawlerRunner Approaches
Python Programming Learning Circle
Python Programming Learning Circle
Jan 23, 2024 · Backend Development

Comprehensive Guide to Python Libraries for Web Crawling, Web Development, and Asynchronous Programming

This article provides an extensive overview of Python libraries and frameworks for web crawling, data extraction, asynchronous networking, browser automation, and popular web development frameworks, helping developers choose the right tools for backend projects and avoid common misconceptions when selecting a framework.

Async ProgrammingLibrariesweb crawling
0 likes · 9 min read
Comprehensive Guide to Python Libraries for Web Crawling, Web Development, and Asynchronous Programming
Python Programming Learning Circle
Python Programming Learning Circle
Dec 15, 2023 · Backend Development

Comprehensive List of Python Libraries for Web Crawling, Web Development, and Related Technologies

This article provides an extensive overview of Python libraries and frameworks for web crawling, HTTP handling, HTML parsing, text processing, asynchronous programming, queue management, cloud execution, WebSocket communication, DNS resolution, computer vision, proxy servers, and popular web frameworks such as Django, Flask, Web2py, Tornado, and CherryPy, helping developers choose appropriate tools for backend development.

Asynchronous ProgrammingBackend DevelopmentLibraries
0 likes · 10 min read
Comprehensive List of Python Libraries for Web Crawling, Web Development, and Related Technologies
Big Data Technology Architecture
Big Data Technology Architecture
Feb 11, 2023 · Backend Development

Understanding Scrapy and Twisted: Architecture, Components, and Debugging Techniques

This article explains Scrapy's comprehensive crawling framework and Twisted's event‑driven networking engine, detailing their core concepts, workflow, code execution process, and how to debug Scrapy spiders using breakpoint tracing, providing a deep technical overview for backend developers.

Backend DevelopmentPythonScrapy
0 likes · 15 min read
Understanding Scrapy and Twisted: Architecture, Components, and Debugging Techniques
Architecture Digest
Architecture Digest
Sep 24, 2022 · Information Security

Web Crawling and Anti‑Crawling Techniques: Principles, Implementation, and Countermeasures

This article explains the technical principles and implementation steps of web crawlers, introduces common crawling frameworks, provides a Python example for extracting app store rankings, and then details various anti‑crawling methods such as CSS offset, image camouflage, custom fonts, dynamic rendering, captchas, request signing, and honeypots, followed by counter‑strategies for each.

PythonScrapyanti-crawling
0 likes · 24 min read
Web Crawling and Anti‑Crawling Techniques: Principles, Implementation, and Countermeasures
Python Programming Learning Circle
Python Programming Learning Circle
Sep 23, 2022 · Backend Development

Understanding Static and Dynamic Web Pages for Effective Web Crawling

This article explains what web crawlers are, compares static and dynamic web pages, outlines their characteristics, advantages, and challenges, and provides practical tips for extracting data from both types of pages using tools like browser developer consoles and packet‑capture utilities.

AJAXData ExtractionHTTP
0 likes · 5 min read
Understanding Static and Dynamic Web Pages for Effective Web Crawling
vivo Internet Technology
vivo Internet Technology
Sep 14, 2022 · Information Security

Web Crawling, Anti‑Crawling, and Anti‑Anti‑Crawling Techniques: Principles, Frameworks, and Code Examples

The article explains web‑crawling basics, Python and Scrapy examples, then surveys common anti‑crawling defenses such as CSS offsets, image camouflage, custom fonts, dynamic rendering, captchas, request signatures and honeypots, and finally presents anti‑anti‑crawling countermeasures—including CSS‑offset reversal, font decoding, headless‑browser rendering and YOLOv5‑based captcha cracking, while stressing legal compliance.

PythonScrapySecurity
0 likes · 25 min read
Web Crawling, Anti‑Crawling, and Anti‑Anti‑Crawling Techniques: Principles, Frameworks, and Code Examples
Java Architect Essentials
Java Architect Essentials
Aug 12, 2022 · Information Security

Case Study: Illegal Web Crawling Causing System Outage and Criminal Conviction

This article recounts the 2018 legal case in which a company's automated web crawler overloaded a municipal residence‑permit system, causing service disruption and data leakage, leading to the CTO and programmer’s conviction for damaging computer information systems.

computer crimeinformation securitylegal case
0 likes · 8 min read
Case Study: Illegal Web Crawling Causing System Outage and Criminal Conviction
Python Programming Learning Circle
Python Programming Learning Circle
Jul 13, 2022 · Backend Development

Comprehensive Scrapy Tutorial: Architecture, XPath Basics, Installation, Project Setup, and Advanced Features

This article provides a detailed walkthrough of Scrapy, covering its event‑driven architecture, component interactions, XPath parsing fundamentals, installation steps, project creation, sample spider code, item pipelines, middleware customization, and essential configuration settings for effective web crawling in Python.

PythonScrapymiddleware
0 likes · 12 min read
Comprehensive Scrapy Tutorial: Architecture, XPath Basics, Installation, Project Setup, and Advanced Features
IT Services Circle
IT Services Circle
Jul 5, 2022 · Backend Development

Optimizing feapder Spider with Gevent: Reducing CPU Usage and Thread Count

This article demonstrates how adding two gevent monkey‑patch lines to a feapder spider reduces CPU usage from 121% to 99% while changing the effective thread count from 36 to 12, and discusses the underlying principle, performance trade‑offs, and future directions for coroutine support.

CPU optimizationPythonfeapder
0 likes · 6 min read
Optimizing feapder Spider with Gevent: Reducing CPU Usage and Thread Count
Java Architect Essentials
Java Architect Essentials
Jun 19, 2022 · Backend Development

Java Sogou Image Crawler: Fetching and Downloading Images with WebMagic

This tutorial explains how to use Java and the WebMagic framework to crawl thousands of images from Sogou image search, parse the JSON responses to extract image URLs, and download the pictures locally using multithreaded processing and custom HTTP utilities.

HTTPImage DownloadJava
0 likes · 17 min read
Java Sogou Image Crawler: Fetching and Downloading Images with WebMagic
IT Services Circle
IT Services Circle
Feb 25, 2022 · Backend Development

Detecting and Handling Gzip Bombs in Web Crawling with Python Requests

This article explains how to identify gzip‑compressed responses that may be gzip bombs, how to inspect HTTP headers and raw response data using Python's requests library, and provides command‑line and code examples for measuring compressed and uncompressed sizes without triggering decompression.

PythonRequestsbackend
0 likes · 5 min read
Detecting and Handling Gzip Bombs in Web Crawling with Python Requests
Architecture Digest
Architecture Digest
Feb 19, 2022 · Information Security

Case Study: Illegal Web Crawling and Criminal Conviction in China

This article recounts how a corporate web‑crawling tool designed to automate housing‑loan data collection overloaded a municipal residence‑permit system, triggered a large‑scale denial‑of‑service attack, and led to the CTO and programmer being prosecuted for damaging a computer information system.

computer crimecyberlawdata scraping
0 likes · 8 min read
Case Study: Illegal Web Crawling and Criminal Conviction in China
Selected Java Interview Questions
Selected Java Interview Questions
Sep 5, 2021 · Backend Development

Crawling and Downloading Thousands of Images from Sogou Using Java

This tutorial explains how to crawl thousands of images from Sogou using Java, detailing the request URL analysis, parameter extraction, multithreaded downloading logic, and providing complete source code for the image processor, pipeline, and HTTP utility classes.

Backend DevelopmentHTTPImage Download
0 likes · 17 min read
Crawling and Downloading Thousands of Images from Sogou Using Java
Python Programming Learning Circle
Python Programming Learning Circle
Aug 20, 2021 · Backend Development

Python Crawler for Scraping Baidu Baike Articles

This article presents a complete Python web crawler example that extracts Baidu Baike entries, detailing the implementation of URL management, page downloading, HTML parsing with BeautifulSoup, data collection, and output generation, along with sample code and usage instructions.

BaikeBeautifulSoupPython
0 likes · 9 min read
Python Crawler for Scraping Baidu Baike Articles