Tagged articles

Web Scraping

472 articles · Page 5 of 5

Nov 17, 2018 · Backend Development

How to Crawl Zhihu’s Funniest Answers with Python: A Simple Two‑Step Guide

This article shows how to use Python to scrape Zhihu answers, store them in MongoDB, filter for short high‑upvote replies, and then presents a collection of programmer‑centric jokes that illustrate the kind of "god replies" the crawler can retrieve.

MongoDBWeb Scrapingprogrammer jokes

0 likes · 14 min read

How to Crawl Zhihu’s Funniest Answers with Python: A Simple Two‑Step Guide

MaGe Linux Operations

Oct 29, 2018 · Backend Development

How to Efficiently Scrape Novel Rankings with Python: De‑duplication and Speed Tips

This guide explains how to extract novel titles and links from a structured ranking website, remove duplicate entries using a set, handle HTML tags, and improve crawling speed with multithreading or the Scrapy framework, all while keeping the code modular and reusable.

DeduplicationPythonWeb Scraping

0 likes · 5 min read

How to Efficiently Scrape Novel Rankings with Python: De‑duplication and Speed Tips

MaGe Linux Operations

Oct 13, 2018 · Backend Development

Master Distributed Web Crawling with Scrapy‑Redis: Setup, Architecture, and Code

This guide explains how to scale web crawling to hundreds of sites using Scrapy‑Redis, covering its components, distributed workflow, Redis installation and configuration, proxy pool handling, and provides complete Python code examples for spiders and pipelines.

PythonWeb Scrapingdistributed crawling

0 likes · 7 min read

Master Distributed Web Crawling with Scrapy‑Redis: Setup, Architecture, and Code

Beike Product & Technology

Oct 12, 2018 · Fundamentals

Headless Browser Automation: Selenium vs Puppeteer

This article explores headless browser automation technologies including Selenium, PhantomJS, Puppeteer, and Headless Chrome, comparing their architectures, use cases, and implementation differences.

Chrome DevTools ProtocolHeadless BrowserPhantomJS

0 likes · 9 min read

Headless Browser Automation: Selenium vs Puppeteer

Qunar Tech Salon

Sep 30, 2018 · Backend Development

Analyzing National Day Travel Crowds Using Python Web Scraping and Search Index Data

This article describes how to use Python, Selenium, and search‑index services to scrape and visualize popularity data for Chinese tourist spots during the National Day holiday, presenting a ranking of destinations and providing full code examples for data collection, cleaning, and storage.

MongoDBPythonSelenium

0 likes · 6 min read

Analyzing National Day Travel Crowds Using Python Web Scraping and Search Index Data

Python Crawling & Data Mining

Sep 20, 2018 · Fundamentals

Master Python Regex for Web Scraping: Quick Guide with Real Code

This article explains why regular expressions are essential for Python web scraping, introduces the special characters ^, ., and *, and demonstrates their use with clear code examples, showing how to extract specific patterns such as numbers from HTML content.

PythonWeb Scrapingtutorial

0 likes · 5 min read

Master Python Regex for Web Scraping: Quick Guide with Real Code

MaGe Linux Operations

Sep 1, 2018 · Backend Development

How to Scrape NetEase Cloud Music Hot Comments and Song List with Python

This tutorial walks through using Python and browser developer tools to locate the NetEase Cloud Music comment API, extract encrypted request parameters, retrieve hot comments in JSON, and parse the Hot Songs chart to collect each song's name and ID with regular expressions.

HTTPNetEase Cloud MusicWeb Scraping

0 likes · 8 min read

How to Scrape NetEase Cloud Music Hot Comments and Song List with Python

MaGe Linux Operations

Aug 16, 2018 · Backend Development

Master Python Web Scraping: GET, POST, Proxies, Cookies, and Multithreading

This article provides a comprehensive Python web scraping guide covering basic page retrieval with GET and POST, proxy usage, cookie handling, header spoofing, page parsing techniques, captcha processing, gzip compression, and multithreaded crawling, complete with code snippets for each step.

Web Scrapingcookiesproxy

0 likes · 7 min read

Master Python Web Scraping: GET, POST, Proxies, Cookies, and Multithreading

MaGe Linux Operations

Aug 8, 2018 · Backend Development

Scrape WeChat Moments with Python Scrapy: A Step‑by‑Step Guide

This tutorial shows how to export WeChat Moments using a third‑party service, then build a Python Scrapy spider to crawl the exported pages, parse the JSON data, and save the moments to a file, with detailed commands and code examples.

ScrapyWeChatWeb Scraping

0 likes · 8 min read

Scrape WeChat Moments with Python Scrapy: A Step‑by‑Step Guide

Python Crawling & Data Mining

Jul 31, 2018 · Big Data

Can Web‑Scraped Movie Reviews Predict Box Office? A Python Data‑Mining Case Study

Using Python to scrape over ten thousand Maoyan comments for the comedy film “The Billionaire” (西虹市首富), this article demonstrates data cleaning, geographic heat‑maps, city‑wise rating analysis, word‑cloud generation, and a simple box‑office forecast based on a comparable movie, illustrating practical web‑scraping and data‑mining techniques.

Box Office PredictionMovie ReviewsPython

0 likes · 10 min read

Can Web‑Scraped Movie Reviews Predict Box Office? A Python Data‑Mining Case Study

MaGe Linux Operations

Jul 28, 2018 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag and attribute access, tree traversal, searching techniques like find_all, find, CSS selectors, and practical code examples.

Web Scrapingbeautifulsoupdata extraction

0 likes · 11 min read

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

Tencent Cloud Developer

Jul 26, 2018 · Artificial Intelligence

Python‑Based Scraping, Cleaning, Sentiment Analysis and Visualization of Douban Movie Reviews

The article walks through a full Python workflow that scrapes up to 500 Douban movie reviews for "Dying to Survive" and "Hidden Blade," cleans and stores them in pandas, performs SnowNLP sentiment analysis, and visualizes city distribution, rating trends, and word clouds with pyecharts.

PandasPyechartsPython

0 likes · 23 min read

Python‑Based Scraping, Cleaning, Sentiment Analysis and Visualization of Douban Movie Reviews

MaGe Linux Operations

Jul 3, 2018 · Backend Development

How to Automate Course Registration on a Legacy University System with Python

Learn to automate course enrollment on outdated university portals by simulating login with Python, extracting session data, handling CAPTCHAs, constructing POST requests, and parsing responses, enabling fast, reliable class registration even when schedules conflict.

AutomationCourse RegistrationWeb Scraping

0 likes · 14 min read

How to Automate Course Registration on a Legacy University System with Python

MaGe Linux Operations

Jun 25, 2018 · Backend Development

How to Scrape WeChat Moments with Python Scrapy: Step‑by‑Step Guide

This tutorial walks you through obtaining WeChat Moments data via a third‑party export service, setting up a Scrapy project, analyzing the JSON responses, and implementing the spider code to extract and save posts and timestamps.

ScrapyWeChatWeb Scraping

0 likes · 7 min read

How to Scrape WeChat Moments with Python Scrapy: Step‑by‑Step Guide

MaGe Linux Operations

May 27, 2018 · Backend Development

How to Scrape Super Schedule App Data with Python: Login & Topic Extraction

This guide demonstrates how to programmatically log into the Super Schedule mobile app using Python, capture its JSON responses, and continuously fetch user-generated topics by mimicking the app’s HTTP requests, complete with sample code and request headers.

HTTPLogin AutomationMobile App

0 likes · 7 min read

How to Scrape Super Schedule App Data with Python: Login & Topic Extraction

Python Crawling & Data Mining

May 26, 2018 · Backend Development

How to Scrape and Visualize Your WeChat Friends’ Locations with Python

This tutorial shows how to use Python's itchat library to extract your WeChat contacts' province and city information, analyze the distribution, and create visual maps, providing step‑by‑step guidance and sample screenshots for each stage.

Data VisualizationPythonWeChat

0 likes · 5 min read

How to Scrape and Visualize Your WeChat Friends’ Locations with Python

Python Crawling & Data Mining

May 20, 2018 · Backend Development

How to Use Python to Scrape Your WeChat Friend Count and Gender Distribution

This tutorial shows how to employ Python's itchat library to log into WeChat, retrieve the total number of friends, and analyze their gender distribution, providing step‑by‑step code, screenshots of results, and tips for verifying the data.

PythonWeChatWeb Scraping

0 likes · 5 min read

How to Use Python to Scrape Your WeChat Friend Count and Gender Distribution

Python Crawling & Data Mining

May 15, 2018 · Fundamentals

Create Stunning Word Clouds from WeChat Moments Using Python and Jieba

This tutorial walks you through extracting WeChat Moments with a Python web scraper, processing the Chinese text using jieba, and visualizing the most frequent words as beautiful word clouds with customizable shapes and fonts.

Data VisualizationPythonWeb Scraping

0 likes · 6 min read

Create Stunning Word Clouds from WeChat Moments Using Python and Jieba

Python Crawling & Data Mining

May 9, 2018 · Backend Development

How to Scrape WeChat Moments with Python and Scrapy: Step‑by‑Step Guide

Learn how to export WeChat Moments using a third‑party service, set up a Scrapy project in Python, analyze the dynamic JSON responses, and write a crawler to extract timeline data, complete with screenshots and command‑line instructions for a fully functional scraper.

PythonWeChatWeb Scraping

0 likes · 6 min read

How to Scrape WeChat Moments with Python and Scrapy: Step‑by‑Step Guide

MaGe Linux Operations

May 8, 2018 · Fundamentals

How to Scrape Lagou Python Job Data and Visualize Trends with Python

This tutorial demonstrates how to collect Python job postings from Lagou using Python's requests library, process the JSON response with pandas, and create insightful visualizations—including bar charts, word clouds, and geographic heatmaps—while handling anti‑scraping measures and data cleaning steps.

Data VisualizationLagouMatplotlib

0 likes · 9 min read

How to Scrape Lagou Python Job Data and Visualize Trends with Python

AutoHome Frontend

May 4, 2018 · Backend Development

Master Web Automation with Puppeteer: PDFs, Testing, and Performance Tracing

This article introduces Puppeteer, a Node library for controlling Chrome/Chromium, explains its installation, key APIs, and demonstrates practical use cases such as generating PDFs, automating UI tests on mobile emulators, and capturing performance traces for web pages.

PuppeteerWeb AutomationWeb Scraping

0 likes · 12 min read

Master Web Automation with Puppeteer: PDFs, Testing, and Performance Tracing

MaGe Linux Operations

Apr 23, 2018 · Backend Development

Essential Python Libraries for Web Scraping and Data Processing

A comprehensive catalog of Python libraries covering network communication, web crawling frameworks, HTML/XML parsing, text manipulation, file format handling, natural language processing, browser automation, concurrency, cloud services, email processing, URL manipulation, multimedia extraction, WebSocket support, DNS resolution, computer vision, proxy servers, and other useful tools for developers.

AutomationParsingPython

0 likes · 16 min read

Essential Python Libraries for Web Scraping and Data Processing

MaGe Linux Operations

Apr 17, 2018 · Backend Development

Build a Python Web Scraper to Grab Maoyan TOP100 Movies in Minutes

This step‑by‑step tutorial shows Python beginners how to create a simple web scraper that downloads, parses, and stores the Maoyan movie TOP 100 list using requests, regular expressions, and multiprocessing for fast data collection.

MultiprocessingPythonWeb Scraping

0 likes · 5 min read

Build a Python Web Scraper to Grab Maoyan TOP100 Movies in Minutes

MaGe Linux Operations

Apr 16, 2018 · Fundamentals

Unlock Python’s Power: A Step‑by‑Step Guide to Mastering the Language

This article explains why Python is a top choice for beginners and professionals, outlines its rich library ecosystem, and presents a four‑stage learning path—from basic syntax and data structures to advanced features and real‑world projects—helping readers become competent Python developers.

AutomationPythonWeb Scraping

0 likes · 7 min read

Unlock Python’s Power: A Step‑by‑Step Guide to Mastering the Language

MaGe Linux Operations

Apr 2, 2018 · Backend Development

Build a Python Scraper for Lagou.com to Extract Job Requirements with Baidu NLP

This article demonstrates a compact, runnable Python 3 scraper that fetches job listings from Lagou.com based on a keyword, filters by city and salary, extracts detailed job requirements using XPath, and applies Baidu's free NLP service for word segmentation and part‑of‑speech tagging to reveal key skill terms.

Baidu AIJob DataLagou

0 likes · 15 min read

Build a Python Scraper for Lagou.com to Extract Job Requirements with Baidu NLP

MaGe Linux Operations

Mar 28, 2018 · Cloud Computing

How to Discover and Simulate Baidu Cloud’s Transfer API for Automated File Saving

This guide walks you through logging into Baidu Cloud, capturing the transfer API via browser dev tools, analyzing required cookies and parameters, constructing the proper request headers and URL, and programmatically extracting share information to automate file transfers using Python.

APIAutomationBaidu Cloud

0 likes · 7 min read

How to Discover and Simulate Baidu Cloud’s Transfer API for Automated File Saving

MaGe Linux Operations

Mar 13, 2018 · Backend Development

Crawl Zhihu’s “Beautiful Women” Images and Filter by AI Face Scores in Python

This guide explains how to collect images from Zhihu’s “美女” topic using Python’s Requests and lxml, filter them with Baidu’s AipFace API based on gender, face presence, authenticity, and beauty score, and store the high‑quality results locally, including setup and optional customizations.

Baidu AIData FilteringPython

0 likes · 7 min read

Crawl Zhihu’s “Beautiful Women” Images and Filter by AI Face Scores in Python

MaGe Linux Operations

Mar 11, 2018 · Artificial Intelligence

Generate Tang Poetry with Python: Scraping, Processing, and Rhyme Creation

This tutorial explains how to build a Python program that crawls 71,000 Tang poems, extracts and tokenizes the text, analyzes word frequencies, and assembles new five‑character regulated verses with proper rhymes, including acrostic poems, while offering code snippets and future AI enhancements.

Poetry GenerationPythonRhyme Detection

0 likes · 7 min read

Generate Tang Poetry with Python: Scraping, Processing, and Rhyme Creation

MaGe Linux Operations

Mar 9, 2018 · Backend Development

How to Crawl CSDN Geek News Articles with Python: A Step‑by‑Step Guide

This article walks through using Python to log into CSDN, capture the dynamic Geek News list via its JSON API, handle request parameters, and extract article titles and links, providing concise code screenshots for a complete web‑scraping solution.

CSDNHTTPPython

0 likes · 5 min read

How to Crawl CSDN Geek News Articles with Python: A Step‑by‑Step Guide

Python Crawling & Data Mining

Jan 30, 2018 · Backend Development

How to Scrape Weather Data with Python and Auto‑Email It Daily

Learn how to use Python's BeautifulSoup to scrape real‑time weather data from Sohu Weather, format the information, and automatically send it via email using SMTP, with step‑by‑step code examples and tips for handling different email providers.

PythonSMTPWeb Scraping

0 likes · 4 min read

How to Scrape Weather Data with Python and Auto‑Email It Daily

Python Crawling & Data Mining

Jan 27, 2018 · Backend Development

How to Scrape Real-Time Weather Data with Python and BeautifulSoup

This guide demonstrates how to use Python's BeautifulSoup library to crawl the Green Breath website, extract real‑time weather and PM2.5 information, handle missing data with conditional checks, and display the results directly in the PyCharm console, providing a practical example of web‑scraping for environmental monitoring.

Air QualityPythonWeather Data

0 likes · 3 min read

How to Scrape Real-Time Weather Data with Python and BeautifulSoup

MaGe Linux Operations

Jan 27, 2018 · Backend Development

Scrape 2018 Chinese City Job Listings with Python and Visualize the Results

This tutorial shows how to use Python to crawl all Chinese city names from Zhaopin, retrieve the number of Android job postings for each city via HTTP GET requests, parse the results with regex, store them in a dictionary, and finally plot the data with Matplotlib for clear visual comparison.

PythonWeb Scrapingjob market analysis

0 likes · 6 min read

Scrape 2018 Chinese City Job Listings with Python and Visualize the Results

Python Crawling & Data Mining

Jan 24, 2018 · Backend Development

Which Python Selector Is Best for Web Scraping? Regex, BeautifulSoup, lxml, CSS Compared

This article compares four Python web‑scraping selectors—regular expressions, BeautifulSoup, lxml/XPath, and CSS—detailing their strengths, weaknesses, performance, and installation difficulty to help developers choose the most suitable tool for extracting data from sites like JD.com.

PythonSelectorsWeb Scraping

0 likes · 6 min read

Which Python Selector Is Best for Web Scraping? Regex, BeautifulSoup, lxml, CSS Compared

Python Crawling & Data Mining

Jan 21, 2018 · Backend Development

Master XPath for Precise JD.com Product Scraping with Python

This tutorial shows how to use Python's urllib and XPath to accurately extract product details such as name, link, image, and price from JD.com search results, offering a clearer alternative to regular expressions and BeautifulSoup.

JD.comPythonWeb Scraping

0 likes · 4 min read

Master XPath for Precise JD.com Product Scraping with Python

Python Crawling & Data Mining

Jan 18, 2018 · Backend Development

How to Accurately Scrape JD.com Product Data with BeautifulSoup

This tutorial shows how to use Python's urllib and BeautifulSoup libraries to encode search keywords, request JD.com pages, parse the HTML tree, and reliably extract product names, links, images, and prices, offering a simpler alternative to complex regular‑expression scrapers.

JD.comPythonWeb Scraping

0 likes · 4 min read

How to Accurately Scrape JD.com Product Data with BeautifulSoup

Architecture Digest

Jan 17, 2018 · Backend Development

Design and Implementation of a Java Web Crawler Framework Inspired by Scrapy

This article explains how to design and build a lightweight Java web crawler framework, covering crawler fundamentals, anti‑scraping challenges, core components such as URL manager, scheduler, downloader, parser and pipeline, and provides concrete code examples and architectural diagrams.

JavaScrapyWeb Crawler

0 likes · 14 min read

Design and Implementation of a Java Web Crawler Framework Inspired by Scrapy

MaGe Linux Operations

Jan 11, 2018 · Backend Development

Master Python Web Scraping: From Basic Requests to Multithreaded Crawlers

This comprehensive guide walks you through Python web‑scraping techniques—including basic URL fetching, proxy usage, cookie and form handling, browser impersonation, gzip/deflate support, captcha processing, multithreading with thread pools and Twisted async I/O, plus practical tips on connection pooling, thread stack size, retries, timeouts and login automation—providing a solid foundation for building robust crawlers.

PythonWeb Scrapinggzip

0 likes · 17 min read

Master Python Web Scraping: From Basic Requests to Multithreaded Crawlers

MaGe Linux Operations

Dec 20, 2017 · Fundamentals

Mastering XPath: Powerful Techniques for Precise Web Scraping

This guide explains how to use XPath efficiently for web scraping, covering node selection, axes, functions, numeric comparisons, and advanced combinations, while emphasizing concise and readable expressions to improve performance and maintainability.

PythonWeb ScrapingXML

0 likes · 5 min read

Mastering XPath: Powerful Techniques for Precise Web Scraping

21CTO

Dec 15, 2017 · Backend Development

Master Web Scraping with Python: Regex, BeautifulSoup & Selenium

This guide demonstrates how to combine Python's regex, BeautifulSoup, and Selenium (including Chrome and headless PhantomJS) for powerful web scraping, covering tag matching, handling Ajax, iFrames, cookie management, and practical code examples for extracting and interacting with dynamic web content.

Headless BrowserSeleniumWeb Scraping

0 likes · 10 min read

Master Web Scraping with Python: Regex, BeautifulSoup & Selenium

Huawei Cloud Developer Alliance

Dec 14, 2017 · Artificial Intelligence

What Do Douban Reviews Reveal? Sentiment Analysis of “The Hunt” TV Drama

This article scrapes over 22,000 Douban comments for the TV series “The Hunt,” cleans the data, applies SnowNLP‑based sentiment classification and word‑cloud visualization, and shows that while overall ratings are mixed, detailed short reviews are overwhelmingly positive.

PythonTV dramaWeb Scraping

0 likes · 5 min read

What Do Douban Reviews Reveal? Sentiment Analysis of “The Hunt” TV Drama

AI Large-Model Wave and Transformation Guide

Nov 23, 2017 · Backend Development

How to Build a Simple Python Spider to Download Images from Baidu Tieba

This tutorial walks through using Python's urllib and regular expressions to crawl a Baidu Tieba page, extract all .jpg image URLs, and download each image locally with a sequential naming scheme.

PythonWeb Scrapingimage-downloader

0 likes · 6 min read

How to Build a Simple Python Spider to Download Images from Baidu Tieba

MaGe Linux Operations

Nov 14, 2017 · Backend Development

How to Use Scrapy to Crawl Zhihu Users and Analyze Their Data

This tutorial explains how a Python developer can set up a Scrapy project, write spiders to crawl Zhihu user profiles, store the results in a MySQL database, adjust settings for headers and delays, and finally perform simple gender and location analysis on the collected data.

Backend DevelopmentPythonScrapy

0 likes · 14 min read

How to Use Scrapy to Crawl Zhihu Users and Analyze Their Data

ITPUB

Oct 26, 2017 · Backend Development

Mastering Python urllib2: GET, POST, Proxies, Cookies, Headers, GZIP, and Multithreaded Crawling

This guide walks through using Python's urllib2 library for web crawling, covering basic GET/POST requests, handling proxy IPs, managing cookies, spoofing browser headers, processing gzip-compressed responses, and implementing multithreaded fetching with a simple thread‑pool template.

Web Scrapingcookiesgzip

0 likes · 7 min read

Mastering Python urllib2: GET, POST, Proxies, Cookies, Headers, GZIP, and Multithreaded Crawling

21CTO

Oct 14, 2017 · Backend Development

How etlpy Simplifies Python Web Scraping and Data Cleaning in Under 500 Lines

etlpy is a lightweight Python framework that lets you define web‑crawling and data‑cleaning pipelines via XML, using generators for streaming, built‑in thread pools for parallelism, and a plug‑in architecture that handles everything from regex parsing to JSON conversion, all within a single 500‑line core file.

ETLWeb Scrapingdata cleaning

0 likes · 14 min read

How etlpy Simplifies Python Web Scraping and Data Cleaning in Under 500 Lines

MaGe Linux Operations

Sep 19, 2017 · Backend Development

How to Scrape and Analyze Quanmin K‑Song User Data with Python

This tutorial walks through using Python and BeautifulSoup to crawl user profiles, fan lists, and song information from the Quanmin K‑Song app, clean and store the data in MongoDB, handle pagination, and prepare the dataset for further analysis.

APIMongoDBPython

0 likes · 8 min read

How to Scrape and Analyze Quanmin K‑Song User Data with Python

MaGe Linux Operations

Sep 16, 2017 · Backend Development

How to Scrape NetEase Cloud Music Hot Comments with Python

This tutorial walks through using Python and browser developer tools to locate, decode, and retrieve the hot comments from NetEase Cloud Music's hot song chart, including extracting song IDs, handling encrypted request parameters, and applying regex to gather song metadata.

APINetEase Cloud MusicPython

0 likes · 9 min read

How to Scrape NetEase Cloud Music Hot Comments with Python

MaGe Linux Operations

Sep 14, 2017 · Backend Development

How to Scrape Qunar Tourist Spot Data and Visualize It with Baidu Maps & ECharts

This guide walks you through using Python to crawl Qunar’s tourist spot pages, retrieve location and sales data, enrich it with Baidu Map’s geocoding API, and then create interactive heatmaps and ranking charts with ECharts, while storing results in JSON and Excel files for further analysis.

Baidu MapsEChartsPython

0 likes · 11 min read

How to Scrape Qunar Tourist Spot Data and Visualize It with Baidu Maps & ECharts

MaGe Linux Operations

Sep 13, 2017 · Backend Development

Build a Car Model Scraper with Scrapy: Complete Step-by-Step Tutorial

Learn how to set up a Scrapy project to crawl comprehensive car brand, series, and model data from Autohome, covering environment preparation, project initialization, spider and pipeline creation, CSV output, rate limiting, and useful debugging tips.

AutohomeCSV exportCar Data

0 likes · 10 min read

Build a Car Model Scraper with Scrapy: Complete Step-by-Step Tutorial

Hujiang Technology

Sep 12, 2017 · Backend Development

Using Nightmare with Electron for Web Automation and Zhihu Topic Crawling

This article introduces Electron and the Nightmare framework, explains how to install and use Nightmare for web automation and crawling, and provides a complete example of scraping Zhihu topic data with JavaScript, Node.js, and Cheerio, including code snippets and JSON output.

ElectronJavaScriptNightmare

0 likes · 9 min read

Using Nightmare with Electron for Web Automation and Zhihu Topic Crawling

MaGe Linux Operations

Sep 9, 2017 · Backend Development

Build a Python Image Downloader: Step‑by‑Step Web Scraping Tutorial

This tutorial walks through building a Python web scraper that automatically downloads images from Baidu by analyzing requirements, inspecting page source, crafting regex patterns, and implementing the crawler with requests, offering step‑by‑step guidance, code snippets, and troubleshooting tips.

PythonWeb Scrapingimage-downloader

0 likes · 7 min read

Build a Python Image Downloader: Step‑by‑Step Web Scraping Tutorial

MaGe Linux Operations

Sep 6, 2017 · Backend Development

How I Built a High‑Performance Novel Site Crawler with MongoDB

Inspired by a tutorial, I created a MongoDB‑backed crawler for the Yisou novel website, extracting category links, managing URL states across multiple processes, handling millions of pages, and finally deduplicating the results to obtain a clean collection of books.

MongoDBMultiprocessingPython

0 likes · 3 min read

How I Built a High‑Performance Novel Site Crawler with MongoDB

MaGe Linux Operations

Jul 29, 2017 · Backend Development

Build a Fast Python Web Scraper for Novel Rankings – Step by Step

This guide walks through building a Python web crawler to extract novel titles and URLs from the qu.la ranking page, explains the site’s clear HTML structure, shows how to deduplicate entries with a set, and provides complete code snippets plus performance tips and a Scrapy upgrade path.

CrawlerPythonScrapy

0 likes · 5 min read

Build a Fast Python Web Scraper for Novel Rankings – Step by Step

MaGe Linux Operations

Jul 10, 2017 · Backend Development

How to Build a Zhihu Crawler with Python, ELK, and Visual Analytics

This article walks through creating a Python-based Zhihu web crawler, detailing the tech stack, data collection, visualization of user demographics and top contributors, the crawler architecture, authorization handling, and suggestions for performance and storage improvements.

ELKWeb Scrapingzhihu

0 likes · 6 min read

How to Build a Zhihu Crawler with Python, ELK, and Visual Analytics

MaGe Linux Operations

Jun 19, 2017 · Big Data

How to Scrape 700K Ximalaya Audio Records with Python and MongoDB

This article details a step‑by‑step process for crawling all popular Ximalaya channels, extracting each audio's metadata, and storing roughly 700,000 records in MongoDB, while also showing how to speed up the crawl with asynchronous requests.

MongoDBPythonWeb Scraping

0 likes · 5 min read

How to Scrape 700K Ximalaya Audio Records with Python and MongoDB

MaGe Linux Operations

Jun 9, 2017 · Backend Development

How to Scrape All News from Sichuan University Public Administration Site with Python

This guide walks through the complete process of crawling the Sichuan University Public Administration College website to extract every news article, covering target identification, rule definition, code implementation, handling pagination, and troubleshooting missing items.

CrawlerPythonSichuan University

0 likes · 6 min read

How to Scrape All News from Sichuan University Public Administration Site with Python

MaGe Linux Operations

May 20, 2017 · Backend Development

5 Must‑Use Python Libraries to Supercharge Your Projects

This article introduces five highly practical Python packages—yagmail, requests, psutil, BeautifulSoup, and a collection of utility scripts—explaining how each simplifies common tasks such as sending emails, making HTTP calls, system monitoring, web scraping, and code reuse, complete with concise code examples.

PythonWeb Scrapinglibraries

0 likes · 14 min read

5 Must‑Use Python Libraries to Supercharge Your Projects

ITPUB

May 2, 2017 · Backend Development

How to Bypass Common Anti‑Scraping Measures with Scrapy

This guide explains why websites employ anti‑scraping defenses, outlines the most common header checks such as User‑Agent, Referer, and Cookies, and provides practical Scrapy code snippets for rotating user agents, managing proxies, handling X‑Forwarded‑For, limiting request rates, and dealing with dynamic AJAX content using Selenium or PhantomJS.

HeadersScrapyWeb Scraping

0 likes · 7 min read

How to Bypass Common Anti‑Scraping Measures with Scrapy

MaGe Linux Operations

Apr 28, 2017 · Backend Development

Scrape NetEase Cloud Music Hot Comments and Visualize Them with Word Clouds

This tutorial demonstrates how to capture hot comments from NetEase Cloud Music using web‑scraping techniques, handle the platform's encrypted API, and generate a Chinese word cloud with Python's WordCloud library for visual insight.

Data VisualizationNetEase Cloud MusicPython

0 likes · 5 min read

Scrape NetEase Cloud Music Hot Comments and Visualize Them with Word Clouds

Tencent IMWeb Frontend Team

Apr 27, 2017 · Frontend Development

Build a Chrome Extension to Auto‑Click and Grab a Huawei Honor V9

This tutorial walks through creating a Chrome extension that automates page clicks to repeatedly refresh and purchase a Huawei Honor V9, covering element inspection, simulated clicks with timers, JavaScript fundamentals, packaging the extension, and additional handy scripts.

AutomationChrome ExtensionJavaScript

0 likes · 5 min read

Build a Chrome Extension to Auto‑Click and Grab a Huawei Honor V9

MaGe Linux Operations

Mar 21, 2017 · Backend Development

Master Web Scraping in Python with urllib2: A Step‑by‑Step Guide

This article explains how to use Python's urllib2 module to fetch web pages, covering basic request handling, creating Request objects, sending GET and POST data, setting custom headers, and demonstrates practical examples with code snippets and screenshots.

HTTP requestsPythonWeb Scraping

0 likes · 6 min read

Master Web Scraping in Python with urllib2: A Step‑by‑Step Guide

Huawei Cloud Developer Alliance

May 28, 2016 · Backend Development

Extract All @Mentions from a Zhihu Page with Simple Scripts

This guide shows how to collect every @mentioned user on a Zhihu question page by using a JavaScript bookmarklet or a Python script, explains the extraction process, provides the necessary code snippets, and discusses why following programmers on Zhihu may not be the most effective learning method.

JavaScriptPythonWeb Scraping

0 likes · 6 min read

Extract All @Mentions from a Zhihu Page with Simple Scripts

21CTO

Apr 12, 2016 · Backend Development

How to Build a PHP cURL Spider to Scrape Zhihu User Data and Visualize It

This article walks through using PHP's cURL extension to crawl tens of thousands of Zhihu user profiles, parse the HTML with regular expressions, store the extracted data efficiently, and present the results with responsive charts and dashboards.

Shell scriptWeb ScrapingcURL

0 likes · 9 min read

How to Build a PHP cURL Spider to Scrape Zhihu User Data and Visualize It

21CTO

Mar 22, 2016 · Information Security

How to Outsmart AI-Powered Web Scrapers: Two Powerful Anti‑Crawling Tricks

Web crawlers, especially AI‑driven ones, threaten site performance and data ownership, so this article reviews common anti‑scraping methods—from IP and header analysis to behavior detection—and reveals two unconventional defenses: data poisoning and a deposit‑based access model that penalize malicious bots.

AIData ProtectionWeb Scraping

0 likes · 5 min read

How to Outsmart AI-Powered Web Scrapers: Two Powerful Anti‑Crawling Tricks

ITPUB

Mar 21, 2016 · Backend Development

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

This guide outlines the main anti‑scraping techniques used by websites—including header validation, user‑behavior monitoring, and dynamic content loading—and provides practical methods such as header spoofing, IP proxy rotation, request throttling, and Selenium/PhantomJS automation to overcome them.

HeadersPhantomJSSelenium

0 likes · 6 min read

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

Qunar Tech Salon

Jan 29, 2016 · Big Data

Python Data Analysis Learning Roadmap (16‑Week Plan)

This article presents a 16‑week Python data‑analysis learning roadmap covering environment setup, basic syntax, web‑scraping techniques, data‑analysis libraries such as pandas and NumPy, and data‑visualization with matplotlib, along with curated free resources and tutorials for each stage.

NumPyPandasRoadmap

0 likes · 6 min read

Python Data Analysis Learning Roadmap (16‑Week Plan)

21CTO

Jan 26, 2016 · Backend Development

How to Bypass Common Anti‑Scraping Measures: Headers, Behavior, and Dynamic Pages

This article summarizes common anti‑scraping techniques—including header checks, user‑behavior detection, and dynamic page defenses—and provides practical ways to circumvent them using custom headers, IP proxies, request timing, and tools like Selenium with PhantomJS to simulate real browsers.

HeadersSeleniumWeb Scraping

0 likes · 6 min read

ITPUB

Dec 17, 2015 · Backend Development

Build a Simple Python Image Scraper on macOS – Step‑by‑Step Guide

This tutorial walks you through setting up a macOS environment, inspecting a web page, and writing a Python script with the requests library to locate and download all images from a target site, complete with code explanations and execution tips.

PythonWeb Scrapingimage-downloader

0 likes · 7 min read

Build a Simple Python Image Scraper on macOS – Step‑by‑Step Guide

21CTO

Nov 13, 2015 · Backend Development

Essential Python Libraries for Web Scraping and Data Processing

Discover a comprehensive collection of Python libraries covering network requests, web crawling frameworks, HTML/XML parsing, text manipulation, file format handling, natural language processing, browser automation, asynchronous programming, and more, providing developers with essential tools for efficient web scraping and data processing tasks.

ParsingPythonWeb Scraping

0 likes · 18 min read

21CTO

Oct 1, 2015 · Backend Development

How to Scrape 1.1 Million Zhihu Users with PHP cURL, Multi‑Threading, and Redis

This tutorial walks through collecting over a million Zhihu user profiles using PHP on Ubuntu, handling cookies, bypassing image hot‑link protection, scaling requests with curl_multi, de‑duplicating MySQL inserts, and coordinating work with Redis and multi‑process pcntl for efficient large‑scale web scraping.

LinuxMulti‑processingPHP

0 likes · 15 min read

How to Scrape 1.1 Million Zhihu Users with PHP cURL, Multi‑Threading, and Redis

Architect

Sep 18, 2015 · Big Data

Web Data Mining and Analysis of the “Da Gai Er” Section of the Caoliu Forum Using PHP

This article presents a PHP‑based web‑scraping experiment that collects and visualizes several months of data from the “Da Gai Er” board of the Caoliu forum, revealing user activity patterns, image hosting distribution, registration trends, and overall forum health through charts and statistical summaries.

Big DataPHPWeb Scraping

0 likes · 7 min read

Web Data Mining and Analysis of the “Da Gai Er” Section of the Caoliu Forum Using PHP

MaGe Linux Operations

Aug 18, 2014 · Operations

Automating Social Media Thanks, Subtitles, IMDb Lookups, and More with Python Scripts

This article compiles the top Quora answers showcasing Python scripts that automate Facebook birthday thank‑you comments, one‑click subtitle downloads, IMDb data extraction, comic and e‑card scraping, and explains how these projects even helped the author land a job.

Comic DownloaderIMDb APISubtitle Downloader

0 likes · 8 min read

Automating Social Media Thanks, Subtitles, IMDb Lookups, and More with Python Scripts

MaGe Linux Operations

Jul 1, 2014 · Backend Development

Master Python Web Scraping: Proxies, Login, Multithreading, and Captcha Hacks

This guide walks through practical Python web‑scraping techniques using urllib2, covering basic page fetching, proxy usage, cookie handling for logins, form submission, header spoofing, anti‑hotlink tricks, multithreaded crawling, and strategies for bypassing simple captchas, all illustrated with code snippets.

Web Scrapingcaptchamultithreading

0 likes · 7 min read

Master Python Web Scraping: Proxies, Login, Multithreading, and Captcha Hacks