Tagged articles
466 articles
Page 4 of 5
Python Crawling & Data Mining
Python Crawling & Data Mining
Nov 30, 2020 · Backend Development

Build a Python Movie Scraper: Download Films from FilmSky with Ease

This guide walks you through setting up a Python environment, installing required libraries, constructing a FilmSky scraper class, handling pagination, parsing HTML with regex, and saving movie titles and download links, enabling you to browse and download movies from the FilmSky website efficiently.

PythonWeb Scrapingmovie downloader
0 likes · 6 min read
Build a Python Movie Scraper: Download Films from FilmSky with Ease
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Nov 26, 2020 · Backend Development

How to Crawl Real-Time Data with Python WebSocket: A Step‑by‑Step Guide

This article explains how crawler engineers can fetch real‑time data such as sports scores, stock quotes, or cryptocurrency prices by comparing polling and WebSocket approaches, introducing the aiowebsocket library, and providing complete Python code to perform handshake, subscription, and continuous data streaming.

PythonWeb ScrapingWebSocket
0 likes · 10 min read
How to Crawl Real-Time Data with Python WebSocket: A Step‑by‑Step Guide
MaGe Linux Operations
MaGe Linux Operations
Oct 27, 2020 · Backend Development

Build a Distributed Scrapy Crawler in Minutes with RabbitMQ and RedisBloom

This guide walks you through installing Scrapy-Distributed, setting up RabbitMQ and RedisBloom containers, creating a sitemap spider, configuring the distributed scheduler and dupefilter, and running the spider, while explaining why this non‑intrusive solution improves over existing Scrapy‑Redis and scrapy‑rabbitmq approaches.

PythonRabbitMQRedisBloom
0 likes · 7 min read
Build a Distributed Scrapy Crawler in Minutes with RabbitMQ and RedisBloom
Python Crawling & Data Mining
Python Crawling & Data Mining
Oct 24, 2020 · Backend Development

Master Scrapy: Extract Likes, Comments, and Content with XPath

This article continues a Scrapy tutorial by showing how to extract like counts, comment counts, and full article content using XPath selectors, regular expressions, and debugging techniques, providing step‑by‑step code examples and screenshots to help Python developers automate web data collection.

Data ExtractionPythonScrapy
0 likes · 6 min read
Master Scrapy: Extract Likes, Comments, and Content with XPath
MaGe Linux Operations
MaGe Linux Operations
Sep 28, 2020 · Backend Development

Build a Scalable Python Web Scraper for 3000+ Companies

This article walks through creating a Python web scraper that extracts financial data for over three thousand listed companies, starting from a simple pandas script and progressively adding error handling, MySQL storage, and multiprocessing to build a robust, production‑ready tool.

Data ExtractionPythonWeb Scraping
0 likes · 7 min read
Build a Scalable Python Web Scraper for 3000+ Companies
Python Crawling & Data Mining
Python Crawling & Data Mining
Sep 4, 2020 · Big Data

How to Scrape and Visualize 3,000 Chinese Recipes with Python

This article demonstrates how to use Python to crawl 3,032 Chinese recipe entries from Douguo.com, clean the data with Pandas, and create insightful visualizations—including rating distributions, cuisine comparisons, and ingredient word clouds—using pyecharts, providing complete code snippets and analysis of the results.

Chinese CuisinePyechartsPython
0 likes · 15 min read
How to Scrape and Visualize 3,000 Chinese Recipes with Python
vivo Internet Technology
vivo Internet Technology
Aug 5, 2020 · Frontend Development

Using Puppeteer for Emoji Scraping, Headless Chrome, and Front‑End Automation Testing

The article demonstrates how to use Puppeteer—a Node.js API built on the Chrome DevTools Protocol—to run headless Chrome for tasks such as scraping Google emoji images, generating screenshots or PDFs, and automating front‑end tests by launching a browser, navigating pages, handling cookies, simulating user input, capturing responses, and saving results.

Browser AutomationHeadless ChromeNode.js
0 likes · 15 min read
Using Puppeteer for Emoji Scraping, Headless Chrome, and Front‑End Automation Testing
MaGe Linux Operations
MaGe Linux Operations
Jul 28, 2020 · Fundamentals

Top 8 Python Tools Every Programmer and Student Should Know

This article reviews eight essential Python tools—including IDLE, Scikit‑learn, Theano, Selenium, TestComplete, BeautifulSoup, Pandas, and PuLP—explaining their main features, typical use cases, and why they are valuable for developers and students across web, data science, automation, and optimization tasks.

Data SciencePythonWeb Scraping
0 likes · 5 min read
Top 8 Python Tools Every Programmer and Student Should Know
Efficient Ops
Efficient Ops
Jul 13, 2020 · Operations

What 13,966 Ops Job Listings Reveal About Salary, Skills, and Hot Cities

This article analyzes 13,966 Chinese operations‑engineer job postings scraped from 51job, cleaning the data with Python and Pandas, then visualizing industry demand, city concentration, salary ranges, education requirements, company size distribution, and keyword trends to guide job seekers and recruiters.

Data visualizationOperationsPython
0 likes · 14 min read
What 13,966 Ops Job Listings Reveal About Salary, Skills, and Hot Cities
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 20, 2020 · Artificial Intelligence

Essential Python Libraries for Data Acquisition, Cleaning, Visualization & Modeling

The article provides a comprehensive guide to Python libraries essential for data analysis, detailing tools for data acquisition (Selenium, Scrapy, Beautiful Soup), cleaning (spaCy, NumPy, pandas), visualization (Matplotlib, Pyecharts), modeling (scikit‑learn, PyTorch, TensorFlow), model inspection (LIME), audio (Librosa), image processing (OpenCV, scikit‑image), database access (PyMongo) and web deployment (Flask, Django).

PythonWeb Scrapinglibraries
0 likes · 12 min read
Essential Python Libraries for Data Acquisition, Cleaning, Visualization & Modeling
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 5, 2020 · Backend Development

Build a Python Image Scraper for 51miz.com in Minutes

This tutorial walks you through creating a Python web scraper that fetches image URLs from 51miz.com using requests and lxml, filters them with regular expressions, downloads the images, and demonstrates the complete workflow with code snippets and screenshots.

PythonWeb ScrapingXPath
0 likes · 5 min read
Build a Python Image Scraper for 51miz.com in Minutes
Python Crawling & Data Mining
Python Crawling & Data Mining
May 28, 2020 · Backend Development

Multithreaded Python Crawl of Xiaomi App Store Games

This tutorial demonstrates how to use Python's requests, threading, and queue modules to build a multithreaded crawler that extracts game names, download links, and execution time from the Xiaomi App Store, complete with code examples and performance tips.

PythonWeb ScrapingXiaomi App Store
0 likes · 7 min read
Multithreaded Python Crawl of Xiaomi App Store Games
Liangxu Linux
Liangxu Linux
May 26, 2020 · Frontend Development

How to Bypass Copy Restrictions and Extract Text from Web Pages

This guide explains several techniques—including using browser developer tools, console commands, and a Windows utility—to copy protected text from websites and download documents like Baidu Docs, while noting their limitations and required steps.

Baidu DocsWeb Scrapingbrowser devtools
0 likes · 6 min read
How to Bypass Copy Restrictions and Extract Text from Web Pages
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Apr 26, 2020 · Backend Development

Scrapy Tutorial: Installation, Components, Project Setup, Code Implementation, and Data Storage

This article provides a comprehensive step‑by‑step guide to installing Scrapy, understanding its core components and processing flow, creating a weather‑data crawling project, writing items, settings, middlewares, spiders, running the crawler, exporting results, and storing the scraped data into MongoDB.

CrawlerMongoDBPython
0 likes · 15 min read
Scrapy Tutorial: Installation, Components, Project Setup, Code Implementation, and Data Storage
Python Programming Learning Circle
Python Programming Learning Circle
Mar 27, 2020 · Backend Development

Accessing Login‑Protected Pages with Cookies, urllib, requests, and Selenium in Python

This guide explains four practical methods—using a known cookie, simulating login with urllib or requests, maintaining a session, and employing a headless Selenium browser—to programmatically retrieve pages that require user authentication, complete with step‑by‑step instructions and code examples.

HTTPSeleniumWeb Scraping
0 likes · 14 min read
Accessing Login‑Protected Pages with Cookies, urllib, requests, and Selenium in Python
TAL Education Technology
TAL Education Technology
Mar 6, 2020 · Backend Development

DIY Technical News Acquisition: Framework, Practices, and Code Samples

This article explains why personalized tech‑news gathering is valuable, proposes a DIY framework for controlling sources, collection, filtering, reading experience and iteration, and demonstrates three concrete Node.js scraping examples—HTML pages, API data, and WeChat public accounts—plus extended thoughts on building a simple product.

Node.jsPuppeteerWeb Scraping
0 likes · 17 min read
DIY Technical News Acquisition: Framework, Practices, and Code Samples
Python Programming Learning Circle
Python Programming Learning Circle
Feb 24, 2020 · Fundamentals

Beginner's Python Web Crawler for High‑Resolution Game Skins and Hero Background Stories

This tutorial introduces a lightweight Python web crawler that, without relying on third‑party scraping frameworks, fetches high‑resolution skin images and all hero background stories from a popular game, explaining required libraries, implementation steps, and showing the resulting outputs.

Web Scrapingbeginner tutorialgame data
0 likes · 3 min read
Beginner's Python Web Crawler for High‑Resolution Game Skins and Hero Background Stories
Python Programming Learning Circle
Python Programming Learning Circle
Feb 21, 2020 · Backend Development

Introduction to Python Web Scraping: Basics, HTTP/HTTPS, Requests Library, Proxies, and Data Extraction

This article provides a comprehensive introduction to Python web scraping, covering the fundamental concepts of spiders, HTTP/HTTPS protocols, the Requests library usage, custom headers, proxies, cookies, and various data extraction techniques such as JSON parsing, XPath, and regular expressions.

Data ExtractionHTTPWeb Scraping
0 likes · 9 min read
Introduction to Python Web Scraping: Basics, HTTP/HTTPS, Requests Library, Proxies, and Data Extraction
21CTO
21CTO
Feb 10, 2020 · Backend Development

Top 8 PHP Libraries for Efficient Web Scraping

This article reviews eight PHP web‑scraping libraries—Goutte, Simple HTML DOM, htmlSQL, cURL, Request, HTTPful, Buzz, and Guzzle—detailing their features, requirements, licensing, and documentation to help developers choose the right tool for their backend data‑extraction projects.

BackendGoutteGuzzle
0 likes · 9 min read
Top 8 PHP Libraries for Efficient Web Scraping
MaGe Linux Operations
MaGe Linux Operations
Jan 3, 2020 · Backend Development

Master Web Scraping with Scrapy: A Complete Python Guide

This guide introduces Scrapy, a powerful Python web‑scraping framework, explains its architecture and components, walks through installation, project creation, spider development, query syntax, recursive crawling, and item pipelines, providing practical code examples for building robust crawlers.

CrawlerData ExtractionPython
0 likes · 8 min read
Master Web Scraping with Scrapy: A Complete Python Guide
Liangxu Linux
Liangxu Linux
Dec 25, 2019 · Backend Development

How Python Bots Beat 12306 Ticket Crashes: Open‑Source Tools & Features

When the Chinese railway ticketing system 12306 crashes under heavy load, developers turn to open‑source Python bots that simulate user behavior, query seat availability, and automate order submission, with detailed feature lists, repository links, and real‑world log examples.

12306AutomationBackend
0 likes · 9 min read
How Python Bots Beat 12306 Ticket Crashes: Open‑Source Tools & Features
21CTO
21CTO
Dec 3, 2019 · Information Security

When Is Web Scraping Legal? A Developer’s Guide to Chinese Cyber Laws

This article explains the legal boundaries of web crawling in China, covering recent cybersecurity regulations, what makes a crawler illegal or legal, common developer questions, and practical advice to avoid personal‑data violations and criminal liability.

Chinese lawLegal ComplianceWeb Scraping
0 likes · 10 min read
When Is Web Scraping Legal? A Developer’s Guide to Chinese Cyber Laws
FunTester
FunTester
Nov 14, 2019 · Backend Development

Web Scraping CBA Match Data with Java: Methodology and Full Code Example

This article explains how to scrape Chinese Basketball Association (CBA) match data from a portal website, analyzes the page structure, extracts table rows using regular expressions, converts them to CSV format, and provides a complete Java/Groovy code example for automated data collection.

CBACSVJava
0 likes · 8 min read
Web Scraping CBA Match Data with Java: Methodology and Full Code Example
Python Programming Learning Circle
Python Programming Learning Circle
Nov 10, 2019 · Fundamentals

7 Fun Python Projects You Can Build in Minutes

This article presents seven practical Python scripts—from scraping Zhihu images and chatting bots to poetry author detection, lottery generation, auto‑drafting apologies, screen recording, and GIF creation—showcasing how to quickly automate diverse tasks without reinventing the wheel.

AIAutomationCode Examples
0 likes · 9 min read
7 Fun Python Projects You Can Build in Minutes
FunTester
FunTester
Oct 22, 2019 · Backend Development

How to Scrape 7.2 Million Historical Weather Records with Groovy

This article explains how to use a Groovy script to crawl over 7 million historical weather entries for 3,200 cities spanning 2011‑2019, process the JSON responses, and store the cleaned data into a MySQL table, while sharing practical tips and code snippets.

GroovyJavaWeather Data
0 likes · 7 min read
How to Scrape 7.2 Million Historical Weather Records with Groovy
MaGe Linux Operations
MaGe Linux Operations
Oct 5, 2019 · Fundamentals

How to Scrape and Analyze Holiday Tourist Spot Data with Python

This tutorial walks you through using Python to collect tourism data from Qunar, extract key fields such as name, price, and rating, store the results in Excel with pandas, and visualize sales and popularity trends using pyecharts, including a simple recommendation algorithm.

PyechartsPythonTourism
0 likes · 8 min read
How to Scrape and Analyze Holiday Tourist Spot Data with Python
Efficient Ops
Efficient Ops
Sep 29, 2019 · Backend Development

How to Scrape and Visualize 6,000+ Chinese Tourist Spots with Selenium and Python

This article demonstrates how to use Selenium and Python to crawl over 6,000 Chinese tourist attractions from Qunar, extract ratings, popularity and sales data, and visualize the results with pandas, seaborn, matplotlib, and pyecharts, revealing the most visited sites and regional travel trends during the 2019 National Day holiday.

Data visualizationPythonSelenium
0 likes · 9 min read
How to Scrape and Visualize 6,000+ Chinese Tourist Spots with Selenium and Python
21CTO
21CTO
Sep 28, 2019 · Backend Development

Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide

This article walks through the challenges of scraping Dazhong Dianping, explains how the site hides numeric data with custom CSS fonts, and provides a complete Python workflow—including HTTP requests, font extraction, glyph rendering, and OCR—to decode and retrieve the protected information.

CSS encryptionOCRPython
0 likes · 13 min read
Cracking Dazhong Dianping’s CSS Encryption: A Step‑by‑Step Web Scraping Guide
FunTester
FunTester
Sep 17, 2019 · Backend Development

Building a Multithreaded Java Web Scraper to Harvest 100k Records

After uncovering an unprotected API that allowed unlimited resource access, the author created a rough Java program that uses a fixed-size thread pool and CountDownLatch to fetch 100 000 items in parallel, retrieving 10 000 records per thread via HTTP GET requests.

HTTPJavaWeb Scraping
0 likes · 6 min read
Building a Multithreaded Java Web Scraper to Harvest 100k Records
FunTester
FunTester
Sep 12, 2019 · Backend Development

Scraping HTML Tables with Java Regex and Generating SQL Inserts

The article walks through a Java solution for extracting multilingual data from an HTML table using regular expressions, handling spacing and encoding issues, splitting fields, and constructing INSERT statements to populate a country_code database table.

BackendData ExtractionJava
0 likes · 6 min read
Scraping HTML Tables with Java Regex and Generating SQL Inserts
MaGe Linux Operations
MaGe Linux Operations
Jul 19, 2019 · Backend Development

How to Scrape High‑Resolution Images from ColorHub with Python

Learn a step‑by‑step Python solution to locate, download, and store high‑resolution, royalty‑free images from ColorHub by navigating its three‑tier page structure, generating request headers, parsing HTML with BeautifulSoup, and saving files locally, enabling offline PPT creation without copyright concerns.

AutomationImage DownloadPython
0 likes · 5 min read
How to Scrape High‑Resolution Images from ColorHub with Python
MaGe Linux Operations
MaGe Linux Operations
Jul 2, 2019 · Backend Development

Master Web Scraping with BeautifulSoup: A Complete Python Guide

This tutorial introduces BeautifulSoup, a powerful Python library for parsing HTML and XML, covering installation, basic usage, tag selection, attribute extraction, navigation of parent and sibling nodes, method and CSS selectors, and best‑practice recommendations for efficient web data extraction.

Data ExtractionPythonWeb Scraping
0 likes · 30 min read
Master Web Scraping with BeautifulSoup: A Complete Python Guide
Youzan Coder
Youzan Coder
Jun 5, 2019 · Backend Development

Building a Poster Rendering Service with Puppeteer

The article explains how to build a poster‑rendering service with Puppeteer, detailing its advantages over canvas, the Redis‑based caching and CDN workflow, optimization tricks for headless Chromium, and future plans to boost QPS and pre‑generate popular posters.

CDNCanvas APIPuppeteer
0 likes · 9 min read
Building a Poster Rendering Service with Puppeteer
360 Tech Engineering
360 Tech Engineering
May 20, 2019 · Fundamentals

A Data‑Driven Guide to Finding a Partner: From Crawling Zhihu Answers to Ranking Candidates

This article walks through a complete data‑analysis workflow—scraping Zhihu dating‑preference answers, cleaning and filtering the data, deriving gender and activity metrics, designing a four‑step screening process, and finally ranking candidates with a custom like‑to‑comment index—to help a single programmer create a concise, high‑quality list of potential partners.

MetricsWeb Scrapingdata analysis
0 likes · 9 min read
A Data‑Driven Guide to Finding a Partner: From Crawling Zhihu Answers to Ranking Candidates
MaGe Linux Operations
MaGe Linux Operations
May 6, 2019 · Big Data

How to Scrape Python Job Listings and Visualize Trends with pyecharts

This article walks through collecting Python job postings from Lagou by handling anti‑scraping measures, parsing POST requests, storing results in Excel, and then using pyecharts to create bar, map, and pie visualizations that reveal city distribution, salary ranges, and experience requirements.

PyechartsPythonWeb Scraping
0 likes · 13 min read
How to Scrape Python Job Listings and Visualize Trends with pyecharts
Tencent Cloud Developer
Tencent Cloud Developer
Mar 26, 2019 · Mobile Development

Building a WeChat Mini Program with Taro and Cloud Development: A Japanese Sentence Helper Case Study

The article explains how to create a WeChat Mini Program backend with Tencent Cloud development, use the React‑based Taro framework to build a Japanese sentence helper, consolidate multiple cloud functions via tcb-router, and scrape example sentences with superagent and cheerio, highlighting setup tips and known limitations.

ReactSuperagentTaro
0 likes · 7 min read
Building a WeChat Mini Program with Taro and Cloud Development: A Japanese Sentence Helper Case Study
Efficient Ops
Efficient Ops
Jan 21, 2019 · Big Data

Scraping and Visualizing China’s Tourist Spot Data: From Web Crawl to Insights

This article details a complete workflow for extracting nationwide tourist attraction data from Qunar, cleaning and enriching it with geographic coordinates, and performing multi‑level statistical analysis and visualizations—including sales rankings, popularity metrics, heatmaps, and word clouds—to reveal regional tourism patterns across China.

Data visualizationGeocodingTourism Data
0 likes · 15 min read
Scraping and Visualizing China’s Tourist Spot Data: From Web Crawl to Insights
MaGe Linux Operations
MaGe Linux Operations
Jan 14, 2019 · Backend Development

How to Build a Scrapy Spider to Crawl AutoHome Car Data in Python

This article walks through building a Python Scrapy spider to extract comprehensive car brand, series, and model data from Autohome, covering environment setup, project initialization, spider and item definitions, handling lazy-loaded pages, CSV output configuration, rate limiting, user‑agent rotation, and debugging tips.

AutohomeCar DataScrapy
0 likes · 10 min read
How to Build a Scrapy Spider to Crawl AutoHome Car Data in Python
Beike Product & Technology
Beike Product & Technology
Oct 12, 2018 · Fundamentals

Headless Browser Automation: Selenium vs Puppeteer

This article explores headless browser automation technologies including Selenium, PhantomJS, Puppeteer, and Headless Chrome, comparing their architectures, use cases, and implementation differences.

Automated TestingBrowser AutomationChrome DevTools Protocol
0 likes · 9 min read
Headless Browser Automation: Selenium vs Puppeteer