Tagged articles
143 articles
Page 2 of 2
Python Programming Learning Circle
Python Programming Learning Circle
Feb 21, 2020 · Backend Development

Introduction to Python Web Scraping: Basics, HTTP/HTTPS, Requests Library, Proxies, and Data Extraction

This article provides a comprehensive introduction to Python web scraping, covering the fundamental concepts of spiders, HTTP/HTTPS protocols, the Requests library usage, custom headers, proxies, cookies, and various data extraction techniques such as JSON parsing, XPath, and regular expressions.

Data ExtractionHTTPWeb Scraping
0 likes · 9 min read
Introduction to Python Web Scraping: Basics, HTTP/HTTPS, Requests Library, Proxies, and Data Extraction
21CTO
21CTO
Feb 19, 2020 · Backend Development

How to Use Python to Capture Chrome History and Email It Automatically

This tutorial shows how to extract a target computer's Chrome browsing history with Python, save it as a text file, and automatically send it via email using a QQ SMTP account, detailing the required environment, scripts, and common pitfalls.

Chrome HistoryData ExtractionPython
0 likes · 3 min read
How to Use Python to Capture Chrome History and Email It Automatically
MaGe Linux Operations
MaGe Linux Operations
Jan 3, 2020 · Backend Development

Master Web Scraping with Scrapy: A Complete Python Guide

This guide introduces Scrapy, a powerful Python web‑scraping framework, explains its architecture and components, walks through installation, project creation, spider development, query syntax, recursive crawling, and item pipelines, providing practical code examples for building robust crawlers.

CrawlerData ExtractionPython
0 likes · 8 min read
Master Web Scraping with Scrapy: A Complete Python Guide
MaGe Linux Operations
MaGe Linux Operations
Dec 27, 2019 · Backend Development

Master Scrapy: Build Powerful Python Web Crawlers Step‑by‑Step

This guide introduces the Scrapy framework, explains its architecture—including engine, scheduler, downloader, spiders, pipelines, and middlewares—covers installation, project setup, item definition, spider coding, pipeline handling, pagination, and provides practical code examples for extracting data from Douban books.

Data ExtractionItem PipelinePython
0 likes · 18 min read
Master Scrapy: Build Powerful Python Web Crawlers Step‑by‑Step
Programmer DD
Programmer DD
Dec 7, 2019 · Backend Development

Why Choose Java Over Python for Web Crawling? A Practical Guide

The article shares the author's journey from manual data collection to mastering Java web crawlers, explains why Java is preferred over Python, outlines the five-step crawling workflow, covers essential Java basics, HTTP fundamentals, and provides code examples for URL queuing, time parsing, and timestamp conversion.

Backend DevelopmentData ExtractionHTTP
0 likes · 12 min read
Why Choose Java Over Python for Web Crawling? A Practical Guide
FunTester
FunTester
Oct 19, 2019 · Backend Development

Building a Fast Historical‑Today Crawler with Java and MySQL

An open‑source Java crawler that fetches historical‑today events from a public API is presented, detailing three practical challenges—GET request length limits, ambiguous JSON value types, and month string construction—along with a full code example and a GitHub repository link for reference.

Data ExtractionGitHubHTTP
0 likes · 5 min read
Building a Fast Historical‑Today Crawler with Java and MySQL
FunTester
FunTester
Oct 9, 2019 · Backend Development

How to Build a Java/Groovy Web Crawler with Regex and MySQL Storage

This article demonstrates a Java‑based web crawler written in Groovy that uses regular‑expression parsing to retrieve paginated company data from a government portal, constructs SQL insert statements, and stores the results in MySQL, with full source code and structural screenshots.

Data ExtractionGroovyJava
0 likes · 6 min read
How to Build a Java/Groovy Web Crawler with Regex and MySQL Storage
FunTester
FunTester
Sep 15, 2019 · Backend Development

How to Build a Java HttpClient Spider for Scraping Movie Details and Download Links

This article explains how to update and use a Java HttpClient‑based spider that removes duplicate links, handles legacy page formats, extracts movie metadata and download URLs (magnet, ed2k, Baidu Pan), and stores the results in a MySQL database, with complete source code examples.

Data ExtractionHttpClientJava
0 likes · 12 min read
How to Build a Java HttpClient Spider for Scraping Movie Details and Download Links
FunTester
FunTester
Sep 12, 2019 · Backend Development

Scraping HTML Tables with Java Regex and Generating SQL Inserts

The article walks through a Java solution for extracting multilingual data from an HTML table using regular expressions, handling spacing and encoding issues, splitting fields, and constructing INSERT statements to populate a country_code database table.

BackendData ExtractionJava
0 likes · 6 min read
Scraping HTML Tables with Java Regex and Generating SQL Inserts
MaGe Linux Operations
MaGe Linux Operations
Jul 2, 2019 · Backend Development

Master Web Scraping with BeautifulSoup: A Complete Python Guide

This tutorial introduces BeautifulSoup, a powerful Python library for parsing HTML and XML, covering installation, basic usage, tag selection, attribute extraction, navigation of parent and sibling nodes, method and CSS selectors, and best‑practice recommendations for efficient web data extraction.

Data ExtractionPythonWeb Scraping
0 likes · 30 min read
Master Web Scraping with BeautifulSoup: A Complete Python Guide
MaGe Linux Operations
MaGe Linux Operations
Mar 18, 2019 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

Learn how to install and use Beautiful Soup 4 in Python to parse HTML, navigate the document tree, access tags, attributes, and text, and perform powerful searches with methods like find_all, CSS selectors, and traversal techniques for effective web scraping.

BeautifulSoup4Data ExtractionPython
0 likes · 12 min read
Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide
MaGe Linux Operations
MaGe Linux Operations
Feb 3, 2019 · Backend Development

Master Web Scraping with Beautiful Soup: A Step‑by‑Step Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag navigation, attribute handling, searching techniques like find_all, CSS selectors, and practical code examples for effective web data extraction.

Data ExtractionPythonbeautifulsoup
0 likes · 13 min read
Master Web Scraping with Beautiful Soup: A Step‑by‑Step Python Guide
Python Crawling & Data Mining
Python Crawling & Data Mining
Jan 25, 2019 · Backend Development

Master Web Crawlers: How Python Scrapes the Web Efficiently

As online information explodes, traditional data collection methods fall short, prompting the rise of Python web crawlers that use URLs and libraries like urllib, urllib2, and re, while frameworks boost efficiency, enabling fast, accurate, and automated extraction of web data for analysis.

Data Extractiondata miningweb scraper
0 likes · 5 min read
Master Web Crawlers: How Python Scrapes the Web Efficiently
Python Crawling & Data Mining
Python Crawling & Data Mining
Oct 1, 2018 · Fundamentals

Master Python Regex for Web Crawling: Quick Guide to ^, ., and *

This article explains why regular expressions are essential for Python web crawling, introduces the special characters ^, ., and *, and demonstrates their use with clear code examples and output screenshots, helping readers quickly grasp regex fundamentals for extracting patterns from HTML content.

Data ExtractionTutorialregex
0 likes · 5 min read
Master Python Regex for Web Crawling: Quick Guide to ^, ., and *
21CTO
21CTO
Sep 7, 2018 · Backend Development

Why Scaling Web Crawlers Is Harder Than You Think: Lessons from 1,000B Pages

This article outlines the major challenges of large‑scale e‑commerce product data extraction—such as ever‑changing site formats, scalable architecture, performance throughput, anti‑bot defenses, and data quality—and shares the hard‑won lessons Scrapinghub gained after crawling over a trillion product pages.

Data ExtractionScaleScrapy
0 likes · 15 min read
Why Scaling Web Crawlers Is Harder Than You Think: Lessons from 1,000B Pages
MaGe Linux Operations
MaGe Linux Operations
Jul 28, 2018 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag and attribute access, tree traversal, searching techniques like find_all, find, CSS selectors, and practical code examples.

Data ExtractionWeb Scrapingbeautifulsoup
0 likes · 11 min read
Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide
MaGe Linux Operations
MaGe Linux Operations
Apr 12, 2018 · Operations

Master Awk: Unlock Powerful Text Processing on the Command Line

This comprehensive guide explains Awk’s role as a versatile, stream‑oriented editor and pattern‑matching language, covering its command‑line syntax, records and fields, scripts, patterns, regular expressions, operators, statements, built‑in variables, functions, I/O handling, and practical code examples for Linux users.

Data ExtractionLinuxShell scripting
0 likes · 28 min read
Master Awk: Unlock Powerful Text Processing on the Command Line
MaGe Linux Operations
MaGe Linux Operations
Mar 5, 2018 · Fundamentals

Master AWK: Powerful Text Processing Techniques and Real-World Examples

This article introduces AWK—a versatile text‑analysis language—explaining its origins, core concepts, command‑line usage, built‑in variables, printing functions, programming constructs, conditionals, loops, and associative arrays, and provides practical Linux examples for extracting and summarizing data.

Data ExtractionShell scriptingawk
0 likes · 12 min read
Master AWK: Powerful Text Processing Techniques and Real-World Examples
MaGe Linux Operations
MaGe Linux Operations
Nov 20, 2017 · Backend Development

Mastering Web Crawlers: Core Principles, Architecture, and Modern Challenges

This article explains how web crawlers work—from initial URL seeding and request handling to flow control, content extraction, and handling dynamic pages—while covering essential modules, HTTP details, common obstacles like JavaScript rendering, anti‑scraping measures, and strategies for large‑scale, distributed crawling.

Data ExtractionDistributed SystemsHTTP
0 likes · 14 min read
Mastering Web Crawlers: Core Principles, Architecture, and Modern Challenges
MaGe Linux Operations
MaGe Linux Operations
Sep 18, 2017 · Backend Development

How to Scrape Meituan Waimai App Comments via AJAX and JavaScript

This guide walks you through analyzing the Meituan Waimai app's comment section, uncovering the AJAX requests that load data, constructing the proper URL pattern, looping through pages to fetch JSON, and storing the results in files or a database for further analysis.

Data ExtractionJavaScriptMeituan
0 likes · 5 min read
How to Scrape Meituan Waimai App Comments via AJAX and JavaScript
MaGe Linux Operations
MaGe Linux Operations
May 11, 2017 · Backend Development

How to Scrape and Export Your Sina Weibo Favorites with Python

This tutorial explains how to use Python to log into Sina Weibo, download your favorite posts, comments, images, and video links, save the raw HTML, and outlines the steps and script needed to automate the extraction and later processing of the collected data.

Data ExtractionPythonWeibo
0 likes · 3 min read
How to Scrape and Export Your Sina Weibo Favorites with Python
MaGe Linux Operations
MaGe Linux Operations
Mar 28, 2017 · Backend Development

Master Scrapy: Step-by-Step Guide to Install the Powerful Python Web Crawler

This article walks you through the complete installation process for Scrapy, the Python-based web crawling framework, covering prerequisite Python setup, required dependencies like lxml, setuptools, zope.interface, Twisted, pyOpenSSL, win32py, and finally verifying the installation, preparing you for large‑scale data extraction tasks.

Data ExtractionInstallationPython
0 likes · 4 min read
Master Scrapy: Step-by-Step Guide to Install the Powerful Python Web Crawler
21CTO
21CTO
Nov 20, 2016 · Backend Development

Mastering Web Crawlers: Strategies, Tools, and Practical Code Samples

This article explores the fundamentals and advanced techniques of building web crawlers, covering crawler types, essential features, RSS/ATOM harvesting, custom scraping methods, PHP header manipulation, regex extraction, and concurrency, providing actionable code examples for backend developers.

Backend DevelopmentData ExtractionRSS
0 likes · 9 min read
Mastering Web Crawlers: Strategies, Tools, and Practical Code Samples
21CTO
21CTO
Nov 9, 2016 · Backend Development

Unlocking the Power of Web Crawlers: How to Harvest Data Efficiently

This article explains what web crawlers are, why they’re essential for content recommendation systems, the technical approaches across languages, practical use‑cases like price monitoring and news aggregation, and best practices for building efficient, ethical crawlers.

Backend DevelopmentData ExtractionWeb Crawling
0 likes · 5 min read
Unlocking the Power of Web Crawlers: How to Harvest Data Efficiently
21CTO
21CTO
Dec 22, 2015 · Big Data

How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting

This article explains how to design and implement a distributed web‑crawling framework in Java that can collect, structure, and store massive amounts of data while handling anti‑scraping measures, duplicate detection, and real‑time monitoring.

Big DataData ExtractionJava
0 likes · 11 min read
How to Build a Scalable Distributed Web Crawler for Massive Data Harvesting
21CTO
21CTO
Oct 9, 2015 · Big Data

33 Open-Source Web Crawlers to Supercharge Your Data Collection

This article compiles 33 notable open‑source web crawler projects across multiple programming languages, detailing their core features, licensing, supported platforms, and typical use cases, helping developers choose the right tool for large‑scale data harvesting and analysis.

CC++Data Extraction
0 likes · 22 min read
33 Open-Source Web Crawlers to Supercharge Your Data Collection
Qunar Tech Salon
Qunar Tech Salon
Nov 11, 2014 · Fundamentals

A One‑Minute Guide to AWK: Basics, Syntax, and Common Use Cases

This article provides a concise, one‑minute introduction to AWK, covering its origin, line‑by‑line processing principle, basic syntax of pattern‑action, built‑in variables and functions, operators, control structures, and how to interact with the shell, illustrated with practical command‑line examples.

Data ExtractionShell scriptingawk
0 likes · 6 min read
A One‑Minute Guide to AWK: Basics, Syntax, and Common Use Cases