Tagged articles

html-parsing

33 articles · Page 1 of 1

Jun 4, 2025 · Big Data

How to Master Python Web Scraping with Pandas: From HTML to CSV in Minutes

This article walks through using Pandas to directly read HTML pages, extract table data, handle AJAX‑loaded JSON and CSV formats, and save results, providing concise code examples and visual steps for effective Python web scraping and data mining.

PandasPythonWeb Scraping

0 likes · 4 min read

How to Master Python Web Scraping with Pandas: From HTML to CSV in Minutes

Spring Full-Stack Practical Cases

Apr 25, 2025 · Backend Development

Master jsoup: Real‑World Spring Boot 3 Examples for HTML Parsing

This tutorial walks through practical jsoup usage within Spring Boot 3, covering dependency setup, parsing HTML from strings, fragments, URLs or files, extracting titles, links, images, applying CSS selectors, modifying elements, and sanitizing content to prevent XSS attacks.

JavaSpring BootWeb Scraping

0 likes · 10 min read

Master jsoup: Real‑World Spring Boot 3 Examples for HTML Parsing

Python Crawling & Data Mining

Apr 24, 2025 · Backend Development

How to Skip Table Headers with XPath in Python Web Scraping

This article explains how to use XPath in Python to skip the first table header row during web scraping, provides a concise code example, and discusses alternative approaches, helping readers efficiently extract desired list items from HTML structures.

Backend DevelopmentPythonWeb Scraping

0 likes · 3 min read

How to Skip Table Headers with XPath in Python Web Scraping

Code Mala Tang

Apr 19, 2025 · Fundamentals

Master HTML Parsing in Python: BeautifulSoup, lxml, and html.parser Compared

Learn why HTML parsing is essential for web scraping, explore three popular Python libraries—BeautifulSoup, lxml, and the built‑in html.parser—covering installation, core usage, advanced techniques, and a comparative analysis to help you choose the right tool for your project.

Pythonbeautifulsouphtml-parsing

0 likes · 11 min read

Master HTML Parsing in Python: BeautifulSoup, lxml, and html.parser Compared

Python Crawling & Data Mining

Apr 1, 2025 · Backend Development

How to Scrape NetEase Cloud Music Hot Tracks with Python and XPath

Learn how to extract song names and URLs from NetEase Cloud Music's hot tracks page using Python's requests library and XPath selectors, including handling malformed HTML, code examples, and tips for replacing interfering tags to ensure reliable scraping.

PythonWeb ScrapingXPath

0 likes · 5 min read

How to Scrape NetEase Cloud Music Hot Tracks with Python and XPath

Python Programming Learning Circle

Dec 28, 2024 · Backend Development

Getting Started with requests-html: Installation, Basic Usage, Advanced Features, and Web Scraping Examples

This article introduces the Python requests-html library, covering its installation, basic operations such as fetching pages, extracting links and elements, advanced capabilities like JavaScript rendering, pagination, custom requests, and provides practical web‑scraping examples for sites like Jianshu and Tianya.

Automationhtml-parsingrequests-html

0 likes · 16 min read

Getting Started with requests-html: Installation, Basic Usage, Advanced Features, and Web Scraping Examples

The Dominant Programmer

Mar 30, 2024 · Backend Development

Scrape Web Pages with Jsoup and Export to Excel Using EasyExcel in Java

This article demonstrates how to use Jsoup to fetch and parse HTML content from a web page, extract specific table data via CSS selectors, map the data to Java objects, and efficiently write the results to an Excel file with EasyExcel, highlighting memory advantages over POI.

EasyExcelExcel ExportWeb Scraping

0 likes · 9 min read

Scrape Web Pages with Jsoup and Export to Excel Using EasyExcel in Java

Open Source Linux

Nov 29, 2023 · Frontend Development

What Really Happens When You Click a Link? A Step‑by‑Step Journey from URL to Rendered Page

This article walks through the entire browser request lifecycle—from entering a URL, resolving the IP via caches and DNS, establishing a secure HTTPS connection, to the server’s response and the browser’s parsing of HTML, CSS, and JavaScript into a rendered page.

HTTPSWeb Browsingbrowser rendering

0 likes · 3 min read

What Really Happens When You Click a Link? A Step‑by‑Step Journey from URL to Rendered Page

Java High-Performance Architecture

Jun 13, 2023 · Backend Development

How to Scrape China’s County GDP Rankings with Java Jsoup and EasyExcel

This tutorial explains how to collect 2022 county‑level GDP and public budget data from the Chinese National Bureau of Statistics using Java's Jsoup library, transform the HTML tables into structured Excel files with EasyExcel, and provides complete source code and step‑by‑step analysis.

EasyExcelJavaWeb Scraping

0 likes · 12 min read

How to Scrape China’s County GDP Rankings with Java Jsoup and EasyExcel

Python Crawling & Data Mining

Sep 23, 2022 · Backend Development

How to Fix Common Python Web Scraping Errors and Extract Rankings

This article walks through a Python web‑crawling issue, explains typical errors like missing data and string comparison pitfalls, and provides a clear BeautifulSoup code example that extracts ranking information, helping readers resolve similar scraping problems.

Error handlingbeautifulsouphtml-parsing

0 likes · 3 min read

How to Fix Common Python Web Scraping Errors and Extract Rankings

Python Programming Learning Circle

Aug 10, 2022 · Backend Development

Python Web Scraping Tutorial: Using requests and BeautifulSoup to Extract Weather Data

This article demonstrates how to use Python's requests library and BeautifulSoup to inspect webpage source, set request headers, fetch weather page HTML, parse it with CSS selectors, extract daytime and nighttime temperatures, and extend the script to handle multiple cities, providing complete code examples.

html-parsingweb-scraping

0 likes · 7 min read

Python Web Scraping Tutorial: Using requests and BeautifulSoup to Extract Weather Data

Python Crawling & Data Mining

Jun 8, 2022 · Backend Development

How to Skip Table Headers with XPath in Python Web Scraping

This article explains how to use XPath in Python to bypass the first table header row during web scraping, provides a concise code example, and walks through the implementation steps so readers can adapt the technique to similar crawling tasks.

Backend DevelopmentXPathhtml-parsing

0 likes · 3 min read

Sohu Tech Products

May 18, 2022 · Fundamentals

Overview of a Web Page Content Extraction Algorithm and Its Practical Demo

This article introduces a web page content extraction algorithm that automatically structures titles, timestamps, body text, authors, and sources from arbitrary news pages, explains how to use an online demo, compares it with existing solutions, and discusses its broader applications and limitations.

Content ExtractionGNEWeb Scraping

0 likes · 8 min read

Overview of a Web Page Content Extraction Algorithm and Its Practical Demo

Baidu Geek Talk

Mar 21, 2022 · Frontend Development

How WebKit Parses HTML: Decoding, Tokenization, and DOM Tree Construction

The article details WebKit’s rendering pipeline in WKWebView, describing how the network process streams HTML bytes to the rendering process, which decodes them via TextResourceDecoder, tokenizes the characters with HTMLTokenizer’s state machine, and constructs an efficient DOM tree using HTMLTreeBuilder and queued insertion tasks.

DOMTokenizationWebKit

0 likes · 33 min read

How WebKit Parses HTML: Decoding, Tokenization, and DOM Tree Construction

Python Programming Learning Circle

Mar 8, 2022 · Backend Development

XPath Basics and Web Scraping with Python lxml: Concepts, Syntax, and Practical Examples

This tutorial explains the fundamental concepts and parsing principles of XPath, shows how to set up the Python lxml environment, demonstrates instantiating etree objects, details XPath expression syntax, and provides multiple real‑world web‑scraping examples with complete code snippets.

PythonWeb ScrapingXPath

0 likes · 9 min read

XPath Basics and Web Scraping with Python lxml: Concepts, Syntax, and Practical Examples

Baidu App Technology

Mar 7, 2022 · Mobile Development

How WKWebView Parses HTML: Decoding, Tokenization, and DOM Tree Construction

WKWebView parses HTML by streaming bytes from the network process to the rendering process, decoding them into characters, tokenizing into HTML tokens, building a DOM tree through node creation and insertion, and finally laying out and painting the document using a doubly‑linked in‑memory structure.

DOMTokenizationWKWebView

0 likes · 37 min read

How WKWebView Parses HTML: Decoding, Tokenization, and DOM Tree Construction

Programmer DD

Dec 28, 2021 · Backend Development

Master Web Scraping with Java: Getting Started with Jsoup

This article introduces Jsoup, an open‑source Java library for extracting and manipulating HTML, explains its key features such as DOM traversal and CSS selectors, and provides a concise code example that fetches Wikipedia headlines, helping developers automate web data collection.

JavaWeb Scrapingbackend

0 likes · 3 min read

Master Web Scraping with Java: Getting Started with Jsoup

MaGe Linux Operations

Sep 2, 2021 · Backend Development

Build a Python Baidu Baike Crawler: Step-by-Step Guide

This article demonstrates how to create a Python web crawler that fetches Baidu Baike entries, covering the main program structure, URL manager, page downloader, HTML parser using BeautifulSoup, and output generator, with complete code snippets and sample results.

PythonWeb Crawlerbaidu-baike

0 likes · 8 min read

Build a Python Baidu Baike Crawler: Step-by-Step Guide

Python Programming Learning Circle

Apr 12, 2021 · Backend Development

Common Regular Expressions and Methods for Python Web Scraping

This article presents a practical collection of Python regular‑expression techniques for extracting HTML elements such as table rows, links, titles, images, and scripts, showing how to filter tags and handle URL parameters during web crawling.

PythonWeb Scrapingdata extraction

0 likes · 20 min read

Common Regular Expressions and Methods for Python Web Scraping

Python Crawling & Data Mining

Apr 6, 2021 · Backend Development

Master BeautifulSoup: Quick Guide to Web Scraping with Python

This article introduces the BeautifulSoup library, explains how to install it, demonstrates core parsing methods such as find, find_all, select, and relationship navigation, and provides a complete example of scraping novel titles from Qidian using Python requests.

PythonWeb Scrapingbeautifulsoup

0 likes · 8 min read

Master BeautifulSoup: Quick Guide to Web Scraping with Python

MaGe Linux Operations

Sep 18, 2020 · Backend Development

Master Web Scraping with Python requests‑html: Install, Basics & Advanced Tips

This tutorial introduces Python's requests‑html library, covering installation, basic page fetching, link extraction, element selection with CSS and XPath, rendering JavaScript, pagination, direct HTML usage, custom request options, form login, and practical crawling examples.

Pythonhtml-parsingrequests-html

0 likes · 17 min read

Master Web Scraping with Python requests‑html: Install, Basics & Advanced Tips

Python Crawling & Data Mining

Aug 13, 2020 · Backend Development

Why a Python HTML Extractor’s Cache Failed: Garbage Collection Got You

The article explains how using an element's string representation as a cache key caused duplicate extraction in a Python news‑page parser, reveals the role of Python's garbage collection and memory reuse, and shows how switching to XPath keys resolves the bug.

Bug FixCachingGarbage Collection

0 likes · 7 min read

Why a Python HTML Extractor’s Cache Failed: Garbage Collection Got You

Xianyu Technology

Apr 16, 2020 · Mobile Development

Design and Implementation of RichText Mixed Content in Flutter for Xianyu Messaging

The article details Xianyu’s migration of its messaging rich‑text system to Flutter, explaining how RichText became a MultiChildRenderObjectWidget, how custom emoji placeholders are converted to HTML tags and parsed into TextSpan and WidgetSpan elements, enabling colored text, clickable links, and emoji rendering across Flutter versions.

EmojiFlutterMobile Development

0 likes · 9 min read

Design and Implementation of RichText Mixed Content in Flutter for Xianyu Messaging

Python Programming Learning Circle

Apr 10, 2020 · Fundamentals

Introduction to BeautifulSoup (bs4) for HTML/XML Parsing in Python

This article introduces BeautifulSoup, a Python library for parsing HTML/XML, explains how to import it, choose among parsers, demonstrates tag navigation, searching with find/find_all, CSS selection, and tree traversal methods, and provides extensive code examples.

beautifulsoupbs4html-parsing

0 likes · 13 min read

Introduction to BeautifulSoup (bs4) for HTML/XML Parsing in Python

Python Programming Learning Circle

Dec 12, 2019 · Backend Development

Master Web Scraping with Python: Requests + BeautifulSoup Step‑by‑Step

This tutorial walks you through using Python's requests library to fetch a web page and BeautifulSoup4 to parse HTML, covering object creation, common attributes, tag properties, and the find() / find_all() methods for extracting specific content.

Pythonbeautifulsoupfind_all

0 likes · 6 min read

Master Web Scraping with Python: Requests + BeautifulSoup Step‑by‑Step

WecTeam

Oct 17, 2019 · Fundamentals

How to Build a Simple HTML AST Parser in JavaScript

This article explains how to transform raw HTML strings into a structured abstract syntax tree (AST) using JavaScript regular expressions and a step‑by‑step parsing algorithm, covering tag, attribute, and text node handling, with a complete example implementation of a lightweight AST parser.

ASTJavaScriptcompiler fundamentals

0 likes · 16 min read

How to Build a Simple HTML AST Parser in JavaScript

MaGe Linux Operations

Jul 2, 2019 · Backend Development

Master Web Scraping with BeautifulSoup: A Complete Python Guide

This tutorial introduces BeautifulSoup, a powerful Python library for parsing HTML and XML, covering installation, basic usage, tag selection, attribute extraction, navigation of parent and sibling nodes, method and CSS selectors, and best‑practice recommendations for efficient web data extraction.

ParsingPythonWeb Scraping

0 likes · 30 min read

Master Web Scraping with BeautifulSoup: A Complete Python Guide

MaGe Linux Operations

Mar 18, 2019 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

Learn how to install and use Beautiful Soup 4 in Python to parse HTML, navigate the document tree, access tags, attributes, and text, and perform powerful searches with methods like find_all, CSS selectors, and traversal techniques for effective web scraping.

BeautifulSoup4Pythonbeautifulsoup

0 likes · 12 min read

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

MaGe Linux Operations

Feb 3, 2019 · Backend Development

Master Web Scraping with Beautiful Soup: A Step‑by‑Step Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag navigation, attribute handling, searching techniques like find_all, CSS selectors, and practical code examples for effective web data extraction.

Pythonbeautifulsoupdata extraction

0 likes · 13 min read

Master Web Scraping with Beautiful Soup: A Step‑by‑Step Python Guide

MaGe Linux Operations

Jul 28, 2018 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag and attribute access, tree traversal, searching techniques like find_all, find, CSS selectors, and practical code examples.

Web Scrapingbeautifulsoupdata extraction

0 likes · 11 min read

Python Crawling & Data Mining

Jan 18, 2018 · Backend Development

How to Accurately Scrape JD.com Product Data with BeautifulSoup

This tutorial shows how to use Python's urllib and BeautifulSoup libraries to encode search keywords, request JD.com pages, parse the HTML tree, and reliably extract product names, links, images, and prices, offering a simpler alternative to complex regular‑expression scrapers.

JD.comPythonWeb Scraping

0 likes · 4 min read

How to Accurately Scrape JD.com Product Data with BeautifulSoup

MaGe Linux Operations

Dec 20, 2017 · Fundamentals

Mastering XPath: Powerful Techniques for Precise Web Scraping

This guide explains how to use XPath efficiently for web scraping, covering node selection, axes, functions, numeric comparisons, and advanced combinations, while emphasizing concise and readable expressions to improve performance and maintainability.

PythonWeb ScrapingXML

0 likes · 5 min read

Mastering XPath: Powerful Techniques for Precise Web Scraping

21CTO

Dec 14, 2017 · Backend Development

Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling

Learn how to scrape static pages, AJAX content, iFrames, and handle cookies using Python libraries such as BeautifulSoup, Selenium, and PhantomJS, while mastering HTTP and URL error handling, CSS‑based element extraction, and practical code examples for robust web data extraction.

HTTP Errorsbeautifulsouphtml-parsing

0 likes · 7 min read

Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling