Tagged articles
33 articles
Page 1 of 1
Code Mala Tang
Code Mala Tang
Apr 19, 2025 · Fundamentals

Master HTML Parsing in Python: BeautifulSoup, lxml, and html.parser Compared

Learn why HTML parsing is essential for web scraping, explore three popular Python libraries—BeautifulSoup, lxml, and the built‑in html.parser—covering installation, core usage, advanced techniques, and a comparative analysis to help you choose the right tool for your project.

Pythonbeautifulsouphtml-parsing
0 likes · 11 min read
Master HTML Parsing in Python: BeautifulSoup, lxml, and html.parser Compared
Python Programming Learning Circle
Python Programming Learning Circle
Dec 28, 2024 · Backend Development

Getting Started with requests-html: Installation, Basic Usage, Advanced Features, and Web Scraping Examples

This article introduces the Python requests-html library, covering its installation, basic operations such as fetching pages, extracting links and elements, advanced capabilities like JavaScript rendering, pagination, custom requests, and provides practical web‑scraping examples for sites like Jianshu and Tianya.

automationhtml-parsingrequests-html
0 likes · 16 min read
Getting Started with requests-html: Installation, Basic Usage, Advanced Features, and Web Scraping Examples
Python Programming Learning Circle
Python Programming Learning Circle
Aug 10, 2022 · Backend Development

Python Web Scraping Tutorial: Using requests and BeautifulSoup to Extract Weather Data

This article demonstrates how to use Python's requests library and BeautifulSoup to inspect webpage source, set request headers, fetch weather page HTML, parse it with CSS selectors, extract daytime and nighttime temperatures, and extend the script to handle multiple cities, providing complete code examples.

html-parsingweb-scraping
0 likes · 7 min read
Python Web Scraping Tutorial: Using requests and BeautifulSoup to Extract Weather Data
Sohu Tech Products
Sohu Tech Products
May 18, 2022 · Fundamentals

Overview of a Web Page Content Extraction Algorithm and Its Practical Demo

This article introduces a web page content extraction algorithm that automatically structures titles, timestamps, body text, authors, and sources from arbitrary news pages, explains how to use an online demo, compares it with existing solutions, and discusses its broader applications and limitations.

Content ExtractionGNEWeb Scraping
0 likes · 8 min read
Overview of a Web Page Content Extraction Algorithm and Its Practical Demo
Baidu Geek Talk
Baidu Geek Talk
Mar 21, 2022 · Frontend Development

How WebKit Parses HTML: Decoding, Tokenization, and DOM Tree Construction

The article details WebKit’s rendering pipeline in WKWebView, describing how the network process streams HTML bytes to the rendering process, which decodes them via TextResourceDecoder, tokenizes the characters with HTMLTokenizer’s state machine, and constructs an efficient DOM tree using HTMLTreeBuilder and queued insertion tasks.

DOMWebKitbrowser engine
0 likes · 33 min read
How WebKit Parses HTML: Decoding, Tokenization, and DOM Tree Construction
Programmer DD
Programmer DD
Dec 28, 2021 · Backend Development

Master Web Scraping with Java: Getting Started with Jsoup

This article introduces Jsoup, an open‑source Java library for extracting and manipulating HTML, explains its key features such as DOM traversal and CSS selectors, and provides a concise code example that fetches Wikipedia headlines, helping developers automate web data collection.

BackendData ExtractionWeb Scraping
0 likes · 3 min read
Master Web Scraping with Java: Getting Started with Jsoup
MaGe Linux Operations
MaGe Linux Operations
Sep 2, 2021 · Backend Development

Build a Python Baidu Baike Crawler: Step-by-Step Guide

This article demonstrates how to create a Python web crawler that fetches Baidu Baike entries, covering the main program structure, URL manager, page downloader, HTML parser using BeautifulSoup, and output generator, with complete code snippets and sample results.

PythonWeb Crawlerbaidu-baike
0 likes · 8 min read
Build a Python Baidu Baike Crawler: Step-by-Step Guide
Xianyu Technology
Xianyu Technology
Apr 16, 2020 · Mobile Development

Design and Implementation of RichText Mixed Content in Flutter for Xianyu Messaging

The article details Xianyu’s migration of its messaging rich‑text system to Flutter, explaining how RichText became a MultiChildRenderObjectWidget, how custom emoji placeholders are converted to HTML tags and parsed into TextSpan and WidgetSpan elements, enabling colored text, clickable links, and emoji rendering across Flutter versions.

EmojiFlutterMobile Development
0 likes · 9 min read
Design and Implementation of RichText Mixed Content in Flutter for Xianyu Messaging
WecTeam
WecTeam
Oct 17, 2019 · Fundamentals

How to Build a Simple HTML AST Parser in JavaScript

This article explains how to transform raw HTML strings into a structured abstract syntax tree (AST) using JavaScript regular expressions and a step‑by‑step parsing algorithm, covering tag, attribute, and text node handling, with a complete example implementation of a lightweight AST parser.

ASTJavaScriptcompiler fundamentals
0 likes · 16 min read
How to Build a Simple HTML AST Parser in JavaScript
MaGe Linux Operations
MaGe Linux Operations
Jul 2, 2019 · Backend Development

Master Web Scraping with BeautifulSoup: A Complete Python Guide

This tutorial introduces BeautifulSoup, a powerful Python library for parsing HTML and XML, covering installation, basic usage, tag selection, attribute extraction, navigation of parent and sibling nodes, method and CSS selectors, and best‑practice recommendations for efficient web data extraction.

Data ExtractionPythonWeb Scraping
0 likes · 30 min read
Master Web Scraping with BeautifulSoup: A Complete Python Guide
MaGe Linux Operations
MaGe Linux Operations
Mar 18, 2019 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

Learn how to install and use Beautiful Soup 4 in Python to parse HTML, navigate the document tree, access tags, attributes, and text, and perform powerful searches with methods like find_all, CSS selectors, and traversal techniques for effective web scraping.

BeautifulSoup4Data ExtractionPython
0 likes · 12 min read
Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide
MaGe Linux Operations
MaGe Linux Operations
Feb 3, 2019 · Backend Development

Master Web Scraping with Beautiful Soup: A Step‑by‑Step Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag navigation, attribute handling, searching techniques like find_all, CSS selectors, and practical code examples for effective web data extraction.

Data ExtractionPythonbeautifulsoup
0 likes · 13 min read
Master Web Scraping with Beautiful Soup: A Step‑by‑Step Python Guide
MaGe Linux Operations
MaGe Linux Operations
Jul 28, 2018 · Backend Development

Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide

This article introduces Beautiful Soup, a Python library for parsing HTML/XML into a navigable tree, covering installation, object initialization, tag and attribute access, tree traversal, searching techniques like find_all, find, CSS selectors, and practical code examples.

Data ExtractionWeb Scrapingbeautifulsoup
0 likes · 11 min read
Master Web Scraping with Beautiful Soup: A Hands‑On Python Guide
21CTO
21CTO
Dec 14, 2017 · Backend Development

Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling

Learn how to scrape static pages, AJAX content, iFrames, and handle cookies using Python libraries such as BeautifulSoup, Selenium, and PhantomJS, while mastering HTTP and URL error handling, CSS‑based element extraction, and practical code examples for robust web data extraction.

HTTP Errorsbeautifulsouphtml-parsing
0 likes · 7 min read
Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling