Backend Development 7 min read

Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling

Learn how to scrape static pages, AJAX content, iFrames, and handle cookies using Python libraries such as BeautifulSoup, Selenium, and PhantomJS, while mastering HTTP and URL error handling, CSS‑based element extraction, and practical code examples for robust web data extraction.

21CTO

Dec 14, 2017

Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling

About Web Scraping

Web scraping extracts data from the web for analysis, storage (CSV, XLS, databases) or further processing with libraries like NLTK.

How to Use BeautifulSoup

Assuming basic Python knowledge, install BeautifulSoup via pip: pip install beautifulsoup4 Verify installation: from bs4 import BeautifulSoup Run a test script (e.g., python myfile.py) to confirm no errors.

First Web Crawler

Example code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://www.python.org/")
res = BeautifulSoup(html.read(), "html5lib")
print(res.title)

The script fetches the page, parses it into a BeautifulSoup object, and prints the title.

Handling HTTP Exceptions

Wrap the request in try/except to catch HTTPError and other issues:

from urllib.request import urlopen
from urllib.error import HTTPError, URLError
try:
    html = urlopen("https://www.python.org/")
except HTTPError as e:
    print(e)
except URLError:
    print("Server down or incorrect domain")
else:
    res = BeautifulSoup(html.read(), "html5lib")
    print(res.title)

Handling URL Exceptions

Similar handling for URLError when the site is unreachable.

Using BeautifulSoup for Category Search

Search elements by CSS class using find_all (findAll):

tags = res.find_all("h3", {"class": "post-title"})
for tag in tags:
    print(tag.get_text())

Demonstrates extracting text from specific headings.

All BeautifulSoup Examples

Additional find_all usages:

tags = res.find_all(["span", "a", "img"])
tags = res.find_all("a", {"class": ["url", "readmorebtn"]})
tags = res.find_all(text="Python Programming Basics withExamples")

Use limit parameter or find() to retrieve a single element, and navigate child nodes:

tag = res.find("nav", {"id": "site-navigation"}).select("a")[3]

These examples illustrate powerful HTML parsing capabilities of BeautifulSoup.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

html-parsing beautifulsoup HTTP Errors

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.