Master Web Scraping with Python: BeautifulSoup, Selenium & Error Handling
Learn how to scrape static pages, AJAX content, iFrames, and handle cookies using Python libraries such as BeautifulSoup, Selenium, and PhantomJS, while mastering HTTP and URL error handling, CSS‑based element extraction, and practical code examples for robust web data extraction.
About Web Scraping
Web scraping extracts data from the web for analysis, storage (CSV, XLS, databases) or further processing with libraries like NLTK.
How to Use BeautifulSoup
Assuming basic Python knowledge, install BeautifulSoup via pip: pip install beautifulsoup4 Verify installation: from bs4 import BeautifulSoup Run a test script (e.g., python myfile.py) to confirm no errors.
First Web Crawler
Example code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://www.python.org/")
res = BeautifulSoup(html.read(), "html5lib")
print(res.title)The script fetches the page, parses it into a BeautifulSoup object, and prints the title.
Handling HTTP Exceptions
Wrap the request in try/except to catch HTTPError and other issues:
from urllib.request import urlopen
from urllib.error import HTTPError, URLError
try:
html = urlopen("https://www.python.org/")
except HTTPError as e:
print(e)
except URLError:
print("Server down or incorrect domain")
else:
res = BeautifulSoup(html.read(), "html5lib")
print(res.title)Handling URL Exceptions
Similar handling for URLError when the site is unreachable.
Using BeautifulSoup for Category Search
Search elements by CSS class using find_all (findAll):
tags = res.find_all("h3", {"class": "post-title"})
for tag in tags:
print(tag.get_text())Demonstrates extracting text from specific headings.
All BeautifulSoup Examples
Additional find_all usages:
tags = res.find_all(["span", "a", "img"])
tags = res.find_all("a", {"class": ["url", "readmorebtn"]})
tags = res.find_all(text="Python Programming Basics withExamples")Use limit parameter or find() to retrieve a single element, and navigate child nodes:
tag = res.find("nav", {"id": "site-navigation"}).select("a")[3]These examples illustrate powerful HTML parsing capabilities of BeautifulSoup.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
