Backend Development 13 min read

Comprehensive Guide to Python urllib Library: Modules, Functions, and Usage Examples

This article provides a detailed tutorial on Python's urllib library, covering its main modules (request, error, parse, robotparser), key functions and classes, code examples for URL fetching, parsing, encoding, and handling robots.txt, making it a practical resource for backend developers and web scrapers.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Comprehensive Guide to Python urllib Library: Modules, Functions, and Usage Examples

Python's urllib library provides tools for handling URLs and fetching web content.

The library consists of several modules: urllib.request for opening and reading URLs, urllib.error for handling exceptions, urllib.parse for parsing and constructing URLs, and urllib.robotparser for interpreting robots.txt files.

urllib.request offers functions such as urlopen and the Request class, allowing custom headers, authentication, and timeout settings. Example:

<code>import urllib.request
url = urllib.request.urlopen("https://www.baidu.com")
print(url.read().decode('utf-8'))</code>

Common methods of the response object include read() , readline() , info() , getcode() , and geturl() .

urllib.error defines URLError and HTTPError exceptions, where URLError indicates network issues and HTTPError represents HTTP status errors.

Example handling:

<code>from urllib import request, error
try:
    response = request.urlopen("http://invalid.url")
except error.URLError as e:
    print(e.reason)
except error.HTTPError as e:
    print(e.code)</code>

urllib.parse provides functions for URL parsing ( urlparse , urlsplit ) and construction ( urlunparse , urlunsplit ), as well as encoding utilities ( quote , urlencode , unquote ). Example parsing:

<code>from urllib.parse import urlparse
o = urlparse("https://docs.python.org/3/library/urllib.parse.html")
print('scheme:', o.scheme)
print('netloc:', o.netloc)</code>

Encoding a query string:

<code>from urllib import parse
query = parse.urlencode({'wd':'爬虫'})
url = f"http://www.baidu.com/s?{query}"
print(url)</code>

urllib.robotparser parses robots.txt files to determine crawling permissions. It offers methods such as set_url , read , can_fetch , and others for managing crawl policies.

PythonnetworkingWeb Scrapingurlliburlparse
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.