Backend Development 6 min read

urllib vs requests: Which Python Library Wins for Web Scraping?

This article compares Python's built‑in urllib library with the third‑party requests library, demonstrating their usage through code examples, highlighting differences in request construction, response handling, and practical considerations for web scraping, and concludes with recommendations for choosing the more convenient tool.

Python Crawling & Data Mining

May 19, 2021

urllib vs requests: Which Python Library Wins for Web Scraping?

Introduction

When using Python for web crawling, you need to simulate HTTP requests; the most common libraries are the built‑in urllib and the third‑party requests, with requests generally recommended because it wraps urllib for easier use.

urllib library

Overview

The urllib response object is created by first building an HTTP request object and passing it to urllib.request.urlopen, which returns an HTTP response whose .read().decode() yields a Unicode string.

from urllib import request
# request headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36"
}
wd = {"wd": "中国"}
url = "http://www.baidu.com/s?"
req = request.Request(url, headers=headers)
response = request.urlopen(req)
print(type(response))
print(response)
res = response.read().decode()
print(type(res))
print(res)

Running the code returns an HTML page (see image).

Note: When constructing HTTP requests for crawling, you often need to add extra headers such as User‑Agent, cookies, or proxy settings to bypass anti‑scraping mechanisms.

requests library

Overview

The requests library uses requests.get (or post) to send a request and returns a Response object; .text gives Unicode content, .content returns raw bytes, and .json() parses JSON.

Use .text for textual data and .content for binary files such as images.

Advantages of requests

For web crawling, requests is preferred because it can directly construct and send GET/POST requests, whereas urllib.request requires separate steps to build the request before sending.

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; BLA-AL00 Build/HUAWEIBLA-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36"
}
wd = {"wd": "中国"}
url = "http://www.baidu.com/s?"
response = requests.get(url, params=wd, headers=headers)
data = response.text
data2 = response.content
print(response)
print(type(response))
print(data)
print(type(data))
print(data2)
print(type(data2))
print(data2.decode())
print(type(data2.decode()))

The output shows the full HTML page (see image).

Conclusion

The article compares urllib and requests based on basic Python knowledge.

Using urllib.request requires manual construction of request headers and decoding of the response. requests is a higher‑level wrapper of urllib, making it more convenient for most crawling tasks.

Web scraping is a practical skill; trying it hands‑on helps overcome many real‑world challenges.

Collaboration and shared learning are encouraged.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python HTTP tutorial urllib

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.