Master Python’s Requests Library: Essential HTTP Techniques for Web Scraping

This guide introduces Python’s Requests library, covering installation, GET and POST requests, handling headers, response codes, cookies, redirects, timeouts, and proxy settings, with practical code examples to help developers perform reliable HTTP operations and avoid common pitfalls in web scraping.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Python’s Requests Library: Essential HTTP Techniques for Web Scraping

Requests library is the most commonly used HTTP request library in Python; mastering it is essential for web scraping and API interaction.

Requests

requests

is the most common HTTP request library in Python. Install it via pip or directly in PyCharm.

1. Response and Encoding

import requests
url = 'http://www.baidu.com'
r = requests.get(url)
print(type(r))
print(r.status_code)
print(r.encoding)
print(r.cookies)

2. GET request

values = {'user':'aaa','id':'123'}
url = 'http://www.baidu.com'
r = requests.get(url, params=values)
print(r.url)
# Output: http://www.baidu.com/?user=aaa&id=123

3. POST request

values = {'user':'aaa','id':'123'}
url = 'http://www.baidu.com'
r = requests.post(url, data=values)
print(r.url)
# Output: http://www.baidu.com/

4. Handling request headers

user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36'}
header = {'User-Agent': user_agent}
url = 'http://www.baidu.com/'
r = requests.get(url, headers=header)
print(r.content)

Note: Many servers verify the User-Agent header to ensure the request comes from a browser. It is advisable to spoof a browser User-Agent to avoid access denial, which is a common anti‑scraping measure.

5. Response code and headers

url = 'http://www.baidu.com'
r = requests.get(url)
if r.status_code == requests.codes.ok:
    print(r.status_code)
    print(r.headers)
    print(r.headers.get('content-type'))
else:
    r.raise_for_status()
# Sample output includes status code 200 and a dictionary of response headers.

6. Cookie handling

url = 'https://www.zhihu.com/'
r = requests.get(url)
print(r.cookies)
print(r.cookies.keys())

7. Redirects and history

Control redirects with the allow_redirects parameter.

r = requests.get(url, allow_redirects=True)
print(r.url)
print(r.status_code)
print(r.history)

8. Timeout setting

url = 'http://www.baidu.com'
r = requests.get(url, timeout=2)

9. Proxy settings

proxies = {
    'http': 'http://www.baidu.com',
    'http': 'http://www.qq.com',
    'http': 'http://www.sohu.com'
}
url = 'http://www.baidu.com'
r = requests.get(url, proxies=proxies)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAPIWeb ScrapingcookiesHeaders
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.