Master Python’s Requests Library: Essential HTTP Techniques for Web Scraping
This guide introduces Python’s Requests library, covering installation, GET and POST requests, handling headers, response codes, cookies, redirects, timeouts, and proxy settings, with practical code examples to help developers perform reliable HTTP operations and avoid common pitfalls in web scraping.
Requests library is the most commonly used HTTP request library in Python; mastering it is essential for web scraping and API interaction.
Requests
requestsis the most common HTTP request library in Python. Install it via pip or directly in PyCharm.
1. Response and Encoding
import requests
url = 'http://www.baidu.com'
r = requests.get(url)
print(type(r))
print(r.status_code)
print(r.encoding)
print(r.cookies)2. GET request
values = {'user':'aaa','id':'123'}
url = 'http://www.baidu.com'
r = requests.get(url, params=values)
print(r.url)
# Output: http://www.baidu.com/?user=aaa&id=1233. POST request
values = {'user':'aaa','id':'123'}
url = 'http://www.baidu.com'
r = requests.post(url, data=values)
print(r.url)
# Output: http://www.baidu.com/4. Handling request headers
user_agent = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36'}
header = {'User-Agent': user_agent}
url = 'http://www.baidu.com/'
r = requests.get(url, headers=header)
print(r.content)Note: Many servers verify the User-Agent header to ensure the request comes from a browser. It is advisable to spoof a browser User-Agent to avoid access denial, which is a common anti‑scraping measure.
5. Response code and headers
url = 'http://www.baidu.com'
r = requests.get(url)
if r.status_code == requests.codes.ok:
print(r.status_code)
print(r.headers)
print(r.headers.get('content-type'))
else:
r.raise_for_status()
# Sample output includes status code 200 and a dictionary of response headers.6. Cookie handling
url = 'https://www.zhihu.com/'
r = requests.get(url)
print(r.cookies)
print(r.cookies.keys())7. Redirects and history
Control redirects with the allow_redirects parameter.
r = requests.get(url, allow_redirects=True)
print(r.url)
print(r.status_code)
print(r.history)8. Timeout setting
url = 'http://www.baidu.com'
r = requests.get(url, timeout=2)9. Proxy settings
proxies = {
'http': 'http://www.baidu.com',
'http': 'http://www.qq.com',
'http': 'http://www.sohu.com'
}
url = 'http://www.baidu.com'
r = requests.get(url, proxies=proxies)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
