Master Python’s Requests: From Basics to Advanced Web Scraping Techniques
This tutorial introduces Python’s Requests library, covering installation, core methods like GET, POST, PUT, PATCH, DELETE, detailed parameters, session handling, exception management, header customization, proxy usage, and practical code examples to empower effective web scraping.
Requests is a Python library that simplifies HTTP GET and POST requests, wrapping the basic urllib module.
Install it via pip install requests or easy_install requests.
Basic Usage
Requests provides several convenient methods:
requests.request() : Construct a generic request.
requests.get() : Send a GET request and receive a response.
requests.head() : Retrieve only the response headers.
requests.post() : Submit data to the server, often used for form submissions.
requests.put() : Replace the target document with new data.
requests.patch() : Apply partial updates to a resource.
requests.delete() : Request the server to delete a specified resource.
request() Method Parameters
The request() method accepts many arguments, such as url, params, timeout, headers, auth, verify, proxies, cookies, allow_redirects, stream, and cert. These control the request URL, query parameters, timeout, custom headers, authentication, SSL verification, proxy settings, cookie handling, redirect behavior, streaming, and client certificates.
GET Method
GET is typically used to retrieve data. It returns a Response object with useful attributes: response.url: The final URL. response.status_code: HTTP status code. response.encoding: Detected encoding. response.cookies: Cookie information. response.headers: Response headers. response.content: Raw bytes. response.text: Decoded string. response.json(): Parsed JSON as a dictionary.
POST Method
POST is commonly used for form submissions, file uploads, or sending JSON payloads.
It can also upload files (illustrated below) and send JSON data.
PUT Method
PUT replaces the content of a specified document on the server with data from the client.
PATCH Method
PATCH submits partial updates to a URL.
DELETE Method
DELETE requests the server to remove the specified resource.
Advanced Operations
Session Persistence
# Simulate Taobao login
import requests
url='https://login.taobao.com/member/login.jhtml?redirectURL=https%3A%2F%2Fai.taobao.com%2F%3Fpid%3Dmm_26632323_6762370_25910879'
headers={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
formdata={'TPL_username':'fsdafdfasf','TPL_password':'fsadfasf'}
se=requests.session() # create session
ss=se.post(url=url, headers=headers, data=formdata)
if ss.status_code==200:
print('登录成功')
else:
print('登录失败')Exception Handling
Common exceptions include Timeout, ConnectionError, and TooManyRedirects. All explicit exceptions inherit from requests.exceptions.RequestException.
Example of a failed request (illustrated below):
Certificate Verification
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
rep = requests.get("https://www.baidu.com", verify=False)
print(rep.status_code)Cookie Parsing
cookie={'Cookie':'_NTES_PASSPORT=...'}
for i in Cookie.split(';'):
k,v = i.split('=')
cookie[k]=v
for k,v in cookie.items():
print(k, ':', v)
# Convert dict to CookieJar and back
cookiesJar = requests.utils.cookiejar_from_dict(cookie, cookiejar=None, overwrite=True)
print(requests.utils.dict_from_cookiejar(cookiesJar))Browser Emulation (Headers)
Common request headers and their purposes:
Accept : Content types the client can handle (e.g., text/html, application/xml).
Accept-Encoding : Compression algorithms supported (e.g., gzip, deflate).
Accept-Language : Preferred languages (e.g., zh-CN, en-US).
User-Agent : Identifies the client software, OS, and browser version.
Connection : Indicates whether to keep the TCP connection alive.
Host : The target server’s domain name.
Referer : The URL of the page that linked to the requested resource.
Using Proxy Servers
import urllib.request
import http.cookiejar
url = "https://www.baidu.com"
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Cookie':'BAIDUID=...; ...',
'Host':'www.baidu.com',
'Sec-Fetch-Mode':'navigate',
'Sec-Fetch-Site':'cross-site',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
jar = http.cookiejar.CookieJar()
proxy = urllib.request.ProxyHandler({'http': "127.0.0.1:8000"})
opener = urllib.request.build_opener(proxy, urllib.request.HTTPHandler, urllib.request.HTTPCookieProcessor(jar))
head = []
for k,v in headers.items():
head.append((k,v))
opener.addheaders = head
urllib.request.install_opener(opener)
data = urllib.request.urlopen(url).read()
with open(r"C:\Users\Administrator\Desktop\et.html", "wb") as f:
f.write(data)Without Proxy Server
import urllib.request
import http.cookiejar
url = "https://www.baidu.com"
headers = { ... same as above ... }
jar = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPHandler, urllib.request.HTTPCookieProcessor(jar))
head = []
for k,v in headers.items():
head.append((k,v))
opener.addheaders = head
urllib.request.install_opener(opener)
data = urllib.request.urlopen(url).read()
with open(r"C:\Users\Administrator\Desktop\et.html", "wb") as f:
f.write(data)Conclusion
This article examined seven commonly used methods of the requests library, providing code snippets and explanations to help readers effectively perform web scraping and HTTP interactions with Python.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
