Master Python Requests: Web Scraping Basics with GET, POST, and File Saving
This tutorial walks you through installing the Python requests library, using GET, POST, and PUT methods, handling query parameters, setting custom headers to bypass anti‑scraping measures, and saving both HTML content and images to local files, complete with runnable code examples.
Key Topics Covered
How web interaction works Using requests.get and requests.post Response object attributes and methods Opening and saving files in Python
Installing the requests Library
For Windows, run:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requestsFor Linux, prepend sudo if necessary:
sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests1. Crawl a Baidu Page
# First crawler example – fetch Baidu homepage
import requests
response = requests.get("http://www.baidu.com")
response.encoding = response.apparent_encoding
print("Status code:" + str(response.status_code))
print(response.text)2. GET Request Example
# GET request to httpbin.org
import requests
response = requests.get("http://httpbin.org/get")
print(response.status_code)
print(response.text)3. POST Request Example
# POST request to httpbin.org
import requests
response = requests.post("http://httpbin.org/post")
print(response.status_code)
print(response.text)4. PUT Request Example
# PUT request to httpbin.org
import requests
response = requests.put("http://httpbin.org/put")
print(response.status_code)
print(response.text)5. GET with URL Parameters
# GET with query string in URL
import requests
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")
print(response.status_code)
print(response.text)6. GET with Params Dictionary
# GET using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=data)
print(response.status_code)
print(response.text)7. POST with Params Dictionary
# POST using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.post("http://httpbin.org/post", params=data)
print(response.status_code)
print(response.text)8. Bypassing Anti‑Scraping (Example: Zhihu)
# First request without headers – likely blocked
import requests
response = requests.get("http://www.zhihu.com")
print("No headers status:" + str(response.status_code))
# Request with a realistic User‑Agent header
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
response = requests.get("http://www.zhihu.com", headers=headers)
print("With headers status:" + str(response.status_code))
print(response.text)9. Save Crawled HTML to a Local File
import requests
url = "http://www.baidu.com"
response = requests.get(url)
response.encoding = "utf-8"
print("Response type:" + str(type(response)))
print("Status code:" + str(response.status_code))
print("Headers:" + str(response.headers))
print("Content:" + response.text)
# Write to file
with open("D:\\crawler\\baidu.html", "w", encoding="utf-8") as file:
file.write(response.text)10. Download an Image and Save It
import requests
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")
with open("D:\\crawler\\baidu_logo.gif", "wb") as file:
file.write(response.content)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
