Backend Development 7 min read
Master Python Requests: Web Scraping Basics with GET, POST, and File Saving
This tutorial walks you through installing the Python requests library, using GET, POST, and PUT methods, handling query parameters, setting custom headers to bypass anti‑scraping measures, and saving both HTML content and images to local files, complete with runnable code examples.
Raymond Ops
Raymond Ops
Key Topics Covered
How web interaction works Using requests.get and requests.post Response object attributes and methods Opening and saving files in Python
Installing the requests Library
For Windows, run:
<code>pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests</code>For Linux, prepend
sudoif necessary:
<code>sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests</code>1. Crawl a Baidu Page
<code># First crawler example – fetch Baidu homepage
import requests
response = requests.get("http://www.baidu.com")
response.encoding = response.apparent_encoding
print("Status code:" + str(response.status_code))
print(response.text)
</code>2. GET Request Example
<code># GET request to httpbin.org
import requests
response = requests.get("http://httpbin.org/get")
print(response.status_code)
print(response.text)
</code>3. POST Request Example
<code># POST request to httpbin.org
import requests
response = requests.post("http://httpbin.org/post")
print(response.status_code)
print(response.text)
</code>4. PUT Request Example
<code># PUT request to httpbin.org
import requests
response = requests.put("http://httpbin.org/put")
print(response.status_code)
print(response.text)
</code>5. GET with URL Parameters
<code># GET with query string in URL
import requests
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")
print(response.status_code)
print(response.text)
</code>6. GET with Params Dictionary
<code># GET using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=data)
print(response.status_code)
print(response.text)
</code>7. POST with Params Dictionary
<code># POST using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.post("http://httpbin.org/post", params=data)
print(response.status_code)
print(response.text)
</code>8. Bypassing Anti‑Scraping (Example: Zhihu)
<code># First request without headers – likely blocked
import requests
response = requests.get("http://www.zhihu.com")
print("No headers status:" + str(response.status_code))
# Request with a realistic User‑Agent header
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
response = requests.get("http://www.zhihu.com", headers=headers)
print("With headers status:" + str(response.status_code))
print(response.text)
</code>9. Save Crawled HTML to a Local File
<code>import requests
url = "http://www.baidu.com"
response = requests.get(url)
response.encoding = "utf-8"
print("Response type:" + str(type(response)))
print("Status code:" + str(response.status_code))
print("Headers:" + str(response.headers))
print("Content:" + response.text)
# Write to file
with open("D:\\crawler\\baidu.html", "w", encoding="utf-8") as file:
file.write(response.text)
</code>10. Download an Image and Save It
<code>import requests
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")
with open("D:\\crawler\\baidu_logo.gif", "wb") as file:
file.write(response.content)
</code>Written by
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
0 followers
Reader feedback
How this landed with the community
Rate this article
Was this worth your time?
Discussion
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.