Backend Development 7 min read

Master Python Requests: Web Scraping Basics with GET, POST, and File Saving

This tutorial walks you through installing the Python requests library, using GET, POST, and PUT methods, handling query parameters, setting custom headers to bypass anti‑scraping measures, and saving both HTML content and images to local files, complete with runnable code examples.

Raymond Ops
Raymond Ops
Raymond Ops
Master Python Requests: Web Scraping Basics with GET, POST, and File Saving

Key Topics Covered

How web interaction works Using requests.get and requests.post Response object attributes and methods Opening and saving files in Python

Installing the requests Library

For Windows, run:

<code>pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests</code>

For Linux, prepend

sudo

if necessary:

<code>sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests</code>

1. Crawl a Baidu Page

<code># First crawler example – fetch Baidu homepage
import requests
response = requests.get("http://www.baidu.com")
response.encoding = response.apparent_encoding
print("Status code:" + str(response.status_code))
print(response.text)
</code>

2. GET Request Example

<code># GET request to httpbin.org
import requests
response = requests.get("http://httpbin.org/get")
print(response.status_code)
print(response.text)
</code>

3. POST Request Example

<code># POST request to httpbin.org
import requests
response = requests.post("http://httpbin.org/post")
print(response.status_code)
print(response.text)
</code>

4. PUT Request Example

<code># PUT request to httpbin.org
import requests
response = requests.put("http://httpbin.org/put")
print(response.status_code)
print(response.text)
</code>

5. GET with URL Parameters

<code># GET with query string in URL
import requests
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")
print(response.status_code)
print(response.text)
</code>

6. GET with Params Dictionary

<code># GET using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=data)
print(response.status_code)
print(response.text)
</code>

7. POST with Params Dictionary

<code># POST using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.post("http://httpbin.org/post", params=data)
print(response.status_code)
print(response.text)
</code>

8. Bypassing Anti‑Scraping (Example: Zhihu)

<code># First request without headers – likely blocked
import requests
response = requests.get("http://www.zhihu.com")
print("No headers status:" + str(response.status_code))
# Request with a realistic User‑Agent header
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
response = requests.get("http://www.zhihu.com", headers=headers)
print("With headers status:" + str(response.status_code))
print(response.text)
</code>

9. Save Crawled HTML to a Local File

<code>import requests
url = "http://www.baidu.com"
response = requests.get(url)
response.encoding = "utf-8"
print("Response type:" + str(type(response)))
print("Status code:" + str(response.status_code))
print("Headers:" + str(response.headers))
print("Content:" + response.text)
# Write to file
with open("D:\\crawler\\baidu.html", "w", encoding="utf-8") as file:
    file.write(response.text)
</code>

10. Download an Image and Save It

<code>import requests
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")
with open("D:\\crawler\\baidu_logo.gif", "wb") as file:
    file.write(response.content)
</code>
pythonHTTPTutorialWeb ScrapingRequestsfile-io
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.