Backend Development 7 min read

Master Python Requests: Web Scraping Basics with GET, POST, and File Saving

This tutorial walks you through installing the Python requests library, using GET, POST, and PUT methods, handling query parameters, setting custom headers to bypass anti‑scraping measures, and saving both HTML content and images to local files, complete with runnable code examples.

Raymond Ops

Dec 27, 2024

Master Python Requests: Web Scraping Basics with GET, POST, and File Saving

Key Topics Covered

How web interaction works Using requests.get and requests.post Response object attributes and methods Opening and saving files in Python

Installing the requests Library

For Windows, run:

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

For Linux, prepend sudo if necessary:

sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

1. Crawl a Baidu Page

# First crawler example – fetch Baidu homepage
import requests
response = requests.get("http://www.baidu.com")
response.encoding = response.apparent_encoding
print("Status code:" + str(response.status_code))
print(response.text)

2. GET Request Example

# GET request to httpbin.org
import requests
response = requests.get("http://httpbin.org/get")
print(response.status_code)
print(response.text)

3. POST Request Example

# POST request to httpbin.org
import requests
response = requests.post("http://httpbin.org/post")
print(response.status_code)
print(response.text)

4. PUT Request Example

# PUT request to httpbin.org
import requests
response = requests.put("http://httpbin.org/put")
print(response.status_code)
print(response.text)

5. GET with URL Parameters

# GET with query string in URL
import requests
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")
print(response.status_code)
print(response.text)

6. GET with Params Dictionary

# GET using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=data)
print(response.status_code)
print(response.text)

7. POST with Params Dictionary

# POST using params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.post("http://httpbin.org/post", params=data)
print(response.status_code)
print(response.text)

8. Bypassing Anti‑Scraping (Example: Zhihu)

# First request without headers – likely blocked
import requests
response = requests.get("http://www.zhihu.com")
print("No headers status:" + str(response.status_code))
# Request with a realistic User‑Agent header
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
response = requests.get("http://www.zhihu.com", headers=headers)
print("With headers status:" + str(response.status_code))
print(response.text)

9. Save Crawled HTML to a Local File

import requests
url = "http://www.baidu.com"
response = requests.get(url)
response.encoding = "utf-8"
print("Response type:" + str(type(response)))
print("Status code:" + str(response.status_code))
print("Headers:" + str(response.headers))
print("Content:" + response.text)
# Write to file
with open("D:\\crawler\\baidu.html", "w", encoding="utf-8") as file:
    file.write(response.text)

10. Download an Image and Save It

import requests
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")
with open("D:\\crawler\\baidu_logo.gif", "wb") as file:
    file.write(response.content)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python HTTP Tutorial requests file-io web-scraping

Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.