Backend Development 8 min read

Master Python Requests: Step-by-Step Web Scraping and Data Saving Guide

This tutorial walks you through installing the Python requests library on Windows and Linux, demonstrates how to perform GET, POST, PUT requests, pass parameters, bypass anti‑scraping measures, and save fetched HTML or images to local files with clear, runnable code examples.

MaGe Linux Operations

Mar 23, 2024

Master Python Requests: Step-by-Step Web Scraping and Data Saving Guide

Below are the main knowledge points covered:

How web interaction works Using requests.get and requests.post Key attributes and methods of the Response object Opening and saving files in Python

Installation of the requests library

Windows users (Linux is the same): open a command prompt and run:

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

Linux users may need to prepend sudo if permissions are insufficient:

sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

1. Crawl a powerful page (e.g., Baidu) and print response info

# First crawler example – crawl Baidu page
import requests  # import the library
response = requests.get("http://www.baidu.com")  # generate a Response object
response.encoding = response.apparent_encoding  # set encoding
print("Status code:" + str(response.status_code))  # print status code
print(response.text)  # output the fetched HTML

2. GET method example

# Second GET example
import requests
response = requests.get("http://httpbin.org/get")
print(response.status_code)  # status code
print(response.text)  # response body

3. POST method example

# Third POST example
import requests
response = requests.post("http://httpbin.org/post")
print(response.status_code)
print(response.text)

4. PUT method example

# Fourth PUT example
import requests
response = requests.put("http://httpbin.org/put")
print(response.status_code)
print(response.text)

5. GET with query parameters (inline)

# Fifth GET with parameters in URL
import requests
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")
print(response.status_code)
print(response.text)

6. GET with parameters using a dictionary

# Sixth GET with params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=data)
print(response.status_code)
print(response.text)

7. POST with parameters (similar to GET)

# Seventh POST with params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.post("http://httpbin.org/post", params=data)
print(response.status_code)
print(response.text)

8. Bypassing anti‑scraping mechanisms (example with Zhihu)

# Example of setting a custom User‑Agent header
import requests
# First request without headers – likely blocked
response = requests.get("http://www.zhihu.com")
print("First request status:" + str(response.status_code))
# Set headers to mimic a browser
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
response = requests.get("http://www.zhihu.com", headers=headers)
print(response.status_code)  # should be 200
print(response.text)

9. Save fetched HTML to a local file

# Crawl an HTML page and save it locally
import requests
url = "http://www.baidu.com"
response = requests.get(url)
response.encoding = "utf-8"
print("Response type:" + str(type(response)))
print("Status code:" + str(response.status_code))
print("Headers:" + str(response.headers))
print("Content:" + response.text)
# Save to file
file = open("D:\\crawler\\baidu.html", "w", encoding="utf-8")
file.write(response.text)
file.close()

10. Download an image and save it locally

# Save Baidu logo image to local disk
import requests
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")
file = open("D:\\crawler\\baidu_logo.gif", "wb")
file.write(response.content)
file.close()

For more details, refer to the original article: https://www.cnblogs.com/h3zh1/p/12548946.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python HTTP File I/O Web Scraping requests Data Saving

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.