Master Python Requests: Step-by-Step Web Scraping and Data Saving Guide
This tutorial walks you through installing the Python requests library on Windows and Linux, demonstrates how to perform GET, POST, PUT requests, pass parameters, bypass anti‑scraping measures, and save fetched HTML or images to local files with clear, runnable code examples.
Below are the main knowledge points covered:
How web interaction works Using requests.get and requests.post Key attributes and methods of the Response object Opening and saving files in Python
Installation of the requests library
Windows users (Linux is the same): open a command prompt and run:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requestsLinux users may need to prepend sudo if permissions are insufficient:
sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests1. Crawl a powerful page (e.g., Baidu) and print response info
# First crawler example – crawl Baidu page
import requests # import the library
response = requests.get("http://www.baidu.com") # generate a Response object
response.encoding = response.apparent_encoding # set encoding
print("Status code:" + str(response.status_code)) # print status code
print(response.text) # output the fetched HTML2. GET method example
# Second GET example
import requests
response = requests.get("http://httpbin.org/get")
print(response.status_code) # status code
print(response.text) # response body3. POST method example
# Third POST example
import requests
response = requests.post("http://httpbin.org/post")
print(response.status_code)
print(response.text)4. PUT method example
# Fourth PUT example
import requests
response = requests.put("http://httpbin.org/put")
print(response.status_code)
print(response.text)5. GET with query parameters (inline)
# Fifth GET with parameters in URL
import requests
response = requests.get("http://httpbin.org/get?name=hezhi&age=20")
print(response.status_code)
print(response.text)6. GET with parameters using a dictionary
# Sixth GET with params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=data)
print(response.status_code)
print(response.text)7. POST with parameters (similar to GET)
# Seventh POST with params dict
import requests
data = {"name": "hezhi", "age": 20}
response = requests.post("http://httpbin.org/post", params=data)
print(response.status_code)
print(response.text)8. Bypassing anti‑scraping mechanisms (example with Zhihu)
# Example of setting a custom User‑Agent header
import requests
# First request without headers – likely blocked
response = requests.get("http://www.zhihu.com")
print("First request status:" + str(response.status_code))
# Set headers to mimic a browser
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"}
response = requests.get("http://www.zhihu.com", headers=headers)
print(response.status_code) # should be 200
print(response.text)9. Save fetched HTML to a local file
# Crawl an HTML page and save it locally
import requests
url = "http://www.baidu.com"
response = requests.get(url)
response.encoding = "utf-8"
print("Response type:" + str(type(response)))
print("Status code:" + str(response.status_code))
print("Headers:" + str(response.headers))
print("Content:" + response.text)
# Save to file
file = open("D:\\crawler\\baidu.html", "w", encoding="utf-8")
file.write(response.text)
file.close()10. Download an image and save it locally
# Save Baidu logo image to local disk
import requests
response = requests.get("https://www.baidu.com/img/baidu_jgylogo3.gif")
file = open("D:\\crawler\\baidu_logo.gif", "wb")
file.write(response.content)
file.close()For more details, refer to the original article: https://www.cnblogs.com/h3zh1/p/12548946.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
