How to Scrape Sogou Wallpaper Images with Python: A Step‑by‑Step Guide
This tutorial walks you through using Python's requests and fake_useragent libraries to locate Sogou wallpaper JSON endpoints, extract image URLs, bypass basic anti‑scraping measures, and download the pictures to a local folder, complete with full code examples and explanations.
Introduction
This article demonstrates how to use Python to crawl Sogou wallpaper images, covering the whole process from finding the JSON API to downloading the pictures.
Project Goal
Teach readers how to obtain Sogou wallpapers and download their preferred categories.
Preparation
Software: PyCharm Required libraries: requests , fake_useragent , json
Finding the Real JSON URL
Open the Sogou wallpaper site, press F12, go to Network → XHR, refresh the page and locate the request URL in the Headers. The original URL looks like:
http://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category=%E5%A3%81%E7%BA%B8&tag=%E5%85%A8%E9%83%A8&start=0&len=15&width=1536&height=864After removing unnecessary parameters, the simplified URL is:
http://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category=%E5%A3%81%E7%BA%B8&tag=%E5%85%A8%E9%83%A8&start=0&len=15In this URL, category denotes the wallpaper category, start is the start index, and len is the number of images to fetch.
Extracting Image URLs
Open a JSON file in the preview, locate the pic_url field for each item, which contains the direct image address.
Anti‑Scraping Measures
Set realistic HTTP request headers when using requests.
Generate random User‑Agent strings with fake_useragent.
Implementation
Below is the core Python code.
import requests, json
from fake_useragent import UserAgent
class ShouGO(object):
def __init__(self):
pass
def main(self):
pass
if __name__ == '__main__':
Siper = ShouGO()
Siper.main()Generate random User‑Agents:
ua = UserAgent(verify_ssl=False)
for i in range(1, 50):
self.headers = {'User-Agent': ua.random}Define the method to fetch images:
def Shou(self, category, length, path):
n = length
cate = category
imgs = requests.get('http://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category=' + cate + '&tag=%E5%85%A8%E9%83%A8&start=0&len=' + str(n))
jd = json.loads(imgs.text)['all_items']
imgs_url = []
for j in jd:
imgs_url.append(j['pic_url'])
m = 0
for img_url in imgs_url:
print('***** ' + cate + str(m) + '.jpg ***** Downloading...')
img = requests.get(url=img_url, headers=self.headers).content
with open(path + cate + str(m) + '.jpg', 'wb') as f:
f.write(img)
m += 1
print('Download complete!')Call the method in main:
def main(self):
self.Shou('汽车', 2000, './壁纸2/')Result
Running the script prints download progress in the console and saves the images to the specified folder. Sample screenshots of the console output and saved pictures are shown below.
Conclusion
Avoid excessive crawling to prevent server overload.
The article provides a practical solution for scraping Sogou wallpapers and handling basic anti‑scraping techniques.
It also demonstrates string concatenation and list type conversion in Python.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
