Backend Development 7 min read

How to Scrape Sogou Wallpaper Images with Python: A Step‑by‑Step Guide

This tutorial walks you through using Python's requests and fake_useragent libraries to locate Sogou wallpaper JSON endpoints, extract image URLs, bypass basic anti‑scraping measures, and download the pictures to a local folder, complete with full code examples and explanations.

Python Crawling & Data Mining

Sep 23, 2020

How to Scrape Sogou Wallpaper Images with Python: A Step‑by‑Step Guide

Introduction

This article demonstrates how to use Python to crawl Sogou wallpaper images, covering the whole process from finding the JSON API to downloading the pictures.

Project Goal

Teach readers how to obtain Sogou wallpapers and download their preferred categories.

Preparation

Software: PyCharm Required libraries: requests , fake_useragent , json

Finding the Real JSON URL

Open the Sogou wallpaper site, press F12, go to Network → XHR, refresh the page and locate the request URL in the Headers. The original URL looks like:

http://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category=%E5%A3%81%E7%BA%B8&tag=%E5%85%A8%E9%83%A8&start=0&len=15&width=1536&height=864

After removing unnecessary parameters, the simplified URL is:

http://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category=%E5%A3%81%E7%BA%B8&tag=%E5%85%A8%E9%83%A8&start=0&len=15

In this URL, category denotes the wallpaper category, start is the start index, and len is the number of images to fetch.

Extracting Image URLs

Open a JSON file in the preview, locate the pic_url field for each item, which contains the direct image address.

Anti‑Scraping Measures

Set realistic HTTP request headers when using requests.

Generate random User‑Agent strings with fake_useragent.

Implementation

Below is the core Python code.

import requests, json
from fake_useragent import UserAgent

class ShouGO(object):
    def __init__(self):
        pass

    def main(self):
        pass

if __name__ == '__main__':
    Siper = ShouGO()
    Siper.main()

Generate random User‑Agents:

ua = UserAgent(verify_ssl=False)
for i in range(1, 50):
    self.headers = {'User-Agent': ua.random}

Define the method to fetch images:

def Shou(self, category, length, path):
    n = length
    cate = category
    imgs = requests.get('http://pic.sogou.com/pics/channel/getAllRecomPicByTag.jsp?category=' + cate + '&tag=%E5%85%A8%E9%83%A8&start=0&len=' + str(n))
    jd = json.loads(imgs.text)['all_items']
    imgs_url = []
    for j in jd:
        imgs_url.append(j['pic_url'])
    m = 0
    for img_url in imgs_url:
        print('***** ' + cate + str(m) + '.jpg *****   Downloading...')
        img = requests.get(url=img_url, headers=self.headers).content
        with open(path + cate + str(m) + '.jpg', 'wb') as f:
            f.write(img)
        m += 1
    print('Download complete!')

Call the method in main:

def main(self):
    self.Shou('汽车', 2000, './壁纸2/')

Result

Running the script prints download progress in the console and saves the images to the specified folder. Sample screenshots of the console output and saved pictures are shown below.

Conclusion

Avoid excessive crawling to prevent server overload.

The article provides a practical solution for scraping Sogou wallpapers and handling basic anti‑scraping techniques.

It also demonstrates string concatenation and list type conversion in Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fake_useragent Sogou web-scraping image-download

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.