Backend Development 8 min read

How to Scrape Douban TV Shows with Python: From Requests to CSV

This tutorial walks through building a Python web scraper that fetches Douban TV show titles, ratings, detail links, and images, parses JSON responses, handles anti‑scraping measures, and stores the results in a CSV file with step‑by‑step code examples.

Python Crawling & Data Mining

Jun 18, 2020

How to Scrape Douban TV Shows with Python: From Requests to CSV

Project Background

Douban Movie provides the latest movie information, reviews, showtimes, ticketing, and lets users record movies they want to watch, are watching, or have watched, as well as rate and write reviews, greatly facilitating daily life.

Project Goal

Obtain the movie name, rating, detail link, download the movie image, and save everything to a document.

Libraries and Websites

URL template:

https://movie.douban.com/j/search_subjects?type=tv&tag=美剧&sort=recommend&page_limit=20&page_start={}

Libraries used: requests , fake_useragent , json , csv .

IDE: PyCharm.

Project Analysis

1. How to request multiple pages?

Increase the page_start parameter by 20 for each page and iterate with a for loop.

2. How to get the real request URL?

Douban loads data via JavaScript; use browser dev tools (Network tab) to find the request URL and preview the JSON.

In the JSON, title is the movie name and rate is the rating.

3. How to construct page URLs?

Example URLs:

https://movie.douban.com/j/search_subjects?type=tv&tag=美剧&sort=recommend&page_limit=20&page_start=0
https://movie.douban.com/j/search_subjects?type=tv&tag=美剧&sort=recommend&page_limit=20&page_start=20
https://movie.douban.com/j/search_subjects?type=tv&tag=美剧&sort=recommend&page_limit=20&page_start=40
https://movie.douban.com/j/search_subjects?type=tv&tag=美剧&sort=recommend&page_limit=20&page_start=60

Implementation

Class definition

import requests, json
from fake_useragent import UserAgent
import csv

class Doban(object):
    def __init__(self):
        self.url = "https://movie.douban.com/j/search_subjects?type=tv&tag=美剧&sort=recommend&page_limit=20&page_start={}"
    def main(self):
        pass

if __name__ == '__main__':
    Siper = Doban()
    Siper.main()

Random UserAgent

for i in range(1, 50):
    self.headers = {
        'User-Agent': ua.random,
    }

Send request and get page

def get_page(self, url):
    res = requests.get(url=url, headers=self.headers)
    html = res.content.decode("utf-8")
    return html

Parse JSON

data = json.loads(html)['subjects']
# print(data[0])

Iterate and extract fields

print(name, goblin_herf)
html2 = self.get_page(goblin_herf)  # second request
parse_html2 = etree.HTML(html2)
r = parse_html2.xpath('//div[@class="entry"]/p/text()')

Write to CSV

# Create CSV file
csv_file = open('scr.csv', 'a', encoding='gbk')
csv_writer = csv.writer(csv_file)
# Write header
csv_writer.writerow(['电影', '评分', '详情页'])
# Write data
csv_writer.writerow([id, rate, urll])

Download images

html2 = requests.get(url=urll, headers=self.headers).content
dirname = "./图/" + id + ".jpg"
with open(dirname, 'wb') as f:
    f.write(html2)
    print("%s 【下载成功！！！！】" % id)

Run workflow

html = self.get_page(url)
self.parse_page(html)

Optimization

Set time delay: time.sleep(1.4) Use a variable u to track page number.

Results

Running the script shows progress, saves the CSV file, and downloads movie images as demonstrated in the screenshots.

Conclusion

Do not scrape excessive data to avoid overloading the server. This tutorial covered the main challenges of parsing JSON, handling dynamic content, and avoiding anti‑scraping measures. It also demonstrated basic CSV handling, string formatting, and image downloading with Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

douban web-scraping

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.