Backend Development 7 min read

Scraping Douban Top 250 Movies with Python and Analyzing Year Distribution in Excel

This tutorial demonstrates how to use Python's requests and BeautifulSoup libraries to scrape the titles and release years of the top 250 movies from Douban, clean the data, export it to Excel, and then create pivot tables and charts to visualize yearly film production trends.

Python Programming Learning Circle

Feb 25, 2021

Scraping Douban Top 250 Movies with Python and Analyzing Year Distribution in Excel

First, import the required modules with from bs4 import BeautifulSoup and import requests. Use requests.get to fetch the HTML of the Douban Top 250 page and parse it with BeautifulSoup(html.text, 'html.parser').

Iterate over each movie block using for item in soup.find_all('div', "info"):. Extract the title with title = item.div.a.span.string and the year line with yearline = item.find('div','bd').p.contents[2].string. Clean the year string by removing spaces and newline characters:

yearline = yearline.replace(' ', '')
yearline = yearline.replace('
', '')

Obtain the four‑digit year using slicing year = yearline[0:4] and print the result with a tab separator.

To scrape all 250 movies, wrap the request in a loop that increments the start parameter by 25 for each page (0, 25, 50, …). The full loop looks like:

import requests
from bs4 import BeautifulSoup
start = 0
for n in range(0, 10):
    html = requests.get('https://movie.douban.com/top250?start=' + str(start))
    start += 25
    soup = BeautifulSoup(html.text, 'html.parser')
    for item in soup.find_all('div', "info"):
        title = item.div.a.span.string
        yearline = item.find('div','bd').p.contents[2].string
        yearline = yearline.replace(' ', '')
        yearline = yearline.replace('
', '')
        year = yearline[0:4]
        print(title, '\t', year)

Copy the printed output into an Excel sheet, placing the movie titles and years in two columns. Insert a pivot table (Data → PivotTable) on a new worksheet, drag the year field to the Rows area and also to the Values area, then set the value field to “Count” to count how many movies were released each year.

Finally, create a column chart from the pivot table to visualize the distribution of high‑quality movies over the years, revealing trends such as a surge in the early 1990s and a peak around 2010.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Excel requests data-analysis web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.