Fundamentals 5 min read

Python Basics, Common Pitfalls, and a Simple Web Scraper for Douban Book Ratings

This article introduces Python's core concepts and hierarchy, highlights ten frequent beginner mistakes, and walks through building a basic web scraper that extracts book information from Douban, processes it with pandas, and displays the resulting data, providing a practical learning path for Python fundamentals.

Python Programming Learning Circle

Apr 24, 2020

Python Basics, Common Pitfalls, and a Simple Web Scraper for Douban Book Ratings

The guide starts with a concise overview of Python's conceptual hierarchy, explaining how expressions create objects, statements contain expressions, logical units form functions or classes, modules are .py files, packages are directories with an __init__.py, and programs consist of multiple packages and files.

It then lists ten typical beginner errors, such as poor variable naming, confusion between numeric and string operations, differences between lists and dictionaries, indexing issues, misuse of range(), mixing assignment and comparison operators, infinite loops, and misunderstandings of function return versus print statements.

Following the pitfalls, the article demonstrates how to create a simple web scraper that fetches book ratings from Douban. First, it imports the required modules:

import requests
from bs4 import BeautifulSoup
print('成功导入模块')

It then shows code snippets for extracting HTML tags, printing titles and links, and iterating over URLs to collect data:

# 提取标签
# print(soup.head)  # 头部信息
print(soup.title)  # 标题
print(soup.a)      # 提取的第一个a标签

The scraper defines a function get_data(ui) that sends a GET request, parses the page with BeautifulSoup, locates the book details, and assembles a list of dictionaries containing the book name, rating, additional info, and summary:

def get_data(ui):
    ri = requests.get(url=ui)
    soupi = BeautifulSoup(ri.text, 'lxml')
    infors = soupi.find_all('div', class_="detail-frame")
    lst = []
    for i in infors:
        dic = {}
        dic['书名'] = i.find('h2').text.replace('
', '')
        dic['评分'] = i.find_all('p')[0].text.replace('
', '').replace(' ', '')
        dic['其他信息'] = i.find_all('p')[1].text.replace('
', '').replace(' ', '')
        dic['简介'] = i.find_all('p')[2].text.replace('
', '').replace(' ', '')
        lst.append(dic)
    return lst

url = 'https://book.douban.com/latest'
result = get_data(url)
print(result[:3])

The extracted data is then converted into a pandas DataFrame for further analysis:

import pandas as pd
df = pd.DataFrame(result)
print(df)

Sample output shows a list of dictionaries with book titles, scores, publication details, and brief introductions, illustrating how the scraper gathers and structures real-world data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Fundamentals Pandas data-analysis web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.