Fundamentals 6 min read

How to Scrape and Download Novels from Biquw.com Using Python

This guide walks you through extracting novel chapters from the Biquw.com website with Python, explaining how to obtain the book ID, retrieve the chapter list, download each chapter using requests and BeautifulSoup, handle common anti‑scraping measures, and organize the downloaded files.

Python Crawling & Data Mining

Aug 6, 2024

How to Scrape and Download Novels from Biquw.com Using Python

Preface

Hello, I am a Python enthusiast sharing a practical web‑scraping example.

1. Novel Download

To download any novel from the site, open its page and note the numeric ID in the URL (e.g., 951 ). This ID is used in the script to fetch the book’s contents.

2. Implementation Details

The complete Python code is shown below:

# coding: utf-8
'''
笔趣网小说下载
仅限用于研究代码
勿用于商业用途
请于24小时内删除
'''
import requests
import os
from bs4 import import BeautifulSoup
import time

def book_page_list(book_id):
    '''
    通过传入的书号bookid，获取此书的所有章节目录
    :param book_id:
    :return: 章节目录及章节地址
    '''
    url = 'http://www.biquw.com/book/{}/'.format(book_id)
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'}
    response = requests.get(url, headers)
    response.encoding = response.apparent_encoding
    response = BeautifulSoup(response.text, 'lxml')
    booklist = response.find('div', class_='book_list').find_all('a')
    return booklist

def book_page_text(bookid, booklist):
    '''
    通过书号、章节目录，抓取每一章的内容并存档
    :param bookid:str
    :param booklist:
    :return:None
    '''
    try:
        for book_page in booklist:
            page_name = book_page.text.replace('*', '')
            page_id = book_page['href']
            time.sleep(3)
            url = 'http://www.biquw.com/book/{}/{}'.format(bookid, page_id)
            headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'}
            response_book = requests.get(url, headers)
            response_book.encoding = response_book.apparent_encoding
            response_book = BeautifulSoup(response_book.text, 'lxml')
            book_content = response_book.find('div', id="htmlContent")
            with open("./{}/{}.txt".format(bookid, page_name), 'a') as f:
                f.write(book_content.text.replace('\xa0', ''))
                print("当前下载章节：{}".format(page_name))
    except Exception as e:
        print(e)
        print("章节内容获取失败，请确保书号正确，及书本有正常内容。")

if __name__ == '__main__':
    bookid = input("请输入书号(数字)：")
    if not os.path.isdir('./{}'.format(bookid)):
        os.mkdir('./{}'.format(bookid))
    try:
        booklist = book_page_list(bookid)
        print("获取目录成功！")
        time.sleep(5)
        book_page_text(bookid, booklist)
    except Exception as e:
        print(e)
        print("获取目录失败，请确保书号输入正确！")

Run the script, enter the numeric book ID when prompted, and the program will create a folder named after the ID, storing each chapter as a separate text file.

3. Common Issues

Frequent errors occur when the site blocks rapid requests. To avoid anti‑scraping measures, randomize the User‑Agent header or use proxies, and add delays between requests.

4. Conclusion

This article demonstrates how to obtain novel content via a Python web scraper using the requests library and BeautifulSoup selectors, and it also discusses typical anti‑scraping problems and their solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Web Scraping novel-downloader

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.