Scrape and Download Novels from Biquw.com with Python – Step‑by‑Step Guide
This article demonstrates how to use Python's requests and BeautifulSoup libraries to crawl the Biquw.com novel site, extract chapter lists, download each chapter into text files, handle common anti‑scraping measures, and organize the results in a folder named after the book ID.
Introduction
In this tutorial a Python enthusiast shares a complete solution for downloading novels from the Biquw.com website. The method relies on the requests library for HTTP access and BeautifulSoup for HTML parsing.
Novel Download
To start, open the target novel page on the site and note the numeric identifier in the URL (e.g., 951). This number is the book ID and will be used in the script to build request URLs.
Implementation
The following code performs the entire workflow: fetching the chapter list, iterating over each chapter, saving the text to files, and handling errors.
# coding: utf-8
'''
笔趣网小说下载
仅限用于研究代码
勿用于商业用途
请于24小时内删除
'''
import requests
import os
from bs4 import BeautifulSoup
import time
def book_page_list(book_id):
'''
通过传入的书号bookid,获取此书的所有章节目录
:param book_id:
:return: 章节目录及章节地址
'''
url = 'http://www.biquw.com/book/{}/'.format(book_id)
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'
}
response = requests.get(url, headers)
response.encoding = response.apparent_encoding
response = BeautifulSoup(response.text, 'lxml')
booklist = response.find('div', class_='book_list').find_all('a')
return booklist
def book_page_text(bookid, booklist):
'''
通过书号、章节目录,抓取每一章的内容并存档
:param bookid:str
:param booklist:
:return:None
'''
try:
for book_page in booklist:
page_name = book_page.text.replace('*', '')
page_id = book_page['href']
time.sleep(3)
url = 'http://www.biquw.com/book/{}/{}'.format(bookid, page_id)
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'
}
response_book = requests.get(url, headers)
response_book.encoding = response_book.apparent_encoding
response_book = BeautifulSoup(response_book.text, 'lxml')
book_content = response_book.find('div', id="htmlContent")
with open("./{}/{}.txt".format(bookid, page_name), 'a') as f:
f.write(book_content.text.replace('\xa0', ''))
print("当前下载章节:{}".format(page_name))
except Exception as e:
print(e)
print("章节内容获取失败,请确保书号正确,及书本有正常内容。")
if __name__ == '__main__':
bookid = input("请输入书号(数字):")
if not os.path.isdir('./{}'.format(bookid)):
os.mkdir('./{}'.format(bookid))
try:
booklist = book_page_list(bookid)
print("获取目录成功!")
time.sleep(5)
book_page_text(bookid, booklist)
except Exception as e:
print(e)
print("获取目录失败,请确保书号输入正确!")Run the script, input the numeric book ID when prompted, and the program will create a folder named after the ID, storing each chapter as a separate text file.
Common Issues
During execution you may encounter anti‑scraping blocks if requests are sent too quickly. The site may return a “blocked” response. Mitigate this by adding random user‑agents, using proxies, or increasing the sleep interval between requests.
Conclusion
The guide shows how to programmatically obtain novel content from a public website using Python web‑scraping techniques, manage files locally, and address typical anti‑scraping challenges. Readers are encouraged to experiment responsibly and respect the target site’s usage policies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
