Scrape Unlimited Novel Site and Auto‑Download Texts with Python

This tutorial explains how to use Python's requests, lxml, and fake_useragent libraries to crawl the free novel website "无限小说网", extract each novel's download link, and automatically download the corresponding text files.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Scrape Unlimited Novel Site and Auto‑Download Texts with Python

1. Introduction

With the rise of online reading, many people prefer web novels, but most require payment. This tutorial shows how to crawl the free novel site "无限小说网" and directly download the text files.

2. Project Goal

Obtain the download link for a given novel and download the corresponding .txt file.

3. Preparation

Tools: PyCharm. Required libraries: requests, lxml, fake_useragent. Target URL pattern: https://www.555x.org/html/wuxiaxianxia/list_29_{page}.html where {page} is the page number.

4. Implementation

4.1 Define spider class

import requests
from lxml import etree
from fake_useragent import UserAgent
import time

class xiaoshuo(object):
    def __init__(self):
        self.url = "https://www.xiachufang.com/explore/?page={}"
    def main(self):
        pass

if __name__ == '__main__':
    spider = xiaoshuo()
    spider.main()

4.2 Random User‑Agent

for i in range(1, 50):
    self.headers = {'User-Agent': ua.random}

4.3 Request page

def get_page(self, url):
    res = requests.get(url=url, headers=self.headers)
    html = res.content.decode("utf-8")
    return html

4.4 Parse first‑level page with XPath

Use browser developer tools to locate the second‑level page URL, then the third‑level download button URL.

4.5 Parse third‑level page and extract title and download link

for rd in three:
    b = rd.xpath('..//div[@class="shutou"]//b/text()')[0].strip()
    tress = rd.xpath('..//div[@class="shuji"]//ul/li/a/@href')[0].strip()
    # print(tress)

4.6 Save result

read = '''《%s》 下载链接 : %s ''' % (b, tress)
print(read)

4.7 Execute workflow

html = self.get_page(url)
self.parse_page(html)

5. Result

Run the script, input start and end pages, and the console displays each novel’s title and download URL. Clicking the link downloads the text file, which can be opened locally.

6. Conclusion

Avoid excessive crawling to reduce server load.

The project demonstrates how to use Python’s requests, lxml and fake_useragent to fetch novel download links.

Hands‑on practice helps deepen understanding of web scraping.

Request the source code by replying "小说".

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

web-scrapinglxmlnovel-downloader
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.