Fundamentals 6 min read

How to Scrape Recipes from XiaChuFang with Python: A Step‑by‑Step Guide

This tutorial walks you through building a Python web scraper that extracts recipe names, ingredients, and download links from the XiaChuFang cooking website, handling anti‑scraping measures with custom headers and fake user agents, and saves the collected data into a Word document for future use.

Python Crawling & Data Mining

Jun 13, 2020

How to Scrape Recipes from XiaChuFang with Python: A Step‑by‑Step Guide

Introduction

This article explains how to use Python to crawl the XiaChuFang cooking website, extract recipe information, and store it in a Word document.

Project Goal

Collect recipe names, ingredients, and download links from multiple pages and save them into a .doc file.

Preparation

Software: PyCharm

Required libraries: requests , lxml , fake_useragent , time

Handling Anti‑Scraping Measures

Two main issues are addressed: the site returns no data without proper HTTP headers, and repeated requests from the same IP can be blocked. Solutions include setting realistic request headers and using fake_useragent to generate random User‑Agent strings.

Implementation

import requests
from lxml import etree
from fake_useragent import UserAgent
import time

class kitchen(object):
    def __init__(self):
        self.url = "https://www.xiachufang.com/explore/?page={}"
        self.u = 0
        self.headers = {}
        self.ua = UserAgent()

    def set_headers(self):
        self.headers = {"User-Agent": self.ua.random}

    def get_page(self, url):
        res = requests.get(url=url, headers=self.headers)
        html = res.content.decode("utf-8")
        return html

    def parse_page(self, html):
        parse_html = etree.HTML(html)
        image_src_list = parse_html.xpath('//li/div/a/@href')
        return image_src_list

    def run(self, start_page, end_page):
        for page in range(start_page, end_page + 1):
            self.set_headers()
            url = self.url.format(page)
            html = self.get_page(url)
            src_list = self.parse_page(html)
            for i in src_list:
                detail_url = "https://www.xiachufang.com/" + i
                detail_html = self.get_page(detail_url)
                detail_tree = etree.HTML(detail_html)
                num = detail_tree.xpath('.//h2[@id="steps"]/text()')[0].strip()
                name = detail_tree.xpath('.//li[@class="container"]/p/text()')
                ingredients = detail_tree.xpath('.//td//a/text()')
                self.u += 1
                food_info = f"""第 {self.u} 种
菜 名 : {name}
原 料 : {ingredients}
下 载 链 接 : {detail_url}
================================================================="""
                with open('菜谱.doc', 'a', encoding='utf-8') as f:
                    f.write(food_info)
                time.sleep(1.4)

if __name__ == '__main__':
    spider = kitchen()
    spider.run(start_page=1, end_page=5)

Optimization

Added a short delay ( time.sleep(1.4)) between requests and used a counter variable self.u to track the number of recipes processed.

Result Display

Running the script shows progress in the console, and the extracted recipes are saved in 菜谱.doc. Screenshots of the console output and the generated document are included.

Conclusion

The guide demonstrates a simple yet effective Python web‑scraping workflow for gathering cooking recipes, handling anti‑scraping defenses, and exporting the data for personal use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Automation data extraction Web Scraping requests lxml Word Document

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.