Backend Development 15 min read

Build a Simple Python Web Scraper to Fetch Recipes in 5 Minutes

This guide walks you through creating a lightweight Python web scraper that fetches recipe data from a cooking website, covering HTTP requests, HTML parsing with BeautifulSoup, extracting titles and images, and wrapping the process into an interactive console application.

MaGe Linux Operations

Mar 15, 2024

Build a Simple Python Web Scraper to Fetch Recipes in 5 Minutes

Many people have heard of web crawlers, and this article shows how to start from scratch and build a simple Python crawler in just five minutes to fetch the content of interest from recipe websites.

Crawler Overview

The crawler works by simulating a user browsing a site: it first accesses the main page, follows links if needed, and downloads the desired images or text once they are found.

Crawling Web HTML

When building a crawler, the first step is sending an HTTP request to retrieve the page data. While many APIs return JSON, a crawler usually needs the raw HTML. In Python you can choose any request library; the example uses urllib.request.

Before starting, install the required third‑party libraries such as beautifulsoup4.

Now see the basic code example:

from urllib.request import urlopen, Request
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0'}
req = Request("https://www.meishij.net/?from=space_block", headers=headers)
# Send request and get HTML
html = urlopen(req)
html_text = bytes.decode(html.read())
print(html_text)

This retrieves the full HTML of the recipe page, similar to viewing the source in a browser.

Parsing Elements

The simplest method is string parsing, but Python offers powerful libraries like BeautifulSoup. Below we use BeautifulSoup to extract hot‑search recipes.

Hot Recipes

Here we parse and analyze the recipes that appear in the hot search list.

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as bf
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0'}
req = Request("https://www.meishij.net/?from=space_block", headers=headers)
# Get HTML
html = urlopen(req)
html_text = bytes.decode(html.read())
# Parse with BeautifulSoup
obj = bf(html_text, 'html.parser')
index_hotlist = obj.find_all('a', class_='sancan_item')
for ul in index_hotlist:
    for li in ul.find_all('strong', class_='title'):
        print(li.get_text())

The main steps are: print the HTML, locate the desired element, and extract the text from li elements.

Random Meal

To solve the "what to eat" problem, we collect all recipes into a list and let the program randomly choose one.

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as bf
for i in range(3):
    url = f"https://www.meishij.net/chufang/diy/jiangchangcaipu/?&page={i}"
    html = urlopen(url)
    html_text = bytes.decode(html.read())
    obj = bf(html_text, 'html.parser')
    index_hotlist = obj.find_all('img')
    for p in index_hotlist:
        if p.get('alt'):
            print(p.get('alt'))

This fetches the first three pages, extracts the alt attribute of each image as a recipe name, and prints them.

Recipe Tutorial

After selecting a recipe, we fetch its detailed tutorial page and print the cooking steps.

from urllib.request import urlopen, Request
import urllib, string
from bs4 import BeautifulSoup as bf
url = f"https://so.meishij.net/index.php?q=红烧排骨"
url = urllib.parse.quote(url, safe=string.printable)
html = urlopen(url)
html_text = bytes.decode(html.read())
obj = bf(html_text, 'html.parser')
index_hotlist = obj.find_all('a', class_='img')
url = index_hotlist[0].get('href')
html = urlopen(url)
html_text = bytes.decode(html.read())
obj = bf(html_text, 'html.parser')
for div in obj.find_all('div', class_='step_content'):
    for p in div.find_all('p'):
        print(p.get_text())

Packaging It

The steps above work, but manually repeating them is inefficient. The following script wraps the whole process into a simple console application that requires no external UI libraries.

# Import required modules
from urllib.request import urlopen, Request
import urllib, string
from bs4 import BeautifulSoup as bf
from random import choice, sample
from colorama import init
from os import system
from termcolor import colored
from readchar import readkey

FGS = ['green', 'yellow', 'blue', 'cyan', 'magenta', 'red']
print(colored('Searching recipes...', choice(FGS)))
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0'}
req = Request("https://www.meishij.net/?from=space_block", headers=headers)
html = urlopen(req)
html_text = bytes.decode(html.read())
hot_list = []
all_food = []
food_page = 3

def draw_menu(menu_list):
    clear()
    for idx, i in enumerate(menu_list):
        print(colored(f'{idx}:{i}', choice(FGS)))
    print(colored('8:Random selection', choice(FGS)))

def draw_word(word_list):
    clear()
    for i in word_list:
        print(colored(i, choice(FGS)))

def clear():
    system("CLS")

def hot_list_func():
    global html_text
    obj = bf(html_text, 'html.parser')
    index_hotlist = obj.find_all('a', class_='sancan_item')
    for ul in index_hotlist:
        for li in ul.find_all('strong', class_='title'):
            hot_list.append(li.get_text())

def search_food_detail(food):
    print('Searching detailed tutorial, please wait...')
    url = f"https://so.meishij.net/index.php?q={food}"
    url = urllib.parse.quote(url, safe=string.printable)
    html = urlopen(url)
    html_text = bytes.decode(html.read())
    obj = bf(html_text, 'html.parser')
    index_hotlist = obj.find_all('a', class_='img')
    url = index_hotlist[0].get('href')
    html = urlopen(url)
    html_text = bytes.decode(html.read())
    obj = bf(html_text, 'html.parser')
    random_color = choice(FGS)
    print(colored(f"{food} steps:", random_color))
    for div in obj.find_all('div', class_='step_content'):
        for p in div.find_all('p'):
            print(colored(p.get_text(), random_color))

def get_random_food():
    global food_page
    if not all_food:
        for i in range(food_page):
            url = f"https://www.meishij.net/chufang/diy/jiangchangcaipu/?&page={i}"
            html = urlopen(url)
            html_text = bytes.decode(html.read())
            obj = bf(html_text, 'html.parser')
            for p in obj.find_all('img'):
                if p.get('alt'):
                    all_food.append(p.get('alt'))
    my_food = choice(all_food)
    print(colored(f'Randomly selected, today\'s meal: {my_food}', choice(FGS)))
    return my_food

init()
hot_list_func()
print(colored('Search completed!', choice(FGS)))
my_array = list(range(0, 9))
my_key = ['q', 'c', 'd', 'm']
my_key.extend(my_array)
while True:
    move = readkey()
    if move in my_key or (move.isdigit() and int(move) <= len(random_food)):
        break
    
    if move == 'q':
        break
    if move == 'c':
        clear()
    if move == 'm':
        random_food = sample(hot_list, 8)
        draw_menu(random_food)
    if move.isdigit() and int(move) <= len(random_food):
        if int(move) == 8:
            my_food = get_random_food()
        else:
            my_food = random_food[int(move)]
        print(my_food)
    if move == 'd' and my_food:
        search_food_detail(my_food)
        my_food = ''

Running the program displays a colored menu, lets you view hot recipes, randomly pick a dish, and show its cooking steps.

Conclusion

Building a basic web scraper can be done in about five minutes. Start by fetching a site’s HTML, parse it with BeautifulSoup, and optionally wrap the logic into an interactive tool. For more complex sites you may need to handle sessions, logins, or CAPTCHAs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python beautifulsoup Console Application recipe crawler

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.