Backend Development 8 min read

How to Scrape Meituan Food Data with Python: Step-by-Step Guide

This tutorial explains how to analyze Meituan food page URLs, use browser developer tools to locate AJAX JSON responses, construct Python requests with proper headers, extract restaurant information via regular expressions, and save the results to a local file.

Python Crawling & Data Mining

Jul 15, 2020

How to Scrape Meituan Food Data with Python: Step-by-Step Guide

1. Analyze Meituan food page URL parameters

Search point: Meituan food, address: Beijing, keyword: hot pot.

2) Crawled URL

https://bj.meituan.com/s/%E7%81%AB%E9%94%85/

3) Explanation

The URL automatically encodes Chinese characters, so the two characters for “hot pot” become %E7%81%AB%E9%94%85. By parsing the URL we can see that “bj” stands for Beijing and the part after /s/ is the search keyword.

2. Analyze page data source (F12 developer tools)

Open the F12 developer tools and refresh the page; the URL does not change when navigating to the second page, indicating AJAX data loading.

Find the corresponding response file in the XHR panel.

The data is exchanged in JSON format. The request URLs for page 2 and page 3 are:

Page 2: https://apimobile.meituan.com/group/v4/poi/pcsearch/1?uuid=xxx&userid=-1&limit=32&offset=32&cateId=-1&q=%E7%81%AB%E9%94%85

Page 3: https://apimobile.meituan.com/group/v4/poi/pcsearch/1?uuid=xxx&userid=-1&limit=32&offset=64&cateId=-1&q=%E7%81%AB%E9%94%85

Comparison shows that the offset parameter increases by 32 each page, limit is the number of items per request, and q is the search keyword. The trailing 1 is the city ID for Beijing.

3. Construct request to fetch Meituan food data

Loop through each page and collect data. The full Python script is:

import requests
import re
import json

def start():
    for w in range(0, 1600, 32):
        # Page number = w/32; limit set to 50 pages (max 1600 items) to avoid excessive requests.
        try:
            # Replace the placeholder xxx with your own UUID.
            url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/1?uuid=xxx&userid=-1&limit=32&offset=' + str(w) + '&cateId=-1&q=%E7%81%AB%E9%94%85'
            # Headers can be copied from the browser's network panel.
            headers = {
                'Accept': '*/*',
                'Accept-Encoding': 'gzip, deflate, br',
                'Accept-Language': 'zh-CN,zh;q=0.9',
                'Connection': 'keep-alive',
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400',
                'Host': 'apimobile.meituan.com',
                'Origin': 'https://bj.meituan.com',
                'Referer': 'https://bj.meituan.com/s/%E7%81%AB%E9%94%85/'
            }
            response = requests.get(url, headers=headers)
            # Use regex because the JSON structure does not expose the title field directly.
            titles = re.findall('"","title":"(.*?)","address":"', response.text)
            addresses = re.findall('"address":"(.*?)",', response.text)
            avgprices = re.findall('"avgprice":(.*?),', response.text)
            avgscores = re.findall('"avgscore":(.*?),', response.text)
            comments = re.findall('"comments":(.*?),', response.text)
            print(len(titles), len(addresses), len(avgprices), len(avgscores), len(comments))
            for o in range(len(titles)):
                title = titles[o]
                address = addresses[o]
                avgprice = avgprices[o]
                avgscore = avgscores[o]
                comment = comments[o]
                file_data(title, address, avgprice, avgscore, comment)
        except Exception:
            continue

def file_data(title, address, avgprice, avgscore, comment):
    data = {
        '店铺名称': title,
        '店铺地址': address,
        '平均消费价格': avgprice,
        '店铺评分': avgscore,
        '评价人数': comment
    }
    with open('美团美食.txt', 'a', encoding='utf-8') as fb:
        fb.write(json.dumps(data, ensure_ascii=False) + '
')

if __name__ == '__main__':
    start()

Running the script produces the following result:

Local file content:

4. Summary

By changing the search keyword and city, you can modify the URL parameters accordingly. Remember to adjust the request headers as needed; practicing these steps will help you become familiar with AJAX‑based data extraction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python json data extraction Web Scraping AJAX requests

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.