Backend Development 8 min read

Bypassing Anti‑Scraping Measures on Mayi Short‑Rent Site Using Cookies and BeautifulSoup

This tutorial explains how to analyze the Mayi short‑rent website, overcome its anti‑scraping defenses by setting appropriate Cookie and User‑Agent headers, and use Python's urllib2 and BeautifulSoup to extract rental details, store them in CSV, and optionally employ Selenium.

Python Programming Learning Circle

Dec 25, 2020

Bypassing Anti‑Scraping Measures on Mayi Short‑Rent Site Using Cookies and BeautifulSoup

When crawling the Mayi short‑rent website, the server may block requests with a message like "Current access suspected hacker attack, intercepted by the administrator." The article first shows how to inspect the page, locate rental information within <dd> elements and the room-detail clearfloat div, and attempts a basic BeautifulSoup script that fails due to the anti‑scraping protection.

To bypass the block, the tutorial demonstrates adding custom request headers, specifically a realistic User-Agent string and the required Cookie value obtained from the browser's Network tab. The core header‑setting code is:

user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
cookie = "mediav=%7B%22eid%22%3A%22387123..."
headers = {"User-Agent": user_agent, "Cookie": cookie}
request = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(request)
contents = response.read()
soup = BeautifulSoup(contents, "html.parser")

With these headers, the script can successfully iterate over each <dd> node, extract the rental name, price, rating/comments, and the detail page URL, and print them. Sample output includes page numbers and fields such as "[短租房名称] 大唐东原财富广场--城市简约复式民宿" and "[短租房价格] 298".

The article then extends the crawler to fetch detailed information like address, occupancy, and calculates per‑person price. It writes all data into a UTF‑8 CSV file using the csv module. The complete code for data extraction and CSV writing is provided within a ... block.

Finally, the author notes that the Cookie expires hourly and must be updated manually, and suggests that Selenium can also be used to scrape the site. The tutorial ends with a disclaimer about copyright.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Cookie beautifulsoup anti-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.