Fundamentals 12 min read

Web Scraping and Data Analysis of Pet Cat Breeds Using Python

This article demonstrates how to scrape cat breed information from a dedicated website, store the data in Excel, and perform comprehensive analysis and visualizations—including relationship graphs, geographic distribution, size ratios, price extremes, and word clouds—using Python libraries such as requests, lxml, pandas, pyecharts, and stylecloud.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Web Scraping and Data Analysis of Pet Cat Breeds Using Python

The article begins with a brief introduction to the Juejin "Use Code to Attract Cats" activity, posing two questions about cat ownership and curiosity, and explains the author's motivation to learn about various pet cat breeds through coding.

Data collection is performed by crawling the cat breed website www.maomijiaoyi.com . The following Python code fetches the list of breed pages, extracts the breed name, price, and detail URL, and prints the results:

from lxml import etree
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"}
url_base = "http://www.maomijiaoyi.com"
session = requests.Session()

# Access the breed index page and collect detail links
url = url_base + "/index.php?/pinzhongdaquan_5.html"
res = session.get(url, headers=headers)
html = etree.HTML(res.text)
main_data = []
for a_tag in html.xpath("//div[@class='pinzhong_left']/a"):
    url = url_base + a_tag.xpath("./@href")[0]
    pet_name, pet_price = None, None
    pet_name_tag = a_tag.xpath("./div[@class='pet_name']/text()")
    if pet_name_tag:
        pet_name = pet_name_tag[0].strip()
    pet_price_tag = a_tag.xpath("./div[@class='pet_price']/span/text()")
    if pet_price_tag:
        pet_price = pet_price_tag[0].strip()
    print(pet_name, pet_price, url)
    main_data.append((pet_name, pet_price, url))

After obtaining the links, the script visits each detail page, parses basic attributes, appearance attributes, detailed descriptions, and image URLs, then downloads the images. The extracted data is saved to an Excel file named 猫咪.xlsx . Sample screenshots of the scraped data and downloaded images are shown below:

Data analysis starts by loading the Excel file with pandas:

import pandas as pd

df = pd.read_excel("猫咪.xlsx")

Various visualizations are created using the pyecharts library:

A relationship graph shows each breed and its aliases.

A bar chart displays the geographic distribution of breeds.

A treemap visualizes the distribution of breeds across countries.

A pie chart illustrates the proportion of different body sizes.

from pyecharts import options as opts
from pyecharts.charts import Graph, Bar, TreeMap, Pie
# (code omitted for brevity – the full snippets are present in the source)

Price analysis splits the "参考价格" column, identifies the cheapest and most expensive breeds, and prints the results:

tmp = df.参考价格.str.split("-", expand=True)
tmp.columns = ["最低价格", "最高价格"]
tmp.dropna(inplace=True)
tmp = tmp.astype("int")
cheap_cat = df.loc[tmp.index[tmp.最低价格 == tmp.最低价格.min()], "中文学名"].to_list()
costly_cat = df.loc[tmp.index[tmp.最高价格 == tmp.最高价格.max()], "中文学名"].to_list()
print("最便宜的品种有:", cheap_cat)
print("最贵的品种有:", costly_cat)

Word clouds are generated for descriptive columns using the stylecloud library. Example code for creating a general word cloud and separate clouds for personality traits and living habits is provided:

import stylecloud, jieba
from IPython.display import Image
# (code omitted for brevity – the full snippets are present in the source)

Finally, a mind‑map style diagram groups breeds by body size, producing a hierarchical view of the cat taxonomy.

References:

https://juejin.cn/post/7024369534119182367

http://www.maomijiaoyi.com/

Pythondata analysisweb scrapingpandaspyechartscat breeds
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.