What 200K Cat Listings Reveal About Breeds, Prices, and Market Trends
This article describes how a Python web‑scraper collected over 200,000 cat‑trading records and breed information, then performed exploratory data analysis to uncover patterns in breed distribution, geographic hotspots, price determinants, and other market insights.
Introduction
After noticing many friends keeping cats, the author explored a dedicated cat‑trading website (http://www.maomijiaoyi.com/) and scraped 200,000 transaction records together with detailed breed information to study the cat market.
Data Acquisition
The scraper first retrieved the list of cat breeds, which displayed only breed names and reference prices. By visiting each breed’s detail page, additional fields such as Chinese scientific name, basic info, temperament, habits, pros/cons, and feeding methods were collected.
Transaction data were then harvested from the buy‑sell pages, capturing price, title, number of listings, cat age, vaccination status, shipping cost, purity, and video availability. A progress bar was added to the crawler, and multi‑process techniques limited the final dataset to 200,000 entries.
Exploratory Analysis
The following dimensions were examined:
Breed word cloud
Origin countries (world map)
Size proportion (donut chart)
Appearance description word cloud
Transaction distribution map
Breed proportion tree diagram
Average price ranking (bar chart)
Views vs. price (scatter plot)
Age distribution (histogram)
Price vs. age (box plot)
Price vs. vaccination count (box plot)
Price vs. shipping cost (box plot)
Price vs. purity (box plot)
Price vs. video availability (box plot)
Key Findings
Breed analysis showed many varieties beyond common ones like Ragdoll and orange cats. Most breeds originated from Canada, the United States, the United Kingdom, ancient Egypt, Thailand, and Afghanistan.
Size distribution revealed only one large breed (Ragdoll); the rest were medium or small.
Color descriptors most frequently used were blue, black, and red; temperament was often described as friendly, and side or rear views were preferred.
Transaction heatmap highlighted Sichuan, Chongqing, and Guangdong as the top provinces for cat sales.
Breed popularity in transactions ranked orange cats first, followed by coffee cats, Ragdoll, and British Shorthair.
Average price analysis showed Maine Coon as the most expensive, with Ragdoll close behind.
Age distribution concentrated between 1–9 months, indicating most cats sold were under one year old.
Statistical tests indicated that price correlated with age, vaccination count, shipping cost, purity, and video availability, while view count showed no clear relationship.
Conclusion
Age, vaccination frequency, shipping fees, breed purity, and video availability are the primary factors influencing cat prices in the online market.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
