How We Scraped and Analyzed the Global Top 100 Most Beautiful Women with Python

In this article we demonstrate how to use Python’s requests and BeautifulSoup to scrape a static webpage listing the world’s top 100 most beautiful women, process the data with pandas and pyecharts for continent, nationality and profession statistics, and apply Baidu’s face‑recognition API to score each celebrity’s attractiveness.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How We Scraped and Analyzed the Global Top 100 Most Beautiful Women with Python

We explore a Python‑based workflow that extracts, cleans, visualizes, and evaluates the "Global Top 100 Most Beautiful Women" ranking published by an overseas media outlet.

Evaluation Criteria

The ranking score is calculated by the formula:

Total Score = 0.3 × fan votes + 0.5 × official facial proportion score + 0.2 × (personality, body, charity, etc.)

Data Acquisition

Because the page is static, we can directly parse its HTML source. The script uses requests to fetch the page and BeautifulSoup to locate each li element whose id varies with the ranking position. A loop constructs the appropriate id values, extracts the name, nationality, and photo URL, and saves the images locally with a self.downloadImg helper.

Scraping code screenshot
Scraping code screenshot

Data Analysis

1) Continent Statistics

We aggregate the data by continent, counting how many entries originate from each region. Countries with dual nationality (e.g., "Filipina‑American") are counted twice.

Continent distribution chart
Continent distribution chart

2) Nationality Statistics

We normalize nationality names (e.g., converting "Thai" to "Thailand") to match the standard country list required by pyecharts. After cleaning, the visualization shows the United States leading the count, followed by the United Kingdom, with China and South Korea appearing without dual citizenship.

Nationality distribution chart
Nationality distribution chart

3) Profession Analysis

The list contains only three professions: model, actor, and singer. Actors dominate because facial appearance carries the highest weight in the scoring formula.

Profession distribution
Profession distribution

4) AI Beauty Scoring

We employ Baidu’s face‑recognition API. The FaceScore function reads each downloaded image as binary, encodes it with base64, sends it as a POST request, and extracts the returned beauty score.

AI beauty scoring process
AI beauty scoring process

The top‑5 scores are displayed in the following chart:

Top 5 beauty scores
Top 5 beauty scores

Overall, the tutorial shows how to combine web scraping, data cleaning, visual analytics, and AI‑driven scoring to derive insights from a seemingly entertainment‑focused ranking.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonface recognitionData visualizationWeb ScrapingPyechartsbeautifulsoup
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.