Fundamentals 5 min read

How to Scrape Proxy Data and Visualize City Distribution with Python and Pyecharts

This guide walks you through using Python to crawl proxy server data, extract city names with the cpca library, count occurrences per city, and create an interactive heat‑map of nationwide proxy distribution using the Pyecharts visualization toolkit.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape Proxy Data and Visualize City Distribution with Python and Pyecharts

1. Introduction

The article continues a previous two‑part series on using Python to crawl proxy data from a website, focusing now on visualizing the collected information. The visualization is performed with the pyecharts library.

2. Proxy Distribution Statistics

To build a heat‑map, city names are required, but the scraped location strings are not standardized. The cpca library, which relies on Jieba segmentation, is used to extract province, city, and district names from raw location data. Example: from "湖北十堰" the city "十堰" is extracted.

After extracting city names, a list is built to count how many proxies appear in each city: if a city is not yet in the list, it is added with a count of 1; otherwise the existing count is incremented.

3. Heat‑Map Generation

With the city‑wise proxy counts ready, pyecharts is used to generate a heat‑map. The prepared list is passed to the library, and the script creates an HTML file named 全国代理分布.html . Opening this file in a browser such as chrome or firefox displays the heat‑map, where darker colors indicate higher proxy density.

The map shows that proxies are mainly concentrated in eastern China, especially around Guangzhou, the Jiang‑Zhe region, and Shandong, while western regions have very few proxies, reflecting the uneven development of internet infrastructure.

4. Conclusion

The project demonstrates how to:

Use pyecharts for data visualization.

Apply the cpca library for Chinese address parsing.

Key takeaways are the regional imbalance of proxy distribution and the value of Python web‑scraping as a comprehensive skill for data acquisition and analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonPyechartsheatmapcpca
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.