Fundamentals 5 min read

How to Scrape Proxy Data and Visualize City Distribution with Python and Pyecharts

This guide walks you through using Python to crawl proxy server data, extract city names with the cpca library, count occurrences per city, and create an interactive heat‑map of nationwide proxy distribution using the Pyecharts visualization toolkit.

Python Crawling & Data Mining

Apr 20, 2020

How to Scrape Proxy Data and Visualize City Distribution with Python and Pyecharts

1. Introduction

The article continues a previous two‑part series on using Python to crawl proxy data from a website, focusing now on visualizing the collected information. The visualization is performed with the pyecharts library.

2. Proxy Distribution Statistics

To build a heat‑map, city names are required, but the scraped location strings are not standardized. The cpca library, which relies on Jieba segmentation, is used to extract province, city, and district names from raw location data. Example: from "湖北十堰" the city "十堰" is extracted.

After extracting city names, a list is built to count how many proxies appear in each city: if a city is not yet in the list, it is added with a count of 1; otherwise the existing count is incremented.

3. Heat‑Map Generation

With the city‑wise proxy counts ready, pyecharts is used to generate a heat‑map. The prepared list is passed to the library, and the script creates an HTML file named 全国代理分布.html . Opening this file in a browser such as chrome or firefox displays the heat‑map, where darker colors indicate higher proxy density.

The map shows that proxies are mainly concentrated in eastern China, especially around Guangzhou, the Jiang‑Zhe region, and Shandong, while western regions have very few proxies, reflecting the uneven development of internet infrastructure.

4. Conclusion

The project demonstrates how to:

Use pyecharts for data visualization.

Apply the cpca library for Chinese address parsing.

Key takeaways are the regional imbalance of proxy distribution and the value of Python web‑scraping as a comprehensive skill for data acquisition and analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Pyecharts heatmap cpca

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.