Fundamentals 21 min read

Exploring Python’s Data Visualization Ecosystem: From Matplotlib to Folium

This article walks through Python’s mature scientific stack by loading a real openflights dataset, cleaning it with pandas, and visualizing it using a variety of libraries—including matplotlib, seaborn, bokeh, pygal, and folium—to illustrate each tool’s strengths and typical use cases.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Exploring Python’s Data Visualization Ecosystem: From Matplotlib to Folium

Exploring the Dataset

We begin by loading the openflights data, which consists of three CSV files: airports, airlines, and routes. Each file is read with pandas.read_csv and column names are assigned manually because the original files lack headers.

import pandas
airports = pandas.read_csv("airports.csv", header=None, dtype=str)
airports.columns = ["id", "name", "city", "country", "code", "icao", "latitude", "longitude", "altitude", "offset", "dst", "timezone"]
airlines = pandas.read_csv("airlines.csv", header=None, dtype=str)
airlines.columns = ["id", "name", "alias", "iata", "icao", "callsign", "country", "active"]
routes = pandas.read_csv("routes.csv", header=None, dtype=str)
routes.columns = ["airline", "airline_id", "source", "source_id", "dest", "dest_id", "codeshare", "stops", "equipment"]

After loading, we preview each DataFrame to verify the data structure.

We then clean the data, for example removing rows where airline_id is missing.

routes = routes[routes["airline_id"] != "\\N"]

Creating a Histogram

Using matplotlib we compute the great‑circle distance between source and destination airports with the haversine formula, then plot a histogram of route lengths.

import math
def haversine(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(float, [lon1, lat1, lon2, lat2])
    lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.asin(math.sqrt(a))
    return 6367 * c

def calc_dist(row):
    try:
        source = airports[airports["id"] == row["source_id"]].iloc[0]
        dest = airports[airports["id"] == row["dest_id"]].iloc[0]
        return haversine(dest["longitude"], dest["latitude"], source["longitude"], source["latitude"])
    except (ValueError, IndexError):
        return 0

route_lengths = routes.apply(calc_dist, axis=1)
import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(route_lengths, bins=20)

The histogram shows that airlines tend to operate many short‑range routes rather than long‑distance flights.

Using Seaborn

Seaborn builds on matplotlib and provides a smoother density plot.

import seaborn as sns
sns.distplot(route_lengths, bins=20)

Bar Chart of Average Route Length per Airline

We compute the mean route length for each airline and plot a bar chart.

import numpy as np
route_length_df = pandas.DataFrame({"length": route_lengths, "id": routes["airline_id"]})
airline_route_lengths = route_length_df.groupby("id").aggregate(np.mean)
airline_route_lengths = airline_route_lengths.sort_values("length", ascending=False)
plt.bar(range(len(airline_route_lengths)), airline_route_lengths["length"])

Interactive Bar Chart with Bokeh

Bokeh adds interactivity, allowing zoom and hover.

from bokeh.io import output_notebook
from bokeh.charts import Bar, show
output_notebook()
p = Bar(airline_route_lengths, 'name', values='length', title="Average airline route lengths")
show(p)

Horizontal Bar Chart with Pygal

We categorize routes into short, medium, and long, then display percentages.

long_routes = len([k for k in route_lengths if k > 10000]) / len(route_lengths)
medium_routes = len([k for k in route_lengths if 2000 < k <= 10000]) / len(route_lengths)
short_routes = len([k for k in route_lengths if k <= 2000]) / len(route_lengths)
import pygal
chart = pygal.HorizontalBar()
chart.title = 'Long, medium, and short routes'
chart.add('Long', long_routes * 100)
chart.add('Medium', medium_routes * 100)
chart.add('Short', short_routes * 100)
chart.render_to_file('routes.svg')

Scatter Plot of Airline ID vs. Name Length

We compare airline IDs with the length of their names.

name_lengths = airlines["name"].apply(lambda x: len(str(x)))
plt.scatter(airlines["id"].astype(int), name_lengths)

Static World Map with Basemap

Using mpl_toolkits.basemap we plot all airports on a Mercator world map.

from mpl_toolkits.basemap import Basemap
m = Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180, urcrnrlon=180, lat_ts=20, resolution='c')
m.drawcoastlines()
m.drawmapboundary()
x, y = m(list(airports["longitude"].astype(float)), list(airports["latitude"].astype(float)))
m.scatter(x, y, 1, marker='o', color='red')
plt.show()

Interactive Map with Folium

Folium (leaflet.js) provides a zoomable, clickable map.

import folium
airports_map = folium.Map(location=[30, 0], zoom_start=2)
for name, row in airports.iterrows():
    if row["name"] != "SouthPole Station":
        airports_map.circle_marker(location=[row["latitude"], row["longitude"]], popup=row["name"])
airports_map.save('airports.html')

Great‑Circle Arcs with Basemap

We draw arcs for the first 3000 routes, filtering out overly long connections.

for _, row in routes[:3000].iterrows():
    try:
        source = airports[airports["id"] == row["source_id"]].iloc[0]
        dest = airports[airports["id"] == row["dest_id"]].iloc[0]
        if abs(float(source["longitude"]) - float(dest["longitude"])) < 90:
            m.drawgreatcircle(float(source["longitude"]), float(source["latitude"]),
                               float(dest["longitude"]), float(dest["latitude"]), linewidth=1, color='b')
    except (ValueError, IndexError):
        pass
plt.show()

Network Graph with NetworkX

We build a weighted network where nodes are airports and edges represent routes with at least two connections.

# Build weight dictionary
weights = {}
added_keys = []
for _, row in routes.iterrows():
    key = f"{row['source_id']}_{row['dest_id']}"
    if key in weights:
        weights[key] += 1
    elif key in added_keys:
        weights[key] = 2
    else:
        added_keys.append(key)
# Create graph
import networkx as nx
graph = nx.Graph()
nodes = set()
for k, weight in weights.items():
    try:
        src, dst = map(int, k.split('_'))
        if src not in nodes:
            graph.add_node(src)
        if dst not in nodes:
            graph.add_node(dst)
        nodes.update([src, dst])
        graph.add_edge(src, dst, weight=weight)
    except (ValueError, IndexError):
        pass
pos = nx.spring_layout(graph)
nx.draw_networkx_nodes(graph, pos, node_color='red', node_size=10, alpha=0.8)
nx.draw_networkx_edges(graph, pos, width=1.0, alpha=1)
plt.show()

Conclusion

Python offers a growing ecosystem of data‑visualization libraries, many built on top of matplotlib, that simplify a wide range of use cases—from quick statistical plots with seaborn to interactive maps with folium and network graphs with networkx.

Author: Open Source China Source: https://www.oschina.net/translate/python-data-visualization-libraries
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonMatplotlibSeabornnetworkxFolium
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.