Data Visualization and Exploratory Graphs with Pandas
This tutorial explains how to use Pandas for data visualization and exploratory analysis, covering line, scatter, histogram, bar, pie, box, and heatmap charts with code examples on the Iris, American Community Survey, and Boston Housing datasets.
Data visualization presents data using graphics or tables, allowing clear insight into data properties and relationships; exploratory graphs help users understand characteristics, discover trends, and lower the barrier to data comprehension.
Common chart examples are demonstrated using Pandas, which integrates Matplotlib methods directly into DataFrames, so explicit Matplotlib imports are unnecessary.
1. Line chart shows continuous relationships between columns. Example:
df_iris[["sepal length (cm)"]].plot.line()
plt.show()
ax = df[["sepal length (cm)"]].plot.line(color="green", title="Demo", style="--")
ax.set(xlabel="index", ylabel="length")
plt.show()2. Scatter chart examines relationships between discrete variables:
df = df_iris
df.plot.scatter(x='sepal length (cm)', y='sepal width (cm)')
from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot.scatter(x='sepal length (cm)', y='sepal width (cm)', s=df[['petal length (cm)']]*20, c=df['target'], cmap=cmap, title='different circle size by petal length (cm)')
plt.show()3. Histogram / Bar chart display distribution of a single column or compare categories:
df[["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"]].plot.hist()
df.target.value_counts().plot.bar()
plt.show()4. Pie chart / Box plot illustrate proportion of categories and distribution differences:
df.target.value_counts().plot.pie(legend=True)
df.boxplot(column=['target'], figsize=(10,5))
plt.show()Practical data exploration is then shown with two real datasets.
1. 2013 American Community Survey – a large census dataset (≈3.5 million households). After loading the CSV, the shape and descriptive statistics are inspected, then selected columns (SCHL, PINCP, ESR) are concatenated from two files and grouped by education level to examine distribution and average income.
# Read data
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("./ss13husa.csv")
print(df.shape) # (756065, 231)
print(df.describe())
# Concatenate two parts
pusa = pd.read_csv("ss13pusa.csv")
pusb = pd.read_csv("ss13pusb.csv")
cols = ['SCHL', 'PINCP', 'ESR']
ac_survey = pd.concat([pusa[cols], pusb[cols]], axis=0)
group = ac_survey.groupby('SCHL')
print('Education distribution:', group.size())
print('Average income:', group.mean())2. Boston Housing dataset – 506 samples with 13 features. After loading, the shape and descriptive statistics are shown, a histogram of the target variable (MEDV) is plotted, scatter plots explore relationships (e.g., MEDV vs. RM), and a Pearson correlation matrix is visualized with a heatmap.
# Load Boston housing data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("./housing.data")
print(df.shape) # (506, 14)
print(df.describe())
# Histogram of house price
df[['MEDV']].plot.hist()
plt.show()
# Scatter plot of price vs. number of rooms
df.plot.scatter(x='MEDV', y='RM')
plt.show()
# Correlation heatmap
corr = df.corr()
sns.heatmap(corr)
plt.show()These examples illustrate how Pandas can be used for quick visual exploration of datasets, helping identify key variables, relationships, and patterns before deeper modeling or analysis.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.