Fundamentals 4 min read

Visualizing Synthetic Data with Pandas and Seaborn: A Step‑by‑Step Guide

This tutorial demonstrates how to generate synthetic datasets with NumPy, organize them into a Pandas DataFrame, and explore their distributions using Seaborn’s histograms, KDE plots, boxplots, violin plots, as well as multivariate visualizations like heatmaps, pair plots, and joint plots.

Model Perspective
Model Perspective
Model Perspective
Visualizing Synthetic Data with Pandas and Seaborn: A Step‑by‑Step Guide

Preparing Data and Required Libraries

Mainly using the Pandas and Seaborn libraries.

import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

Generate four datasets and convert them to a DataFrame data type.

xarray = np.linspace(0,10,100)  # generate 100 numbers from 0 to 10
yarray = xarray**3 + np.random.normal(0,100,100)  # y = x^3 + normal noise
zarray = -100 * xarray + np.random.normal(0,10,100)  # y = -100x + normal noise
warray = 200 * xarray**0.5 + np.random.normal(0,10,100)

Univariate Analysis

Histogram of Frequency Distribution

df.hist(bins=15, color='steelblue', edgecolor='black', linewidth=1.0,
        xlabelsize=8, ylabelsize=8, grid=False)

Probability Density Curve

sns.kdeplot(df['w'])

Box Plot

sns.boxplot(data=df)

Violin Plot

Using a kernel density plot to show grouped numeric data provides an effective way to depict the probability density at different values.

sns.violinplot(data=df)

Multivariate Analysis

Correlation Heatmap

sns.heatmap(round(df.corr(),2), annot=True, cmap="coolwarm", fmt='.2f', linewidths=.05)

The gradient in the heatmap varies with the strength of correlation, making it easy to spot attributes that are strongly related.

Pair Plot

sns.pairplot(data=df, diag_kind='kde')

Joint Probability Distribution

sns.jointplot(x='x', y='y', data=df, kind='kde')
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

statisticspandasSeaborn
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.