Fundamentals 18 min read

Master Data Distribution Visualization with Seaborn: Histograms to Violin Plots

This tutorial walks through essential seaborn techniques for visualizing data distributions—including univariate histograms, conditional histograms, KDE curves, ECDFs, boxplots, violin plots, bivariate histograms, and joint plots—providing code snippets, parameter explanations, and practical examples using the penguins dataset.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Master Data Distribution Visualization with Seaborn: Histograms to Violin Plots

Effective data analysis often begins with understanding the distribution of a dataset, and seaborn offers a rich set of visual tools to explore these properties. The following sections demonstrate how to create and customize various distribution plots using the penguins dataset.

Univariate Histogram

A histogram bins data points and displays the frequency of each bin as bars. Different bin counts or widths can dramatically affect the visual result.

import seaborn as sns
data = sns.load_dataset("penguins", data_home="./data/seaborn-data")
# Different bin specifications
sns.displot(data, x='flipper_length_mm', bins=20)
sns.displot(data, x='flipper_length_mm', bins=50)
sns.displot(data, x='flipper_length_mm', binwidth=5)

Histogram Normalization

Normalization makes histograms comparable across datasets. Two common approaches are density normalization (area = 1) and probability normalization (sum of bar heights = 1).

# Density normalization
sns.displot(data, x='flipper_length_mm', bins=20, stat='density')
# Probability normalization
sns.displot(data, x='flipper_length_mm', bins=20, stat='probability')

Conditional Histogram

By setting the hue parameter, you can compare the distribution of a variable across categories such as species, sex, or island.

sns.displot(data, x="flipper_length_mm", hue='species')
sns.displot(data, x="flipper_length_mm", hue='sex')
sns.displot(data, x="flipper_length_mm", hue='island')

Normalization in Conditional Histograms

Setting common_norm=False normalizes each condition independently, highlighting differences in shape rather than scale.

sns.displot(data, x="flipper_length_mm", hue='species')
sns.displot(data, x="flipper_length_mm", hue='species', stat='probability')
sns.displot(data, x="flipper_length_mm", hue='species', stat='probability', common_norm=False)

Kernel Density Estimation (KDE)

KDE provides a smooth estimate of the underlying probability density. Bandwidth controls the smoothness; smaller values reveal more detail but can introduce noise.

sns.displot(data, x="flipper_length_mm", kind="kde", bw_adjust=1)
sns.displot(data, x="flipper_length_mm", kind="kde", bw_adjust=0.25)
sns.displot(data, x="flipper_length_mm", kind="kde", bw_adjust=1.5)

Combining Histogram and KDE

Overlaying a KDE curve on a histogram gives both discrete and continuous perspectives.

sns.displot(data, x="flipper_length_mm", kde=True)
sns.displot(data, x="flipper_length_mm", hue="species", kde=True)

Boxplot

Boxplots summarize key statistics (median, quartiles, IQR, outliers) and are useful for quick comparisons across categories.

sns.boxplot(data=data, x="flipper_length_mm", y='species', showmeans=True,
            meanprops={"marker":"o", "markerfacecolor":"red", "markeredgecolor":"black", "markersize":5})

sns.boxplot(data=data, x="flipper_length_mm", y='species', hue='sex', showmeans=True,
            meanprops={"marker":"o", "markerfacecolor":"red", "markeredgecolor":"black", "markersize":5})

sns.boxplot(data=data, x="flipper_length_mm", showmeans=True,
            meanprops={"marker":"o", "markerfacecolor":"red", "markeredgecolor":"black", "markersize":5})

Violin Plot

Violin plots combine a boxplot with a KDE curve, displaying the full distribution shape and individual sample points.

sns.violinplot(data=data, x="flipper_length_mm", y='species', inner="stick")
sns.violinplot(data=data, x="flipper_length_mm", y='species', hue='sex', split=True, inner="stick")
sns.violinplot(data=data, x="flipper_length_mm", inner="stick")

Empirical Cumulative Distribution Function (ECDF)

ECDF plots the cumulative proportion of observations without requiring binning or bandwidth selection.

sns.displot(data, x="flipper_length_mm", kind="ecdf")
sns.displot(data, x="flipper_length_mm", hue="species", kind="ecdf")

Bivariate Histogram and KDE

Two‑dimensional histograms (heatmaps) and bivariate KDEs reveal joint distributions. The multiple="stack" option can be used for stacked conditional visualizations.

sns.displot(data, x="flipper_length_mm", y="species", cbar=True)
sns.displot(data, x="bill_length_mm", y="bill_depth_mm", hue="species")
sns.displot(data, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")

Joint Plot and JointGrid

Joint plots combine scatter (or histogram) visualizations of two variables with marginal distributions. JointGrid allows custom combinations of plot types.

# Simple joint scatter with marginal histograms
sns.jointplot(data=data, x="bill_length_mm", y="bill_depth_mm")

# Joint KDE with marginal KDEs
sns.jointplot(data=data, x="bill_length_mm", y="bill_depth_mm", hue="species", kind="kde")

# Custom JointGrid examples
g = sns.JointGrid(data=data, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.histplot)
g.plot_marginals(sns.kdeplot)

g = sns.JointGrid(data=data, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.histplot)
g.plot_marginals(sns.boxplot)

g = sns.JointGrid(data=data, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.scatterplot)
g.plot_marginals(sns.boxplot)

These seaborn visualizations provide a comprehensive toolbox for exploring and presenting data distributions, from simple univariate histograms to complex joint and conditional plots.

PythonstatisticsData VisualizationdistributionSeabornHistogramKDE
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.