Effective Visualization of Multi‑Dimensional Data Using Python and the Wine Quality Dataset
This article explains how to explore and visualize one‑ to six‑dimensional data with Python libraries such as pandas, matplotlib and seaborn, using the UCI Wine Quality dataset to demonstrate histograms, density plots, pairwise matrices, 3‑D scatter, bubble charts and facet grids for both numeric and categorical variables.
Data aggregation, summarization and visualization are the three pillars of data analysis. While 2‑D visualizations are common, higher‑dimensional data require more sophisticated strategies. This tutorial walks through effective techniques for visualizing 1‑D to 6‑D data using the Wine Quality dataset.
1. One‑Dimensional Visualization
Histograms and kernel density plots are the quickest way to see the distribution of a single numeric attribute. The following code creates a histogram of the sulphates column:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
wines["sulphates"].hist(bins=15, color='steelblue', edgecolor='black')
plt.title('Sulphates Content in Wine')
plt.xlabel('Sulphates')
plt.ylabel('Frequency')
plt.show()Box plots and violin plots are useful for comparing the distribution of a numeric variable across categorical groups (e.g., wine quality).
sns.boxplot(x="quality", y="alcohol", data=wines)
sns.violinplot(x="quality", y="sulphates", hue="wine_type", data=wines, split=True)2. Two‑Dimensional Visualization
Correlation matrices visualized as heatmaps reveal relationships between numeric attributes. Pairwise scatter plots (with optional regression lines) help spot patterns.
import seaborn as sns
corr = wines.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
pp = sns.pairplot(wines, hue='wine_type')Bar charts, stacked bar charts and count plots (via sns.countplot) are effective for categorical variables.
sns.countplot(x='quality', hue='wine_type', data=wines)3. Three‑Dimensional Visualization
Scatter plots with a third axis (z‑axis) or bubble size can encode an additional numeric dimension. Hue can encode a categorical dimension.
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(wines['residual sugar'], wines['alcohol'], wines['fixed acidity'],
c=wines['wine_type'].map({'red':'red','white':'yellow'}), alpha=0.4)
ax.set_xlabel('Residual Sugar')
ax.set_ylabel('Alcohol')
ax.set_zlabel('Fixed Acidity')
plt.show()Bubble charts use point size to represent a fourth variable, while hue still distinguishes wine type.
plt.scatter(wines['fixed acidity'], wines['alcohol'],
s=wines['residual sugar']*25, c=fill_colors, edgecolors=edge_colors, alpha=0.4)4. Four‑Dimensional Visualization
Combining depth (z‑axis), hue, and size allows four variables to be displayed in a single 3‑D scatter plot.
# See the 3‑D example above where size encodes 'residual sugar' and hue encodes 'wine_type'.5. Five‑Dimensional Visualization
A 5‑D plot can be built by adding another variable as point size (e.g., total sulfur dioxide) while keeping depth, hue and the two spatial axes.
for (x,y,z), color, size in zip(data_points, colors, ss):
ax.scatter(x, y, z, c=color, s=size, alpha=0.4, edgecolors='none')6. Six‑Dimensional Visualization
Six dimensions are visualized by adding marker shape to encode a categorical variable (quality label) on top of depth, hue, size and the two axes.
markers = [',' if q=='high' else 'x' if q=='medium' else 'o' for q in wines['quality_label']]
for (x,y,z), color, size, mark in zip(data_points, colors, ss, markers):
ax.scatter(x, y, z, c=color, s=size, marker=mark, alpha=0.4, edgecolors='none')Facet grids ( sns.FacetGrid) can replace depth by creating separate sub‑plots for categorical dimensions, while hue and point size continue to encode additional variables.
g = sns.FacetGrid(wines, row='wine_type', col='quality', hue='quality_label', size=4)
g.map(plt.scatter, 'residual sugar', 'alcohol', s=wines['total sulfur dioxide']*2, alpha=0.5)
plt.show()Conclusion
Effective multi‑dimensional visualization combines basic chart components—axes, color, size, shape, depth, and faceting—to convey complex relationships without overwhelming the viewer. The techniques demonstrated with the wine dataset can be applied to any data science project to explore and communicate insights.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
