Multidimensional Data Visualization Strategies Using Python and the Wine Quality Dataset
This article explores effective strategies for visualizing one‑ to six‑dimensional data using Python libraries such as pandas, matplotlib, and seaborn, demonstrating each technique with the UCI Wine Quality dataset and providing code snippets for histograms, heatmaps, pair plots, 3‑D scatter plots, bubble charts, and more.
Data aggregation, summarization, and visualization are three pillars of data analysis, and while traditional 2‑D visualizations are powerful, they become limited when dealing with multidimensional datasets. This tutorial examines a range of visualization techniques from 1‑D to 6‑D using the UCI Wine Quality dataset, illustrating each method with Python code.
1. Single‑Variable (1‑D) Visualization
Histograms and density plots are generated with pandas and seaborn to quickly assess the distribution of individual attributes such as sulphates.
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
import numpy as np
import seaborn as sns
%matplotlib inlineData is loaded, wine type labels are added, and quality scores are bucketed into low, medium, and high categories.
white_wine = pd.read_csv('winequality-white.csv', sep=';')
red_wine = pd.read_csv('winequality-red.csv', sep=';')
red_wine['wine_type'] = 'red'
white_wine['wine_type'] = 'white'
red_wine['quality_label'] = red_wine['quality'].apply(lambda value: 'low' if value <= 5 else ('medium' if value <= 7 else 'high'))
red_wine['quality_label'] = pd.Categorical(red_wine['quality_label'], categories=['low','medium','high'])
white_wine['quality_label'] = white_wine['quality'].apply(lambda value: 'low' if value <= 5 else ('medium' if value <= 7 else 'high'))
white_wine['quality_label'] = pd.Categorical(white_wine['quality_label'], categories=['low','medium','high'])
wines = pd.concat([red_wine, white_wine])
wines = wines.sample(frac=1, random_state=42).reset_index(drop=True)2. Two‑Variable (2‑D) Visualization
Correlation matrices are visualized as heatmaps, and pairwise scatter plots reveal relationships between attributes.
f, ax = plt.subplots(figsize=(10,6))
corr = wines.corr()
hm = sns.heatmap(round(corr,2), annot=True, ax=ax, cmap="coolwarm", fmt='.2f', linewidths=.05)
f.subplots_adjust(top=0.93)
t = f.suptitle('Wine Attributes Correlation Heatmap', fontsize=14)3. Three‑Variable (3‑D) Visualization
Scatter plots with hue encode wine type, while 3‑D plots add depth for a third numeric attribute.
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
xs = wines['residual sugar']
ys = wines['fixed acidity']
zs = wines['alcohol']
ax.scatter(xs, ys, zs, s=50, alpha=0.6, edgecolors='w')
ax.set_xlabel('Residual Sugar')
ax.set_ylabel('Fixed Acidity')
ax.set_zlabel('Alcohol')4. Four‑Dimensional (4‑D) Visualization
Depth and color are combined with bubble size to represent an additional variable.
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
xs = list(wines['residual sugar'])
ys = list(wines['alcohol'])
zs = list(wines['fixed acidity'])
colors = ['red' if wt=='red' else 'yellow' for wt in wines['wine_type']]
size = wines['total sulfur dioxide']
for x,y,z,c,s in zip(xs,ys,zs,colors,size):
ax.scatter(x, y, z, alpha=0.4, c=c, edgecolors='none', s=s)
ax.set_xlabel('Residual Sugar')
ax.set_ylabel('Alcohol')
ax.set_zlabel('Fixed Acidity')5. Five‑Dimensional (5‑D) Visualization
Bubble charts incorporate size for a fifth dimension while retaining depth and hue for the previous dimensions.
# (same code as 4‑D example, with size mapped to 'total sulfur dioxide')6. Six‑Dimensional (6‑D) Visualization
Shape is added to encode wine quality labels, completing a six‑dimensional representation.
markers = [',' if q=='high' else ('x' if q=='medium' else 'o') for q in wines['quality_label']]
for x,y,z,c,s,m in zip(xs,ys,zs,colors,size,markers):
ax.scatter(x, y, z, alpha=0.4, c=c, edgecolors='none', s=s, marker=m)The article concludes that effective visualization is essential for extracting insights from high‑dimensional data and encourages readers to apply these techniques to their own datasets.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.