Big Data 32 min read

Multidimensional Data Visualization Strategies Using Python and the Wine Quality Dataset

This article explores effective strategies for visualizing one‑ to six‑dimensional data using Python libraries such as pandas, matplotlib, and seaborn, demonstrating each technique with the UCI Wine Quality dataset and providing code snippets for histograms, heatmaps, pair plots, 3‑D scatter plots, bubble charts, and more.

Python Programming Learning Circle

Nov 18, 2022

Multidimensional Data Visualization Strategies Using Python and the Wine Quality Dataset

Data aggregation, summarization, and visualization are three pillars of data analysis, and while traditional 2‑D visualizations are powerful, they become limited when dealing with multidimensional datasets. This tutorial examines a range of visualization techniques from 1‑D to 6‑D using the UCI Wine Quality dataset, illustrating each method with Python code.

1. Single‑Variable (1‑D) Visualization

Histograms and density plots are generated with pandas and seaborn to quickly assess the distribution of individual attributes such as sulphates.

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
import numpy as np
import seaborn as sns
%matplotlib inline

Data is loaded, wine type labels are added, and quality scores are bucketed into low, medium, and high categories.

white_wine = pd.read_csv('winequality-white.csv', sep=';')
red_wine = pd.read_csv('winequality-red.csv', sep=';')
red_wine['wine_type'] = 'red'
white_wine['wine_type'] = 'white'
red_wine['quality_label'] = red_wine['quality'].apply(lambda value: 'low' if value <= 5 else ('medium' if value <= 7 else 'high'))
red_wine['quality_label'] = pd.Categorical(red_wine['quality_label'], categories=['low','medium','high'])
white_wine['quality_label'] = white_wine['quality'].apply(lambda value: 'low' if value <= 5 else ('medium' if value <= 7 else 'high'))
white_wine['quality_label'] = pd.Categorical(white_wine['quality_label'], categories=['low','medium','high'])
wines = pd.concat([red_wine, white_wine])
wines = wines.sample(frac=1, random_state=42).reset_index(drop=True)

2. Two‑Variable (2‑D) Visualization

Correlation matrices are visualized as heatmaps, and pairwise scatter plots reveal relationships between attributes.

f, ax = plt.subplots(figsize=(10,6))
corr = wines.corr()
hm = sns.heatmap(round(corr,2), annot=True, ax=ax, cmap="coolwarm", fmt='.2f', linewidths=.05)
f.subplots_adjust(top=0.93)
t = f.suptitle('Wine Attributes Correlation Heatmap', fontsize=14)

3. Three‑Variable (3‑D) Visualization

Scatter plots with hue encode wine type, while 3‑D plots add depth for a third numeric attribute.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
xs = wines['residual sugar']
ys = wines['fixed acidity']
zs = wines['alcohol']
ax.scatter(xs, ys, zs, s=50, alpha=0.6, edgecolors='w')
ax.set_xlabel('Residual Sugar')
ax.set_ylabel('Fixed Acidity')
ax.set_zlabel('Alcohol')

4. Four‑Dimensional (4‑D) Visualization

Depth and color are combined with bubble size to represent an additional variable.

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')
xs = list(wines['residual sugar'])
ys = list(wines['alcohol'])
zs = list(wines['fixed acidity'])
colors = ['red' if wt=='red' else 'yellow' for wt in wines['wine_type']]
size = wines['total sulfur dioxide']
for x,y,z,c,s in zip(xs,ys,zs,colors,size):
    ax.scatter(x, y, z, alpha=0.4, c=c, edgecolors='none', s=s)
ax.set_xlabel('Residual Sugar')
ax.set_ylabel('Alcohol')
ax.set_zlabel('Fixed Acidity')

5. Five‑Dimensional (5‑D) Visualization

Bubble charts incorporate size for a fifth dimension while retaining depth and hue for the previous dimensions.

# (same code as 4‑D example, with size mapped to 'total sulfur dioxide')

6. Six‑Dimensional (6‑D) Visualization

Shape is added to encode wine quality labels, completing a six‑dimensional representation.

markers = [',' if q=='high' else ('x' if q=='medium' else 'o') for q in wines['quality_label']]
for x,y,z,c,s,m in zip(xs,ys,zs,colors,size,markers):
    ax.scatter(x, y, z, alpha=0.4, c=c, edgecolors='none', s=s, marker=m)

The article concludes that effective visualization is essential for extracting insights from high‑dimensional data and encourages readers to apply these techniques to their own datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Seaborn wine dataset

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.