Fundamentals 21 min read

Master Data Visualization: Core Concepts, Chart Selection, and Python Code Samples

This comprehensive guide explains what data visualization is, why it matters, how to choose the right chart type, preprocess data, design effective visuals, select appropriate Python tools, and provides numerous code examples for pie, bar, histogram, box, scatter, bubble, and deviation charts, concluding with best‑practice insights.

Data Party THU
Data Party THU
Data Party THU
Master Data Visualization: Core Concepts, Chart Selection, and Python Code Samples

What is Data Visualization

Data visualization converts numerical data into visual forms that humans can perceive quickly, enabling rapid identification of patterns, trends, and anomalies. Common visual forms include charts, maps, graphs, dashboards, heatmaps, word clouds, and animated visuals.

Why Data Visualization Matters

Visualization accelerates comprehension because the brain processes images faster than numbers, reveals hidden patterns in large datasets, supports rational decision‑making, and serves as a universal communication language. Anscombe’s Quartet illustrates that identical statistical summaries can correspond to very different shapes, underscoring the need for visual inspection.

How to Create Effective Visualizations

1. Choose the Right Chart Type

Visualization goals fall into four categories: comparison, distribution, relationship/trend, and composition. Recommended chart types:

Comparison : bar, grouped bar, stacked bar.

Distribution : histogram, kernel density plot, box plot, violin plot.

Relationship/Trend : scatter, bubble, line, area, heatmap.

Composition : pie, donut, stacked bar, treemap, radar.

Deviation : error bars, residual plots.

2. Data Pre‑Processing

Before plotting, clean and transform data: merge or sample to simplify structure, apply dimensionality reduction, perform feature selection or generation, and discretize or transform attributes for clearer numeric representation.

3. Design and Interaction

Balance accuracy with aesthetics. Use color contrast to highlight key information and add interactive elements (hover, zoom) when possible.

4. Tool Selection (Python)

Popular libraries:

Matplotlib/Seaborn : foundational, suitable for research and teaching.

Plotly/Bokeh : enable interactive visualizations for sharing.

ECharts, D3.js : front‑end, large‑scale interactive visualizations.

Code Examples

Composition Charts

Pie Chart

import matplotlib.pyplot as plt
labels = ['A', 'B', 'C', 'D']
sizes = [25, 30, 20, 25]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90,
        colors=['red', 'lightgreen', 'lightblue', 'yellow'])
plt.title('Composition - Pie Chart')
plt.show()

Donut Chart

import matplotlib.pyplot as plt
labels = ['Category A', 'Category B', 'Category C', 'Category D']
sizes = [25, 30, 20, 25]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90,
        colors=['red', 'green', 'blue', 'yellow'],
        wedgeprops=dict(width=0.3))
plt.title('Donut Chart - Data Composition')
plt.show()

Stacked Bar Chart

import matplotlib.pyplot as plt
import numpy as np
categories = ['Category A', 'Category B', 'Category C']
values1 = [25, 30, 20]
values2 = [10, 15, 25]
bar_width = 0.5
index = np.arange(len(categories))
plt.bar(index, values1, width=bar_width, label='Group 1', color='red')
plt.bar(index, values2, width=bar_width, bottom=values1,
        label='Group 2', color='orange')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Stacked Bar Chart - Data Composition')
plt.xticks(index, categories)
plt.legend()
plt.show()

Distribution Charts

Histogram with Kernel Density Estimate

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30, density=True, color='skyblue', edgecolor='black')
sns.kdeplot(data, color='red')
mean_value = np.mean(data)
std_dev = np.std(data)
plt.axvline(mean_value, color='green', linestyle='dashed',
            label=f'Mean: {mean_value:.2f}')
plt.axvline(mean_value + std_dev, color='orange', linestyle='dashed',
            label=f'Std Dev: {std_dev:.2f}')
plt.axvline(mean_value - std_dev, color='orange', linestyle='dashed')
plt.title('Histogram and KDE with Mean and Std Dev')
plt.xlabel('Values')
plt.ylabel('Density')
plt.legend()
plt.show()

Box Plot

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.boxplot(x=data, color='skyblue')
plt.subplot(1, 2, 2)
sns.boxplot(y=data, color='skyblue')
plt.title('Box Plot - Data Distribution')
plt.show()

Comparison Charts

Grouped Horizontal Bar Chart

import matplotlib.pyplot as plt
import numpy as np
categories = ['Category A', 'Category B', 'Category C']
values1 = [4, 7, 3]
values2 = [2, 5, 8]
bar_height = 0.35
index = np.arange(len(categories))
plt.barh(index, values1, height=bar_height, label='Group 1', color='blue')
plt.barh(index + bar_height, values2, height=bar_height,
         label='Group 2', color='orange')
plt.ylabel('Categories')
plt.xlabel('Values')
plt.title('Grouped Horizontal Bar Chart')
plt.yticks(index + bar_height/2, categories)
plt.legend()
plt.show()

Grouped Vertical Bar Chart

import matplotlib.pyplot as plt
import numpy as np
categories = ['Category A', 'Category B', 'Category C']
values1 = [4, 7, 3]
values2 = [2, 5, 8]
bar_width = 0.35
index = np.arange(len(categories))
plt.bar(index, values1, width=bar_width, label='Group 1', color='blue')
plt.bar(index + bar_width, values2, width=bar_width,
        label='Group 2', color='orange')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Grouped Vertical Bar Chart')
plt.xticks(index + bar_width/2, categories)
plt.legend()
plt.show()

Relationship & Trend Charts

Bubble Chart

import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(30)
y = np.random.rand(30)
sizes = np.random.rand(30) * 1000
plt.scatter(x, y, s=sizes, alpha=0.7, c='skyblue', edgecolors='black')
plt.title('Bubble Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Scatter & Line Chart (Trend Over Time)

import matplotlib.pyplot as plt
import numpy as np
time = np.arange(0, 10, 0.1)
data = np.sin(time) + 0.2 * np.random.randn(len(time))
plt.scatter(time, data, color='green')
plt.plot(time, data, color='blue')
plt.plot(time, np.sin(time), color='red')
plt.title('Scatter Plot - Trend Over Time')
plt.xlabel('Time')
plt.ylabel('Values')
plt.show()

Deviation Charts

Error Bar Plot

import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 4, 6])
error = np.array([0.5, 0.3, 0.2, 0.4, 0.6])
plt.errorbar(x, y, yerr=error, fmt='o', color='blue', capsize=5)
plt.title('Error Bar Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Residual Plot (Regression)

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(100)
y = 2 * x + 1 + 0.1 * np.random.randn(100)
sns.regplot(x=x, y=y, ci=None, line_kws={'color':'red'})
residuals = y - (2 * x + 1)
sns.residplot(x=x, y=residuals, lowess=True, color='blue')
plt.title('Residual Plot')
plt.xlabel('X-axis')
plt.ylabel('Residuals')
plt.show()

References

https://mp.weixin.qq.com/s/ffbmojSucQBrlOlRuCJ7qw

https://zhuanlan.zhihu.com/p/657259480

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythondata analysisData visualizationMatplotlibvisual analyticschart typesSeaborn
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.