Big Data 18 min read

What Do Heart‑Disease Data Reveal? A Python‑Driven Exploratory Analysis

This article walks through a Python‑based exploratory analysis of a public heart‑disease dataset, loading the data, describing its 14 clinical features, visualizing gender, age, heart‑rate, blood‑pressure and cholesterol relationships, and presenting correlation insights to help understand patterns of disease prevalence.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
What Do Heart‑Disease Data Reveal? A Python‑Driven Exploratory Analysis

Data Set Loading and Simple Description

First, import the necessary libraries and set up the environment for analysis.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

The dataset contains 303 rows and 14 columns representing various clinical measurements such as age, sex, chest pain type, resting blood pressure, cholesterol, fasting blood sugar, resting ECG, maximum heart rate, exercise‑induced angina, ST depression, slope, number of major vessels, thalassemia and the target label (0 = no disease, 1 = disease).

age: patient age

sex: 1 = male, 0 = female

cp: chest‑pain type (1‑typical angina, 2‑atypical angina, 3‑non‑anginal pain, 4‑asymptomatic)

trestbps: resting blood pressure (mm Hg)

chol: serum cholesterol (mg/dl)

fbs: fasting blood sugar > 120 mg/dl (1 = true, 0 = false)

restecg: resting ECG results (0‑normal, 1‑ST‑T wave abnormality, 2‑left ventricular hypertrophy)

thalach: maximum heart rate achieved

exang: exercise‑induced angina (1 = yes, 0 = no)

oldpeak: ST depression induced by exercise relative to rest

slope: slope of the peak exercise ST segment (1‑upsloping, 2‑flat, 3‑downsloping)

ca: number of major vessels (0‑4) colored by fluoroscopy

thal: thalassemia (3‑normal, 6‑fixed defect, 7‑reversible defect)

target: heart disease (0 = no, 1 = yes)

These variables describe physiological measurements but lack lifestyle factors such as smoking or exercise habits, limiting direct health recommendations.

Gender Ratio and Disease Prevalence

countNoDisease = len(data[data.target == 0])
countHaveDisease = len(data[data.target == 1])
countfemale = len(data[data.sex == 0])
countmale = len(data[data.sex == 1])
print(f'No disease: {countNoDisease}', end=' ,')
print("No disease rate: {:.2f}%".format(countNoDisease / len(data.target) * 100))
print(f'Disease: {countHaveDisease}', end=' ,')
print("Disease rate: {:.2f}%".format(countHaveDisease / len(data.target) * 100))
print(f'Female: {countfemale}', end=' ,')
print("Female ratio: {:.2f}%".format(countfemale / len(data.sex) * 100))
print(f'Male: {countmale}', end=' ,')
print("Male ratio: {:.2f}%".format(countmale / len(data.sex) * 100))
No disease: 138, No disease rate: 45.54% Disease: 165, Disease rate: 54.46% Female: 96, Female ratio: 31.68% Male: 207, Male ratio: 68.32%

Age and Disease Relationship

pd.crosstab(data.age, data.target).plot(kind="bar", figsize=(25,8))
plt.title('Disease Distribution Across Ages')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

Age‑Heart Rate‑Disease Relationship

# Scatter plot
plt.scatter(x=data.age[data.target==1], y=data.thalach[data.target==1], c="red")
plt.scatter(x=data.age[data.target==0], y=data.thalach[data.target==0], c="#41D3BD")
plt.legend(["Disease", "No disease"])
plt.xlabel("Age")
plt.ylabel("Maximum Heart Rate")
plt.show()
# Violin plot of resting blood pressure by disease
sns.violinplot(x=data.target, y=data.trestbps, data=data)
plt.show()

Age and Resting Blood Pressure Distribution

plt.scatter(x=data.age[data.target==1], y=data.trestbps[data.target==1], c="#FFA773")
plt.scatter(x=data.age[data.target==0], y=data.trestbps[data.target==0], c="#8DE0FF")
plt.legend(["Disease", "No disease"])
plt.xlabel("Age")
plt.ylabel("Resting Blood Pressure")
plt.show()

The scatter plot shows a fairly uniform distribution of blood pressure across ages for both disease and non‑disease groups, indicating that resting blood pressure alone is not a strong predictor of heart disease in this sample.

Blood Pressure and Heart Rate Relationship

plt.scatter(x=data.thalach[data.target==1], y=data.trestbps[data.target==1], c="#FFA773")
plt.scatter(x=data.thalach[data.target==0], y=data.trestbps[data.target==0], c="#8DE0FF")
plt.legend(["Disease", "No disease"])
plt.xlabel("Maximum Heart Rate")
plt.ylabel("Resting Blood Pressure")
plt.show()

In this dataset, blood pressure and heart rate appear uncorrelated for both disease and non‑disease groups.

Chest Pain Type, Disease and Blood Pressure

sns.swarmplot(x='target', y='trestbps', hue='cp', data=data, size=6)
plt.xlabel('Disease')
plt.show()
fig,ax=plt.subplots(1,2,figsize=(14,5))
sns.countplot(x='cp', data=data, hue='target', palette='Set3', ax=ax[0])
ax[0].set_xlabel('Chest Pain Type')
data.cp.value_counts().plot.pie(ax=ax[1], autopct='%1.1f%%', explode=[0.01,0.01,0.01,0.01], shadow=True, cmap='Blues')
ax[1].set_title('Chest Pain Type Distribution')
plt.show()

Patients with chest‑pain type 0 dominate the non‑disease group, while types 1‑3 are more common among those with disease.

Exercise‑Induced Angina, Disease and Heart Rate

sns.swarmplot(x='exang', y='thalach', hue='target', data=data, size=6)
plt.xlabel('Exercise‑induced Angina')
plt.ylabel('Maximum Heart Rate')
plt.show()

Patients without exercise‑induced angina tend to have higher maximum heart rates (160‑180 bpm) and a higher proportion of disease, whereas those with angina show lower heart rates (120‑150 bpm) and many are disease‑free.

Number of Major Vessels (ca), Blood Pressure and Disease

plt.figure(figsize=(15,5))
sns.swarmplot(y='trestbps', x='ca', hue='target', data=data, palette='RdBu_r', size=7)
plt.xlabel('Number of Major Vessels')
plt.ylabel('Resting Blood Pressure')
plt.show()
plt.figure(figsize=(15,5))
sns.catplot(x='ca', y='age', hue='target', kind='swarm', data=data, palette='RdBu_r')
plt.xlabel('Number of Major Vessels')
plt.ylabel('Age')
plt.show()

Zero major vessels (ca = 0) are strongly associated with disease presence.

Age and Cholesterol Relationship

plt.scatter(x=data.age[data.target==1], y=data.chol[data.target==1], c="orange")
plt.scatter(x=data.age[data.target==0], y=data.chol[data.target==0], c="green")
plt.legend(["Disease", "No disease"])
plt.xlabel("Age")
plt.ylabel("Cholesterol")
plt.show()
sns.boxplot(x=data.target, y=data.chol, data=data)
plt.show()

Cholesterol levels show no clear separation between disease and non‑disease groups; the box plot indicates only a slight decrease in median cholesterol for diseased patients.

Correlation Analysis

plt.figure(figsize=(15,10))
ax = sns.heatmap(data.corr(), cmap=plt.cm.RdYlBu_r, annot=True, fmt='.2f')
a,b = ax.get_ylim()
ax.set_ylim(a+0.5, b-0.5)
plt.show()

The heatmap reveals that the target variable is positively correlated with chest‑pain type (cp), maximum heart rate (thalach) and slope, and negatively correlated with exercise‑induced angina (exang), ST depression (oldpeak), number of vessels (ca) and thalassemia (thal).

Overall, this exploratory analysis visualizes several relationships within the heart‑disease dataset, highlighting gender imbalance, age‑related disease trends, and the limited predictive power of individual clinical measurements without further modeling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythoncorrelationvisualizationexploratory data analysisheart diseasemedical dataset
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.