Fundamentals 13 min read

Concrete Strength Data Analysis Using Pandas: A Step‑by‑Step Tutorial

This tutorial walks through a complete pandas‑based workflow for analyzing a concrete‑strength dataset, covering data loading, cleaning, exploratory visualizations, correlation analysis, and targeted sub‑group investigations to uncover factors influencing product strength and suggest improvement measures.

DataFunSummit

Dec 15, 2021

Concrete Strength Data Analysis Using Pandas: A Step‑by‑Step Tutorial

The article explains why many learners struggle with pandas despite following examples, emphasizing the need for a systematic analysis mindset that tells a coherent story with data.

It outlines the objectives of a concrete‑strength case study: verify whether product strength varies significantly, identify key factors affecting strength, and propose improvement suggestions.

First, essential libraries are imported and their versions printed:

# 先导入必要的计算包并查看版本，最好将pandas升级到0.24以上
import numpy as np
import pandas as pd
import matplotlib as mpl
import seaborn as sns
import matplotlib.pyplot as plt
for model in np,pd,mpl,sns:
    print(model.__name__, model.__version__)

Warnings are suppressed and a plotting style (e.g., plt.style.use('bmh')) is set for nicer figures.

The dataset is loaded, column names are simplified, and df.head() is displayed to inspect the first rows.

# 简化字段名称
df.columns = ['水泥含量','高炉矿渣含量','粉煤灰量','含水量','减水剂含量','粗骨料含量','细骨料含量','龄期','强度/Mpa']
df.head()

Basic data inspection uses df.info() and df.describe() to reveal data types, missing values, and summary statistics.

>>>df.info()
... (output omitted for brevity) ...

Strength distribution is visualized with a histogram and a box plot:

plt.figure(figsize=(15,6))
plt.subplot(121)
df['强度/Mpa'].plot(kind='hist', width=3.5)
plt.xlabel('强度/Mpa')
plt.title('产品强度的概率密度分布')
plt.subplot(122)
plt.boxplot(df['强度/Mpa'])
plt.title('强度的箱线图')
plt.show()

Box‑plot analysis shows a normal‑like distribution with a long tail of high‑strength outliers, confirming the client’s complaint about strength instability.

Next, the tutorial examines each predictor variable with box plots and scatter plots against strength, revealing positive correlations for cement and super‑plasticizer, a negative correlation for water content, and ambiguous relationships for slag and age.

# 单变量箱线图
plt.figure(figsize=(20,16))
for i, feature in enumerate(list(df.columns[:-1])):
    plt.subplot(2,4,i+1)
    plt.boxplot(df[feature])
    plt.title(feature)
# 散点图
plt.figure(figsize=(20,16))
for i, feature in enumerate(list(df.columns[:-1])):
    plt.subplot(3,3,i+1)
    plt.scatter(df[feature], df['强度/Mpa'])
    plt.xlabel(feature)
    plt.ylabel('强度/Mpa')
plt.show()

Quantitative correlation is computed with df.corr(), highlighting that cement and super‑plasticizer have notable positive Pearson coefficients with strength, while age shows a weaker but present relationship.

To address noisy age values, samples with age > 56 days are filtered out, and the analysis is repeated, resulting in a clearer positive correlation between age and strength.

df_age56 = df[df['龄期'] <= 56]
df_age56.shape
# repeat visualizations and correlation on df_age56

The article concludes that combining qualitative visual inspection with quantitative correlation yields a robust understanding of factors influencing concrete strength, and encourages readers to apply similar grouping and variable‑selection strategies to other datasets.

Additional resources and links to other data‑analysis case studies are provided at the end of the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data analysis correlation visualization Pandas concrete strength

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.