Common Statistical Methods for Data Analysis with Python Code Examples
This article introduces ten common statistical techniques used in data analysis—including descriptive statistics, correlation, t‑test, ANOVA, linear regression, PCA, outlier detection, frequency distribution, time‑series analysis, and non‑parametric tests—providing concise explanations and Python code snippets for each method.
In data analysis, various statistical methods help reveal trends, relationships, and distributions within datasets.
1. Descriptive Statistics : Computes basic metrics such as mean, median, and standard deviation to provide an overall summary of the data.
import numpy as np
data = [1, 2, 3, 4, 5]
mean = np.mean(data) # calculate mean
median = np.median(data) # calculate median
std = np.std(data) # calculate standard deviation
print("Mean:", mean)
print("Median:", median)
print("Std:", std)2. Correlation Analysis : Measures the linear relationship between two variables using the Pearson correlation coefficient.
import numpy as np
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
correlation = np.corrcoef(x, y)[0, 1] # calculate correlation coefficient
print("Correlation:", correlation)3. t‑Test : Compares the means of two independent samples to determine if they differ significantly.
from scipy import stats
group1 = [1, 2, 3, 4, 5]
group2 = [2, 4, 6, 8, 10]
t_statistic, p_value = stats.ttest_ind(group1, group2)
print("t statistic:", t_statistic)
print("p value:", p_value)4. ANOVA (Analysis of Variance) : Extends the t‑test to compare means across three or more groups.
from scipy import stats
group1 = [1, 2, 3, 4, 5]
group2 = [2, 4, 6, 8, 10]
group3 = [3, 6, 9, 12, 15]
f_statistic, p_value = stats.f_oneway(group1, group2, group3)
print("F statistic:", f_statistic)
print("p value:", p_value)5. Linear Regression : Fits a linear model to predict a dependent variable from an independent variable using the least‑squares method.
import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10])
regression = LinearRegression()
regression.fit(x, y)
intercept = regression.intercept_ # intercept
slope = regression.coef_[0] # slope
print("Intercept:", intercept)
print("Slope:", slope)6. Principal Component Analysis (PCA) : Reduces data dimensionality by extracting the most important features.
import numpy as np
from sklearn.decomposition import PCA
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(data)
print("Reduced data:")
print(reduced_data)7. Outlier Detection : Identifies anomalous observations, for example using a box plot.
import matplotlib.pyplot as plt
data = [1, 2, 3, 4, 5, 10]
plt.boxplot(data)
plt.show()8. Frequency Distribution : Calculates counts and frequencies of values and visualizes them with a histogram.
import numpy as np
import matplotlib.pyplot as plt
data = np.array([1, 2, 2, 3, 3, 3, 4, 4, 5])
counts, bins, _ = plt.hist(data, bins=5)
plt.show()
print("Counts:", counts)
print("Frequencies:", counts / len(data))9. Time‑Series Analysis : Examines trends and seasonality in data indexed by time.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.Series([1, 2, 3, 4, 5], index=pd.date_range('2021-01-01', periods=5))
data.plot()
plt.show()10. Non‑Parametric Tests : Performs statistical inference without assuming a specific data distribution, such as the Mann‑Whitney U test.
from scipy import stats
group1 = [1, 2, 3, 4, 5]
group2 = [2, 4, 6, 8, 10]
u_statistic, p_value = stats.mannwhitneyu(group1, group2)
print("U statistic:", u_statistic)
print("p value:", p_value)These examples cover the most common statistical methods used in data analysis, allowing you to select and implement the appropriate technique based on your specific data characteristics and analytical goals.
Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.