Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python
Factor analysis, a statistical technique for uncovering underlying common factors among variables, is explained alongside its distinction from PCA, detailed procedural steps, adequacy tests, and a hands‑on Python implementation using the factor_analyzer library with visualizations and factor rotation methods.
Factor Analysis
Factor analysis is a common statistical method used to explore relationships among multiple variables, identify common factors, and combine them into fewer dimensions (factors) to explain variance.
For example, a questionnaire on personal health may contain variables such as weight, height, exercise frequency, and diet habits, which can be grouped into factors like "physical health" and "dietary habits".
Relationship with Principal Component Analysis (PCA)
Both are dimensionality‑reduction techniques, but their goals differ. PCA seeks linear combinations that retain maximal variance, while factor analysis focuses on uncovering latent common factors that explain correlations among observed variables.
Steps of Factor Analysis
The main steps are:
Standardize the data sample.
Compute the correlation matrix R.
Obtain eigenvalues and eigenvectors of R.
Determine the number of factors based on cumulative contribution.
Calculate the factor loading matrix A.
Finalize the factor model.
factor_analyzer Library
The core Python library for factor analysis is factor_analyzer. Install it with pip install factor_analyzer. It provides functions for PCA, minimum residual, maximum likelihood, factor rotation, factor scores, and methods to retrieve loadings, communalities, variance, and eigenvalues.
# Data processing
import pandas as pd
import numpy as np
# Plotting
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# Factor analysis
from factor_analyzer import FactorAnalyzerLoad the dataset:
df = pd.read_excel('data/grades2.xlsx', index_col=0).iloc[:, -3]
df = df.dropna()
df.head()Sufficiency Tests
Before performing factor analysis, adequacy tests such as Bartlett’s test of sphericity and the KMO test assess whether variables are sufficiently correlated.
Bartlett’s Test
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi_square_value, p_value = calculate_bartlett_sphericity(df)
chi_square_value, p_valueIf the matrix is not an identity matrix, variables are correlated and factor analysis is appropriate.
KMO Test
# KMO test
from factor_analyzer.factor_analyzer import calculate_kmo
kmo_all, kmo_model = calculate_kmo(df)
kmo_modelA KMO value above 0.6 (e.g., 0.885) indicates sufficient correlation for factor analysis.
Choosing the Number of Factors
Compute eigenvalues of the correlation matrix and plot them (scree plot) to decide how many factors to retain.
faa = FactorAnalyzer(25, rotation=None)
faa.fit(df)
ev, v = faa.get_eigenvalues()
ev, vVisualize the eigenvalues:
# Scatter and line plot of eigenvalues
plt.scatter(range(1, df.shape[1] + 1), ev)
plt.plot(range(1, df.shape[1] + 1), ev)
plt.title("Scree Plot")
plt.xlabel("Factors")
plt.ylabel("Eigenvalue")
plt.grid()
plt.show()Factor Rotation
Building the Factor Model
Here we choose varimax (maximum variance) rotation with two factors.
# Choose varimax rotation with 2 factors
faa_two = FactorAnalyzer(2, rotation='varimax')
faa_two.fit(df)
faa_two.get_communalities()Other rotation options include promax, oblimin, oblimax, quartimin, quartimax, and equamax.
Inspecting Factor Variance
faa_two.get_factor_variance()Visualizing Loadings
df1 = pd.DataFrame(np.abs(faa_two.loadings_), index=df.columns)
ax = sns.heatmap(df1, annot=True, cmap="BuPu")
ax.yaxis.set_tick_params(labelsize=15)
plt.title("Factor Analysis", fontsize="xx-large")
plt.ylabel("Sepal Width", fontsize="xx-large")
plt.show()Transforming to New Variables
After confirming two factors are appropriate, transform the original data into two new features.
df2 = pd.DataFrame(faa_two.transform(df))Reference: 洋洋菜鸟 因子分析——python https://blog.csdn.net/qq_25990967/article/details/122566533
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
