Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python
Factor analysis, a statistical technique for uncovering underlying common factors among variables, is explained alongside its distinction from PCA, detailed procedural steps, adequacy tests, and a hands‑on Python implementation using the factor_analyzer library with visualizations and factor rotation methods.
Factor Analysis
Factor analysis is a common statistical method used to explore relationships among multiple variables, identify common factors, and combine them into fewer dimensions (factors) to explain variance.
For example, a questionnaire on personal health may contain variables such as weight, height, exercise frequency, and diet habits, which can be grouped into factors like "physical health" and "dietary habits".
Relationship with Principal Component Analysis (PCA)
Both are dimensionality‑reduction techniques, but their goals differ. PCA seeks linear combinations that retain maximal variance, while factor analysis focuses on uncovering latent common factors that explain correlations among observed variables.
Steps of Factor Analysis
The main steps are:
Standardize the data sample.
Compute the correlation matrix R.
Obtain eigenvalues and eigenvectors of R.
Determine the number of factors based on cumulative contribution.
Calculate the factor loading matrix A.
Finalize the factor model.
factor_analyzer Library
The core Python library for factor analysis is factor_analyzer . Install it with pip install factor_analyzer . It provides functions for PCA, minimum residual, maximum likelihood, factor rotation, factor scores, and methods to retrieve loadings, communalities, variance, and eigenvalues.
<code># Data processing
import pandas as pd
import numpy as np
# Plotting
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# Factor analysis
from factor_analyzer import FactorAnalyzer
</code>Load the dataset:
<code>df = pd.read_excel('data/grades2.xlsx', index_col=0).iloc[:, -3]
df = df.dropna()
df.head()
</code>Sufficiency Tests
Before performing factor analysis, adequacy tests such as Bartlett’s test of sphericity and the KMO test assess whether variables are sufficiently correlated.
Bartlett’s Test
<code>from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi_square_value, p_value = calculate_bartlett_sphericity(df)
chi_square_value, p_value
</code>If the matrix is not an identity matrix, variables are correlated and factor analysis is appropriate.
KMO Test
<code># KMO test
from factor_analyzer.factor_analyzer import calculate_kmo
kmo_all, kmo_model = calculate_kmo(df)
kmo_model
</code>A KMO value above 0.6 (e.g., 0.885) indicates sufficient correlation for factor analysis.
Choosing the Number of Factors
Compute eigenvalues of the correlation matrix and plot them (scree plot) to decide how many factors to retain.
<code>faa = FactorAnalyzer(25, rotation=None)
faa.fit(df)
ev, v = faa.get_eigenvalues()
ev, v
</code>Visualize the eigenvalues:
<code># Scatter and line plot of eigenvalues
plt.scatter(range(1, df.shape[1] + 1), ev)
plt.plot(range(1, df.shape[1] + 1), ev)
plt.title("Scree Plot")
plt.xlabel("Factors")
plt.ylabel("Eigenvalue")
plt.grid()
plt.show()
</code>Factor Rotation
Building the Factor Model
Here we choose varimax (maximum variance) rotation with two factors.
<code># Choose varimax rotation with 2 factors
faa_two = FactorAnalyzer(2, rotation='varimax')
faa_two.fit(df)
faa_two.get_communalities()
</code>Other rotation options include promax, oblimin, oblimax, quartimin, quartimax, and equamax.
Inspecting Factor Variance
<code>faa_two.get_factor_variance()
</code>Visualizing Loadings
<code>df1 = pd.DataFrame(np.abs(faa_two.loadings_), index=df.columns)
ax = sns.heatmap(df1, annot=True, cmap="BuPu")
ax.yaxis.set_tick_params(labelsize=15)
plt.title("Factor Analysis", fontsize="xx-large")
plt.ylabel("Sepal Width", fontsize="xx-large")
plt.show()
</code>Transforming to New Variables
After confirming two factors are appropriate, transform the original data into two new features.
<code>df2 = pd.DataFrame(faa_two.transform(df))
</code>Reference: 洋洋菜鸟 因子分析——python https://blog.csdn.net/qq_25990967/article/details/122566533
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.