Fundamentals 10 min read

Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python

Factor analysis, a statistical technique for uncovering underlying common factors among variables, is explained alongside its distinction from PCA, detailed procedural steps, adequacy tests, and a hands‑on Python implementation using the factor_analyzer library with visualizations and factor rotation methods.

Model Perspective
Model Perspective
Model Perspective
Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python

Factor Analysis

Factor analysis is a common statistical method used to explore relationships among multiple variables, identify common factors, and combine them into fewer dimensions (factors) to explain variance.

For example, a questionnaire on personal health may contain variables such as weight, height, exercise frequency, and diet habits, which can be grouped into factors like "physical health" and "dietary habits".

Relationship with Principal Component Analysis (PCA)

Both are dimensionality‑reduction techniques, but their goals differ. PCA seeks linear combinations that retain maximal variance, while factor analysis focuses on uncovering latent common factors that explain correlations among observed variables.

Steps of Factor Analysis

The main steps are:

Standardize the data sample.

Compute the correlation matrix R.

Obtain eigenvalues and eigenvectors of R.

Determine the number of factors based on cumulative contribution.

Calculate the factor loading matrix A.

Finalize the factor model.

factor_analyzer Library

The core Python library for factor analysis is factor_analyzer . Install it with pip install factor_analyzer . It provides functions for PCA, minimum residual, maximum likelihood, factor rotation, factor scores, and methods to retrieve loadings, communalities, variance, and eigenvalues.

<code># Data processing
import pandas as pd
import numpy as np

# Plotting
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# Factor analysis
from factor_analyzer import FactorAnalyzer
</code>

Load the dataset:

<code>df = pd.read_excel('data/grades2.xlsx', index_col=0).iloc[:, -3]
df = df.dropna()
df.head()
</code>

Sufficiency Tests

Before performing factor analysis, adequacy tests such as Bartlett’s test of sphericity and the KMO test assess whether variables are sufficiently correlated.

Bartlett’s Test

<code>from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity

chi_square_value, p_value = calculate_bartlett_sphericity(df)
chi_square_value, p_value
</code>

If the matrix is not an identity matrix, variables are correlated and factor analysis is appropriate.

KMO Test

<code># KMO test
from factor_analyzer.factor_analyzer import calculate_kmo
kmo_all, kmo_model = calculate_kmo(df)
kmo_model
</code>

A KMO value above 0.6 (e.g., 0.885) indicates sufficient correlation for factor analysis.

Choosing the Number of Factors

Compute eigenvalues of the correlation matrix and plot them (scree plot) to decide how many factors to retain.

<code>faa = FactorAnalyzer(25, rotation=None)
faa.fit(df)
ev, v = faa.get_eigenvalues()
ev, v
</code>

Visualize the eigenvalues:

<code># Scatter and line plot of eigenvalues
plt.scatter(range(1, df.shape[1] + 1), ev)
plt.plot(range(1, df.shape[1] + 1), ev)
plt.title("Scree Plot")
plt.xlabel("Factors")
plt.ylabel("Eigenvalue")
plt.grid()
plt.show()
</code>

Factor Rotation

Building the Factor Model

Here we choose varimax (maximum variance) rotation with two factors.

<code># Choose varimax rotation with 2 factors
faa_two = FactorAnalyzer(2, rotation='varimax')
faa_two.fit(df)
faa_two.get_communalities()
</code>

Other rotation options include promax, oblimin, oblimax, quartimin, quartimax, and equamax.

Inspecting Factor Variance

<code>faa_two.get_factor_variance()
</code>

Visualizing Loadings

<code>df1 = pd.DataFrame(np.abs(faa_two.loadings_), index=df.columns)
ax = sns.heatmap(df1, annot=True, cmap="BuPu")
ax.yaxis.set_tick_params(labelsize=15)
plt.title("Factor Analysis", fontsize="xx-large")
plt.ylabel("Sepal Width", fontsize="xx-large")
plt.show()
</code>

Transforming to New Variables

After confirming two factors are appropriate, transform the original data into two new features.

<code>df2 = pd.DataFrame(faa_two.transform(df))
</code>

Reference: 洋洋菜鸟 因子分析——python https://blog.csdn.net/qq_25990967/article/details/122566533

pythonstatisticsdata preprocessingdimensionality reductionfactor analysisfactor_analyzer
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.