Fundamentals 10 min read

Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python

Factor analysis, a statistical technique for uncovering underlying common factors among variables, is explained alongside its distinction from PCA, detailed procedural steps, adequacy tests, and a hands‑on Python implementation using the factor_analyzer library with visualizations and factor rotation methods.

Model Perspective

Mar 3, 2023

Factor Analysis

Factor analysis is a common statistical method used to explore relationships among multiple variables, identify common factors, and combine them into fewer dimensions (factors) to explain variance.

For example, a questionnaire on personal health may contain variables such as weight, height, exercise frequency, and diet habits, which can be grouped into factors like "physical health" and "dietary habits".

Relationship with Principal Component Analysis (PCA)

Both are dimensionality‑reduction techniques, but their goals differ. PCA seeks linear combinations that retain maximal variance, while factor analysis focuses on uncovering latent common factors that explain correlations among observed variables.

Steps of Factor Analysis

The main steps are:

Standardize the data sample.

Compute the correlation matrix R.

Obtain eigenvalues and eigenvectors of R.

Determine the number of factors based on cumulative contribution.

Calculate the factor loading matrix A.

Finalize the factor model.

factor_analyzer Library

The core Python library for factor analysis is factor_analyzer. Install it with pip install factor_analyzer. It provides functions for PCA, minimum residual, maximum likelihood, factor rotation, factor scores, and methods to retrieve loadings, communalities, variance, and eigenvalues.

# Data processing
import pandas as pd
import numpy as np

# Plotting
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# Factor analysis
from factor_analyzer import FactorAnalyzer

Load the dataset:

df = pd.read_excel('data/grades2.xlsx', index_col=0).iloc[:, -3]
df = df.dropna()
df.head()

Sufficiency Tests

Before performing factor analysis, adequacy tests such as Bartlett’s test of sphericity and the KMO test assess whether variables are sufficiently correlated.

Bartlett’s Test

from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity

chi_square_value, p_value = calculate_bartlett_sphericity(df)
chi_square_value, p_value

If the matrix is not an identity matrix, variables are correlated and factor analysis is appropriate.

KMO Test

# KMO test
from factor_analyzer.factor_analyzer import calculate_kmo
kmo_all, kmo_model = calculate_kmo(df)
kmo_model

A KMO value above 0.6 (e.g., 0.885) indicates sufficient correlation for factor analysis.

Choosing the Number of Factors

Compute eigenvalues of the correlation matrix and plot them (scree plot) to decide how many factors to retain.

faa = FactorAnalyzer(25, rotation=None)
faa.fit(df)
ev, v = faa.get_eigenvalues()
ev, v

Visualize the eigenvalues:

# Scatter and line plot of eigenvalues
plt.scatter(range(1, df.shape[1] + 1), ev)
plt.plot(range(1, df.shape[1] + 1), ev)
plt.title("Scree Plot")
plt.xlabel("Factors")
plt.ylabel("Eigenvalue")
plt.grid()
plt.show()

Factor Rotation

Building the Factor Model

Here we choose varimax (maximum variance) rotation with two factors.

# Choose varimax rotation with 2 factors
faa_two = FactorAnalyzer(2, rotation='varimax')
faa_two.fit(df)
faa_two.get_communalities()

Other rotation options include promax, oblimin, oblimax, quartimin, quartimax, and equamax.

Inspecting Factor Variance

faa_two.get_factor_variance()

Visualizing Loadings

df1 = pd.DataFrame(np.abs(faa_two.loadings_), index=df.columns)
ax = sns.heatmap(df1, annot=True, cmap="BuPu")
ax.yaxis.set_tick_params(labelsize=15)
plt.title("Factor Analysis", fontsize="xx-large")
plt.ylabel("Sepal Width", fontsize="xx-large")
plt.show()

Transforming to New Variables

After confirming two factors are appropriate, transform the original data into two new features.

df2 = pd.DataFrame(faa_two.transform(df))

Reference: 洋洋菜鸟因子分析——python https://blog.csdn.net/qq_25990967/article/details/122566533

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python statistics Data preprocessing factor analysis factor_analyzer

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.