Fundamentals 10 min read

Master Factor Analysis in Python: From Theory to Practical Implementation

This article explains the origins and core concepts of factor analysis, outlines its algorithmic steps, demonstrates how to perform the analysis using Python's factor_analyzer library—including data preparation, adequacy tests, eigenvalue selection, rotation, and visualization—culminating in extracting new latent variables.

Model Perspective

Sep 1, 2022

Master Factor Analysis in Python: From Theory to Practical Implementation

Origin

Factor analysis originated in 1904 when a British psychologist observed strong correlations among students' English, French, and classical language scores and hypothesized a common underlying factor, which he called "language ability". This insight led to the definition of factor analysis as a method for uncovering hidden common factors behind correlated variables.

Basic Idea

The basic idea is illustrated with a student who scores perfectly in mathematics, physics, chemistry, and biology, suggesting a strong "rational thinking" factor that drives high scores in science subjects. Factor analysis assumes that observed variables are generated by one or more latent variables (factors) that cannot be measured directly.

It reduces a set of correlated variables to a smaller number of factors that represent the original variables and can be used for classification.

Algorithm Uses

Factor analysis, similar to principal component analysis, aims to describe hidden, unobservable variables underlying a set of measured variables and can be used for comprehensive evaluation.

By exploiting correlations among indicators, it infers latent common factors that jointly influence the indicators, reducing the number of variables while preserving essential information.

Steps of Factor Analysis

Standardize the data sample.

Compute the correlation matrix R.

Obtain eigenvalues and eigenvectors of R.

Determine the number of principal factors based on cumulative contribution.

Calculate the factor loading matrix A.

Finalize the factor model.

factor_analyzer Library

The core Python library for factor analysis is factor_analyzer, which provides two main modules: factor_analyzer.analyze (key module)

factor_analyzer.factor_analyzer

Detailed Example

Using a student grades dataset, the following code demonstrates the workflow.

# Data processing
import pandas as pd
import numpy as np

# Plotting
import seaborn as sns
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']  # Set default font
plt.rcParams['axes.unicode_minus'] = False  # Fix minus sign display

# Factor analysis
from factor_analyzer import FactorAnalyzer

Load the data:

df = pd.read_excel('data/grades2.xlsx', index_col=0).iloc[:, :-3]
df = df.dropna()
df.head()

Adequacy Tests

Before performing factor analysis, test the adequacy of the correlation matrix.

Bartlett's Sphericity Test

from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi_square_value, p_value = calculate_bartlett_sphericity(df)
chi_square_value, p_value

Result: (638.4879, 2.33e-126). A non‑unit matrix indicates sufficient correlation for factor analysis.

KMO Test

from factor_analyzer.factor_analyzer import calculate_kmo
kmo_all, kmo_model = calculate_kmo(df)
kmo_model

Result: 0.8849 (>0.6), confirming suitability for factor analysis.

Selecting the Number of Factors

Compute eigenvalues of the correlation matrix and sort them in descending order.

Eigenvalues and Eigenvectors

faa = FactorAnalyzer(25, rotation=None)
faa.fit(df)
# Get eigenvalues (ev) and eigenvectors (v)
ev, v = faa.get_eigenvalues()
ev, v

Eigenvalues: [3.7605, 0.7315, 0.4438, 0.3891, 0.3708, 0.3043]

Visualization

# Scatter and line plot of eigenvalues
plt.scatter(range(1, df.shape[1] + 1), ev)
plt.plot(range(1, df.shape[1] + 1), ev)
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigenvalue')
plt.grid()
plt.show()

Factor Rotation

Building the Factor Model

Choose varimax (maximum variance) rotation with two factors.

# Choose varimax rotation with 2 factors
faa_two = FactorAnalyzer(2, rotation='varimax')
faa_two.fit(df)
# Communalities (shared variance)
faa_two.get_communalities()

Communalities: [0.5189, 0.6104, 0.6212, 0.6098, 0.6657, 0.6816]

Other rotation options include: varimax, promax, oblimin, oblimax, quartimin, quartimax, equamax.

Factor Variance

faa_two.get_factor_variance()

This returns total variance, proportional variance, and cumulative variance for each factor.

Visualizing Latent Variables

Heatmap of absolute factor loadings to see which variables relate strongly to each latent factor.

df1 = pd.DataFrame(np.abs(faa_two.loadings_), index=df.columns)
ax = sns.heatmap(df1, annot=True, cmap="BuPu")
ax.yaxis.set_tick_params(labelsize=15)
plt.title('Factor Analysis', fontsize='xx-large')
plt.ylabel('Feature', fontsize='xx-large')
plt.show()

Transforming to New Variables

Convert the original data into the two extracted factors.

df2 = pd.DataFrame(faa_two.transform(df))

The resulting table shows the scores for each observation on the two latent factors.

Reference

Author: 洋洋菜鸟 – https://blog.csdn.net/qq_25990967/article/details/122566533

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python statistics data science dimensionality reduction factor analysis

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.