Fundamentals 6 min read

Understanding Pearson, Spearman, and Kendall Correlation Coefficients with Pandas

Learn how Pearson, Spearman, and Kendall correlation coefficients measure linear and monotonic relationships between variables, explore their mathematical properties, interpret their value ranges, and see practical Python examples using Pandas to compute each coefficient on generated data.

Model Perspective

Aug 27, 2022

Understanding Pearson, Spearman, and Kendall Correlation Coefficients with Pandas

Pearson Correlation Coefficient

Consider a dataset with two features x and y, each having n values, forming n pairs (x_i, y_i). The Pearson correlation coefficient measures the linear relationship between the two features. It is the covariance of x and y divided by the product of their standard deviations, usually denoted by r.

Here, μ_x and μ_y denote the means of x and y. The formula shows that if larger x values tend to correspond to larger y values, r is positive; if larger x values tend to correspond to smaller y values, r is negative. Important facts about Pearson's r:

The coefficient can take any real value in the range [-1, 1]. The maximum value +1 corresponds to a perfect positive linear relationship.

A value of 0 indicates no linear correlation.

A value of -1 indicates a perfect negative linear relationship.

In short, the larger the absolute value of r, the stronger the linear correlation; the closer to zero, the weaker.

Spearman Correlation Coefficient

The Spearman correlation coefficient is the Pearson correlation applied to the rank values of the two features. It uses the ranks instead of the raw values and is denoted by ρ (rho). For two tuples (x_i, y_i), compute the ranks of x and y and then apply the Pearson formula.

It ranges between -1 and 1.

The maximum value +1 corresponds to a monotonic increasing relationship.

The minimum value -1 corresponds to a monotonic decreasing relationship.

Kendall Correlation Coefficient

Consider two tuples (x_i, y_i). Each pair of observations can be concordant, discordant, or tied. The Kendall tau coefficient compares the number of concordant and discordant pairs relative to the total number of pairs. It is denoted by τ.

It can take any real value in the range [-1, 1].

The maximum value +1 occurs when all pairs are concordant.

The minimum value -1 occurs when all pairs are discordant.

Calculating Correlations with Pandas

Generate data and plot

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

xarray = np.linspace(0, 10, 100)  # generate 100 numbers from 0 to 10
yarray = xarray**3 + np.random.normal(0, 100, 100)  # y = x^3 + normal noise

plt.scatter(xarray, yarray)
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

Convert to Pandas Series

xseries = pd.Series(xarray)  # convert to Series
yseries = pd.Series(yarray)

Compute correlations

Pearson

xseries.corr(yseries, method='pearson')  # Pearson correlation

Result: 0.840850116329609

Spearman

xseries.corr(yseries, method='spearman')  # Spearman correlation

Result: 0.8455325532553255

Kendall

xseries.corr(yseries, method='kendall')  # Kendall correlation

Result: 0.6755555555555557

Reference: Data STUDIO https://mp.weixin.qq.com/s/3XR2_0Mca50-rZO9ZRAzuA

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics data analysis correlation pandas kendall pearson Spearman

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.