Kendall’s τ (tau)

Kendall’s \(τ\) (tau)#

Kendall’s \(\tau\) is another non-parametric correlation measure, but instead of ranks it uses pairwise concordance.

  • Pairwise concordance means for every possible pair of data points you classify them as concordant if they move in the same direction, and discordant if they move in different directions. So:

    • Concordant pairs move in the same direction:

      • \(x_{i} < x_{j}\), and \(y_{i} < y_{j}\), \(x_{i} > x_{j}\), and \(y_{i} > y_{j}\)

    • Discordant pairs move in opposite directions:

      • \(x_{i} < x_{j}\), and \(y_{i} > y_{j}\), \(x_{i} > x_{j}\), and \(y_{i} < y_{j}\)

Kendall’s \(\tau\) is ideal for smaller sample sizes, and is better about handling tied ranks than Spearman’s \(\rho\).

Think of it as estimating the difference in probability that two variables are in the same order versus the opposite order.

Assumptions

  1. At least ordinal level data.

  2. Monotonic relationship.

As you can see again, linearity and approximate normality are not required for the data set.

\[ \tau = \frac{\text{# concordant pairs} - \text{# discordant pairs}}{\binom{n}{2}} \]
  • \(\binom{n}{2}\) is the number of unique pairs. You can also represent it as \(\frac{n(n-1)}{2}\)

How to Interpret

τ Value Range

Description

\(\tau = 0\)

No association

\(0 < \tau < 0.3\)

Weak positive association

\(-0.3 < \tau < 0\)

Weak negative association

\(0.3 \leq \tau < 0.7\)

Moderate association

\(\tau \geq 0.7\)

Strong association

Python Code Example#

# Import
import numpy as np
import pandas as pd

# Correlation coefficient
import scipy.stats as stats

# Visualizations
import matplotlib.pyplot as plt
import seaborn as sns
# Load Planets dataset from Seaborn - drop missing variables and choose a smaller sample
planets = sns.load_dataset('planets').dropna().sample(100, random_state=42)
df = pd.DataFrame(planets)

# View some details
df.head()
method number orbital_period mass distance year
632 Radial Velocity 3 8.1352 0.0360 25.87 2011
142 Radial Velocity 4 61.1166 2.2756 4.70 1998
351 Radial Velocity 1 6.2760 1.9000 58.82 2001
294 Radial Velocity 1 388.0000 9.1000 20.98 2005
358 Radial Velocity 1 1260.0000 3.0600 53.05 2008
# Skew and kurtosis
pd.DataFrame(data={'skew': df.iloc[:, 2:5].skew(), 'kurtosis': df.iloc[:, 2:5].kurtosis()}).style.background_gradient(cmap='coolwarm')
  skew kurtosis
orbital_period 6.755347 55.439942
mass 2.016572 3.960250
distance 2.960721 10.280670
# Pairplot
sns.pairplot(df.iloc[:, 2:5], corner=True, kind='reg', plot_kws={'line_kws': {'color': 'black'}})

plt.tight_layout()
plt.show()
../../_images/d4e17e21552f773cc4a676a3fa0a7b9932e369c75efdfcccb6f2ced7bc71e95d.png
### Calculate Kendall's tau individuall
for col in df.iloc[:, 2:5].columns:
  # Exclude orbital_period
  if col != 'orbital_period':
    # Calculate Kendall's tau
    tau, p = stats.kendalltau(df['orbital_period'], df[col])
    # Output display
    print(f'Kendall\'s tau between Orbital Period and {col}: {round(tau,2)} | p-value: {p}')
Kendall's tau between Orbital Period and mass: 0.35 | p-value: 3.5840403947937143e-07
Kendall's tau between Orbital Period and distance: 0.12 | p-value: 0.08302418600209442
### Seaborn heatmap version
planets_corr = df.iloc[:, 2:5].corr(method='kendall')

mask = np.triu(np.ones_like(planets_corr, dtype=bool))

plt.figure(figsize=(6,4))
sns.heatmap(planets_corr, mask=mask, annot=True, fmt='.2g', cmap='coolwarm', linewidths=1)
plt.title('Correlation Matrix')
plt.tight_layout()
../../_images/40ee7f5888dbafa40a0b2f08e8aee1794b31780712a207e8a993a30b9897f8d2.png