25 Feb Pearson and Spearman Correlations in R
In a previous post we explained the concept of correlation between two variables and specifically discussed for Spearman and Pearson correlations. Here, we will have a practical example using R language.
With the following script we create a scatter-plot derived from two columns of the mtcars dataset depicting the relationship of weight(wt) and miles per gallon(mpg).
> x <- mtcars$wt > y <- mtcars$mpg #scatter plot > plot(x, y, main = "wt VS mpg", xlab = "wt", ylab = "mpg", pch = 18, frame = FALSE) > abline(lm(y ~ x, data = mtcars), col = "red")
Pearson correlation is a parametric correlation and can be used only when x, y come from a normal distribution. In our example, we can test this assumption using the Shapiro-Wilk normality test.
# Shapiro-Wilk normality test for wt > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.94326, p-value = 0.09265 # Shapiro-Wilk normality test for mpg > shapiro.test(y) Shapiro-Wilk normality test data: y W = 0.94756, p-value = 0.1229
In this test, the null hypothesis is that the data come from a normal distribution. In case when p-value < 0.05 we can reject the null hypothesis and accept the alternative one. Here, for both x and y we accept the null hypothesis and x,y are normally distributed.
Calculating the Pearson and Spearman correlations with the following lines, we have:
#pearson > cor(x,y,method = "pearson")  -0.8676594 #spearman > cor(x,y,method = "spearman")  -0.886422
Both of these metrics indicate strong correlation between weight and mpg variables of the mtcars dataset. Spearman has a slightly higher value since it captures the monotonic relationship and not strictly the linear relationship.