Pearson and Spearman Correlations in R

Pearson and Spearman Correlations in R

In a previous post we explained the concept of correlation between two variables and specifically discussed for Spearman and Pearson correlations. Here, we will have a practical example using R language.

With the following script we create a scatter-plot derived from two columns of the mtcars dataset depicting the relationship of weight(wt) and miles per gallon(mpg).

> x <- mtcars$wt
> y <- mtcars$mpg

#scatter plot
> plot(x, y, main = "wt VS mpg",
     xlab = "wt", ylab = "mpg",
     pch = 18, frame = FALSE)
> abline(lm(y ~ x, data = mtcars), col = "red")

Assumptions

Pearson correlation is a parametric correlation and can be used only when x, y come from a normal distribution. In our example, we can test this assumption using the Shapiro-Wilk normality test.

# Shapiro-Wilk normality test for wt
> shapiro.test(x)

	Shapiro-Wilk normality test

data:  x
W = 0.94326, p-value = 0.09265

# Shapiro-Wilk normality test for mpg
> shapiro.test(y)

	Shapiro-Wilk normality test

data:  y
W = 0.94756, p-value = 0.1229

In this test, the null hypothesis is that the data come from a normal distribution. In case when p-value < 0.05 we can reject the null hypothesis and accept the alternative one. Here, for both x and y we accept the null hypothesis and x,y are normally distributed.

Correlations

Calculating the Pearson and Spearman correlations with the following lines, we have:

#pearson
> cor(x,y,method = "pearson")
[1] -0.8676594

#spearman
> cor(x,y,method = "spearman") 
[1] -0.886422

Both of these metrics indicate strong correlation between weight and mpg variables of the mtcars dataset. Spearman has a slightly higher value since it captures the monotonic relationship and not strictly the linear relationship.

No Comments

Sorry, the comment form is closed at this time.

× How can we help you?