Assumptions of Linear Regression
Previously we made a brief introduction on linear regression theory. In this post we will discuss about the assumptions that must be true when constructing a linear regression model.
The first obvious assumption is the linear relationship between the dependent (response) variable and the independent variables. An easy way to check this is with a scatter plot.
A second assumption is absence of multicollinearity. This means that the independent variables must not be correlated with each other. One way to check for multicolinearity is to use a correlation matrix or a variance inflation factor.
Another assumption is homoscedasticity. In other words the residuals are the same accross the values of the independent variables. A scatter plot of residuals against predicted values is enough to check for homoscedasticity; a cone-shaped pattern is a sign of heteroscdasticity.
Also, the residuals must be normally distributed. This assumption can be checked with a Q-Q plot or by drawing a histogram from the residual values. Statistical test like the Kolmogorov-Smirnov test on residuals is an option too.