One Sample Statistical Test
In these series of posts we will discuss about hypothesis testsing-an important component of statistical analysis when building econometrics models.
Student t-test is one of the most common ways of comparing means fo samples. It is used to evaluate whether the two samples are significantly different from one each other by comparing the means of the samples. Specifically, one sample t-test is used to compare the mean of a sample (m) with a theoretical mean(μ) (the theoretical mean usually comes from previous experiments done to answer a similar research problem).
The common research questions the one sample t-test attempts to answer are the followings:
- Is the sample mean equal to the theoretical mean?
- Is the sample mean larger than the theoretical mean?
- Is the sample mean less than the theoretical mean?
Therefore, the one-sample t-test constructs the following hypotheses in order to answer the afforementioned research questions. The null hypothesis is as follows;
- H0: m=μ (two-tailed)
- H0: m≥μ (one-tailed)
- Ho: m≤μ (one-tailed)
The respective alternative hypothesis is constructed as the following;
- H0: m≠μ
- Η0: m<μ
- Η0: m>μ
The formula for the t-statistic can be calculated as follows:
where m: the sample mean, μ: the theoretical value, s: the sample standart deviation with n-1 degrees of freedom and n: the sample size.
As a parametric procedure, the one-sample t-test makes several assumptions. The main of them are the following;
- The dependent variable should be approximately normally distributed. One can verify this with many ways: inspecting the Q-Q plot or implementing the Shapiro-Wilk test are some of them.
- The observations are independent. This can be reasonably assumed if the data collection process was random without replacement.
- The dependent variable should not contain any extreme values (outliers) and should be continuous. Assessing whether a variable contains outliers can be very tricky and depends on the nature of the observed data. However some standard procedures exist such as Box-plot analysis.
After calculating the sample mean and standard deviation as well as the test statistic, the p-value (i.e. the probability of observing the test statistic or a more extreme value, under the null hypothesis) can be calculated as follows;
- p = 2 ⋅ Pr(T > |t|) (two-tailed)
- p = Pr(T > t) (upper-tailed)
- p = Pr(T < t) (lower-tailed)
As a rule of thumb, a large p-value indicates a strong probabbility that a similar t-statistic can be observed. Thus the null hypothesis can be supported by the statostical test. On the other hand, a small p-value indicates a decreased support for the null hypothesis. However, the possibility that a very rare value is obtained and that the null hypothesis is true can never be ruled out completely. The threshold for determining statistical significance is usually a value of 0.05 or less.