Nonparametric methods provide statistical tools that can be used when the usual assumptions of normality and large sample size are violated. When stating hypothesis tests in the nonparametric framework, it is common for the null to be that all the population distributions are identical; however, when the population distributions all have the same shape except for a difference in their location, such hypothesis reduce to the claim that the populations have different means or medians. So while the methods apply to very general populations, in practice the hypothesis are often still about parameters. Because we make fewer assumptions about the population distributions when using nonparametric methods, the tests are less powerful than the corresponding parametric tests when the normality or large sample size assumption holds. Thus, the researcher is less likely to reject the null hypothesis when it is false, if the data comes from a normally distributed population.
Nonparametric test often rely on ranks because under the hypothesis that all the populations are identically distributed, the distributions of the test statistics based on these ranks can be determined. Each of the test statistics in this section is based on ranks and has a distribution that does not depend on that of the unknown population distribution. Ranks are simply scores assigned to the pooled data that indicate their relative position in an ordered list. As an example of what is meant by ranks, suppose \(x_1=3, x_2=7, x_3=1, x_4=12\), and \(y_1=5, y_2=10, y_3=7, y_4=15\). First, we order the data from smallest to largest: 1, 3, 5, 7, 7, 10, 12, 15. Then, we assign the lowest value a rank of 1, the next lowest value a rank of 2, and so on. When there are ties in the sample, the recommended procedure is to assign the mean rank of each tied observation. So for the \(x\)-values the ranks are 1, 2, 4.5, 7, and for \(y\)-values the ranks are 3, 4.5, 6, 8.
As indicated in Flowchart D, for nonparametric analysis for two groups the Mann-Whitney U test, the Sign Test and the Wilcoxon Signed Rank Test should be used. For more than two groups, the Kruskal-Wallis test should be used. For the case of one group, which is not included in the flowchart, bootstrapping is recommended.
Note: The Mann-Whitney U test is also called Wilcoxon Rank-Sum Test.
Assumptions:
1) Independent random samples.
2) Population distributions for two groups have identical shapes but potentially different locations (means or medians).
The test is used as a replacement for the t-test when the populations are not normally distributed or the sample sizes are not greater than 30. If these assumptions are violated, the researcher can perform the Mann-Whitney U Test that has less stringent assumptions. Since the shapes of the population distributions for each of the two groups are assumed to be the same and the only distinguishing feature is location, the test can be used to determine whether two independent samples have the same mean or median.
Process:
Effects of Grinding of Dental Restorations on Tooth Color: 24 specimens of a composite dental material stained in target color A2 are grinded down by 100 micrometers. 12 of the specimens are made from a composite material produced by company A, and the other 12 from a composite from company B. The color of each specimen is measured before and after grinding, and the ΔE00 coefficient is calculated to measure the color loss as a result of grinding. The researcher is interested in comparing the mean color loss for the dental materials among the two companies. However, the data is not normally distributed and the sample size is less than 30, so a t-test should not be performed. Instead, the researcher performs the Mann-Whitney U test that does not require normality, and only assumes that the two groups are independent and have population distributions with the same shape. Independence is a reasonable assumption in this case as the dental materials are produced by two different companies with different manufacturing techniques. A qqplot pairing the quantiles of these two groups has a strong linear trend, which is consistent with the assumption of the population distributions having the same shape.
The Mann-Whitney U tests the null hypothesis that there is no difference between color loss (ΔE00) for group A compared to group B. Since we assumed the same shape, this is really a test for a difference in means or medians of ΔE00. A Mann-Whitney U test is performed at level 0.05, and the researcher calculates that U=0 (output in R is written as W=0) and the p-value is 7.396E-07. The researcher rejects the null hypothesis and concludes that there was a significant difference between the color loss ΔE00 for company A compared to company B.
Assumptions:
1) Paired observations.
2) Population distributions are not discrete.
The test is used as a replacement for the paired t-test when the populations are not normally distributed or the sample sizes are not greater than 30. While the paired t-test compares means of two groups, the Sign Test compares medians of two groups by analyzing the signs of the differences between the matched pairs from those groups.
Process:
If the difference between a matched pair is zero, it has neither a positive nor a negative sign, so the researcher should delete those observations and reduce the sample size \(n\) for the test. Another approach that ensures a higher power for the test, is as follows:
1) If the number of zeros is even, the researcher should use a random number generator to assign the zero differences positive or negative signs.
2) If the number of zeros is odd, the researcher should randomly drop one, and then follow the procedure above for an even number of zeros producing \(n-1\) total signs.
Assumptions:
1) Paired observations.
2) Population distribution of differences is symmetric.
Another nonparametric test for the matched pairs design is the Wilcoxon Signed Rank Test, which replaces the t-test when the assumptions of normality and large sample size are violated. Compared to the Sign Test, the Wilcoxon Signed Rank Test extracts more information from the data by incorporating both the signs and the magnitudes of the differences in the analysis.
Process:
Comparing Results of Two Teaching Techniques: 10 dentistry students are learning to assess the color of natural teeth. At the end of a semester-long course, the 10 students are asked to assess color of a specific tooth specimen. The color of this tooth was measured by a spectroradiometer and the difference between the color indicated by the students and the true color of the restoration is calculated using the \(\Delta E_{00}\) coefficient. The 10 students are then asked to complete an additional 1-week workshop that trains them how to recognize color, and at the end of the week students are once again tested and the \(\Delta E_{00}\) coefficient between the student’s color choice and that measured by the spectroradiometer is recorded. The researcher is interested in studying if there is a significant difference between the students’ responses before and after the workshop. Since the measurements are taken on the same students, there is dependence between the subjects of the two groups so the Sign Test and the Wilcoxon Signed Rank Test can be performed. Each matched pair is formed using the \(\Delta E_{00}\) scores before and after the workshop for the same student.
The Sign Test yields a test statistic S=8 which has a p-value of 0.1094 in the two-sided case, thus the researcher does not reject the null hypothesis of equal medians at level 0.05. There is no significant difference between the medians of the \(\Delta E_{00}\) scores before and after the workshop. On the other hand, the Wilcoxon Signed Rank Test yields a test statistics of \(W=51\) which has a p-value of 0.01367 in the two-sided case. Thus according to this test, the researcher rejects the null hypothesis that the difference between the median \(\Delta E_{00}\) scores before and after the workshop is zero. The example illustrates that the Wilcoxon Signed Rank Test has more power than the Sign Test.
The Sign Test can be used in great generality, no matter the shapes of the population distributions so long as they are not discrete. However, the power of the Sign Test when applied to matched pairs is lower than that of the Wilcoxon Signed Rank Test, but in order to use the latter test, the distribution of the differences must be symmetric. For the Wilcoxon Signed Rank Test, the individual distribution of the populations need not be symmetric, it is only the differences that must have a symmetric distribution. This will be the case for example if the two population distributions have the same shape and only differ in terms of their location (mean or median).
Assumptions:
1) Independent random samples.
2) Population distributions for the k groups have identical shapes but potentially different locations (means or medians).
The test is used as a replacement for the analysis of variance F-test when the populations are not normally distributed or the sample sizes are not greater than 30. If these assumptions are violated, the researcher can perform the Kruskal-Wallis Test that has less stringent assumptions. Since the population distribution is assumed to have the same shape for each of the k groups, the only distinguishing feature is location, so the test can be used to determine whether k independent samples have the same mean or median.
Process:
Effects of Grinding of Dental Restorations on Tooth Color: 48 specimens of a composite dental material stained in target color A2 are grinded down by 100 micrometers. 12 of the specimens are made from a composite material produced by company A from a white composite (group 1), 12 from a pre-colored composite from company A (group 2), 12 from a pz-composite from company B (group 3), and the remaining 12 specimens are from a pa-composite from company B (group 4). The color of each specimen is measured before and after grinding, and the ΔE00 coefficient is calculated to measure the color loss as a result of grinding. The researcher is interested in comparing the mean color loss for the dental materials among the four groups. However, the data is not normally distributed and the sample size is less than 30, so a classic ANOVA test should not be performed. Instead, the researcher performs the Kruskall-Wallis test that does not require normality, and only assumes that the four groups are independent and have population distributions with the same shape. Independence is a reasonable assumption in this case as the dental materials are produced by two different companies for different composite materials with different manufacturing techniques. The Kruskal-Wallis tests the null hypothesis that there is no difference between color loss (ΔE00) among groups 1, 2, 3, and 4. The Kruskal-Wallis test with level 0.05 is performed and the researcher calculates Kruskal-Wallis H = 38.062 with an approximate p-value = 2.742E-08 calculated using a \(\chi^2\)-distribution with 3 degrees of freedom. Since H = 38.062 is greater than the \(\chi^2\) critical value of 7.815 for 3 degrees of freedom, and p is much less than 0.05, the researcher rejects the null hypothesis and concludes that there is a significant difference between the color loss ΔE00 among the 4 groups. Since the distributions have the same shape, this means that at least two groups have different mean ΔE00 scores.
To better understand which of the groups are significantly different and which are not, the researcher then performs 6 Mann-Whitney U tests to compare color loss ΔE00 among two groups at a time.
Using a significance level of 0.05, the researcher concludes that all groups, except groups 2 and 3, are significantly different. The p-value for the comparison between groups 2 and 3 is 0.068, so since \(p > 0.05\) the researcher does not reject the null hypothesis for the Mann-Whitney U test and concludes the difference in color loss ΔE00 scores is not significantly different.
R Code and Examples
Mann-Whitney U Test: R script file
###-----------------------
### Mann-Whitney U Test
###-----------------------
x<-c(2.598720605,
2.956664701,
3.132831941,
3.036701609,
2.370539221,
2.778687614,
2.492212481,
3.573790004,
2.653324785,
2.405669672,
3.105197183,
2.934973586)
y<-c(1.4447,
1.2080,
1.2921,
1.2478,
1.3718,
1.3171,
1.3198,
1.4491,
1.3573,
1.2901,
1.6266,
1.4688)
# QQplot
qqplot(x,y, xlab="Group A Quantiles", ylab="Group B Quantiles")
# Mann Whitney U test
wilcox.test(y,x)
Sign Test and Wilcoxon Signed Rank Test: R script file
###------------------------------------------
### Sign Test and Wilcoxon Signed Rank Test
###------------------------------------------
# Generate Data
a1<-1.1
a2<-2.0
as<-0.2
b1<-0.9
b2<-1.7
bs<-0.15
set.seed(168585)
x<-c(rnorm(n=5,mean=a1,sd=as),rnorm(n=5,mean=a2,sd=as))
y<-c(rnorm(n=5,mean=b1,sd=bs),rnorm(n=5,mean=b2,sd=bs))
# QQplot
qqplot(x,y)
qqnorm(x)
# Histogram
hist(x)
hist(y)
# Library BSDA package
library(BSDA)
# Sign test
SIGN.test(x, y, md = 0, alternative = "two.sided")
# Histogram
hist(x-y)
# Wilcoxon Signed Rank test
wilcox.test(x, y, paired = TRUE, alternative = "two.sided")
Kruskal Wallis Test: R script file
###-----------------------
### Kruskal Wallis Test
###-----------------------
# Enter Data
x<-c(2.598720605,
2.956664701,
3.132831941,
3.036701609,
2.370539221,
2.778687614,
2.492212481,
3.573790004,
2.653324785,
2.405669672,
3.105197183,
2.934973586)
y<-c(1.4447,
1.2080,
1.2921,
1.2478,
1.3718,
1.3171,
1.3198,
1.4491,
1.3573,
1.2901,
1.6266,
1.4688)
z<-c(0.888180159,
1.122454322,
1.060978933,
1.050607838,
1.221687248,
1.516789236,
1.329441948,
1.244581366,
1.375049939,
1.327936771,
1.247591327,
1.321586341)
w<-c(1.15722013,
0.847002643,
0.748706216,
0.968027886,
0.985208816,
0.984114276,
0.899525211,
0.750247285,
0.745000625,
1.10559267,
1.212459628,
1.018301528)
# Kruskal Wallis Test
kruskal.test(list(x,y,z,w))
wilcox.test(x,y)
wilcox.test(x,z)
wilcox.test(x,w)
wilcox.test(y,z)
wilcox.test(y,w)
wilcox.test(z,w)