Assessing Normality
Many statistical methods in this handbook require that the sample data comes from an underlying population that is normally distributed. When the shape of the population distribution is known, this extra knowledge allows for better decisions by producing narrower confidence intervals and more powerful hypothesis tests for the sample size. To assess the validity of this normality assumption, graphical methods (visual inspections of the histogram and normal probability plot), or normality tests (for example, Ryan-Joiner test) can be employed.
(1) Graphical Methods
- Histogram: The histogram of the data should follow a symmetric, bell-shaped pattern.
- The presence of skewness in the shape of the distribution is an indicator of non-normality and further analysis is indicated.
- Presence of Outliers: A significant number of outliers relative to the sample size is potentially indicative of a population that is not normally distributed.
- Normal Probability Plot: The data points in the normal probability plot should follow a linear trend. Pronounced deviations from a linear trend, such as an S-shape, are of concern and an indicator of non-normality.
Determining Normality Graphically: In example (a), the histogram is close to a bell-shaped pattern and the points on the normal probability plot do not substantially deviate from a linear trend, so this does not provide any evidence for rejecting the assumption of normality of the population distribution. On the other hand, examples (b), (c) and (d) offer reasons to reject that the underlying population is normally distributed. The pronounced S-shape in the normal probability plot in example (b) is indicative of heavy tails as the area in the tails of the histogram lies outside the bounds of the normal curve, whereas the distribution in example (c) has light tails. A combination of the two patterns can be seen in example (d), in which the distribution is right-skewed.
Graphical methods are designed to illuminate non-normality. However, the reader should keep in mind that when the sample size is small, there is a lot of variabilities present in the shape of the normal probability plot even when the data is sampled from a normal population. When looking for evidence to reject the normality assumption, the reader should only look for a strong pattern of deviation from a linear trend in the normal probability plot. The question of how close the population is to a normal distribution presents many challenges, as there are no clear cut-offs. These methods are only designed to assess the plausibility of the normality assumption of the population
Variability in Normal Probability Plots: To demonstrate the variability present in normal probability plots for small sample sizes, we include below examples of four such graphs generated from normal distributions. The deviations from the linear pattern are however not pronounced, so only strong patterns should be taken as evidence against normality.
(2) Tests of Normality
There are several tests to assess normality, such as the Kolmogorov-Smirnov test, the \(\chi^2\)-test, or the Anderson-Darling test, though these tests do not provide consistent answers. We offer below an example of such a test based on a simple and intuitive idea, known as the Ryan-Joiner test or the correlation test for normality. The reader is encouraged to accompany these tests with the normal probability plot, as for large sample sizes these tests may indicate a poor fit when the plot indicates otherwise.
The Ryan-Joiner test is based on the idea that the correlation coefficient r is an indicator of how well the points in the normal probability plot fit a line. Since in this setting the null hypothesis is that the population has a normal distribution, small p-values are evidence that the distribution is not normal. The table below offers specific language to interpret the output of the p-value corresponding to this observed correlation coefficient.
What to do if Normality is Violated: If the data is almost normal or roughly normal, then some statistical methods such as the z– and unpooled t-test are robust to deviations from the normal distribution. Some statistical methods can also be employed in the absence of normality when the sample size is large \((n \leq 30)\). This of course presents a tradeoff, as collecting data on an experiment with 30 specimens instead of 12 specimens can be prohibitively expensive. Other options are to use non-parametric methods, in which the underlying distribution of the population does not need to be known, or to transform the data to correct the deficiencies
The Assumption of Independence
Probabilistic independence is a common modeling assumption underlying the performance of hypothesis tests. Outcomes on n trials are called probabilistically independent if learning the outcomes of some trials does not change the probabilities associated with the outcomes of the other trials. When outcomes are not independent, they are called dependent. For example, suppose 10 patients in a group are asked if they brush their teeth twice a day through a written survey. Then it is reasonable to assume that their answers are independent of each other. On the other hand, if the 10 patients are in the same room and are asked to raise their hands if they brush their teeth twice a day, then social pressure may be felt, which could lead to a change in patient responses, and thus to dependence. To ensure independence of the patient’s answers, the researcher should prevent them from seeing each other’s responses. The preceding example illustrates a common method of achieving independence, specifically, that of preserving the system being observed. Another setting in which we can often reasonably obtain independence is by ensuring the system we observe does not change from one trial to the next. In particular, with experiments involving dental material specimens, independence can often be achieved by cleaning equipment or recalibrating machines between trials.
The required independence applies within groups as well as between groups. For example, suppose two instructors each teach a group of 20 students to perform a dental procedure. After their training, the students are assessed on their ability to perform this procedure. The instructors will be compared by how well their students perform. If the students within the first group can observe others in their group as they perform this procedure, then this can change how well they perform when it is their turn. This would result in within group dependence. Similarly, if the second group of students could watch the students from the first group perform the procedure, then this would create a between group dependence. In each case, the within and between group dependence could be removed by denying students the ability to witness each other’s performance.
There are also experimental designs that preserve independence even when using the same subjects for two treatments. This can be achieved by incorporating washout periods between the treatments. For example, a researcher might treat all subjects with a first drug, and then wait long enough for the effects of the first drug to clear (or washout of) the person before administering the second drug. This design will limit the impact of any carryover effects which reduces the dependence between the treatments.
The reader is encouraged to ensure that their samples and groups are independent through the design of the experiment. There are several tests that can be used to assess the independence of the observations, however, they are not frequently used in practice. One such test is the Test for Association, which can be used to assess whether the observations from two or more groups are independent.
How to Check Equality of Variance
To check if the population standard deviations are equal \(\sigma_1=\sigma_2\), we must perform an F-test. The underlying assumption for the F-test is that the data is normally distributed or we have a large sample. However, the F-test is not robust, in that it is sensitive to mild deviations from normality of the underlying populations. The reader should be cautious when employing this method for small sample sizes and should instead follow the recommended method above.
Process:
R Code and Examples
QQplot: R script file
### ------------------------------------------------------
### create qqplot of your data vs theoretical normal data
### ------------------------------------------------------
## Two Steps
#step 1 - store your data as my_data
my_data<-c(1,
4,
2,
12,
6,
0,
3,
4,
8
)
#step 2 - graph qqplot of your data vs theoretical normal
qqnorm(y=my_data,
pch=21,#round dots with border
bg="blue",#fill color
col="blue",#border color
cex=1.2 #120% of regular dot size
)
#add line to graph
qqline(my_data,col="red",lwd=2)
Histogram: R script file
###---------------------------------
### using R to make histograms
### --------------------------------
# example 1 - with very basic plotting options
#1# store data
my_data<-c(1,
4,
2,
12,
6,
0,
3,
4,
8
)
#2# plot histogram
hist(my_data,col="lightblue")
# example 2 - choosing options to enhance display
#1# store data
my_data<-c(1,
4,
2,
12,
6,
0,
3,
4,
8
)
#2# make histogram
#choose historam color - R knows html color names: htmlcolorcodes.com/color-names
hc<-col2rgb("lightblue")/255
#plot histogram
hist(x=my_data,
breaks=6, #number of cells
labels=TRUE, #displays counts - use FALSE to turn off
main="Your Title",
cex.main=1.5, #150% of normal Title font
xlab="X Axis",
ylab="Y Axis",
cex.lab=1.25, #125% of normal axes label font
sub="This is a Subtitle",
cex.sub=1, #subtitle size is 100% of normal
#histogram color with opacity alpha=0.25
col=rgb(red=hc[1],green=hc[2],blue=hc[3],alpha=0.25)
)
#enclose in a box
box(col="red")
#abline(h=0)