In the earlier section of one-way analysis of variance, we studied the effects of one factor on a response variable. For example, a one-way ANOVA is used to investigate if the training level of students influences their ability to correctly recognize tooth color. If the researcher suspects that gender also influences this ability, then a two-way analysis of variance is appropriate.
The two-way analysis of variance (Two-Way ANOVA) is used to examine the effects of two factors A and B on the response variable \(y\). In the setting above, factor A is the training level of the student with 3 different levels (low, moderate, excellent), factor B is the gender of the student with 2 levels (female, male), and the response variable \(y\) is the student’s ability to recognize color (measured as CIEDE2000 deviation from actual tooth color). With 3 levels of factor A and 2 levels of factor B, this is called a 3 × 2 design and it has 6 treatments formed from choosing the levels of each of the two factors. The table below displays the observed mean responses (CIEDE2000 deviation scores) for each treatment group, which are commonly called cell means.
Alternatively, the cell means can be displayed in a table like the one below.
A two-way analysis of variance is concerned with determining whether the response variable is influenced by the levels of factors A and B. This includes the effect of factor A, the influence of factor B, and the interaction influence of factors A and B. Two factors A and B are said to interact if the effect of one factor on the response variable depends on the level of the other factor.
In the setting above, the researcher can test whether the ability to recognize color is influenced by training level (factor A), gender (factor B), and the simultaneous effects and possible interactions of the two factors A and B. If the influence of training level on the ability to recognize color is the same for each gender, then there is no interaction between factors A and B. In the example listed in the interaction plot (\(i\)) below, we note that the differences between the CIEDE2000 deviation scores by gender are almost the same at each training level (0.3, 0.3, and 0.2), and thus the curves of mean responses for the different levels of factor B (gender) are almost parallel. The interaction plot therefore suggests that there is no significant interaction between factors A and B.
On the other hand, the modified example listed in the interaction plot (\(ii\)) above illustrates a case for which the difference between mean responses for the three levels of factor A – training level is not constant across the two levels of factor B – gender (they are 3.8, 2.6 and 0.1), and thus the curves are not parallel. This suggests that the influence of training level on the response variable depends on gender (male student performance improves greatly with training whereas female students improve less with training), and thus there is a factor interaction effect between training level and gender.
The analysis of variance method is robust to both deviations from the assumption of normality and of constant variance when the sample sizes between groups are equal (or near equal). For this reason, the use of a balanced experimental design with groups that have the same number of specimens is recommended.
Another consideration about the experimental design is the number of observations for each treatment (or cell). The analysis of variance performed depends on whether the design is balanced, having an equal number of observations in each cell, or unbalanced in which the number of observations varies across cells. In the example above, the researcher could select 10 students for each of the 6 treatments with a total number of 60 students in the experiment, thus making the design balanced. Alternatively, the researcher could select 10 male students for each of the training levels (30 male students in total) and 5 female students at each training level (15 female students in total), which would correspond to an unbalanced design for the experiment.
(1) Since an unbalanced design with an unequal number of observations is more complex than the balanced design, we recommend using an equal number of observations for each treatment (cell). However, if practical concerns such as cost, time, and restrictions to gathering data for certain cells are present, then the computer software output will provide the necessary analysis for this design.
(2) To run a valid test of interactions, every cell must have at least two observations.
Experiment:
1) One response variable.
2) Two factors with two or more levels (groups).
Assumptions:
1) The response variable is normally distributed.
2) The observations are independent.
3) The variance is constant (the same in each group).
For unbalanced designs, the type I and II error rates are inflated more when there is non-constant variance than in the balanced case. For this reason, the use of a balanced experimental design is recommended.
The model for the two-factor design can be expressed in the form of a cell means or a factor effects model. The model that expresses the values of \(y\) in terms of the cell means and random errors is called the cell means model. Here, for each level \(i\) of factor A and level \(j\) of factor B, the \(k^{th}\) response to that treatment combination is denoted \(y_{ijk}\), the population mean response to that treatment is denoted $\mu_{ij}$, and the random errors in using that mean to predict the value of \(y_{ijk}\) are denoted by \(\varepsilon_{ijk}\).
On the other hand, the factor effects model provides a more detailed insight into the influences of the two factors by expanding the cell mean \(\mu_{ij}\) as the sum of the overall experimental mean \(\mu\), the influences of factor A, factor B, and the interaction of factors A and B. The coefficient \(\alpha_i\) is called the main effect of the \(i^{th}\) level of factor A, \(\beta_j\) is called the main effect of the \(j^{th}\) level of factor B, \((\alpha\beta)_{ij}\) is called the interaction effect of factors A and B when they are set to levels \(i\) and \(j\), respectively. If there is no interaction between factors A and B, then \((\alpha\beta)_{ij}=0\) for all \(i\) and \(j\), and the resulting model without the \((\alpha\beta)_{ij}\)-term is said to be additive because the cell mean is a pure sum of \(\mu\) and the main effects of factors A and B. It is worth noting that the coefficient \(\alpha_i\) is the deviation of the marginal effect of factor A at level \(i\) from the overall experimental mean \(\mu\), that is \(\mu_{i.}-\mu\). Similarly, the coefficient \(\beta_j\) is the deviation of the marginal effect of factor B at level \(j\) from the overall experimental mean \(\mu (\mu_{.j} – \mu)\).
In the standard two-way analysis of variance, the overall F-tests assess three null hypotheses: whether the main effects of factor A are all zero, whether the main effects of factor B are all zero, and whether the interaction effects between factors A and B are all zero. We assume in the following that the design of the experiment is balanced.
Process:
Effect of Material Type and Thickness on Color Stability of Restorations Subjected to Coffee Staining (Two-Way ANOVA with Balanced Design and Interaction):A researcher is interested in studying the effect of 2 material types (LDS, MonZr, ZLS) and 3 thickness levels (0.5, 0.7, 1.0 mm) on the color stability of a restoration subjected to repeated exposure to coffee staining. The color stability of the restoration is quantified using the CIEDE2000 score that measures the difference in color of the restoration before and after the staining occurs. For each of the \(3\times3\times9\) treatments in the experiment, four specimens are generated for a two-way ANOVA with balanced design.
The 3 boxplots below offer a visual inspection of the main effects and the variability. From the first boxplot on factor A – thickness, we observe that color stability does appear to depend on this factor in that as the thickness increases, the mean color stability decreases. From the second boxplot, we note that the effect of factor B – material type is not constant. Finally, the third boxplot shows a widely varying interaction effects on color stability.
The researcher constructs an interaction plot to explore the presence and nature of the interactions between thickness and material type.
Since the 3 curves are not parallel, they display interactions between the two factors. In particular, when thickness is 0.5 mm, the 3 material types produce highly distinct effects on mean CIEDE2000, having material type ZLS as inducing the largest mean CIEDE2000. When the thickness is 0.7 or 1.0 mm, ZLS and MonZr have almost an identical effect, which is much less than the effect of material type LDS.
The results of the two-way ANOVA F-tests are displayed in the table below. While the boxplots suggest a departure from constant variance, a Levene’s Test revealed that departure was not statistically significant. Overall, the assumptions of the two-way ANOVA are satisfied.
The test for the main effects of factor A – thickness yields a p-value of 0.009406, which is less than the significance level \(\alpha=.05\), so the researcher rejects the null hypothesis and concludes that there is sufficient evidence to indicate that thickness has a non-constant effect on the mean response CIEDE2000. Similarly, the small p-value ( p< .05) for factor B – material type shows that material type has a non-constant effect on the mean response CIEDE2000. Finally, with a p-value of 0.000201 which is less than the significance level \(\alpha=.05\), the test for interactions between the two factors reveals that there is sufficient evidence of a non-zero interaction between thickness and material type.
Effect of Tooth Shade and Ceramic Type on Translucency (Two-Way ANOVA with Unbalanced Design and No Interaction): A researcher is interested in studying the effect of 2 ceramic shades (A2 and A3) and 4 types of ceramics (Slip-cast, Zirconia block, Feldspathic block, Heat pressed) on the translucency of a restoration. There are several different brands of Zirconia blocks and Slip-cast blocks available, so the researcher decides to measure the translucency of 7 specimens for each of the different combinations of ceramic brands and shades available that are listed in the table below.
While the experimental design with factor A – ceramic shade (A2 and A3) and factor B – ceramic type (Slip-cast, Zirconia block, Feldspathic block, Heat pressed) has \(2\times4=8\) treatments, the observations are not equally distributed across these treatments, and so the design for the two-way ANOVA is unbalanced. The frequency table for the experiment is included below.
The 3 boxplots below display a visual representation of the main effects and variability. The first boxplot shows the effects of the shade designations (A2 and A3) on translucency are nearly the same, while in the second boxplot the observed effects of ceramic type on the TP-value are not constant, and so it is plausible to expect that the type of ceramic influences the translucency of the restoration. The third boxplot shows that the cell means depend on the treatment combinations since their centers are not at an approximately constant height.
Since in the interaction plot below the four curves are nearly parallel, the data suggest there is no interaction between the two factors shade and ceramic type.
The results of the two-way ANOVA F-tests are displayed in the table below.
The test for the main effects of factor A – shade designation yields a p-value of 0.203, which is greater than the significance level=.05, so the researcher does not reject the null hypothesis and concludes that there is not sufficient evidence to indicate that shade has a non-constant effect on the mean response TP-value. On the other hand, the small p-value (p < .05) for factor B – ceramic type shows that ceramic type has a non-constant effect on the mean response translucency parameter value. Finally, with a p-value of 0.971 which is greater than the significance level \(\alpha=0.05\), the test for interactions between the two factors reveals that there is not a significant interaction between thickness and material type. Since the F-test revealed no interactions between the two factors, the model is additive, and the analysis can proceed as if there were no interactions between shade and ceramic type.
As with the one-way setting, a single comparison two-way ANOVA is a test of a single hypothesis about the population treatment means. If on the other hand, a researcher is interested in simultaneously testing multiple hypothesis, then a multiple comparison two-way ANOVA procedure (described in the next section) should be employed. The hypothesis for these procedures are expressed using contrasts, just as they were in the one-factor setting. The reader is encouraged to revisit the concept of contrast and the procedures in the one-factor case in the earlier section.
The procedure for the single comparison two-way ANOVA varies depending on whether interaction was present.
Procedure:
Effect of Tooth Shade and Ceramic Type on Translucency (Single Contrast Two-Way ANOVA with No Interaction): Having performed the overall F-test for the two-way ANOVA and rejected the hypothesis for factor B, the researcher can formulate and test new hypothesis of interest. For example, the researcher can choose to test whether the Slip-cast material type, which exhibits the lowest TP-values for both A2 and A3 shades, performs significantly better than the average of the other 3 material types. A single contrast two-way ANOVA with no interaction present for factor B – material type is performed. The contrast \(\mu_{.3} – (\mu_{.1}+\mu_{.2}+\mu_{.4})/3\) is of interest. A hypothesis test with \(H_0: \mu_{.3} – \frac{\mu_{.1}+\mu_{.2}+\mu_{.4}}{3} = 0\) vs \(H_a: \mu_{.3} – \frac{\mu_{.1}+\mu_{.2}+\mu_{.4}}{3} \neq 0\) can be performed to determine if the observed contrast is statistically significant.
Procedure:
Effect of Material Type and Thickness on Color Stability of Restorations Subjected to Coffee Staining (Single Contrast Two-Way ANOVA with Interaction Present):Since the overall F-test for the two-way ANOVA rejected the three null hypotheses, the researcher is now interested in determining if the LDS material at thickness level 1.0 mm performs significantly worse than the average of the other two material types (MonZr and ZLS) for the same thickness level. The boxplot showed that the CIEDE2000 score for LDS at this thickness was larger than that of the other two materials. A single contrast two-way ANOVA with interaction present for factor A – material type is performed, with the contrast of interest being \(\mu_{13}-(\mu_{23}+\mu_{33})/2\). A hypothesis test with \(H_0: \mu_{13} – \frac{\mu_{23}+\mu_{33}}{2}=0\) vs \(H_a: \mu_{13} – \frac{\mu_{23}+\mu_{33}}{2} \neq 0\) can be performed to determine if the observed contrast is statistically significant.
All the techniques presented for one-way ANOVA multiple comparisons can be applied to the two-way ANOVA setting if multiple comparisons are of interest to the researcher.
R Code and Examples
Two Factor ANOVA Example 1: R script file
###--------------------------
### Two-way ANOVA Example 1
###--------------------------
# Generate Data
set.seed(1869232)
MonZr5 <- rnorm(4, 0.46, 0.179)
MonZr7 <- rnorm(4, 0.58, 0.291)
MonZr10 <- rnorm(4, 0.26, 0.112)
LDS5 <- rnorm(4, 1.42, 0.582)
LDS7 <- rnorm(4, 1.23, 0.276)
LDS10 <- rnorm(4, 1.38, 0.168)
ZLS5 <- rnorm(4, 1.92, 0.337)
ZLS7 <- rnorm(4, 0.73, 0.403)
ZLS10 <- rnorm(4, 0.37, 0.112)
CIEDE <- c(MonZr5, MonZr7, MonZr10, LDS5, LDS7, LDS10, ZLS5, ZLS7, ZLS10)
Thickness <- c(rep(c(rep('0.5', 4), rep('0.7', 4), rep('1.0', 4)), 3))
Materials <- c(rep('MonZr', 12), rep('LDS', 12), rep('ZLS', 12))
MyData <- data.frame(CIEDE, Thickness, Materials)
# Frequency table
table(MyData$Thickness, MyData$Materials)
## Box plots
par(mfrow = c(1,2))
boxplot(CIEDE~Thickness, data = MyData, col = 2:4, xlab = "Thickness")
boxplot(CIEDE~Materials, data = MyData, col = 2:4, xlab = "Materials")
par(mfrow = c(1,1))
boxplot(CIEDE~Thickness*Materials, data = MyData, col = 2:10, xlab = "Thickness*Materials")
## Interaction Plots
require(graphics)
with(MyData, {
interaction.plot(Thickness,Materials,CIEDE,col=1:3,type="b",pch=19)
})
with(MyData, {
interaction.plot(Materials,Thickness,CIEDE,col=1:3,type="b",pch=19)
})
# Two-way ANOVA analysis
MyMod <- aov(CIEDE~ Thickness*Materials, data = MyData)
summary(MyMod)
Two Factor ANOVA Example 2: R script file
###-----------------------
# Two-way ANOVA Example 2
###-----------------------
# Generate Data
set.seed(185719)
ICS2 <- rnorm(7, 8, 0.765)
ICA2 <- rnorm(7, 4.4, 0.332)
AZC2 <- rnorm(7, 7.8, 0.255)
DIZ2 <- rnorm(7, 8.3, 0.663)
VIZ2 <- rnorm(7, 7.5, 0.408)
MK22 <- rnorm(7, 11.5, 0.255)
EM22 <- rnorm(7, 12, 0.204)
ICS3 <- rnorm(7, 8.2, 0.408)
ICA3 <- rnorm(7, 4.4, 0.316)
AZC3 <- rnorm(7, 8.3, 0.357)
DIZ3 <- rnorm(7, 8.1, 0.357)
VIZ3 <- rnorm(7, 8, 0.765)
MK23 <- rnorm(7, 11.8, 0.255)
EM23 <- rnorm(7, 12, 0.255)
TPvalue <- c(ICS2, ICA2, AZC2, DIZ2, VIZ2, MK22, EM22, ICS3, ICA3, AZC3, DIZ3, VIZ3, MK23, EM23)
Shade <- c(rep('A2', 49), rep('A3', 49))
Type <- rep(c(rep('Slipcast',14), rep('Zirconia', 21), rep('Feldspathic', 7), rep('Heatpressed', 7)),2)
Code <- c(rep('ICS2', 7), rep('ICA2', 7), rep('AZC2', 7), rep('DIZ2', 7), rep('VIZ2', 7),
rep('MK22', 7), rep('EM22', 7), rep('ICS3', 7), rep('ICA3', 7), rep('AZC3', 7), rep('DIZ3', 7), rep('VIZ3', 7),
rep('MK23', 7), rep('EM23', 7))
TPData <- data.frame(TPvalue, Shade, Type, Code)
# Frequency table
table(TPData$Shade, TPData$Type)
## Box plots
par(mfrow = c(1,2))
boxplot(TPvalue~Shade, data = TPData, col = 2:3, xlab = "Shade Designation")
boxplot(TPvalue~Type, data = TPData, col = 2:5, xlab = "Type")
par(mfrow = c(1,1))
boxplot(TPvalue~Shade*Type, data = TPData, col = 2:9, xlab = "Shade Designation*Type")
boxplot(TPvalue~Code, data = TPData, col = 2:15, xlab = "Code")
## Interaction Plots
require(graphics)
with(TPData, {
interaction.plot(Shade,Type,TPvalue,col=1:4,type="b",pch=19)
})
with(TPData, {
interaction.plot(Type,Shade,TPvalue,col=1:2,type="b",pch=19)
})
# Two-way ANOVA analysis
TPMod <- aov(TPvalue~ Shade*Type, data = TPData)
summary(TPMod)