In the typical research scenario, the value of the population standard deviation \(\sigma\) is unknown, and so it must be estimated, which leads to the use of the t-test instead of the z-test to assess the plausibility of the underlying research hypothesis. Moreover, even when the population standard deviation is known, the width of the t-confidence interval is larger than that of the z-confidence interval, and so the t-test is more conservative and should be given preference.
a) Z-Test
Assumptions:
1) Normal distribution or large sample.
2) Population standard deviation is known。
Process:
b) Z-Confidence Interval
- Two-sided: The \((1-\alpha)100\%\) confidence interval for \(\mu\) is \(\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\) is the upper \(100*\alpha/2\)th percentile of normal distribution.
- One-sided:
- The upper \((1-\alpha)100\%\) confidence interval for \(\mu\) is from \(\bar{x}-z_\alpha \frac{\sigma}{\sqrt{n}}\) to positive infinity, where \(z_\alpha\) is the upper \(100*\alpha\)th percentile of normal distribution.
- The lower \((1-\alpha)100\%\) confidence interval for \(\mu\) is from negative infinity to \(\bar{x}+z_\alpha \frac{\sigma}{\sqrt{n}}\).
The population standard deviation \(\sigma\) is almost always unknown, and so \(\sigma\) must be estimated, which leads to the use of the t-test as the recommended path to assess the plausibility of the underlying research hypothesis. We offer below an example of a possible scenario where the population standard deviation \(\sigma\) is known and the z-test should be used.
Machine Calibration A researcher needs to calibrate a measuring instrument such as a scale, which has factory specifications for the standard deviation of \(\sigma=0.05g\). So while the variability of the instrument is known, the researcher is trying to determine if the scale needs to be adjusted to give accurate measurements. A 1-gram mass is placed on the scale and its weight is recorded. The process is repeated \(n=30\) times to produce the data set \(x_1, x_2, \cdots, x_n\) with sample mean \(\bar{x}=1.02g\). The null hypothesis is that the machine is well calibrated, that is the population mean \(\mu=1g\). The alternative hypothesis is that \(\mu \neq 1g\). The z-test is employed and the test statistic \(z_0\) is calculated \(z=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}=\frac{1.02-1}{0.05/\sqrt{30}}=2.19\). Using R the p-value is calculated \(p=0.0285\). Using a signficance level of \(\alpha=0.05\) for a two-sided test, since the \(p-value \leq \alpha\), we rejected the null hypothesis. We can conclude that it is exceptionally implausible that the scale is correctly calibrated for 1g masses. The z-confidence interval \((1.0021, 1.0379)\) shows that with \(95\%\) confidence this scale reports that a 1-gram mass is measured to be between 1.0021 and 1.0379 grams.
a) T-Test
Assumptions:
1) Normal distribution or large sample.
2) The population standard deviation is unknown.
Process:
b) T-Confidence Interval
- Two-sided: The \((1-\alpha)100\%\) confidence interval for \(\mu\) is \(\bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}\), where \(t_{\alpha/2, n-1}\) is the upper \(100*\alpha/2\)th percentile of \(t\) distribution with \(n-1\) degrees of freedom.
- One-sided:
- The upper \((1-\alpha)100\%\) confidence interval for \(\mu\) is from \(\bar{x}-t_{\alpha, n-1} \frac{s}{\sqrt{n}}\) to positive infinity, where \(t_{\alpha, n-1}\) is the upper \(100*\alpha\)th percentile of \(t\) distribution with \(n-1\) degrees of freedom.
- The lower \((1-\alpha)100\%\) confidence interval for \(\mu\) is from negative infinity to \(\bar{x}+t_{\alpha, n-1} \frac{s}{\sqrt{n}}\).
In the typical research scenario, the value of the population standard deviation is unknown, so a t-test is used to assess the hypothesis. When using statistical software, the researcher should distinguish between the one-sided (right- or left-tailed) and the two-sided cases.
Irradiation Effects on Flexural Strength: 30 specimens of a composite dental material are subjected to ionizing radiation used in the treatment of cancer patients for 40 days. The flexural strength of the material is measured for each of the 30 specimens after irradiation. One research goal is to estimate the mean flexural strength after irradiation and determine if it exceeds a minimally acceptable threshold for flexural strength \(\mu_0\) of composite materials used in dental restorations. The researcher is concerned the composite material is weakened as a result of irradiation. Wanting to guard against using composite materials that would become unacceptably weak after irradiation such as that experienced by cancer patients, we choose the null hypothesis to be \(H_0: \mu < \mu_0\). Since the sample size \(n=30\), the researcher can use the t-test and obtains a p-value of 0.00172. Since the p-value is less than the significance level \(\alpha= 0.05\), the researcher rejects the null hypothesis and concludes that the composite material has the required minimal flexural strength after irradiation. The example is continued in section 5, when the researcher is interested in comparing the flexural strength of the material before and after irradiation, and so a t-test for two groups will be employed.
Irradiation Effects on Hardness: 10 specimens of a composite dental material are subjected to the same irradiation. The goal is to estimate the mean hardness after irradiation and determine if it exceeds a minimally acceptable threshold for hardness \(\mu_0\) of fixed dental restorations. Since \(n=10<30\), normality should be assessed before using this parametric test. If normality is not present, then a non-parametric test or transformation of the data should be used.
a) Z-Test
Assumptions:
1) Normal distribution or large sample.
2) Population standard deviations for the two groups are known.
Process:
b) Z-Confidence Interval
- Two-sided: The \((1-\alpha)100\%\) confidence interval for the difference between two populations, \(\mu_1 – \mu_2\) is \(\bar{x}_1-\bar{x}_2 \pm z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}\).
- One-sided:
- The upper \((1-\alpha)100\%\) confidence interval for the difference between two populations, \(\mu_1 – \mu_2\), is from \(\bar{x}_1-\bar{x}_2 – z_{\alpha}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}\) to positive infinity.
- The lower \((1-\alpha)100\%\) confidence interval for the difference between two populations, \(\mu_1 – \mu_2\), is from negative infinity to \(\bar{x}_1-\bar{x}_2 + z_{\alpha}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}\).
The population standard deviations \(\sigma_1\) and \(\sigma_2\) are almost always unknown, and so they must be estimated, which leads to the use of the t-test as the recommended path to assess the plausibility of the underlying research hypothesis.
a) Paired T-Test
Assumptions:
1) Normal distribution or large sample.
2) Matched-pair data from dependent samples of two populations, or each subject is measured twice and the measurements are paired.
Process:
b) T-Confidence Interval
- Two-sided: The \((1-\alpha) 100\%\) confidence interval for the difference between two pairs \(\mu_d\) is \(\bar{d} \pm t_{\alpha/2, n-1} \frac{s_d}{\sqrt{n}}\).
- One-sided:
- The upper \((1-\alpha)100\%\) confidence interval for the difference between two pairs \(\mu_d\) is from \(\bar{d} – t_{\alpha, n-1} \frac{s_d}{\sqrt{n}}\) to positive infinity.
- The lower \((1-\alpha)100\%\) confidence interval for the difference between two pairs \(\mu_d\) is from negative infinity to \(\bar{d} + t_{\alpha, n-1} \frac{s_d}{\sqrt{n}}\).
The paired t-test is used for comparing differences in means when you have matched pairs. The most common ways that matched pairs arise are:
- two measurements taken on the same individual subjected to two different treatments,
- before and after measurements taken on the same individual subjected to one treatment,
- measurements of similar individuals from different groups subjected to two treatments.
A paired t-test should be considered for measurements of similar individuals from different groups when the observations are dependent. In this setting, matched pairs allow the researcher to exclude the potentially confounding influences like age, gender, manufacturer as true underlying sources of the reasons for the difference in means.
Measuring Translucency with Two Instruments: The translucency for 7 specimens of a type of dental material is measured with two different instruments: spectroradiometer (SR) and spectrophotometer (SP). The researcher is interested in determining whether the two instruments perform differently. A paired t-test is performed to compare the mean values of translucency under the 2 instruments, where the null hypothesis is H0: \(\mu_d=0\). Since the sample size is small, the researcher first assesses normality through a qqplot. The paired t-test yields a p-value that is greater than \(\alpha= 0.05\), so the researcher concludes that there is no statistically significant difference between the measurements of the two instruments.
Effects of Grinding of Dental Restorations on Tooth Color: 12 specimens of a composite dental material stained in target color A2 are grinned down by 100 micrometers. The lightness (L*) of the specimens is measured before and after the grinding. A paired t-test is performed to assess the difference in lightness as a result of grinding. One or two-sided hypothesis can be formulated depending on the interests of the researcher. To assess whether the grinding changed the lightness of the specimens, the researcher can test null hypothesis H0:\(\mu_d=0\). However, to test whether there is a difference in lightness that is perceptible to the human eye, we can test the null hypothesis \(\mu_d \leq \mu_0\), where \(\mu_d \leq \mu_0\) is the perceptibility threshold for lightness. Since the sample size is small, the researcher first assesses normality by using a qqplot by looking for strong deviations from normality. Suppose the researcher decides to test the null hypothesis \(\mu_d \leq \mu_0\) that the difference in lightness is not perceptible to the human eye. Given that the p-value p = 2.173E-8 for the paired t-test is less than \(\alpha=0.05\), the researcher concludes that the difference in lightness as a result of grinding is significantly larger than the perceptibility threshold.
When deciding between the pooled and the unpooled t-test, the researcher should proceed as if \(\sigma_1 \neq \sigma_2\), unless they have strong reason to believe that \(\sigma_1 = \sigma_2\).
How to Check Equality of Variance: To check if the population standard deviations are equal \(\sigma_1=\sigma_2\), we must perform an F-test. The underlying assumption for the F-test is that the data is normally distributed or we have a large sample. However, the F-test is not robust, in that it is sensitive to mild deviations from normality of the underlying populations. The reader should be cautious when employing this method for small sample sizes and should instead follow the recommended method above.
Process:
a) Pooled T-Test
Assumptions:
1) Normal distribution or large sample for both groups.
2) Population standard deviations are unknown, but not equal.
Process:
b) Confidence Interval
- Two-sided: The \((1-\alpha) 100\%\) confidence interval for the difference between two pairs \(\mu_1 – \mu_2\) is \(\bar{x}_1 – \bar{x}_2 \pm t_{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1}+ \frac{1}{n_2}}\).
- One-sided:
- The upper \((1-\alpha)100\%\) confidence interval for the difference between two pairs \(\mu_1 – \mu_2\) is from \(\bar{x}_1 – \bar{x}_2 – t_{\alpha, df} \cdot s_p \sqrt{\frac{1}{n_1}+ \frac{1}{n_2}}\) to positive infinity.
- The lower \((1-\alpha)100\%\) confidence interval for the difference between two pairs \(\mu_1 – \mu_2\) is from negative infinity to \(\bar{x}_1 – \bar{x}_2 + t_{\alpha, df} \cdot s_p \sqrt{\frac{1}{n_1}+ \frac{1}{n_2}}\).
When deciding between the pooled and the unpooled t-test, the researcher should proceed as if \(\sigma_1 \neq \sigma_2\), unless they have strong reason to believe that \(\sigma_1 = \sigma_2\). The pooled method can be used in settings when the samples are drawn from the same population, and so it is not unreasonable to assume that the samples are from populations with the same standard deviations. However, the researcher should still verify that the population standard deviations are equal (see the technical note above).
Comparing Effects of Irradiation on Two Composite Materials: 30 specimens of a composite dental material from company C1 and 30 specimens of a second composite material from company C2 are subjected to irradiation for 40 days. Flexural strength of the material is measured for each of the 60 specimens after treatment, and the sample means \(\bar{x}_1\) and \(\bar{x}_2\) as well as the sample deviations \(s_1\) and \(s_2\) for group 1 and group 2 are calculated. The goal is to compare the mean strength between the two composite materials C1 and C2, and determine if there is a significant difference. The null hypothesis is H0: \(\mu_1=\mu_2\). If there are industry standards for flexural strength of composite dental materials, it may be the case that the two materials have the same population standard deviations. To assess if this is the case, we perform a hypothesis test using an F-test (see the technical note above). If the population standard deviations are the same, the researcher should use the pooledt-test. If they are different, then the researcher should use an unpooled t-test. Since for this data the p-value for the F-test is greater than \(\alpha= 0.05\), the variances should be regarded as equal and so the researcher performs a pooled t-test for the null hypothesis H0: \(\mu_1=\mu_2\). The p-value for the pooled t-test is p = 0.0167 which is less than is \(\alpha= 0.05,\) so the researcher concludes there is a significant difference between the mean strength of the two composite materials.
a) Unpooled T-Test
Assumptions:
1) Normal distribution or large sample for both groups.
2) Population standard deviations are unknown and unequal.
Process:
b) Confidence Interval
- Two-sided: The \((1-\alpha) 100\%\) confidence interval for the difference between two pairs \(\mu_1 – \mu_2\) is \(\bar{x}_1 – \bar{x}_2 \pm t_{\alpha/2, df} \sqrt{\frac{s_1^2}{n_1}+ \frac{s_2^2}{n_2}}\).
- One-sided:
- The upper \((1-\alpha)100\%\) confidence interval for the difference between two pairs \(\mu_1 – \mu_2\) is from \(\bar{x}_1 – \bar{x}_2 – t_{\alpha, df} \sqrt{\frac{s_1^2}{n_1}+ \frac{s_2^2}{n_2}}\) to positive infinity.
- The lower \((1-\alpha)100\%\) confidence interval for the difference between two pairs \(\mu_1 – \mu_2\) is from negative infinity to \(\bar{x}_1 – \bar{x}_2 + t_{\alpha, df} \sqrt{\frac{s_1^2}{n_1}+ \frac{s_2^2}{n_2}}\).
While the unpooled t-test still requires normality or large sample size, it is not necessary to check whether the population standard deviations for the two groups are the same.
Comparing Effects of Irradiation on Two Composite Materials: 12 specimens of a composite dental material C1 and 12 specimens of a second composite material C2 are subjected to irradiation for 40 days. Flexural strength of material is measured for each of the 24 specimens after treatment, and the sample means \(\bar{x}_1\) and \(\bar{x}_2\) as well as the sample deviations \(s_1\) and \(s_2\) for group 1 and group 2 are calculated. The goal is to compare the mean strength between the two composite materials C1 and C2, and determine if there is a significant difference. Normality needs to be assessed since the sample sizes are small. If the normality assumption is satisfied, then the unpooled t-test can be used to assess the null hypothesis is H0: \(\mu_1=\mu_2\). Since the p-value p = 6.731E-12 is less than \(\alpha= 0.05\), the researcher concludes that there is a significant difference between the mean strength of the two composite materials.
Comparing Results of Two Teaching Techniques: 60 dentistry students are learning to assess the color of natural teeth. 30 students in the treatment group are taught using a new innovative teaching technique (T1) and the other 30 students in the control group are taught using the classical teaching technique (T2). One student from group 2 withdrew from the class, and so the test was performed with \(n_1=30\) and \(n_2=29\). At the end of the semester, students are asked to assess the color of a tooth. The color of this tooth was measured by a spectroradiometer and recorded as A3.5. The difference between the color indicated by the students and the true color of the restoration is calculated using the \(\Delta E_{00}\) coefficient using the CIELab color system. The research goal is to assess if the mean color difference \(\Delta E_{00}\) in group T1 is significantly better than the color difference in group T2. The unpooled t-test is used to assess the null hypothesis H0: \(\mu_1 \geq \mu_2\). Since the p-value for the one-sided t-test p = 0.0276 is less than \(\alpha= 0.05\), the researcher rejects the null hypothesis and concludes that the new innovative teaching technique has a mean effect that exceeds the classical teaching technique.
R code and Examples
One-group z-test: R script file
###-----------------------
### One-group z-test
###-----------------------
# Generate Data
mean=1
sigma=0.05
x<-rnorm(30, mean, sigma)
mean(x)
# QQ plot
qqnorm(x)
qqline(x)
# One-group z-test
library(BSDA)
z.test(x, sigma.x = sigma, mu = 1) # Two-sided test
z.test(x, sigma.x = sigma, mu = 1, alternative = "greater") # One-sided test
One-group t-test: R script file
###-----------------------
### One-group t-test
###-----------------------
# Generate Data
mean=10
sigma=3
x<-rnorm(30, mean, sigma)
mean(x)
# QQ plot
qqnorm(x)
qqline(x)
# One-group t-test
t.test(x, mu = 10) # Two-sided test
t.test(x, mu = 10, alternative = "less") # One-sided test
Two-group z-test: R script file
###-----------------------
### Two-group z-test
###-----------------------
# Generate Data
mean1=10
sigma1=2
mean2=12
sigma2=3
x<-rnorm(50, mean1, sigma1)
y<-rnorm(50, mean2, sigma2)
# QQ plot
par(mfrow=c(1,2)) # Partition the graph window
qqnorm(x,col="red")
qqline(x)
qqnorm(y,col="blue")
qqline(y)
# Two-group z-test
library(BSDA)
z.test(x, y, sigma.x = sigma1, sigma.y = sigma2)
Two-group Paired t-test: R script file
###-----------------------
### Paired t-test
###-----------------------
# Generate Data
mean1=2
sigma1=0.02
mean2=3
sigma2=0.01
x<-rnorm(35, mean1, sigma1)
y<-rnorm(35, mean2, sigma2)
# QQ plot
par(mfrow=c(1,2)) # Partition the graph window
qqnorm(x,col="red")
qqline(x)
qqnorm(y,col="blue")
qqline(y)
# Difference
d = y-x
# Paired t-test
t.test(d, mu = 0)
F-test to check equality of variance: R script file
###-----------------------------------------
### F-test to check equality of variances
###-----------------------------------------
# Generate Data
mean1=1
sigma1=0.02
mean2=1.5
sigma2=0.03
x<-rnorm(30, mean1, sigma1)
y<-rnorm(30, mean2, sigma2)
# QQ plot
par(mfrow=c(1,2)) # Partition the graph window
qqnorm(x,col="red")
qqline(x)
qqnorm(y,col="blue")
qqline(y)
# F-Test to check equality of variances
var.test(x, y)
Two-group pooled t-test: R script file
###----------------------------------
### Two-group pooled t-test
###----------------------------------
# Generate Data
mean1=1.5
sigma1=0.03
mean2=2
sigma2=0.02
x<-rnorm(30, mean1, sigma1)
y<-rnorm(30, mean2, sigma2)
# QQ plot
par(mfrow=c(1,2)) # Partition the graph window
qqnorm(x,col="red")
qqline(x)
qqnorm(y,col="blue")
qqline(y)
# Two-group pooled t-test
t.test(x,y,var.equal=TRUE) # Two-sided test
t.test(x,y,var.equal=TRUE, alternative = "greater") # One-sided test
Two-group unpooled t-test: R script file
###----------------------------------
### Two-group unpooled t-test
###----------------------------------
# Generate Data
mean1=2
sigma1=0.1
mean2=2.5
sigma2=0.02
x<-rnorm(30, mean1, sigma1)
y<-rnorm(29, mean2, sigma2)
# QQ plot
par(mfrow=c(1,2)) # Partition the graph window
qqnorm(x,col="red")
qqline(x)
qqnorm(y,col="blue")
qqline(y)
# Two-group unpooled t-test
t.test(x,y,var.equal=FALSE) # Two-sided test
t.test(x,y,var.equal=FALSE, alternative = "less") # One-sided test