ALL STATISTICAL TESTS IN R

İrem Tanrıverdi
Analytics Vidhya
Published in
29 min readFeb 14, 2022

--

A statistical test provides a mechanism for making quantitative decisions about a process. The goal is to determine if there is sufficient evidence to “reject” an assumption or hypothesis about the process. To use statistical tests, processes must satisfied some conditions (assumptions).

There is two types of tests which are parametric and non-parametric tests. Let first start with the parametric tests.

Testing Population Mean

Parametric Test in R

One-Sample T-Test

It test if the population mean equal (bigger or smaller) to the specific value or not.

Assumptions:

  • Independence of the observations
  • Normality of the sample
  • Randomness

Let use the “diamonds” dataset to show how statistical tests are used.

library(ggplot2)
library(kableExtra)
library(knitr)
library(rsample)

data(diamonds)
kbl(head(diamonds), caption="Dataset", booktabs = T) %>% kable_styling(latex_options = c("striped", "hold_position","scale_down"),font_size=12, stripe_index = c(1,3,5,7))
Figure 1. Diamonds Dataset

Test if the price of the diamonds bigger than 3500 or not.

First we check the normality with shapiro-wilk normality test and qq-plot.

Q-Q plot: Q-Q plot (or quantile-quantile plot) draws the correlation between a given sample and the normal distribution. A 45-degree reference line is also plotted.

library(ggpubr)
ggqqplot(diamonds$price)
Figure 2. QQPlot with “ggpubr” package
library("car")
qqPlot(diamonds$price)
Figure 3. QQPlot with “car” package

When we look at the qqplot for “price” variable, there is huge deviations in the upper tail and lower tails. Besides, there is outliers. Then, let test the normality with appropriate test.

shapiro.test(diamonds$price)

Error in shapiro.test(diamonds$price) : sample size must be between 3 and 5000
  • As seen shapiro test does not work when sample size is bigger than 5000. In such a cases, Kolmogorov-Smirnov test or Jarque-Bera normality test are used.

Ho: Sample normally distributed

H1: Sample does not normally distributed

ks.test(x=diamonds$price,y='pnorm',alternative='two.sided')
##
## One-sample Kolmogorov-Smirnov test
##
## data: diamonds$price
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided

The Jarque-Bera test is a goodness-of-fit test that determines whether or not sample data have skewness and kurtosis that matches a normal distribution.

library(tseries)
jarque.bera.test(diamonds$price)
##
## Jarque Bera Test
##
## data: diamonds$price
## X-squared = 34201, df = 2, p-value < 2.2e-16

As seen both Jarque-Bera test and Kolmogorov-Smirnov test, p-value is smaller then the significance level of 0.05, so null hypothesis (Ho) is rejected. That means price of the diamonds does not normally distributed. Normally, in this situation we can not use t-test, but as an example for how to conduct t-test in R, let move assuming normality is satisfied.

Hypothesis:

Ho: μ=3500 vs H1: μ>3500

t.test(diamonds$price,mu=3500,alternative = "greater")
##
## One Sample t-test
##
## data: diamonds$price
## t = 25.196, df = 53939, p-value < 2.2e-16
## alternative hypothesis: true mean is greater than 3500
## 95 percent confidence interval:
## 3904.545 Inf
## sample estimates:
## mean of x
## 3932.8

alternative = a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".

When we look at the t-test result, p-value is smaller than the significance level of 0.05, so we reject the null hypothesis. That means real population mean is bigger than 3500. As seen real population mean is 3932.8.

TWO SAMPLE T-TEST

It test if the population means of two samples are equal (bigger or smaller) to the specific value or not.

Assumptions:

  • Samples should be independent from each other
  • Normality of the samples
  • Sample variances’ should be equal (homogeneous)

Test if the price of the diamonds of ideal cut and premium cut is equal or not.

First we check the normality with shapiro-wilk normality test and qq-plot.

First, let filter the ideal cut and premium cut diamonds from the data with the “filter” command.

#select ideal and premium cuts from data
library(dplyr)
diamonds_cut<- diamonds %>% filter(cut=="Ideal" | cut=="Premium")
ggqqplot(diamonds$price[diamonds$cut=="Ideal"])
ggqqplot(diamonds$price[diamonds$cut=="Premium"])
Figure 4. QQPlot for two samples

As seen from qqplot of price for ideal cut and premium cut, upper and lower tail of both sample show huge deviations, so both samples do not distribute normally. To be sure, let use an appropriate test for normality.

Ho: Sample is normally distributed.

H1: Sample is not normally distributed.

jarque.bera.test(diamonds$price[diamonds$cut=="Ideal"])
##
## Jarque Bera Test
##
## data: diamonds$price[diamonds$cut=="Ideal"]
## X-squared = 22169, df = 2, p-value < 2.2e-16
jarque.bera.test(diamonds$price[diamonds$cut=="Premium"])
##
## Jarque Bera Test
##
## data: diamonds$price[diamonds$cut=="Premium"]
## X-squared = 21398, df = 2, p-value < 2.2e-16
  • As seen two samples are not normality distributed.

Homogeneity of the variances

Two tests are used in the literature to test the homogeneity of variances. These are Bartlett test and levene test. Difference between these two test will be specified later.

Let use the Bartlett test in this example.

Ho: Sample variances are equal

H1: Sample variances are not equal

bartlett.test(price ~ cut, data = diamonds_cut)
##
## Bartlett test of homogeneity of variances
##
## data: price by cut
## Bartlett's K-squared = 301.5, df = 1, p-value < 2.2e-16
  • As can be seen, the variances of the two samples are not equal (p-value=0). The assumptions of normality and homogeneity of variance were not met for the t-test, but let see how to do the t-test in R as an example.

Hypothesis:

Ho: μ(ideal cut)=μ(premium cut)

H1: μ(ideal cut)μ(premium cut)

t.test(diamonds_cut$price ~ diamonds_cut$cut, alternative = "two.sided",conf.level = 0.95, var.equal = F)
##
## Welch Two Sample t-test
##
## data: diamonds_cut$price by diamonds_cut$cut
## t = 24.918, df = 26552, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1038.088 1215.344
## sample estimates:
## mean in group Premium mean in group Ideal
## 4584.258 3457.542

If variances of two sample are equal write “var.equal=T”.

  • The two population mean is not equal (p-value=0). The average price of the Premium group is 4584,258 and the average price of the Ideal group is 3457,542.

T-TEST FOR DEPENDENT SAMLPLES (PAIRED SAMPLES T-TEST)

The paired samples t-test is used to compare the means between two related groups of samples.

  • As an example of data, 20 patients received a treatment for diabetes during 6 months. We want to know whether the treatment for diabetes has an impact on the insulin level of the patients.

To answer to this question, the insulin level of the 20 patients has been measured before and after the treatment. This gives us 20 sets of values before treatment and 20 sets of values after treatment from measuring twice the insulin level of the same individuals.

In such situations, paired t-test can be used to compare the mean insulin level before and after treatment.

Paired t-test analysis is performed as follow:

  1. Calculate the difference (d) between each pair of value
  2. Compute the mean (m) and the standard deviation (s) of d
  3. Compare the average difference to 0. If there is any significant difference between the two pairs of samples, then the mean of d (m) is expected to be far from 0.
# Insulin level of the patients before treatment
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
# Insulin level of the patients before treatment
after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
# Create a data frame
my_data <- data.frame(group = rep(c("before", "after"), each = 10),
insulin = c(before, after))
head(my_data)
## group weight
## 1 before 200.1
## 2 before 190.9
## 3 before 192.7
## 4 before 213.0
## 5 before 241.4
## 6 before 196.9
# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "insulin",
color = "group", palette = c("#00AFBB", "#E7B800"),
order = c("before", "after"),
ylab = "insulin", xlab = "Groups")
Figure 5. Boxplot for before and after treatment

Assumptions of paired sample t-test

  • Samples should be dependent
  • Sample size should be large
  • Samples should be normally distributed.

In our case:

  1. Samples are dependent since the data have been collected from measuring twice the insulin level of the same individual.
  2. Sample size is small, because n < 30. Since the sample size is not large enough (less than 30), we need to check whether the differences of the pairs follow a normal distribution.

That means we should look for normality distance(d) when sample is smaller than 30. (distance means difference between before and after)

Let print distance.

# compute the difference
d <- my_data$insulin[my_data$group == "before"] - my_data$insulin[my_data$group == "after"]
print(d)## [1] -192.8 -202.3 -152.4 -180.0 -192.6 -231.0 -249.8 -198.4 -187.1 -158.5

Ho: Distance is normally distributed

H1: Distance is not normally distributed

shapiro.test(d)## 
## Shapiro-Wilk normality test
##
## data: d
## W = 0.94536, p-value = 0.6141
  • Since p-value is bigger than the significance level of 0.05, so distance is normally distributed.

Ho: μ(after treatment)=μ(before treatment)

H1: μ(after treatment)μ(before treatment)

# Compute t-test
# we write paired=T
t.test(insulin ~ group, data = my_data, paired = TRUE)##
## Paired t-test
##
## data: weight by group
## t = 20.883, df = 9, p-value = 6.2e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 173.4219 215.5581
## sample estimates:
## mean of the differences
## 194.49

As seen that p-value is smaller than the significance level of 0.05, so we reject the null hypothesis. That means insulin level of patients before treatment and after treatment are not equal. Hence, treatment is useful.

ONE WAY ANOVA

The one-way analysis of variance (ANOVA) is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable).

Assumptions:

  • The observations are obtained independently and randomly from the population defined by the factor levels
  • The data of each factor level are normally distributed.
  • These normal populations have a common variance. (Levene’s test or bartlett test can be used to check this.)

Let test if there is significant difference between mean of the all cut groups’ prices. Remember there is 5 cut groups, so we compare more than 2 means. That is why we uses one-way ANOVA.

Ho: μ1=μ2=μ3=μ4=μ5

H1: At least one of them is different

library(ggplot2)
ggplot(diamonds, aes(x=price, y=cut, fill=cut)) + geom_boxplot(outlier.colour="red", outlier.shape=18,
outlier.size=3, notch=F)+labs(title = "Boxplot of price by cut") + scale_fill_brewer(palette = "BuPu")
Figure 6. Boxplot of price by cut

The R function aov() is used to apply ANOVA. The function summary.aov() is used to summarize the analysis of variance model.

anova<-aov(price ~ cut, data = diamonds)
summary(anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## cut 4 1.104e+10 2.760e+09 175.7 <2e-16 ***
## Residuals 53935 8.474e+11 1.571e+07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups highlighted with “*” in the model summary.

Multiple pairwise-comparison between the means of groups

In one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different.

It’s possible to perform multiple pairwise-comparison, to determine if the mean difference between specific pairs of group are statistically significant.

Tukey multiple pairwise-comparisons

As the ANOVA test is significant, we can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for performing multiple pairwise-comparison between the means of groups. (If anova test is not significant, there is no need to perform pair-wise comparison)

TukeyHSD(anova)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = price ~ cut, data = diamonds)
##
## $cut
## diff lwr upr p adj
## Good-Fair -429.89331 -740.44880 -119.3378 0.0014980
## Very Good-Fair -376.99787 -663.86215 -90.1336 0.0031094
## Premium-Fair 225.49994 -59.26664 510.2665 0.1950425
## Ideal-Fair -901.21579 -1180.57139 -621.8602 0.0000000
## Very Good-Good 52.89544 -130.15186 235.9427 0.9341158
## Premium-Good 655.39325 475.65120 835.1353 0.0000000
## Ideal-Good -471.32248 -642.36268 -300.2823 0.0000000
## Premium-Very Good 602.49781 467.76249 737.2331 0.0000000
## Ideal-Very Good -524.21792 -647.10467 -401.3312 0.0000000
## Ideal-Premium -1126.71573 -1244.62267 -1008.8088 0.0000000
  • diff: difference between means of the two groups
  • lwr, upr: the lower and the upper end point of the confidence interval at 95% (default)
  • p adj: p-value after adjustment for the multiple comparisons.

We can say that the means of groups with p-values below 0.05 are significantly different.

#we can use also this function to performn pairwise comparison
pairwise.t.test(diamonds$price, diamonds$cut,
p.adjust.method = "BH")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: diamonds$price and diamonds$cut
##
## Fair Good Very Good Premium
## Good 0.00023 - - -
## Very Good 0.00042 0.43055 - -
## Premium 0.03419 < 2e-16 < 2e-16 -
## Ideal < 2e-16 9.5e-14 < 2e-16 < 2e-16
##
## P value adjustment method: BH

Check ANOVA assumptions for test validity

The ANOVA test assumes that, the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.

Homogeneity of variances

# 1. Homogeneity of variances
plot(anova, 1)
Figure 7. Residuals vs Fitted values
  • For equal variances, the points in the plot should be around the zero line, but in our case the equal variance assumption seems to be broken. Let’s check this result with the test. We used the Bartlett test to test the homogeneity of variance in the t-tests, this time let’s use the Levene test.

Levene's test is less sensitive to deviations from the Normal distribution. In other words, in cases where normality is not provided, it is more accurate to use the Levenes test instead of the Bartlett test.

Ho: The variance across groups is equal.

H1: The variance across groups is statistically significantly different.

library(car)
leveneTest(price ~ cut, data = diamonds)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 123.6 < 2.2e-16 ***
## 53935
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
bartlett.test(price ~ cut, data = diamonds)
##
## Bartlett test of homogeneity of variances
##
## data: price by cut
## Bartlett's K-squared = 406.7, df = 4, p-value < 2.2e-16

From the output above, we can see that the p-value is less than the significance level of 0.05. This means that there is an evidence to suggest that the variance across groups is significantly different. We reach the same result in the Bartlett test.

ANOVA test with no assumption of equal variances

An alternative procedure (Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().

oneway.test(price ~ cut, data = diamonds)
##
## One-way analysis of means (not assuming equal variances)
##
## data: price and cut
## F = 166.04, num df = 4.0, denom df = 9398.6, p-value < 2.2e-16
pairwise.t.test(diamonds$price, diamonds$cut,
p.adjust.method = "BH", pool.sd = FALSE)
##
## Pairwise comparisons using t tests with non-pooled SD
##
## data: diamonds$price and diamonds$cut
##
## Fair Good Very Good Premium
## Good 4.5e-05 - - -
## Very Good 0.00011 0.40560 - -
## Premium 0.02122 < 2e-16 < 2e-16 -
## Ideal < 2e-16 1.7e-15 < 2e-16 < 2e-16
##
## P value adjustment method: BH

NORMALITY

  • Normality plot of residuals. In the plot below, the quantiles of the residuals are plotted against the quantiles of the normal distribution. A 45-degree reference line is also plotted.
  • The normal probability plot of residuals is used to check the assumption that the residuals are normally distributed. It should approximately follow a straight line.
plot(anova,2)
Figure 8. Normal Q-Q plot of residuals
aov_residuals <- residuals(object = anova )
jarque.bera.test(x = aov_residuals )
##
## Jarque Bera Test
##
## data: aov_residuals
## X-squared = 34438, df = 2, p-value < 2.2e-16
  • Normality assumption is also not met. In such a cases, we should use non-parametric version of the one-way anova which is “Kruskal-Wallis rank sum test”. (Mentioned in Non-parametric tests section)

TWO-WAY ANOVA

Two-way ANOVA test is used to evaluate simultaneously the effect of two grouping variables (A and B) on a response variable. The grouping variables are also known as factors. The different categories (groups) of a factor are called levels. The number of levels can vary between factors.

  • When the sample sizes within cells are equal, we have the so-called balanced design. In this case the standard two-way ANOVA test can be applied.
  • When the sample sizes within each level of the independent variables are not the same (case of unbalanced designs), the ANOVA test should be handled differently.
Figure 9. Representation of two-way anova

Assumptions of two-way ANOVA test

Two-way ANOVA, like all ANOVA tests, assumes that the observations within each cell are normally distributed and have equal variances.

Two-way ANOVA test hypotheses

Ho:There is no difference in the means of factor AHoThere is no difference in means of factor BHo:There is no interaction between factors A and B

As an example we group price of diamonds by their color and cut, and we test 2 things which are:

  • If there is difference in means of cut group
  • If there is difference in means of color group
ggplot(diamonds, aes(x=price, y=cut, fill=color)) + geom_boxplot(outlier.colour="red", outlier.shape=18,
outlier.size=0.3, notch=F)+labs(title = "Boxplot of price by cut and color") + scale_fill_brewer(palette = "BuPu")
Figure 10. Price by cut and color

We want to know if price depends on cut and color.

res.aov2 <- aov(price ~ cut +color , data =diamonds)
summary(res.aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## cut 4 1.104e+10 2.760e+09 181.1 <2e-16 ***
## color 6 2.551e+10 4.251e+09 278.9 <2e-16 ***
## Residuals 53929 8.219e+11 1.524e+07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA table we can conclude that both cut and colors are statistically significant. Let include interaction between cut and color in the model.

# Two-way ANOVA with interaction effect
res.aov3 <- aov(price ~ cut*color , data =diamonds)
summary(res.aov3)
## Df Sum Sq Mean Sq F value Pr(>F)
## cut 4 1.104e+10 2.760e+09 181.405 <2e-16 ***
## color 6 2.551e+10 4.251e+09 279.371 <2e-16 ***
## cut:color 24 1.653e+09 6.889e+07 4.527 1e-12 ***
## Residuals 53905 8.203e+11 1.522e+07
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA results, you can conclude the following, based on the p-values and a significance level of 0.05:

  • the p-value of cut is ~0(significant), which indicates that the levels of cut are associated with significant different price.
  • the p-value of color is < 2e-16 (significant), which indicates that colors are associated with significant different price.
  • the p-value for the interaction between cut*color is ~0 (significant), which indicates that the relationships between color and price depends on the cut.
#multiple comparison like in the ANOVA
TukeyHSD(res.aov2)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = price ~ cut + color, data = diamonds)
##
## $cut
## diff lwr upr p adj
## Good-Fair -429.89331 -735.75636 -124.03026 0.0011913
## Very Good-Fair -376.99787 -659.52768 -94.46806 0.0025302
## Premium-Fair 225.49994 -54.96387 505.96375 0.1821643
## Ideal-Fair -901.21579 -1176.35038 -626.08121 0.0000000
## Very Good-Good 52.89544 -127.38604 233.17692 0.9305749
## Premium-Good 655.39325 478.36707 832.41943 0.0000000
## Ideal-Good -471.32248 -639.77829 -302.86667 0.0000000
## Premium-Very Good 602.49781 469.79831 735.19731 0.0000000
## Ideal-Very Good -524.21792 -645.24787 -403.18797 0.0000000
## Ideal-Premium -1126.71573 -1242.84112 -1010.59035 0.0000000
##
## $color
## diff lwr upr p adj
## E-D -104.4894 -286.36133 77.38248 0.6201928
## F-D 537.8278 354.96509 720.69052 0.0000000
## G-D 820.6730 643.79152 997.55443 0.0000000
## H-D 1260.0195 1071.58200 1448.45698 0.0000000
## I-D 1885.6973 1675.96202 2095.43256 0.0000000
## J-D 2064.7478 1806.41658 2323.07904 0.0000000
## F-E 642.3172 476.76688 807.86758 0.0000000
## G-E 925.1624 766.24356 1084.08123 0.0000000
## H-E 1364.5089 1192.82072 1536.19711 0.0000000
## I-E 1990.1867 1795.36107 2185.01235 0.0000000
## J-E 2169.2372 1922.85710 2415.61737 0.0000000
## G-F 282.8452 122.79337 442.89696 0.0000039
## H-F 722.1917 549.45426 894.92911 0.0000000
## I-F 1347.8695 1152.11859 1543.62038 0.0000000
## J-F 1526.9200 1279.80758 1774.03243 0.0000000
## H-G 439.3465 272.95393 605.73911 0.0000000
## I-G 1065.0243 874.84890 1255.19973 0.0000000
## J-G 1244.0748 1001.35519 1486.79449 0.0000000
## I-H 625.6778 424.70932 826.64627 0.0000000
## J-H 804.7283 553.46259 1055.99405 0.0000000
## J-I 179.0505 -88.55864 446.65968 0.4322041

Check ANOVA assumptions: test validity

ANOVA assumes that the data are normally distributed and the variance across groups are homogeneous. We can check that with some diagnostic plots.

Check the homogeneity of variance assumption

# 1. Homogeneity of variances
plot(res.aov3, 1)
Figure 11. Residuals vs Fitted values
library(car)
leveneTest(price ~ cut*color , data =diamonds)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 34 50.915 < 2.2e-16 ***
## 53905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • Since p-value of the levene test is smaller than the significance level of 0.05, null hypothesis is rejceted. Hence, variances are not equal.

Normality

# 2. Normality
plot(res.aov3, 2)
Figure 12. Normal Q-Q plot
# Extract the residuals
aov_residuals <- residuals(object = res.aov3)
# Run Shapiro-Wilk test
jarque.bera.test(x = aov_residuals )
##
## Jarque Bera Test
##
## data: aov_residuals
## X-squared = 33271, df = 2, p-value < 2.2e-16
  • The conclusion above, is supported by the jarque.bera test on the ANOVA residuals (X-squared = 33271, p= 2.2e-16) which finds indication that normality is violated.

Compute two-way ANOVA test in R for unbalanced designs

An unbalanced design includes an unequal number of subjects in each group. When we group the 2 factors (cut and color) we use in two-way ANOVA, let’s see how many observations there are in each cell.

table(diamonds$cut,diamonds$color)##            
## D E F G H I J
## Fair 163 224 312 314 303 175 119
## Good 662 933 909 871 702 522 307
## Very Good 1513 2400 2164 2299 1824 1204 678
## Premium 1603 2337 2331 2924 2360 1428 808
## Ideal 2834 3903 3826 4884 3115 2093 896
  • As seen in each cell, there is not equal number of observations. Thus, we have unbalanced design.
  • To set up an unbalanced design in two-way ANOVA, we need to write Anova(type = “III”).
library(car)
my_anova <- aov(price ~ cut*color , data =diamonds)
Anova(my_anova, type = "III")
## Anova Table (Type III tests)
##
## Response: price
## Sum Sq Df F value Pr(>F)
## (Intercept) 3.7606e+11 1 24713.0009 < 2.2e-16 ***
## cut 8.7891e+09 4 144.3969 < 2.2e-16 ***
## color 9.4602e+09 6 103.6142 < 2.2e-16 ***
## cut:color 1.6535e+09 24 4.5274 1.001e-12 ***
## Residuals 8.2027e+11 53905
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

MANOVA

In the situation where there multiple response variables you can test them simultaneously using a multivariate analysis of variance (MANOVA). (we have two continuous dependent variable and one independent categoric variable)

Assumptions of MANOVA

MANOVA can be used in certain conditions:

  • The dependent variables should be normally distribute within groups. The R function mshapiro.test( )[in the mvnormtest package] can be used to perform the Shapiro-Wilk test for multivariate normality. This is useful in the case of MANOVA, which assumes multivariate normality.
  • Homogeneity of variances across the range of predictors.
  • Linearity between all pairs of dependent variables, all pairs of covariates, and all dependent variable-covariate pairs in each cell

Hypotheses:

Ho: Treatment variable does not have significant effect on two continuous variable

We want to know if there is significant difference price and carat when they grouped by cut.

# MANOVA test
res.man <- manova(cbind(price, carat) ~ cut, data = diamonds)
summary(res.man)
## Df Pillai approx F num Df den Df Pr(>F)
## cut 4 0.081688 574.18 8 107870 < 2.2e-16 ***
## Residuals 53935
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • we see that p-value is smaller than the significance level of 0.05. That means, cut(treatment) has significance effect on price and carat of diamonds.
# Look to see which differ
summary.aov(res.man)
## Response price :
## Df Sum Sq Mean Sq F value Pr(>F)
## cut 4 1.1042e+10 2760436340 175.69 < 2.2e-16 ***
## Residuals 53935 8.4743e+11 15712087
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response carat :
## Df Sum Sq Mean Sq F value Pr(>F)
## cut 4 429.7 107.435 495.69 < 2.2e-16 ***
## Residuals 53935 11689.6 0.217
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1996

Check multivariate normality:

library(MVN)
result <- mvn(data = diamonds2[,c(1,7)], mvnTest = "royston")
result$multivariateNormality
## Test H p value MVN
## 1 Royston 251.0394 6.571757e-56 NO
# create univariate Q-Q plots
result <- mvn(data = diamonds2[,c(1,7)], mvnTest = "royston", univariatePlot = "qqplot")
Figure 13. Multivariate normal Q-Q plot
  • We see that MVN= No, so multivariate normality is not met.

LATIN SQUARE DESIGN

The Latin square design is used where the researcher desires to control the variation in an experiment that is related to rows and columns in the field.

Assumptions:

  • Treatments are assigned at random within rows and columns, with each treatment once per row and once per column.
  • There are equal numbers of rows, columns, and treatments.
  • Useful where the experimenter desires to control variation in two different directions

The formula used for this kind of three-way ANOVA are:

For Example:

  • we have 3 categorical variable(row effect, column effect and treatment effect), and 1 continuous varible(target).
#create new datasetfertil <- c(rep("fertil1",1), rep("fertil2",1), rep("fertil3",1), rep("fertil4",1), rep("fertil5",1))
treat <- c(rep("treatA",5), rep("treatB",5), rep("treatC",5), rep("treatD",5), rep("treatE",5))
seed <- c("A","E","C","B","D", "C","B","A","D","E", "B","C","D","E","A", "D","A","E","C","B", "E","D","B","A","C")
freq <- c(42,45,41,56,47, 47,54,46,52,49, 55,52,57,49,45, 51,44,47,50,54, 44,50,48,43,46)

mydata <- data.frame(treat, fertil, seed, freq)
kbl(head(mydata), caption="Dataset", booktabs = T) %>% kable_styling(latex_options = c("striped", "hold_position","scale_down"),font_size=12, stripe_index = c(1,3,5,7))
#visualize data
mydata %>%
ggplot(aes(x= freq, y=treat, fill=fertil)) +
geom_boxplot() +
geom_jitter(width=0.1,alpha=0.2) +
xlab("freq")+
facet_wrap(~seed) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 14. Boxplot for fertilizer data

Hypothesis:

Ho: Row has no effect on Response

Ho: Column has no effect on Response

Ho: No Treatment effect on Response

myfit <- lm(freq ~ fertil+treat+seed, mydata)
anova(myfit)
## Analysis of Variance Table
##
## Response: freq
## Df Sum Sq Mean Sq F value Pr(>F)
## fertil 4 17.76 4.440 0.7967 0.549839
## treat 4 109.36 27.340 4.9055 0.014105 *
## seed 4 286.16 71.540 12.8361 0.000271 ***
## Residuals 12 66.88 5.573
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • The difference between group considering the fertilizer is not significant (p-value > 0.1).
  • The difference between group considering the tillage is quite significant (p-value < 0.05).
  • The difference between group considering the seed is very significant (p-value < 0.001).

NON- PARAMETRIC TESTS IN R

When the assumptions of the parametric t-test, non-parametric version of the tests are used. The common assumptions in nonparametric tests are randomness and independence. The chi-square test is one of the nonparametric tests for testing three types of statistical tests: the goodness of fit, independence, and homogeneity. In nonparametric analysis, the Mann-Whitney U test is used for comparing two groups of cases on one variable. The Kruskal-Wallis test is considered as an alternative test to the parametric one-way analysis of variance (ANOVA) for comparing more than two groups on one variable. The Wilcoxon Signed-Rank test is an alternative test to the parametric “Paired-samples T-Test” to test the statistical differences in the mean between two related/dependent random samples.

Sign Test

Sign test is used to test if the median of the sample is equal to specific value or not. Remember for diamonds dataset normality assumptions are not met, so let apply same test using non-parametric versions.

Let’s test with the sign test whether the median of the price of diamonds is equal to 3900 or not.

Ho: Median=3900

H1: Median ≠3900

library(BSDA)
SIGN.test(diamonds$price, md = 3500)
##
## One-sample Sign-Test
##
## data: diamonds$price
## s = 21440, p-value < 2.2e-16
## alternative hypothesis: true median is not equal to 3500
## 95 percent confidence interval:
## 2376 2441
## sample estimates:
## median of x
## 2401
##
## Achieved and Interpolated Confidence Intervals:
##
## Conf.Level L.E.pt U.E.pt
## Lower Achieved CI 0.9499 2376 2441
## Interpolated CI 0.9500 2376 2441
## Upper Achieved CI 0.9509 2376 2441
  • As seen median of the priced of the diamonds are not 3500, it is 2400.

Wilcoxon rank sum test

The ‘Wilcoxon Rank Sum test’ (also called the ‘Mann-Whitney test’) is a distribution-free alternative to the t-test (doesn’t assume normality) and is used to test the hypothesis that the distributions in two groups have the same median.

Ho: Median1=Median2

H1: Median1≠Median2

# independent 2-group Mann-Whitney U Test
wilcox.test(diamonds_cut$price~diamonds_cut$cut)
##
## Wilcoxon rank sum test with continuity correction
##
## data: diamonds_cut$price by diamonds_cut$cut
## W = 174286667, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
# where y is numeric and A is A binary factor
  • As can be seen, the actual position shift is not equal to 0 (p-value=0). That means, the difference between the medians is not 0, so the medians are not equal. We reject the null hypothesis because the P value is close to 0. As you can see, the median of diamond prices is 2400, not 3500. We also see the upper and lower confidence intervals for diamond prices. The confidence interval for the median is (2376, 2441).

Wilcoxon Signed Rank Test

The non-parametric equivalent of the t-test for matched pairs (dependent samples) is the ‘Wilcoxon signed rank test’.

Ho: Median (before)=Median (after)

H1: Median (before) ≠ Median (after)

# dependent 2-group Wilcoxon Signed Rank Test 
wilcox.test(insulin ~ group, data = my_data, paired=TRUE)
##
## Wilcoxon signed rank exact test
##
## data: weight by group
## V = 55, p-value = 0.001953
## alternative hypothesis: true location shift is not equal to 0
  • As seen medians of the before-after is not equal (like in the means).

Note: We know that outliers in the dataset, break the normality and “mean” of the population is heavily influenced by outliers. However, median is not affected by outliers. That is why in non-parametric tests, medians are compared and tested.

Kruskal-Wallis rank sum test

The Kruskal-Wallis H test (sometimes called “one-way ANOVA on ranks”) is a rank-based nonparametric test that can be used to determine whether there are statistically significant differences between two or more groups of an independent variable. It is considered a non-parametric alternative to one-way ANOVA and an extension of the Mann-Whitney U test to allow comparison of more than two independent groups.

Assumptions

  • 1: Your dependent variable should be measured at the ordinal or continuous level.
  • 2: Your argument must consist of two or more categorical, independent groups.
  • 3: You must be independent of the observations, i.e. there should be no relationship between the observations in each group or between the groups themselves

Because the Kruskal-Wallis H test does not assume normality in the data and is much less sensitive to outliers, it can be used when these assumptions are violated and the use of one-way ANOVA is inappropriate. . Additionally, if your data is ordinal, one-way ANOVA is not appropriate, but the Kruskal-Wallis H test is. However, the Kruskal-Wallis H test comes with an additional assumption. This assumption;

  • 4: To know how to interpret the results from a Kruskal-Wallis H test, you must determine whether the distributions in each group have the same shape (ie have the same variability).

Briefly;

Assumptions

  • Observations are independent within and between samples.
  • The studied variable is continuous
  • Populations are the same with respect to the median.

The test statistic is distributed approximately as chi-square with (k-1) degrees of freedom.

Ho: Median of the all groups are equal (Ho: M1=M2=M3=M4=M5)

H1: At least one of the medians is different

kruskal.test(price ~ cut, data=diamonds)## 
## Kruskal-Wallis rank sum test
##
## data: price by cut
## Kruskal-Wallis chi-squared = 978.62, df = 4, p-value < 2.2e-16
  • As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the treatment groups.

Multiple pairwise-comparison between groups

  • From the output of the Kruskal-Wallis test, we know that there is a significant difference between groups, but we don’t know which pairs of groups are different.
  • It’s possible to use the function pairwise.wilcox.test() to calculate pairwise comparisons between group levels with corrections for multiple testing.
pairwise.wilcox.test(diamonds$price, diamonds$cut,
p.adjust.method = "BH")
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: diamonds$price and diamonds$cut
##
## Fair Good Very Good Premium
## Good 7.9e-14 - - -
## Very Good < 2e-16 0.023 - -
## Premium 1.5e-05 1.6e-12 < 2e-16 -
## Ideal < 2e-16 < 2e-16 < 2e-16 < 2e-16
##
## P value adjustment method: BH

FRIEDMAN TEST

The Friedman test is a non-parametric alternative to the one-way repeated measures ANOVA test. It extends the Sign test in the situation where there are more than two groups to compare.

Friedman test is used to assess whether there are any statistically significant differences between the distributions of three or more paired groups. It’s recommended when the normality assumptions of the one-way repeated measures ANOVA test is not met or when the dependent variable is measured on an ordinal scale.

Assumptions:

  • Two-way data arranged in an unreplicated complete block design
  • Dependent variable is ordinal, interval, or ratio
  • Treatment or group independent variable is a factor with two or more levels. That is, two or more groups
  • Blocking variable is a factor with two or more levels
  • Blocks are independent of each other and have no interaction with treatments
  • In order to be a test of medians, the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread. Otherwise the test is a test of distributions.

Hypotheses: If the distribution of the differences in scores between each pair of groups are all symmetrical, or the distributions of values for each group have similar shape and spread:

Null hypothesis: The medians of values for each group are equal.

Alternative hypothesis (two-sided): The medians of values for each group are not equal.

Post-hoc tests: The outcome of the Friedman test tells you if there are differences among the groups, but doesn’t tell you which groups are different from other groups. In order to determine which groups are different from others, post-hoc testing can be conducted.

friedman.test(price~ cut|color, diamonds)
Friedman rank sum test
Friedman chi-squared = 23.139, df = 4, p-value = 0.0001188

TESTING POPULATION VARIANCE

One-sample test of variance

We estimate the variance. Using the chi-square test, where the variance is equal to a user-specified value, we test the null hypothesis and construct a confidence interval for the variance.

Let’s test if the variance of the diamond price is equal to 90.

Ho: σ2 = 90

H1: σ2 ≠ 90

library(EnvStats)varTest(diamonds$price ,alternative = "two.sided", conf.level = 0.95,
sigma.squared = 90)
##
## Chi-Squared Test on Variance
##
## data: diamonds$price
## Chi-Squared = 9538590395, df = 53939, p-value < 2.2e-16
## alternative hypothesis: true variance is not equal to 90
## 95 percent confidence interval:
## 15727376 16107299
## sample estimates:
## variance
## 15915629
  • As can be seen, the sample estimate for the variance is 15915629, not 90. (p-value=0)

Two-Sample test for sample variances

When should you use the F-test?

Comparing two variances is useful in many situations, including:

  • When we want to run a two-sample t-test to check the equality of variances of two samples.
  • When we want to compare the variability of a new measurement method with an old measurement method. To test if the new method reduces the variability of the measure.
  • Note, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.
  • Note that the F test requires a normal distribution of two samples. (we already checked normality for these variables)

Let test whether the variances of the ideal cut price and the premium cut are equal.

Ho: σ2 (ideal cut)=σ2 (premium cut)

H1: σ2 (ideal cut) ≠ σ2 (premium cut)

var.test(price~cut, diamonds_cut, alternative = "two.sided",alpha=0.05)## 
## F test to compare two variances
##
## data: price by cut
## F = 1.3042, num df = 13790, denom df = 21550, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.265403 1.344263
## sample estimates:
## ratio of variances
## 1.30417
  • As can be seen, the variance ratios are greater than 1, which means that the variances are not equal. (also specified at p-value=0)

IMPORTANT NOTE

There are different types of tests that can be used to evaluate the equality of variances.

  1. F-test: It is used to compare two groups of variance. The data should be normally distributed.
  2. Bartlett test: It is used for two or more groups of variance comparison. The data should be normally distributed.
  3. Levene’s test: It is an alternative to Bartlett’s test for non-normally distributed data.
  4. Figner-Killeen test: It is a non-parametric test for abnormal data.

We conclude that the nonparametric version of the F test is the levene test. We have already shown how to do the Levene test.

library(car)
leveneTest(price ~ cut, data = diamonds_cut)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 424.97 < 2.2e-16 ***
## 35340
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the above output, we can see that the p-value is less than the 0.05 significance level. This means that there is evidence to suggest that the variance between groups is significantly different.

TESTING POPULATION PROPORTION

“prop.test” can be used to test the hypothesis that the odds (probability of success) in several groups are the same or equal to certain given values. Our main goal is to find out if there is a difference between the sample rate p̂ and the claimed value of the population ratio $p_0 $. To perform this test, the sampling distribution of the sample rate must be approximately normally distributed, with its mean p̂ and its standard deviation:

hjklş

Assumptions:

The following 3 conditions must be met:

  • The sample should be obtained by a simple random sampling process. (randomness)
  • n* p * (1 — p) ≥ 10
  • n ≤ 0.05*N where n is the sample size and N is the population size.

One- sample proportion test

Let test if the ratio of ideally cut diamonds is equal to 0.1. First, let’s see the number of all cuts and the ideal cut.

Ho: p=0.1

H1: p≠0.1

table(diamonds$cut)## 
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551

Syntax for the proportion test; is prop.test(x, n , p, alternative). X is the number of categories tested, n is the number of all samples, p is the rate tested on the null hypothesis.

In our case;

  • x: number of ideal cut (21551)
  • n: sum of all numbers (1610 +4906 +12082+13791+21551 = 53940)
  • p: 0.1
prop.test(x = 21551, n = 53940, p = 0.1, correct = FALSE,alternative = "two.sided")## 
## 1-sample proportions test without continuity correction
##
## data: 21551 out of 53940, null probability 0.1
## X-squared = 53773, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.1
## 95 percent confidence interval:
## 0.3954104 0.4036770
## sample estimates:
## p
## 0.3995365
  • We see that the ideal cut ratio is 0.4, not 0.1.

Comparison of proportion of two samples

For statistical purposes, we can compare two populations or groups when the variable is categorical (smokers/non-smoker, Democrat/Republican, support/opposes an opinion, etc.) or the proportion of individuals with a particular trait (such as the proportion of smokers).

In order to make this comparison, an independent (separate) random sample must be selected from each population.

Note that the Z-statistics formula is valid only when the sample size (n) is large enough.

Situation for Large sample sizes

Let’s test whether there is a significant difference between the ideal and premium cut diamond rates. In the first part, we see the numbers of these categories.

Ho: p1=p2

H1: p1≠p2

prop.test(x = c( 13791, 21551 ), n =c(53940, 53940),
alternative = "two.sided")
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(13791, 21551) out of c(53940, 53940)
## X-squared = 2533.4, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1494173 -0.1383098
## sample estimates:
## prop 1 prop 2
## 0.2556730 0.3995365
  • We see that the premium cut rate is 0.26, and the ideal cut rate is 0.40. Therefore, the ratios of these two samples are different. (p=0)

SUMMARY

--

--

İrem Tanrıverdi
Analytics Vidhya

Research and Teaching Assistant. MSc in Statistics. Interested in programming, machine learning and artificial inteligence.