Analysis of Variance
“He uses statistics as a drunken man uses lamp posts—for support rather than for illumination.”
— Marissa Mayer
Whole big books have been written about Analysis of Variance (ANOVA). Although there are many ANOVA experimental designs available, biologists are taught to pay special attention to the design of experiments, and generally make sure that the experiments are fully factorial (in the case of two-way or higher ANOVAs) and balanced. For this reason we will focus in this Introductory Statistics course on one-way and factorial ANOVAs only.
As t-tests, ANOVAs require that some assumptions are met:
- Normally distributed data
- Homogeneity of variances
- Independence of data
- In our case, we will encourage also that the data are balanced
If some of the above assumptions are violated, then your course of action is either to transform the data (if non-normal) or to use a generalised linear model (also when non-normal), or to use a linear mixed model (when the assumption on non-independence cannot be guaranteed). We will get to some of these methods in later chapters. Linked to the above, ANOVAs are also sensitive to the presence of outliers (see our earlier discussion about the mean and how it differs from the median), so we need to ensure that outliers are not present (they can be removed, and there are many ways of finding them and eliminating them). If outliers are an important feature of the data, then a non-parametric test can be used, or some other test that works well with extreme values can be applied.
Rather than talking about t-tests and ANOVAs as separate things, let us acknowledge that they are similar ways of asking the same question. That question being, are the means of these two or more things we want to compare different, or the same? At this stage it is important to note that the independent variable is categorical (i.e. a factor denoting two or more different treatments or sampling conditions) and that the dependent variable is continuous. You may perhaps be more familiar with this question when it is presented as a set of hypotheses.
H0: Group A is not different from group B.
H1: Group A is different from group B.
This is a scientific question in the simplest sense. Often, for basic inquiries such as that posed above, we need to see if one group differs significantly from another. The way in which we accomplish this is by looking at the mean and variance within a set of data compared against another similar set. In order to do so appropriately however we need to first assume that both sets of data are normally distributed, and that the variance found within each set of data is similar. These are the two primary assumptions we learned about in the Chapter on t-tests, and if they are met then we may use parametric tests. We will learn in the Chapter on Transformations what we can do if these assumptions are not meant and we cannot adequately transform our data, meaning we will need to use non-parametric tests.
Remember the t-test
As you know, a t-test is used when we want to compare two
different sample sets against one another. This is also known as a
two-factor or two level test. When one wants to compare multiple (more
than two) sample sets against one another an ANOVA is required (see
below). Remember how to perform a t-test in R: we will revisit
this test using the chicks
data, but only for Diets 1 and 2
from day 21.
# First grab the data
<- as_tibble(ChickWeight)
chicks
# Then subset out only the sample sets to be compared
<- chicks %>%
chicks_sub filter(Diet %in% c(1, 2), Time == 21)
Once we have filtered our data we may now perform the
t-test. Traditionally this would be performed with
t.test()
, but recent developments in R have made any
testing for the comparison of means more convenient by wrapping
everything up into the one single function compare_means()
.
We may use only this one single function for many of the tests we will
perform in this chapter as well as the Chapter on t-tests.
To use compare_means()
for a t-test we must simply
specify this in the method
argument, as seen below:
compare_means(weight ~ Diet, data = chicks_sub, method = "t.test")
R> # A tibble: 1 × 8
R> .y. group1 group2 p p.adj p.format p.signif method
R> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
R> 1 weight 1 2 0.218 0.22 0.22 ns T-test
Unfortunately, the means are not provided as part of the outcome, but fret not! The means are:
%>%
chicks_sub group_by(Diet) %>%
summarise(mean_mass = mean(weight))
R> # A tibble: 2 × 2
R> Diet mean_mass
R> <fct> <dbl>
R> 1 1 178.
R> 2 2 215.
Contrast the output with that of the atandard t-test
function, t.test()
:
t.test(weight ~ Diet, data = chicks_sub)
R>
R> Welch Two Sample t-test
R>
R> data: weight by Diet
R> t = -1.2857, df = 15.325, p-value = 0.2176
R> alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
R> 95 percent confidence interval:
R> -98.09263 24.19263
R> sample estimates:
R> mean in group 1 mean in group 2
R> 177.75 214.70
The choice of whether to use compare_means()
or
t.test()
is yours.
Note: If you do decide to use the
compare_means()
function, please make sure that you know exactly which test is being used! This is, t-test for two means, or ANOVA for more than two means. In your Methods and Results you will want to report the name of the test being used internally withincompare_means()
, i.e.t.test()
oraov()
. Do NOT say you used thecompare_means()
function – rather, know what it does!
As one may recall from the previous chapter, whenever we want to give
a formula to a function in R, we use the ~
. The formula
used above, weight ~ Diet
, reads in plain English as
“weight as a function of diet”. This is perhaps easier to understand as
“Y as a function of X”. This means that we are assuming whatever is to
the left of the ~
is the dependant variable, and whatever
is to the right is the independent variable. We then tell
compare_means()
to run a t-test on our
chicks_sub
dataframe and it does the rest. We see in the
output above that this function gives us a rather tidy read-out of the
information we require to test a potential hypothesis. Let’s take a
moment to look through the help file for this function and see what all
of this means. Did the Diet 1 and 2 produce significantly fatter
birds?
One could also supplement the output by producing a graph:
library(ggstatsplot)
## since the confidence intervals for the effect sizes are computed using
## bootstrapping, important to set a seed for reproducibility
set.seed(13)
## parametric t-test and box plot
<- ggbetweenstats(
p1 data = chicks_sub,
x = Diet,
y = weight,
xlab = "Diet",
ylab = "Chick mass (g)",
plot.type = "box",
type = "p",
results.subtitle = FALSE,
conf.level = 0.95,
title = "t-test",
package = "ggsci",
palette = "nrc_npg"
) p1
Notice above that we did not need to specify to use a
t-test. The ggbetweenstats()
function
automatically determines if an independent samples t-test or a
1-way ANOVA is required based on whether there are two groups or three
or more groups within the grouping (factor) variable.
ANOVA
In the chicks
data we have four diets, not only two as
in the t-test example just performed. Why not then simply do a
t-test multiple times, once for each pair of diets given to the
chickens? The problem is that the chances of committing a Type I error
increases as more multiple comparisons are done. So, the overall chance
of rejecting the null hypothesis increases. Why? If one sets \(\alpha=0.05\) (the significance level below
which the null hypothesis is no longer accepted), one will still reject
the null hypothesis 5% of the time when it is in fact true (i.e. when
there is no difference between the groups). When many pairwise
comparisons are made, the probability of rejecting the null hypothesis
at least once is higher because we take this 5% risk each time we repeat
a t-test. In the case of the chicken diets, we would have to
perform six t-tests, and the error rate would increase to
slightly less than \(6\times5\%\). If
you insist in creating more work for yourself and do t-tests
many times, one way to overcome the problem of committing Type I errors
that stems from multiple comparisons is to apply a Bonferroni
correction.
Or better still, we do an ANOVA that controls for these Type I errors so that it remains at 5%.
A suitable null hypothesis for our chicken weight data is:
\[H_{0}:\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}\] where \(\mu_{1...4}\) are the means of the four diets.
At this point I was very tempted to put many equations here, but I ommitted them for your sake. Let us turn to some examples.
Single factor
We continue with the chicken data. The t-test showed that
Diets 1 and 2 resulted in the same chicken masses at the end of the
experiment at Day 21. What about the other two diets? Our null
hypothesis is that, at Day 21, \(\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}\). Is there
a statistical difference between chickens fed these four diets, or do we
retain the null hypothesis? The R function for an ANOVA is
aov()
. To look for significant differences between all four
diets on the last day of sampling we use this one line of code:
<- aov(weight ~ Diet, data = filter(chicks, Time == 21))
chicks.aov1 summary(chicks.aov1)
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 57164 19055 4.655 0.00686 **
R> Residuals 41 167839 4094
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Task 1: What does the outcome say about the chicken masses? Which ones are different from each other?
Task 2: Devise a graphical display of this outcome.
If this seems too easy to be true, it’s because we aren’t quite done
yet. You could use your graphical display to eyeball where the
significant differences are, or we can turn to a more ‘precise’
approach. The next step one could take is to run a Tukey HSD test on the
results of the ANOVA by wrapping tukeyHSD()
around
aov()
:
TukeyHSD(chicks.aov1)
R> Tukey multiple comparisons of means
R> 95% family-wise confidence level
R>
R> Fit: aov(formula = weight ~ Diet, data = filter(chicks, Time == 21))
R>
R> $Diet
R> diff lwr upr p adj
R> 2-1 36.95000 -32.11064 106.01064 0.4868095
R> 3-1 92.55000 23.48936 161.61064 0.0046959
R> 4-1 60.80556 -10.57710 132.18821 0.1192661
R> 3-2 55.60000 -21.01591 132.21591 0.2263918
R> 4-2 23.85556 -54.85981 102.57092 0.8486781
R> 4-3 -31.74444 -110.45981 46.97092 0.7036249
The output of tukeyHSD()
shows us that pairwise
comparisons of all of the groups we are comparing.
We may also produce a graphical summary:
set.seed(666)
## parametric t-test and box plot
<- ggbetweenstats(
p2 data = filter(chicks, Time == 21),
x = Diet,
y = weight,
xlab = "Diet",
ylab = "Chick mass (g)",
plot.type = "box",
type = "parametric",
results.subtitle = TRUE,
pairwise.comparisons = TRUE,
pairwise.display = "non-significant",
conf.level = 0.95,
title = "ANOVA",
package = "ggsci",
palette = "nrc_npg"
) p2
Task 3: Look at the help file for this function to better understand what the output means.
Task 4: How does one interpret the results? What does this tell us about the effect that that different diets has on the chicken weights at Day 21?
Task 5: Figure out a way to plot the Tukey HSD outcomes.
Task 6: Why does the ANOVA return a significant result, but the Tukey test shows that not all of the groups are significantly different from one another?
Task 7: Produce a graphical display of the Tukey HSD result.
Now that we’ve seen how to perform a single factor ANOVA, let’s watch some animations that highlight how certain aspects of our data may affect our results.
When the sample size changes:
When the mean of a sample changes:
When the variance of a sample changes:
Multiple factors
What if we have multiple grouping variables, and not just one? In the case of the chicken data, there is also time that seems to be having an effect.
Task 8: How is time having an effect?
Task 9: What hypotheses can we construct around time?
Let us look at some variations around questions concerning time. We might ask, at a particular time step, are there differences amongst the effect due to diet on chicken mass? Let’s see when diets are starting the have an effect by examining the outcomes at times 0, 2, 10, and 21:
summary(aov(weight ~ Diet, data = filter(chicks, Time %in% c(0))))
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 4.32 1.440 1.132 0.346
R> Residuals 46 58.50 1.272
summary(aov(weight ~ Diet, data = filter(chicks, Time %in% c(2))))
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 158.4 52.81 4.781 0.00555 **
R> Residuals 46 508.1 11.05
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(weight ~ Diet, data = filter(chicks, Time %in% c(10))))
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 8314 2771 6.46 0.000989 ***
R> Residuals 45 19304 429
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(weight ~ Diet, data = filter(chicks, Time %in% c(21))))
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 57164 19055 4.655 0.00686 **
R> Residuals 41 167839 4094
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Task 10: What do you conclude from the above series of ANOVAs?
Task 11: What problem is associated with running multiple tests in the way that we have done here?
Or we may ask, regardless of diet (i.e. disregarding the effect of diet by clumping all chickens together), is time having an effect?
<- aov(weight ~ as.factor(Time), data = filter(chicks, Time %in% c(0, 2, 10, 21)))
chicks.aov2 summary(chicks.aov2)
R> Df Sum Sq Mean Sq F value Pr(>F)
R> as.factor(Time) 3 939259 313086 234.8 <2e-16 ***
R> Residuals 190 253352 1333
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Task 12: What do you conclude from the above ANOVA?
Or, to save ourselves a lot of time and reduce the coding effort, we
may simply run a two-way ANOVA and look at the effects of
Diet
and Time
simultaneously. To specify the
different factors we put them in our formula and separate them with a
+
:
summary(aov(weight ~ Diet + as.factor(Time), data = filter(chicks, Time %in% c(0, 21))))
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 39595 13198 5.987 0.00091 ***
R> as.factor(Time) 1 734353 734353 333.120 < 2e-16 ***
R> Residuals 90 198402 2204
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Task 13: What question are we asking with the above line of code? What is the answer? Also, why did we wrap
Time
inas.factor()
?
It is also possible to look at what the interaction effect between
grouping variables (i.e. in this case the effect of time on diet—does
the effect of time depend on which diet we are looking at?), and not
just within the individual grouping variables. To do this we replace the
+
in our formula with *
:
summary(aov(weight ~ Diet * as.factor(Time), data = filter(chicks, Time %in% c(4, 21))))
R> Df Sum Sq Mean Sq F value Pr(>F)
R> Diet 3 40914 13638 6.968 0.000298 ***
R> as.factor(Time) 1 582221 582221 297.472 < 2e-16 ***
R> Diet:as.factor(Time) 3 25530 8510 4.348 0.006684 **
R> Residuals 86 168322 1957
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Task 14: How do these results differ from the previous set?
One may also run a post-hoc Tukey test on these results the same as for a single factor ANOVA:
TukeyHSD(aov(weight ~ Diet * as.factor(Time), data = filter(chicks, Time %in% c(20, 21))))
R> Tukey multiple comparisons of means
R> 95% family-wise confidence level
R>
R> Fit: aov(formula = weight ~ Diet * as.factor(Time), data = filter(chicks, Time %in% c(20, 21)))
R>
R> $Diet
R> diff lwr upr p adj
R> 2-1 36.18030 -9.301330 81.66194 0.1663037
R> 3-1 90.63030 45.148670 136.11194 0.0000075
R> 4-1 62.25253 15.223937 109.28111 0.0045092
R> 3-2 54.45000 3.696023 105.20398 0.0305957
R> 4-2 26.07222 -26.072532 78.21698 0.5586643
R> 4-3 -28.37778 -80.522532 23.76698 0.4863940
R>
R> $`as.factor(Time)`
R> diff lwr upr p adj
R> 21-20 8.088223 -17.44017 33.61661 0.5303164
R>
R> $`Diet:as.factor(Time)`
R> diff lwr upr p adj
R> 2:20-1:20 35.188235 -40.67378 111.050253 0.8347209
R> 3:20-1:20 88.488235 12.62622 164.350253 0.0111136
R> 4:20-1:20 63.477124 -14.99365 141.947897 0.2035951
R> 1:21-1:20 7.338235 -58.96573 73.642198 0.9999703
R> 2:21-1:20 44.288235 -31.57378 120.150253 0.6116081
R> 3:21-1:20 99.888235 24.02622 175.750253 0.0023872
R> 4:21-1:20 68.143791 -10.32698 146.614563 0.1371181
R> 3:20-2:20 53.300000 -31.82987 138.429869 0.5234263
R> 4:20-2:20 28.288889 -59.17374 115.751515 0.9723470
R> 1:21-2:20 -27.850000 -104.58503 48.885027 0.9486212
R> 2:21-2:20 9.100000 -76.02987 94.229869 0.9999766
R> 3:21-2:20 64.700000 -20.42987 149.829869 0.2732059
R> 4:21-2:20 32.955556 -54.50707 120.418182 0.9377007
R> 4:20-3:20 -25.011111 -112.47374 62.451515 0.9862822
R> 1:21-3:20 -81.150000 -157.88503 -4.414973 0.0305283
R> 2:21-3:20 -44.200000 -129.32987 40.929869 0.7402877
R> 3:21-3:20 11.400000 -73.72987 96.529869 0.9998919
R> 4:21-3:20 -20.344444 -107.80707 67.118182 0.9960548
R> 1:21-4:20 -56.138889 -135.45396 23.176184 0.3619622
R> 2:21-4:20 -19.188889 -106.65152 68.273738 0.9972631
R> 3:21-4:20 36.411111 -51.05152 123.873738 0.8984019
R> 4:21-4:20 4.666667 -85.06809 94.401428 0.9999998
R> 2:21-1:21 36.950000 -39.78503 113.685027 0.8067041
R> 3:21-1:21 92.550000 15.81497 169.285027 0.0075185
R> 4:21-1:21 60.805556 -18.50952 140.120628 0.2629945
R> 3:21-2:21 55.600000 -29.52987 140.729869 0.4679025
R> 4:21-2:21 23.855556 -63.60707 111.318182 0.9896157
R> 4:21-3:21 -31.744444 -119.20707 55.718182 0.9486128
Task 15: Yikes! That’s a massive amount of results. What does all of this mean, and why is it so verbose?
Summary: To summarise t-tests, single-factor (1-way) and multifactor (2- or 3-way, etc.) ANOVAs:
A t-test is applied to situations where one wants to compare the means of only two groups of a response variable within one categorical independent variable (we say a factor with two levels).
A 1-way ANOVA also looks at the means of a response variable belonging to one categorical independent variable, but the categorical response variable has more than two levels in it.
Following on from there, a 2-way ANOVA compares the means of response variables belonging to all the levels within two categorical independent variables (e.g. Factor 1 might have three levels, and Factor 2 five levels). In the simplest formulaton, it does so by looking at the main effects, which is the group differences between the three levels of Factor 1 and disregarding the contribution due to the group membership to Factor 2, and also the group differences amongst the levels of Factor 2 but disregarding the group membership of Factor 1. In addition to looking at the main effects, a 2-way ANOVA can also consider the interaction (or combined effect) of Factors 1 and 2 in influencing the means.
Examples
Snakes!
These data could be analysed by a two-way ANOVA without replication, or a repeated measures ANOVA. Here I will analyse it by using a two-way ANOVA without replication.
Place and Abramson (2008) placed diamondback rattlesnakes (Crotalus atrox) in a “rattlebox,” a box with a lid that would slide open and shut every 5 minutes. At first, the snake would rattle its tail each time the box opened. After a while, the snake would become habituated to the box opening and stop rattling its tail. They counted the number of box openings until a snake stopped rattling; fewer box openings means the snake was more quickly habituated. They repeated this experiment on each snake on four successive days, which is treated as an influential variable here. Place and Abramson (2008) used 10 snakes, but some of them never became habituated; to simplify this example, data from the six snakes that did become habituated on each day are used.
First, we read in the data, making sure to convert the column named
day
to a factor. Why? Because ANOVAs work with factor
independent variables, while day
as it is encoded by
default is in fact a continuous variable.
<- read_csv("../data/snakes.csv")
snakes $day = as.factor(snakes$day) snakes
The first thing we do is to create some summaries of the data. Refer to the summary statistics Chapter.
<- snakes %>%
snakes.summary group_by(day, snake) %>%
summarise(mean_openings = mean(openings),
sd_openings = sd(openings)) %>%
ungroup()
snakes.summary
R> # A tibble: 24 × 4
R> day snake mean_openings sd_openings
R> <fct> <chr> <dbl> <dbl>
R> 1 1 D1 85 NA
R> 2 1 D11 40 NA
R> 3 1 D12 65 NA
R> 4 1 D3 107 NA
R> 5 1 D5 61 NA
R> 6 1 D8 22 NA
R> 7 2 D1 58 NA
R> 8 2 D11 45 NA
R> 9 2 D12 27 NA
R> 10 2 D3 51 NA
R> # … with 14 more rows
Task 16: Something seems… off. What’s going on here? Please explain this outcome.
To fix this problem, let us ignore the grouping by both
snake
and day
.
<- snakes %>%
snakes.summary group_by(day) %>%
summarise(mean_openings = mean(openings),
sd_openings = sd(openings)) %>%
ungroup()
snakes.summary
R> # A tibble: 4 × 3
R> day mean_openings sd_openings
R> <fct> <dbl> <dbl>
R> 1 1 63.3 30.5
R> 2 2 47 12.2
R> 3 3 34.5 26.0
R> 4 4 25.3 18.1
library(Rmisc)
<- summarySE(data = snakes, measurevar = "openings", groupvars = c("day")) snakes.summary2
Now we turn to some visual data summaries.
ggplot(data = snakes, aes(x = day, y = openings)) +
geom_segment(data = snakes.summary2, aes(x = day, xend = day, y = openings - ci, yend = openings + ci, colour = day),
size = 2.0, linetype = "solid", show.legend = F) +
geom_boxplot(aes(fill = day), alpha = 0.6, show.legend = F) +
geom_jitter(width = 0.05)
What are our null hypotheses?
- H0: There is no difference between snakes with respect to the number of openings at which they habituate.
- H0: There is no difference between days in terms of the number of openings at which the snakes habituate.
Fit the ANOVA model to test these hypotheses:
<- aov(openings ~ day + snake, data = snakes)
snakes.aov summary(snakes.aov)
R> Df Sum Sq Mean Sq F value Pr(>F)
R> day 3 4878 1625.9 3.320 0.0487 *
R> snake 5 3042 608.4 1.242 0.3382
R> Residuals 15 7346 489.7
R> ---
R> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now we need to test of the assumptions hold true (i.e. erros are normally distributed and heteroscedastic). Also, where are the differences?
par(mfrow = c(2, 2))
# Checking assumptions...
# make a histogram of the residuals;
# they must be normal
<- residuals(snakes.aov)
snakes.res hist(snakes.res, col = "red")
# make a plot of residuals and the fitted values;
# # they must be normal and homoscedastic
plot(fitted(snakes.aov), residuals(snakes.aov), col = "red")
<- TukeyHSD(snakes.aov, which = "day", conf.level = 0.90)
snakes.tukey plot(snakes.tukey, las = 1, col = "red")
Alternatives to ANOVA
In the first main section of this chapter we learned how to test
hypotheses based on the comparisons of means between sets of data when
we were able to meet our two base assumptions. These parametric tests
are preferred over non-parametric tests because they are more robust.
However, when we simply aren’t able to meet these assumptions we must
not despair. Non-parametric tests are still useful. In this chapter we
will learn how to run non-parametric tests for two sample and multiple
sample datasets. To start, let’s load our libraries and
chicks
data if we have not already.
# First activate libraries
library(tidyverse)
library(ggpubr)
# Then load data
<- as_tibble(ChickWeight) chicks
With our libraries and data loaded, let’s find a day in which at least one of our assumptions are violated.
# Then check for failing assumptions
%>%
chicks filter(Time == 0) %>%
group_by(Diet) %>%
summarise(norm_wt = as.numeric(shapiro.test(weight)[2]),
var_wt = var(weight))
R> norm_wt var_wt
R> 1 0.000221918 1.282041
Wilcox rank sum test
The non-parametric version of a t-test is a Wilcox rank sum
test. To perform this test in R we may again use
compare_means()
and specify the test we want:
compare_means(weight ~ Diet, data = filter(chicks, Time == 0, Diet %in% c(1, 2)), method = "wilcox.test")
R> # A tibble: 1 × 8
R> .y. group1 group2 p p.adj p.format p.signif method
R> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
R> 1 weight 1 2 0.235 0.23 0.23 ns Wilcoxon
What do our results show?
Kruskall-Wallis rank sum test
Single factor
The non-parametric version of an ANOVA is a Kruskall-Wallis rank sum
test. As you may have by now surmised, this may be done with
compare_means()
as seen below:
compare_means(weight ~ Diet, data = filter(chicks, Time == 0), method = "kruskal.test")
R> # A tibble: 1 × 6
R> .y. p p.adj p.format p.signif method
R> <chr> <dbl> <dbl> <chr> <chr> <chr>
R> 1 weight 0.475 0.48 0.48 ns Kruskal-Wallis
As with the ANOVA, this first step with the Kruskall-Wallis test is
not the last. We must again run a post-hoc test on our results. This
time we will need to use pgirmess::kruskalmc()
, which means
we will need to load a new library.
library(pgirmess)
kruskalmc(weight ~ Diet, data = filter(chicks, Time == 0))
R> Multiple comparison test after Kruskal-Wallis
R> p.value: 0.05
R> Comparisons
R> obs.dif critical.dif difference
R> 1-2 6.95 14.89506 FALSE
R> 1-3 6.90 14.89506 FALSE
R> 1-4 4.15 14.89506 FALSE
R> 2-3 0.05 17.19933 FALSE
R> 2-4 2.80 17.19933 FALSE
R> 3-4 2.75 17.19933 FALSE
Let’s consult the help file for kruskalmc()
to
understand what this print-out means.
Multiple factors
The water becomes murky quickly when one wants to perform multiple factor non-parametric comparison of means tests. To that end, we will not cover the few existing methods here. Rather, one should avoid the necessity for these types of tests when designing an experiment.
The SA time data
<- as_tibble(read_csv("../data/SA_time.csv", col_types = list(col_double(), col_double(), col_double())))
sa_time <- sa_time %>%
sa_time_long gather(key = "term", value = "minutes") %>%
filter(minutes < 300) %>%
mutate(term = as.factor(term))
<- list( c("now", "now_now"), c("now_now", "just_now"), c("now", "just_now") )
my_comparisons
ggboxplot(sa_time_long, x = "term", y = "minutes",
color = "term", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "jitter", shape = "term")
ggviolin(sa_time_long, x = "term", y = "minutes", fill = "term",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "boxplot", add.params = list(fill = "white")) +
stat_compare_means(comparisons = my_comparisons, label = "p.signif") + # Add significance levels
stat_compare_means(label.y = 50) # Add global the p-value
Exercises
Exercise 1 Here is bunch of data for pigs raised on different diets. The experiment is similar to the chicken one. Does feed type have an effect on the mass of pigs at the end of the experiment?
# enter the mass at the end of the experiment
<- c(60.8, 57.0, 65.0, 58.6, 61.7)
feed_1 <- c(68.7, 67.7, 74.0, 66.3, 69.8)
feed_2 <- c(102.6, 102.1, 100.2, 96.5)
feed_3 <- c(87.9, 84.2, 83.1, 85.7, 90.3)
feed_4
# make a dataframe
<- as.tibble(data.frame(
bacon feed = c(
rep("Feed 1", length(feed_1)),
rep("Feed 2", length(feed_2)),
rep("Feed 3", length(feed_3)),
rep("Feed 4", length(feed_4))
),mass = c(feed_1, feed_2, feed_3, feed_4)
))
Exercise 2 Construct suitable null and alternative hypotheses for the built-in
ToothGrowth
data, and test your hypotheses using an ANOVA.
<- datasets::ToothGrowth teeth
Exercise 3 Find or generate your own data that lend themselves to being analysed by a two-way ANOVA. Generate suitable hypotheses about your data, and analyse it. Supplement your analysis by providing a suitable descriptive statistical summary and graph(s) of your data.