To get started, open your web browser, head to rstudio.cloud and log in using your RStudio Cloud account.
Once you have logged in, start a New Project by clicking the button to the right of “Your Projects”
After you click the button, rstudio.cloud will deploy a new RStudio instance, which may take a minute or two, so be patient.
Once the RStudio instance is running, rename your project by clicking on the “Untitled Project” label at the top of the screen. Give the project a useful name, such as “biol 266”.
It is good practice to enter your R code into a R script. An R script is a plain text file that we save with the extension .R
. All your code for the lab should go in here because
To start a new script, go to the File menu, then select New File > R Script
Or you can press Ctrl + Shift + Alt + N on Windows or Cmd + Shift + Option + N on a Mac.
Your RStudio screen should look like this
Once you have a new session running and named, click in the Console window by the prompt and run the code below by typing in the console exactly what is shown below and hitting Return
install.packages(c('purrr', 'readxl', 'tidyr', 'dplyr', 'forcats',
'ggplot2', 'tibble', 'stringr', 'emmeans'))
This will take a few minutes to complete.
We need to load the packages we just installed so they are ready to use. You need to do this each time you start R and want to use these packages. The installation step you just performed does not need to be each time you start R.
Click in the script window. Enter the lines of code below into the script window. The hash character #
allows you to enter comments, which R will ignore.
library('purrr')
library('readxl') # read from Excel sheets
library('tidyr') # data processing
library('dplyr') # mo data processing
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library('forcats') # mo mo data processing
library('tibble') # creating data frames
library('stringr') # process strings (text)
library('ggplot2') # plotting
library('emmeans') # for post hoc comparisons
theme_set(theme_bw()) # Set plot theme
Once you have entered all those lines of code into the script window, move the cursor to the first line or click on it, and then press the following keys:
This will send the current line of code to the console window and move cursor to the start of the next line. Continue hitting Ctrl + Enter or Cmd + Enter until you have sent each line of code into the console. As you do so, R will load each package; some messages will be printed to the console, but this is OK as long as you do not see any Errors.
From now on, enter the code in the shaded grey boxes into the Script window and use Ctrl + Enter or Cmd + Enter to send the code to the Console. Whenever R produces output, it should match (or be similar to; you will see output for your data) the text shown in the outlined boxes where each line starts with ##
.
Note that the output in the console won’t have the ##
, these are just to indicate output in this document.
Before we can work with the data from the experiment, we will need to upload them to our rstudio.cloud project. Make sure you have downloaded the rn266data.xlsx
file from UR Courses and that you have it somewhere on your local computer. Next, make sure the Files tab is selected in the lower right quarter of the application, and locate the Upload button
Click the Upload button — this will bring up the Upload files dialogue box
Click the Choose file button — this will bring up the file chooser on your computer. In the file chooser, navigate to where you saved the rn266data.xlsx
file and open it in the chooser so that the file is listed next to the Choose file button, as shown below
Once you have rn266data.xlsx
showing in the dialogue box, you can click OK to upload the file to rstudio.cloud.
When you have uploaded the file, your Files pane should list the rn266data.xlsx
file as shown below
At this point, it is worth saving your script. Find the single disk icons in the main menu or the menu for your script
Click either of these icons to save the file. The first time you do this you’ll be prompted to enter a file name — use something memorable and be sure to give it the extension .R
. In the image below I named my script biol266-n-exp.R
.
Then click Save.
Be sure to save your work regularly during class. You can use Ctrl + s on Windows and Cmd + s on Mac to save changes you make to your script, or click on the disk icon as before.
The experimental data exist in rn266data.xlsx
, an Excel workbook. We’re going to work with all the data collected by everyone that took part in the experiment. To do this, we need to load the data from each sheet in turn, which we do with an iterator using the map()
function to call read_excel()
on each sheet name in turn
excel_sheets("rn266data.xlsx") %>%
sheets <- set_names()
map(sheets, read_excel, path = "rn266data.xlsx") plants <-
We need to do a little cleaning of the data before we can combine all the sheets into a single object for analysis. The data cleaning requires a couple of functions
## fixes the sheet names to clean the up a bit
function(x) {
fix_names <- names(x) %>%
nms <- str_trim(side = "right") %>%
str_to_lower() %>%
str_replace_all(pattern = ": ", replacement = ":") %>%
str_replace_all(pattern = "heatlh", replacement = "health")
names(x) <- nms
x
}
## wrapper around the two pivot operations needed to get the data
## we want them
function(x) {
pivot_fun <- pivot_longer(x,
x <--(treatment:plantid),
names_to = c("variable", "day"),
names_sep = ':',
names_transform = list(day = as.integer))
pivot_wider(x, names_from = variable, values_from = value)
x <- mutate(x, id = paste(treatment, plantid, sep = '_'))
x <- mutate(x, treatment = fct_relevel(treatment, 'control'))
x <-
x }
Copy and paste these functions into your script and then highlight them in the script and run them within R so that thay are available for use in your session.
Now we can clean up the column names in the sheets
map(plants, fix_names) plants <-
and finally do the two pivot operations you saw in the lab analysis videos, to get the data exactly how we want them
map(plants, pivot_fun) plants <-
Next we need to combine all of the processed data objects in plants
into a single data frame, rather than a list of separate data frames as it is currently. We start by stroing some info about the sheets and how many rows of data each has
names(plants) # names of sheets
sheet_names <- map(plants, nrow) # how many obs ber sheet obs_per_sheet <-
then we combine the sheets and add in an extra variable observer
(who did the experiment), as well as modifying the id
variable and setting the reference level of the treatment
factor to be the control
level.
bind_rows(plants) %>%
plants <- mutate(observer = rep(sheet_names, times = obs_per_sheet),
id = paste(observer, id, sep = '_'),
treatment = fct_relevel(treatment, 'control'))
OK, that’s a lot of data processing. Let’s look at the data.
We can plot the plant height data over time using the following code
ggplot(plants, aes(x = day, y = height, group = id,
colour = treatment)) +
geom_line() +
facet_wrap(~ observer) +
labs(y = 'Height (mm)', x = 'Day', colour = 'Treatment')
## Warning: Removed 650 row(s) containing missing values (geom_path).
Note, we specify the data object (plants
) and the mapping of variables to axes and the colour channel. We want to group the data for each plant and draw lines through the observations to better show how the height of each plant changed over time. We indicate this grouping via the group
aesthetic, which we set to the id
variable we created earlier. We also draw the data for each observer in it’s own panel, which ggplot calls faceting. The final line adds some nice labels to the plot.
Some preliminary analysis suggests we need to model plant height on the log scale; bigger things get bigger more quickly than smaller things. As such we should look at the data on that scale too
ggplot(plants, aes(x = day, y = log1p(height), group = id,
colour = treatment)) +
geom_line() +
facet_wrap(~ observer) +
labs(y = expression(log[e](Height) ~ (log[e]*mm)), x = 'Day', colour = 'Treatment')
## Warning: Removed 650 row(s) containing missing values (geom_path).
The log1p()
function adds 1 to each observation of plant height before taking logs, as there are some 0
height plants. The value log(0)
is minus infinity, something thats not a helpful number and which can’t easily be represented in the computer in a way that we can include it in the model. Hence adding 1 to all values means we have no zero values in the data before we log transform them.
The final thing we’ll do before we start analysing the data, just to simplify things, is to take only those rows of data where we have a height measurement, excluding all the NA
values in the height
variable. We do this by filtering the plants
object
plants %>% filter(!is.na(height)) plant_h <-
The model we are fitting is sometimes called an analysis of covariance or ANCOVA, which is another special case of a linear model that can also be fitted using the lm()
function. In an ANCOVA, we control for the effects of a continuous variable whilst interest is on a categorical treatment effect.
As before, it is helpful to think about the kinds of result we might expect from the statistical analysis. When considering the plant height data we could think of five basic types of result:
We specify the model structure using a symbolic formula. The formula we’ll use is
height ~ day * treatment + observer
The response (dependent) variable is on the left hand side of the ~
(the tilde symbol). This is the variable we wish to explain. The right hand side of the formula contains the independent variables in the model. Here, the independent variables represent the experimental setup. In addition we have + observer
, which allows for a different mean plant height for each observer; we do this as there are likely to be differences between each student’s experimental set-up (temperature, amount of light, soil, etc) that will lead to systematic variation between observers. In a more formal analysis, we’d handle this using what are known as random effects, but here we keep things simple and just consider observer
as a fixed offset and we don’t investigate any interactions between the experimental variables and observer
, although that could be done.
Given our experimental design, the most complex model we can fit is one that allows for different growth rates (effects of day
) for each treatment level. In other words we want to have a different growth rate for each combination of treatment.
Such a model implies a two-way interaction between day
, and treatment
. We indicate interactions in the formula using the *
symbol.
lm(log1p(height) ~ day * treatment + observer, data = plant_h) m1 <-
As with the plot above, we are modelling log1p(height)
, which adds 1
to each height observation before applying a log transformation.
We can summarize the model to look for some key diagnostics of the fit
summary(m1)
##
## Call:
## lm(formula = log1p(height) ~ day * treatment + observer, data = plant_h)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8584 -0.2474 0.0955 0.3959 1.3116
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.348031 0.083498 28.121 < 2e-16 ***
## day 0.087206 0.003517 24.797 < 2e-16 ***
## treatment1x 0.018353 0.081643 0.225 0.82216
## treatment3x -0.245240 0.081264 -3.018 0.00258 **
## observerAH 1.161503 0.083345 13.936 < 2e-16 ***
## observerALi 0.532873 0.085831 6.208 6.44e-10 ***
## observerCC 1.517712 0.085973 17.653 < 2e-16 ***
## observerHB 0.011267 0.085090 0.132 0.89467
## observerHG -1.068374 0.082880 -12.891 < 2e-16 ***
## observerJC 0.357956 0.078808 4.542 5.89e-06 ***
## observerJE 0.892810 0.083166 10.735 < 2e-16 ***
## observerJL 0.892757 0.075715 11.791 < 2e-16 ***
## observerKK 0.599455 0.075709 7.918 3.89e-15 ***
## observerLZ 0.449775 0.080785 5.568 2.92e-08 ***
## observerMF -0.588969 0.082718 -7.120 1.48e-12 ***
## observerMM 0.429434 0.080524 5.333 1.07e-07 ***
## observerNW 1.073622 0.081501 13.173 < 2e-16 ***
## observerTC 1.034075 0.081634 12.667 < 2e-16 ***
## observerTH 0.795451 0.080720 9.854 < 2e-16 ***
## day:treatment1x -0.008454 0.004877 -1.733 0.08319 .
## day:treatment3x -0.007639 0.004896 -1.560 0.11886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6347 on 2087 degrees of freedom
## Multiple R-squared: 0.626, Adjusted R-squared: 0.6224
## F-statistic: 174.7 on 20 and 2087 DF, p-value: < 2.2e-16
Most of this isn’t very helpful in this state, except the omnibus F test which indicates if the total variance explained by all the model terms is significantly large relative to the unexplained variance.
If we think back to the plot of the data we made earlier, the relationship between plant height and time was non-linear, even on the log scale. The model we just fitted is a linear model, only capable of fitting straight lines and as such is not likely to follow the data closely. Statisticians call this bias. To try to reduce the bias we can fit a non-linear curve to the data using polynomials of day
. Polynomials are \(day\), \(day^2\), \(day^3\), etc. To model a saturating curve — one which levels off — we might be able to simply us \(\text{day} + \text{day}^2\), but to allow for decreasing height at later days we will use a cubic polynomial, \(\text{day}^3\) as well. We can then compare all three models with increasing amounts of non-linearity and see which fits the data better.
First we create the polynomial models my updating the existing model. First we add first and second order polynomials and their interaction with treatment
, which requires us to remove the day * treatment
terms from the original model
update(m1, . ~ poly(day, 2) * treatment + observer) m2 <-
Next we repeat the above but this time adding the third-degree polynomial
update(m1, . ~ poly(day, 3) * treatment + observer) m3 <-
Now we have three models, we can compare them using an analysis of variance table or Akaike’s Information Criterion (AIC). We start with the analysis of variance table
anova(m1, m2, m3)
## Analysis of Variance Table
##
## Model 1: log1p(height) ~ day * treatment + observer
## Model 2: log1p(height) ~ poly(day, 2) + treatment + observer + poly(day,
## 2):treatment
## Model 3: log1p(height) ~ poly(day, 3) + treatment + observer + poly(day,
## 3):treatment
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2087 840.79
## 2 2084 710.92 3 129.87 134.402 < 2.2e-16 ***
## 3 2081 670.28 3 40.64 42.058 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we are comparing the models sequentially from simplest to most complex. Each line is testing if the next most complex model is similar to the next in terms of the residual sums of squares (RSS
), which will decrease as we make the model more complex. We see significant reductions in the RSS
(lack of fit of the model) as we add the extra polynomial terms, suggesting that the cubic polynomial of day
gives the best fit to the data.
We confirm this using AIC, where smaller values of AIC indicate better-fitting models
AIC(m1, m2, m3)
## df AIC
## m1 22 4088.661
## m2 25 3740.974
## m3 28 3622.887
Again, we find that the cubic polynomial provides the better fit. As such, we will use this model, m3
to explore in more detail how plant height varied across the experimental period.
To get more interpretable output we will again use estimate marginal means; remember that we are interested in the effect of day
, a continuous variable, as well as the treatment effects. The day
effect will be represented by curved lines (regression lines), the slopes of which reflect the average growth rate of plants in the combinations of treatments and cultivars. Because the lines are not linear, the rate of growth varies throughout the experimental period. These lines are often called trends hence we use the emtrends()
function to estimate the marginal trends instead of the marginal means.
As before we will start by comparing the treated plants with the control plants. This time however we need to indicate the trend variable via argument var
. In our case the trend variable is day
(which is indicated by the name in quotes). We also want the growth rate on the original (not log) scale, so we add transform = "response"
, and because the growth rate is not constant we ask for the comparison of treatments at day 5
emtrends(m3, trt.vs.ctrl ~ treatment, var = "day",
m3mt5 <-transform = "response",
at = list(day = 5))
m3mt5
## $emtrends
## treatment day.trend SE df lower.CL upper.CL
## control 6.45 0.318 2081 5.82 7.07
## 1x 5.65 0.308 2081 5.05 6.26
## 3x 4.84 0.250 2081 4.35 5.33
##
## Results are averaged over the levels of: observer
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 1x - control -0.795 0.438 2081 -1.816 0.1277
## 3x - control -1.612 0.400 2081 -4.031 0.0001
##
## Results are averaged over the levels of: observer
## P value adjustment: dunnettx method for 2 tests
You should be familiar with this output now. The $emtrends
section contains similar outputs to the $emmeans
sections we looked at earlier when analyzing the fresh weight of the plants. The main difference is in the interpretation of the numbers in the day.trend
column, which replaces the emmean
from before. The day.trend
column contains estimates of the slope of the regression line for the day
variable at day == 5
.
The numbers in the column are the estimated change in the height of the plant for a single day of growth:
day.trend
number is positive it means the plants in a treatment group increased in height over the experiment, andday.trend
number is negative it means the plants in a treatment group decreased in height over the experiment.The $contrasts
section contains the same kind of comparisons of a treatments vs controls. Remember here that the estimate
column contains the estimated difference in the slopes (growth rates) of the stated treatment and the control group. Hence the null hypothesis being tested is that the difference in slopes is equal to zero.
estimate
column means the average change in height in the stated treatment was lower than in the control group. In other words, the plants in that treatment group grew more slowly on average than the plants in the control group, andestimate
column means the plants in the that treatment group grew, on average, at a faster rate than plants in the control group.As before, we can plot these estimated marginal trends using the plot()
method:
plot(m3mt5, comparisons = TRUE)
This plot is similar to one we created in the earlier lab when we looked the data from the 2019 experiment:
The main difference in this plot is that we no longer show the estimated marginal mean on the x axis. Instead the slope of the regression line (the growth rate) is plotted on the x axis. Remember how we interpret these numbers: the values are the estimated change, on average, in the height of a plant in single day of growth.
It look like there is little evidence in favor of different treatment effects at day 5. What about at a point later in the experiment? We can try day == 10
for example
emtrends(m3, trt.vs.ctrl ~ treatment,
m3mt10 <-var = "day",
transform = "response",
at = list(day = 10))
m3mt10
## $emtrends
## treatment day.trend SE df lower.CL upper.CL
## control 9.55 0.476 2081 8.62 10.49
## 1x 7.39 0.400 2081 6.61 8.18
## 3x 5.78 0.324 2081 5.14 6.41
##
## Results are averaged over the levels of: observer
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## 1x - control -2.16 0.606 2081 -3.569 0.0007
## 3x - control -3.78 0.563 2081 -6.713 <.0001
##
## Results are averaged over the levels of: observer
## P value adjustment: dunnettx method for 2 tests
Look at the $contrasts
component and see if there are significant differences at this point in the experiment.
As before, we can plot these estimated marginal trends using the plot()
method:
plot(m3mt10, comparisons = TRUE)
You can repeat these commands for different days in the experimental period.
All this talk of estimated trends or slopes may be a little opaque, but we can plot the estimated regression lines to visualize the average growth rates of plants in each of the treatment groups using the emmip()
function:
emmip(m3, treatment ~ day, cov.reduce = FALSE, CIs = TRUE)
Here the formula means you want to show the treatment
effects against day
. The cov.reduce = FALSE
part means that we want to plot for all values of day
so that we get the full regression lines. Confidence intervals are added with CIs = TRUE
— if you want to turn them off use CIs = FALSE
.
If you want formal comparisons of the growth rates for all pairs of treatments, we can also do all pairwise comparisons using the emmtrends()
function by using the pairwise
option instead of trt-vs-ctrl
. As before, we can assess the estimate growth rate (slope) at a given day, here for day == 5
emtrends(m3, pairwise ~ treatment,
var = "day",
transform = "response",
at = list(day = 5))
## $emtrends
## treatment day.trend SE df lower.CL upper.CL
## control 6.45 0.318 2081 5.82 7.07
## 1x 5.65 0.308 2081 5.05 6.26
## 3x 4.84 0.250 2081 4.35 5.33
##
## Results are averaged over the levels of: observer
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## control - 1x 0.795 0.438 2081 1.816 0.1645
## control - 3x 1.612 0.400 2081 4.031 0.0002
## 1x - 3x 0.818 0.392 2081 2.083 0.0935
##
## Results are averaged over the levels of: observer
## P value adjustment: tukey method for comparing a family of 3 estimates
and for day == 20
emtrends(m3, pairwise ~ treatment,
var = "day",
transform = "response",
at = list(day = 20))
## $emtrends
## treatment day.trend SE df lower.CL upper.CL
## control 2.51 0.696 2081 1.149 3.88
## 1x 2.74 0.597 2081 1.569 3.91
## 3x 1.78 0.448 2081 0.905 2.66
##
## Results are averaged over the levels of: observer
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## control - 1x -0.226 0.908 2081 -0.249 0.9663
## control - 3x 0.730 0.819 2081 0.892 0.6456
## 1x - 3x 0.956 0.736 2081 1.299 0.3959
##
## Results are averaged over the levels of: observer
## P value adjustment: tukey method for comparing a family of 3 estimates
We don’t see any differences in the estimated growth rates, but are there differences in the average plant height between the two treatments? We can test that using the emmeans()
function, again using the pairwise
option instead of trt-vs-ctrl
. We use emmeans()
here because we want estimates of the expected mean height for each treatment at a set of days, and not the slope of a trend line.
emmeans(m3, pairwise ~ treatment | day,
at = list(day = c(1, 5, 10, 15, 20)))
## Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
## $emmeans
## day = 1:
## treatment emmean SE df lower.CL upper.CL
## control 1.36 0.1271 2081 1.108 1.61
## 1x 1.59 0.1335 2081 1.330 1.85
## 3x 1.19 0.1308 2081 0.938 1.45
##
## day = 5:
## treatment emmean SE df lower.CL upper.CL
## control 2.88 0.0473 2081 2.783 2.97
## 1x 2.89 0.0493 2081 2.798 2.99
## 3x 2.66 0.0478 2081 2.561 2.75
##
## day = 10:
## treatment emmean SE df lower.CL upper.CL
## control 3.98 0.0354 2081 3.906 4.05
## 1x 3.85 0.0348 2081 3.780 3.92
## 3x 3.64 0.0353 2081 3.574 3.71
##
## day = 15:
## treatment emmean SE df lower.CL upper.CL
## control 4.46 0.0304 2081 4.402 4.52
## 1x 4.29 0.0298 2081 4.229 4.35
## 3x 4.03 0.0307 2081 3.966 4.09
##
## day = 20:
## treatment emmean SE df lower.CL upper.CL
## control 4.63 0.0367 2081 4.555 4.70
## 1x 4.46 0.0361 2081 4.394 4.54
## 3x 4.15 0.0383 2081 4.079 4.23
##
## Results are averaged over the levels of: observer
## Results are given on the log1p (not the response) scale.
## Confidence level used: 0.95
##
## $contrasts
## day = 1:
## contrast estimate SE df t.ratio p.value
## control - 1x -0.235 0.1820 2081 -1.290 0.4011
## control - 3x 0.163 0.1802 2081 0.903 0.6385
## 1x - 3x 0.397 0.1845 2081 2.154 0.0795
##
## day = 5:
## contrast estimate SE df t.ratio p.value
## control - 1x -0.019 0.0671 2081 -0.283 0.9567
## control - 3x 0.220 0.0661 2081 3.336 0.0025
## 1x - 3x 0.239 0.0675 2081 3.550 0.0011
##
## day = 10:
## contrast estimate SE df t.ratio p.value
## control - 1x 0.127 0.0494 2081 2.582 0.0267
## control - 3x 0.332 0.0497 2081 6.686 <.0001
## 1x - 3x 0.205 0.0492 2081 4.156 0.0001
##
## day = 15:
## contrast estimate SE df t.ratio p.value
## control - 1x 0.175 0.0424 2081 4.120 0.0001
## control - 3x 0.435 0.0431 2081 10.090 <.0001
## 1x - 3x 0.260 0.0427 2081 6.105 <.0001
##
## day = 20:
## contrast estimate SE df t.ratio p.value
## control - 1x 0.162 0.0512 2081 3.169 0.0044
## control - 3x 0.473 0.0527 2081 8.961 <.0001
## 1x - 3x 0.310 0.0522 2081 5.949 <.0001
##
## Results are averaged over the levels of: observer
## Note: contrasts are still on the log1p scale
## P value adjustment: tukey method for comparing a family of 3 estimates
There is a lot of output here, but it follows that which we have seen previously; The $emmeans
component contains the point estimates of plant height for the three treatments for each of the requested days, while the $contrasts
component shows the pairwise comparisons for the treatment.
Are any of these pairwise comparisons statistically significant? Does which pairs of comparisons are significantly different change over time? What does this suggest about the effects of the treatment and the cultivar on the growth of the plants?
The plant health data are not as easily analyzed as the plant height data; health
is an ordered categorical response and that requires a more specialized model, such as a proportional odds model or assuming an ordered categorical response distribution. These models are non-trivial and as such we won’t explore the health
data statistically here although doing so is most certainly possible.
Instead, we’ll just look at the data, which you can describe.
ggplot(plants, aes(x = day, y = health, group = id,
colour = treatment)) +
geom_line() +
facet_wrap(~ observer) +
labs(y = 'Health (1-5)', x = 'Day', colour = 'Treatment')
## Warning: Removed 624 row(s) containing missing values (geom_path).
We can analyse the number of internodes in a similar way to that for plant height, with an ANCOVA. The data can be plotted using
ggplot(plants, aes(x = day, y = log1p(internodes), group = id,
colour = treatment)) +
geom_line() +
facet_wrap(~ observer) +
labs(y = 'Number of internodes', x = 'Day', colour = 'Treatment')
## Warning: Removed 670 row(s) containing missing values (geom_path).
First we filter the data
filter(plants, !is.na(internodes)) plants_i <-
Then we fit the models, again allowing for polynomials of day
lm(log1p(internodes) ~ day * treatment + observer, data = plants_i)
i1 <- update(i1, . ~ poly(day, 2) * treatment + observer)
i2 <- update(i1, . ~ poly(day, 3) * treatment + observer) i3 <-
And finally we can compre these to see which fits the data best
anova(i1, i2, i3)
## Analysis of Variance Table
##
## Model 1: log1p(internodes) ~ day * treatment + observer
## Model 2: log1p(internodes) ~ poly(day, 2) + treatment + observer + poly(day,
## 2):treatment
## Model 3: log1p(internodes) ~ poly(day, 3) + treatment + observer + poly(day,
## 3):treatment
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2067 294.70
## 2 2064 245.88 3 48.818 139.942 < 2.2e-16 ***
## 3 2061 239.66 3 6.231 17.861 1.915e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(i1, i2, i3)
## df AIC
## i1 22 1881.196
## i2 25 1509.053
## i3 28 1461.461
Working from i3
then, we repeat some of the emtrends()
and emmeans()
code we used earlier but applied to i3
to investigate changes and difference in the number of internodes over time and between treatments.
To save a plot to a file, click on the Plots tab in the lower right window, and use the arrow icons to page through the plots to the one you want. Then click the Export
From the menu that appears choose Save as Image…
This brings up the Save Plot as Image dialogue box. Make the following changes:
.png
extensionWhen you have made the required changes, click Save.
To export saved images or your script to your local machine, select the Files tab in the lower right window
Check the boxes next to the files you wish to export (download) as shown above, and then click the More button in the menu and select the Export… option.
Enter a filename — if you are exporting more than one file, RStudio will compress all the files into a single ZIP archive — if needed, but the provided name should be OK. Then click Download to download the file or the ZIP archive.
Your work will be saved in your rstudio.cloud workspace for some time after the lab. You will be able to resume from where you left off if you log in to your workspace at a late time. To be sure that the latest state of your work is saved, it would be best to log out of rstudio.cloud at the end of class today.
Locate your user name in the top right of the screen
Click your user name to open the menu
Click “Log out” to securely log out of rstudio.cloud.