Getting started

To get started, open your web browser, head to rstudio.cloud and log in using your RStudio Cloud account.

Once you have logged in, start a New Project by clicking the button to the right of “Your Projects”

After you click the button, rstudio.cloud will deploy a new RStudio instance, which may take a minute or two, so be patient.

Once the RStudio instance is running, rename your project by clicking on the “Untitled Project” label at the top of the screen. Give the project a useful name, such as “biol 266”.

Opening a new script window

It is good practice to enter your R code into a R script. An R script is a plain text file that we save with the extension .R. All your code for the lab should go in here because

  1. you can save the file and all your work as you go along so you have a record of what you did,
  2. it is easier to edit your code and fix errors, mistakes, or typos in the script window, and
  3. you can share the script file if you need help to track down an error.

To start a new script, go to the File menu, then select New File > R Script

Or you can press Ctrl + Shift + Alt + N on Windows or Cmd + Shift + Option + N on a Mac.

Your RStudio screen should look like this

Installing R Packages

Once you have a new session running and named, click in the Console window by the prompt and run the code below by typing in the console exactly what is shown below and hitting Return

install.packages(c('purrr', 'readxl', 'tidyr', 'dplyr', 'forcats',
                   'ggplot2', 'tibble', 'stringr', 'emmeans'))

This will take a few minutes to complete.

Load Packages

We need to load the packages we just installed so they are ready to use. You need to do this each time you start R and want to use these packages. The installation step you just performed does not need to be each time you start R.

Click in the script window. Enter the lines of code below into the script window. The hash character # allows you to enter comments, which R will ignore.

library('purrr')
library('readxl')  # read from Excel sheets
library('tidyr')   # data processing
library('dplyr')   # mo data processing
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library('forcats') # mo mo data processing
library('tibble')  # creating data frames
library('stringr') # process strings (text)
library('ggplot2') # plotting
library('emmeans') # for post hoc comparisons

theme_set(theme_bw()) # Set plot theme

Once you have entered all those lines of code into the script window, move the cursor to the first line or click on it, and then press the following keys:

  • On Windows press Ctrl + Enter
  • On Mac press Cmd + Enter

This will send the current line of code to the console window and move cursor to the start of the next line. Continue hitting Ctrl + Enter or Cmd + Enter until you have sent each line of code into the console. As you do so, R will load each package; some messages will be printed to the console, but this is OK as long as you do not see any Errors.

From now on, enter the code in the shaded grey boxes into the Script window and use Ctrl + Enter or Cmd + Enter to send the code to the Console. Whenever R produces output, it should match (or be similar to; you will see output for your data) the text shown in the outlined boxes where each line starts with ##.

Note that the output in the console won’t have the ##, these are just to indicate output in this document.

Upload data

Before we can work with the data from the experiment, we will need to upload them to our rstudio.cloud project. Make sure you have downloaded the rn266data.xlsx file from UR Courses and that you have it somewhere on your local computer. Next, make sure the Files tab is selected in the lower right quarter of the application, and locate the Upload button

Click the Upload button — this will bring up the Upload files dialogue box

Click the Choose file button — this will bring up the file chooser on your computer. In the file chooser, navigate to where you saved the rn266data.xlsx file and open it in the chooser so that the file is listed next to the Choose file button, as shown below

Once you have rn266data.xlsx showing in the dialogue box, you can click OK to upload the file to rstudio.cloud.

When you have uploaded the file, your Files pane should list the rn266data.xlsx file as shown below

Saving your work

At this point, it is worth saving your script. Find the single disk icons in the main menu or the menu for your script

Click either of these icons to save the file. The first time you do this you’ll be prompted to enter a file name — use something memorable and be sure to give it the extension .R. In the image below I named my script biol266-n-exp.R.

Then click Save.

Be sure to save your work regularly during class. You can use Ctrl + s on Windows and Cmd + s on Mac to save changes you make to your script, or click on the disk icon as before.

Load Data

The experimental data exist in rn266data.xlsx, an Excel workbook. We’re going to work with all the data collected by everyone that took part in the experiment. To do this, we need to load the data from each sheet in turn, which we do with an iterator using the map() function to call read_excel() on each sheet name in turn

sheets <- excel_sheets("rn266data.xlsx") %>%
    set_names()
plants <- map(sheets, read_excel, path = "rn266data.xlsx")

We need to do a little cleaning of the data before we can combine all the sheets into a single object for analysis. The data cleaning requires a couple of functions

## fixes the sheet names to clean the up a bit 
fix_names <- function(x) {
    nms <- names(x) %>%
        str_trim(side = "right") %>%
        str_to_lower() %>%
        str_replace_all(pattern = ": ", replacement = ":") %>%
        str_replace_all(pattern = "heatlh", replacement = "health")
    names(x) <- nms
    x
}

## wrapper around the two pivot operations needed to get the data
## we want them
pivot_fun <- function(x) {
    x <- pivot_longer(x,
                      -(treatment:plantid),
                      names_to = c("variable", "day"),
                      names_sep = ':',
                      names_transform = list(day = as.integer))
    x <- pivot_wider(x, names_from = variable, values_from = value)
    x <- mutate(x, id = paste(treatment, plantid, sep = '_'))
    x <- mutate(x, treatment = fct_relevel(treatment, 'control'))
    x
}

Copy and paste these functions into your script and then highlight them in the script and run them within R so that thay are available for use in your session.

Now we can clean up the column names in the sheets

plants <- map(plants, fix_names)

and finally do the two pivot operations you saw in the lab analysis videos, to get the data exactly how we want them

plants <- map(plants, pivot_fun)

Next we need to combine all of the processed data objects in plants into a single data frame, rather than a list of separate data frames as it is currently. We start by stroing some info about the sheets and how many rows of data each has

sheet_names <- names(plants) # names of sheets
obs_per_sheet <- map(plants, nrow) # how many obs ber sheet

then we combine the sheets and add in an extra variable observer (who did the experiment), as well as modifying the id variable and setting the reference level of the treatment factor to be the control level.

plants <- bind_rows(plants) %>%
    mutate(observer = rep(sheet_names, times = obs_per_sheet),
           id = paste(observer, id, sep = '_'),
           treatment = fct_relevel(treatment, 'control'))

OK, that’s a lot of data processing. Let’s look at the data.

Plant height

We can plot the plant height data over time using the following code

ggplot(plants, aes(x = day, y = height, group = id,
                   colour = treatment)) +
    geom_line() +
    facet_wrap(~ observer) +
    labs(y = 'Height (mm)', x = 'Day', colour = 'Treatment')
## Warning: Removed 650 row(s) containing missing values (geom_path).

Note, we specify the data object (plants) and the mapping of variables to axes and the colour channel. We want to group the data for each plant and draw lines through the observations to better show how the height of each plant changed over time. We indicate this grouping via the group aesthetic, which we set to the id variable we created earlier. We also draw the data for each observer in it’s own panel, which ggplot calls faceting. The final line adds some nice labels to the plot.

Some preliminary analysis suggests we need to model plant height on the log scale; bigger things get bigger more quickly than smaller things. As such we should look at the data on that scale too

ggplot(plants, aes(x = day, y = log1p(height), group = id,
                   colour = treatment)) +
    geom_line() +
    facet_wrap(~ observer) +
    labs(y = expression(log[e](Height) ~ (log[e]*mm)), x = 'Day', colour = 'Treatment')
## Warning: Removed 650 row(s) containing missing values (geom_path).

The log1p() function adds 1 to each observation of plant height before taking logs, as there are some 0 height plants. The value log(0) is minus infinity, something thats not a helpful number and which can’t easily be represented in the computer in a way that we can include it in the model. Hence adding 1 to all values means we have no zero values in the data before we log transform them.

The final thing we’ll do before we start analysing the data, just to simplify things, is to take only those rows of data where we have a height measurement, excluding all the NA values in the height variable. We do this by filtering the plants object

plant_h <- plants %>% filter(!is.na(height)) 

ANCOVA

The model we are fitting is sometimes called an analysis of covariance or ANCOVA, which is another special case of a linear model that can also be fitted using the lm() function. In an ANCOVA, we control for the effects of a continuous variable whilst interest is on a categorical treatment effect.

Potential results

As before, it is helpful to think about the kinds of result we might expect from the statistical analysis. When considering the plant height data we could think of five basic types of result:

We specify the model structure using a symbolic formula. The formula we’ll use is

height ~ day * treatment + observer

The response (dependent) variable is on the left hand side of the ~ (the tilde symbol). This is the variable we wish to explain. The right hand side of the formula contains the independent variables in the model. Here, the independent variables represent the experimental setup. In addition we have + observer, which allows for a different mean plant height for each observer; we do this as there are likely to be differences between each student’s experimental set-up (temperature, amount of light, soil, etc) that will lead to systematic variation between observers. In a more formal analysis, we’d handle this using what are known as random effects, but here we keep things simple and just consider observer as a fixed offset and we don’t investigate any interactions between the experimental variables and observer, although that could be done.

Given our experimental design, the most complex model we can fit is one that allows for different growth rates (effects of day) for each treatment level. In other words we want to have a different growth rate for each combination of treatment.

Such a model implies a two-way interaction between day, and treatment. We indicate interactions in the formula using the * symbol.

m1 <- lm(log1p(height) ~ day * treatment + observer, data = plant_h)

As with the plot above, we are modelling log1p(height), which adds 1 to each height observation before applying a log transformation.

We can summarize the model to look for some key diagnostics of the fit

summary(m1)
## 
## Call:
## lm(formula = log1p(height) ~ day * treatment + observer, data = plant_h)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8584 -0.2474  0.0955  0.3959  1.3116 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.348031   0.083498  28.121  < 2e-16 ***
## day              0.087206   0.003517  24.797  < 2e-16 ***
## treatment1x      0.018353   0.081643   0.225  0.82216    
## treatment3x     -0.245240   0.081264  -3.018  0.00258 ** 
## observerAH       1.161503   0.083345  13.936  < 2e-16 ***
## observerALi      0.532873   0.085831   6.208 6.44e-10 ***
## observerCC       1.517712   0.085973  17.653  < 2e-16 ***
## observerHB       0.011267   0.085090   0.132  0.89467    
## observerHG      -1.068374   0.082880 -12.891  < 2e-16 ***
## observerJC       0.357956   0.078808   4.542 5.89e-06 ***
## observerJE       0.892810   0.083166  10.735  < 2e-16 ***
## observerJL       0.892757   0.075715  11.791  < 2e-16 ***
## observerKK       0.599455   0.075709   7.918 3.89e-15 ***
## observerLZ       0.449775   0.080785   5.568 2.92e-08 ***
## observerMF      -0.588969   0.082718  -7.120 1.48e-12 ***
## observerMM       0.429434   0.080524   5.333 1.07e-07 ***
## observerNW       1.073622   0.081501  13.173  < 2e-16 ***
## observerTC       1.034075   0.081634  12.667  < 2e-16 ***
## observerTH       0.795451   0.080720   9.854  < 2e-16 ***
## day:treatment1x -0.008454   0.004877  -1.733  0.08319 .  
## day:treatment3x -0.007639   0.004896  -1.560  0.11886    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6347 on 2087 degrees of freedom
## Multiple R-squared:  0.626,  Adjusted R-squared:  0.6224 
## F-statistic: 174.7 on 20 and 2087 DF,  p-value: < 2.2e-16

Most of this isn’t very helpful in this state, except the omnibus F test which indicates if the total variance explained by all the model terms is significantly large relative to the unexplained variance.

Polynomial regression

If we think back to the plot of the data we made earlier, the relationship between plant height and time was non-linear, even on the log scale. The model we just fitted is a linear model, only capable of fitting straight lines and as such is not likely to follow the data closely. Statisticians call this bias. To try to reduce the bias we can fit a non-linear curve to the data using polynomials of day. Polynomials are \(day\), \(day^2\), \(day^3\), etc. To model a saturating curve — one which levels off — we might be able to simply us \(\text{day} + \text{day}^2\), but to allow for decreasing height at later days we will use a cubic polynomial, \(\text{day}^3\) as well. We can then compare all three models with increasing amounts of non-linearity and see which fits the data better.

First we create the polynomial models my updating the existing model. First we add first and second order polynomials and their interaction with treatment, which requires us to remove the day * treatment terms from the original model

m2 <- update(m1, . ~ poly(day, 2) * treatment + observer)

Next we repeat the above but this time adding the third-degree polynomial

m3 <- update(m1, . ~ poly(day, 3) * treatment + observer)

Now we have three models, we can compare them using an analysis of variance table or Akaike’s Information Criterion (AIC). We start with the analysis of variance table

anova(m1, m2, m3)
## Analysis of Variance Table
## 
## Model 1: log1p(height) ~ day * treatment + observer
## Model 2: log1p(height) ~ poly(day, 2) + treatment + observer + poly(day, 
##     2):treatment
## Model 3: log1p(height) ~ poly(day, 3) + treatment + observer + poly(day, 
##     3):treatment
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1   2087 840.79                                   
## 2   2084 710.92  3    129.87 134.402 < 2.2e-16 ***
## 3   2081 670.28  3     40.64  42.058 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we are comparing the models sequentially from simplest to most complex. Each line is testing if the next most complex model is similar to the next in terms of the residual sums of squares (RSS), which will decrease as we make the model more complex. We see significant reductions in the RSS (lack of fit of the model) as we add the extra polynomial terms, suggesting that the cubic polynomial of day gives the best fit to the data.

We confirm this using AIC, where smaller values of AIC indicate better-fitting models

AIC(m1, m2, m3)
##    df      AIC
## m1 22 4088.661
## m2 25 3740.974
## m3 28 3622.887

Again, we find that the cubic polynomial provides the better fit. As such, we will use this model, m3 to explore in more detail how plant height varied across the experimental period.

Treatments vs controls

To get more interpretable output we will again use estimate marginal means; remember that we are interested in the effect of day, a continuous variable, as well as the treatment effects. The day effect will be represented by curved lines (regression lines), the slopes of which reflect the average growth rate of plants in the combinations of treatments and cultivars. Because the lines are not linear, the rate of growth varies throughout the experimental period. These lines are often called trends hence we use the emtrends() function to estimate the marginal trends instead of the marginal means.

As before we will start by comparing the treated plants with the control plants. This time however we need to indicate the trend variable via argument var. In our case the trend variable is day (which is indicated by the name in quotes). We also want the growth rate on the original (not log) scale, so we add transform = "response", and because the growth rate is not constant we ask for the comparison of treatments at day 5

m3mt5 <- emtrends(m3, trt.vs.ctrl ~ treatment, var = "day",
                  transform = "response",
                  at = list(day = 5))
m3mt5
## $emtrends
##  treatment day.trend    SE   df lower.CL upper.CL
##  control        6.45 0.318 2081     5.82     7.07
##  1x             5.65 0.308 2081     5.05     6.26
##  3x             4.84 0.250 2081     4.35     5.33
## 
## Results are averaged over the levels of: observer 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast     estimate    SE   df t.ratio p.value
##  1x - control   -0.795 0.438 2081  -1.816  0.1277
##  3x - control   -1.612 0.400 2081  -4.031  0.0001
## 
## Results are averaged over the levels of: observer 
## P value adjustment: dunnettx method for 2 tests

You should be familiar with this output now. The $emtrends section contains similar outputs to the $emmeans sections we looked at earlier when analyzing the fresh weight of the plants. The main difference is in the interpretation of the numbers in the day.trend column, which replaces the emmean from before. The day.trend column contains estimates of the slope of the regression line for the day variable at day == 5.

The numbers in the column are the estimated change in the height of the plant for a single day of growth:

  • If the day.trend number is positive it means the plants in a treatment group increased in height over the experiment, and
  • if the day.trend number is negative it means the plants in a treatment group decreased in height over the experiment.

The $contrasts section contains the same kind of comparisons of a treatments vs controls. Remember here that the estimate column contains the estimated difference in the slopes (growth rates) of the stated treatment and the control group. Hence the null hypothesis being tested is that the difference in slopes is equal to zero.

  • A negative value in the estimate column means the average change in height in the stated treatment was lower than in the control group. In other words, the plants in that treatment group grew more slowly on average than the plants in the control group, and
  • A positive value in the estimate column means the plants in the that treatment group grew, on average, at a faster rate than plants in the control group.

As before, we can plot these estimated marginal trends using the plot() method:

plot(m3mt5, comparisons = TRUE)

This plot is similar to one we created in the earlier lab when we looked the data from the 2019 experiment:

  • the black points are the estimated growth rates (regression slopes) for each treatment group,
  • the 95% confidence interval around the estimated growth rate is shown by the pale blue bar, and
  • the red arrows are approximate pairwise comparison regions — if arrows for a pair of treatments overlap one another we fail to reject the null hypothesis of equal growth rates for plants in the two treatments.

The main difference in this plot is that we no longer show the estimated marginal mean on the x axis. Instead the slope of the regression line (the growth rate) is plotted on the x axis. Remember how we interpret these numbers: the values are the estimated change, on average, in the height of a plant in single day of growth.

It look like there is little evidence in favor of different treatment effects at day 5. What about at a point later in the experiment? We can try day == 10 for example

m3mt10 <- emtrends(m3, trt.vs.ctrl ~ treatment,
                   var = "day",
                   transform = "response",
                   at = list(day = 10))
m3mt10
## $emtrends
##  treatment day.trend    SE   df lower.CL upper.CL
##  control        9.55 0.476 2081     8.62    10.49
##  1x             7.39 0.400 2081     6.61     8.18
##  3x             5.78 0.324 2081     5.14     6.41
## 
## Results are averaged over the levels of: observer 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast     estimate    SE   df t.ratio p.value
##  1x - control    -2.16 0.606 2081  -3.569  0.0007
##  3x - control    -3.78 0.563 2081  -6.713  <.0001
## 
## Results are averaged over the levels of: observer 
## P value adjustment: dunnettx method for 2 tests

Look at the $contrasts component and see if there are significant differences at this point in the experiment.

As before, we can plot these estimated marginal trends using the plot() method:

plot(m3mt10, comparisons = TRUE)

You can repeat these commands for different days in the experimental period.

All this talk of estimated trends or slopes may be a little opaque, but we can plot the estimated regression lines to visualize the average growth rates of plants in each of the treatment groups using the emmip() function:

emmip(m3, treatment ~ day, cov.reduce = FALSE, CIs = TRUE)

Here the formula means you want to show the treatment effects against day. The cov.reduce = FALSE part means that we want to plot for all values of day so that we get the full regression lines. Confidence intervals are added with CIs = TRUE — if you want to turn them off use CIs = FALSE.

Pairwise comparisons

If you want formal comparisons of the growth rates for all pairs of treatments, we can also do all pairwise comparisons using the emmtrends()function by using the pairwise option instead of trt-vs-ctrl. As before, we can assess the estimate growth rate (slope) at a given day, here for day == 5

emtrends(m3, pairwise ~ treatment,
         var = "day",
         transform = "response",
         at = list(day = 5))
## $emtrends
##  treatment day.trend    SE   df lower.CL upper.CL
##  control        6.45 0.318 2081     5.82     7.07
##  1x             5.65 0.308 2081     5.05     6.26
##  3x             4.84 0.250 2081     4.35     5.33
## 
## Results are averaged over the levels of: observer 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast     estimate    SE   df t.ratio p.value
##  control - 1x    0.795 0.438 2081   1.816  0.1645
##  control - 3x    1.612 0.400 2081   4.031  0.0002
##  1x - 3x         0.818 0.392 2081   2.083  0.0935
## 
## Results are averaged over the levels of: observer 
## P value adjustment: tukey method for comparing a family of 3 estimates

and for day == 20

emtrends(m3, pairwise ~ treatment,
         var = "day",
         transform = "response",
         at = list(day = 20))
## $emtrends
##  treatment day.trend    SE   df lower.CL upper.CL
##  control        2.51 0.696 2081    1.149     3.88
##  1x             2.74 0.597 2081    1.569     3.91
##  3x             1.78 0.448 2081    0.905     2.66
## 
## Results are averaged over the levels of: observer 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast     estimate    SE   df t.ratio p.value
##  control - 1x   -0.226 0.908 2081  -0.249  0.9663
##  control - 3x    0.730 0.819 2081   0.892  0.6456
##  1x - 3x         0.956 0.736 2081   1.299  0.3959
## 
## Results are averaged over the levels of: observer 
## P value adjustment: tukey method for comparing a family of 3 estimates

We don’t see any differences in the estimated growth rates, but are there differences in the average plant height between the two treatments? We can test that using the emmeans() function, again using the pairwise option instead of trt-vs-ctrl. We use emmeans() here because we want estimates of the expected mean height for each treatment at a set of days, and not the slope of a trend line.

emmeans(m3, pairwise ~ treatment | day,
        at = list(day = c(1, 5, 10, 15, 20)))
## Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
## $emmeans
## day =  1:
##  treatment emmean     SE   df lower.CL upper.CL
##  control     1.36 0.1271 2081    1.108     1.61
##  1x          1.59 0.1335 2081    1.330     1.85
##  3x          1.19 0.1308 2081    0.938     1.45
## 
## day =  5:
##  treatment emmean     SE   df lower.CL upper.CL
##  control     2.88 0.0473 2081    2.783     2.97
##  1x          2.89 0.0493 2081    2.798     2.99
##  3x          2.66 0.0478 2081    2.561     2.75
## 
## day = 10:
##  treatment emmean     SE   df lower.CL upper.CL
##  control     3.98 0.0354 2081    3.906     4.05
##  1x          3.85 0.0348 2081    3.780     3.92
##  3x          3.64 0.0353 2081    3.574     3.71
## 
## day = 15:
##  treatment emmean     SE   df lower.CL upper.CL
##  control     4.46 0.0304 2081    4.402     4.52
##  1x          4.29 0.0298 2081    4.229     4.35
##  3x          4.03 0.0307 2081    3.966     4.09
## 
## day = 20:
##  treatment emmean     SE   df lower.CL upper.CL
##  control     4.63 0.0367 2081    4.555     4.70
##  1x          4.46 0.0361 2081    4.394     4.54
##  3x          4.15 0.0383 2081    4.079     4.23
## 
## Results are averaged over the levels of: observer 
## Results are given on the log1p (not the response) scale. 
## Confidence level used: 0.95 
## 
## $contrasts
## day =  1:
##  contrast     estimate     SE   df t.ratio p.value
##  control - 1x   -0.235 0.1820 2081  -1.290  0.4011
##  control - 3x    0.163 0.1802 2081   0.903  0.6385
##  1x - 3x         0.397 0.1845 2081   2.154  0.0795
## 
## day =  5:
##  contrast     estimate     SE   df t.ratio p.value
##  control - 1x   -0.019 0.0671 2081  -0.283  0.9567
##  control - 3x    0.220 0.0661 2081   3.336  0.0025
##  1x - 3x         0.239 0.0675 2081   3.550  0.0011
## 
## day = 10:
##  contrast     estimate     SE   df t.ratio p.value
##  control - 1x    0.127 0.0494 2081   2.582  0.0267
##  control - 3x    0.332 0.0497 2081   6.686  <.0001
##  1x - 3x         0.205 0.0492 2081   4.156  0.0001
## 
## day = 15:
##  contrast     estimate     SE   df t.ratio p.value
##  control - 1x    0.175 0.0424 2081   4.120  0.0001
##  control - 3x    0.435 0.0431 2081  10.090  <.0001
##  1x - 3x         0.260 0.0427 2081   6.105  <.0001
## 
## day = 20:
##  contrast     estimate     SE   df t.ratio p.value
##  control - 1x    0.162 0.0512 2081   3.169  0.0044
##  control - 3x    0.473 0.0527 2081   8.961  <.0001
##  1x - 3x         0.310 0.0522 2081   5.949  <.0001
## 
## Results are averaged over the levels of: observer 
## Note: contrasts are still on the log1p scale 
## P value adjustment: tukey method for comparing a family of 3 estimates

There is a lot of output here, but it follows that which we have seen previously; The $emmeans component contains the point estimates of plant height for the three treatments for each of the requested days, while the $contrasts component shows the pairwise comparisons for the treatment.

Are any of these pairwise comparisons statistically significant? Does which pairs of comparisons are significantly different change over time? What does this suggest about the effects of the treatment and the cultivar on the growth of the plants?

Plant health

The plant health data are not as easily analyzed as the plant height data; health is an ordered categorical response and that requires a more specialized model, such as a proportional odds model or assuming an ordered categorical response distribution. These models are non-trivial and as such we won’t explore the health data statistically here although doing so is most certainly possible.

Instead, we’ll just look at the data, which you can describe.

ggplot(plants, aes(x = day, y = health, group = id,
                   colour = treatment)) +
    geom_line() +
    facet_wrap(~ observer) +
    labs(y = 'Health (1-5)', x = 'Day', colour = 'Treatment')
## Warning: Removed 624 row(s) containing missing values (geom_path).

Number of internodes

We can analyse the number of internodes in a similar way to that for plant height, with an ANCOVA. The data can be plotted using

ggplot(plants, aes(x = day, y = log1p(internodes), group = id,
                   colour = treatment)) +
    geom_line() +
    facet_wrap(~ observer) +
    labs(y = 'Number of internodes', x = 'Day', colour = 'Treatment')
## Warning: Removed 670 row(s) containing missing values (geom_path).

First we filter the data

plants_i <- filter(plants, !is.na(internodes))

Then we fit the models, again allowing for polynomials of day

i1 <- lm(log1p(internodes) ~ day * treatment + observer, data = plants_i)
i2 <- update(i1, . ~ poly(day, 2) * treatment + observer)
i3 <- update(i1, . ~ poly(day, 3) * treatment + observer)

And finally we can compre these to see which fits the data best

anova(i1, i2, i3)
## Analysis of Variance Table
## 
## Model 1: log1p(internodes) ~ day * treatment + observer
## Model 2: log1p(internodes) ~ poly(day, 2) + treatment + observer + poly(day, 
##     2):treatment
## Model 3: log1p(internodes) ~ poly(day, 3) + treatment + observer + poly(day, 
##     3):treatment
##   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
## 1   2067 294.70                                   
## 2   2064 245.88  3    48.818 139.942 < 2.2e-16 ***
## 3   2061 239.66  3     6.231  17.861 1.915e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(i1, i2, i3)
##    df      AIC
## i1 22 1881.196
## i2 25 1509.053
## i3 28 1461.461

Working from i3 then, we repeat some of the emtrends() and emmeans() code we used earlier but applied to i3 to investigate changes and difference in the number of internodes over time and between treatments.

Exporting images & files

To save a plot to a file, click on the Plots tab in the lower right window, and use the arrow icons to page through the plots to the one you want. Then click the Export

From the menu that appears choose Save as Image…

This brings up the Save Plot as Image dialogue box. Make the following changes:

  1. In the Image format box, select PNG
  2. In the File name box enter the file name you want for the image, include the .png extension
  3. Change the height and width to suit
    • You can set the height and width in pixel in the boxes to the top right of the dialogue box. If you change these you need to click the Update Preview button to redraw the plot
    • Alternatively, you can use the hotspot to the bottom right corner of the plot to drag to resize. Click the Update Preview button to redraw the plot to the desired size

When you have made the required changes, click Save.

To export saved images or your script to your local machine, select the Files tab in the lower right window

Check the boxes next to the files you wish to export (download) as shown above, and then click the More button in the menu and select the Export… option.

Enter a filename — if you are exporting more than one file, RStudio will compress all the files into a single ZIP archive — if needed, but the provided name should be OK. Then click Download to download the file or the ZIP archive.

Logging Off

Your work will be saved in your rstudio.cloud workspace for some time after the lab. You will be able to resume from where you left off if you log in to your workspace at a late time. To be sure that the latest state of your work is saved, it would be best to log out of rstudio.cloud at the end of class today.

Locate your user name in the top right of the screen

Click your user name to open the menu

Click “Log out” to securely log out of rstudio.cloud.