Additive models

Solutions

Load packages and data

Today

By the end of today you will…

  • understand the difference between and additive vs interaction model
  • understand the geometric picture of multiple linear regression
  • be able to build, fit and interpret linear models with \(>1\) predictor
  • think critically about r-squared as a model selection tool

Fitting the additive model

To fit the additive model, we can use the + sign. Use the plus sign to add species to the linear model code fit from Monday’s class.

model1 <- lm(flipper_length_mm ~ bill_length_mm + species, data = penguins)

summary(model1)

Call:
lm(formula = flipper_length_mm ~ bill_length_mm + species, data = penguins)

Residuals:
     Min       1Q   Median       3Q      Max 
-24.7485  -3.4135  -0.0681   3.6607  15.9965 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      147.9511     4.1738  35.447   <2e-16 ***
bill_length_mm     1.0828     0.1069  10.129   <2e-16 ***
speciesChinstrap  -5.0039     1.3698  -3.653    3e-04 ***
speciesGentoo     17.7986     1.1698  15.216   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.826 on 338 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.8299,    Adjusted R-squared:  0.8284 
F-statistic: 549.6 on 3 and 338 DF,  p-value: < 2.2e-16

Prediction using R

Let’s use R to make predictions using this additive model. Use R to predict the flipper length for a Gentoo penguin that has a bill length of 60.

predict(model1, data.frame(bill_length_mm = 60,  species = "Gentoo"))

Interpretation

Now, let’s interpret these coefficients in the context of the problem:

Intercept: For a bill length of 0, we estimate the mean flipper length for the Adelie penguins to be 147.563mm

speciesChinstrap: Holding bill length constant, we estimate the mean flipper length of Chinstrap penguins to be 5.247mm lower than Adelie penguins

bill_length_mm: Holding species constant, for a 1 mm increase in bill length, we estimate the mean flipper length to increase by 1.09mm.

Can we do this with 2 quantitative variables?

Yes! Let’s look at the explanatory variables bill length (mm) and body mass (g).

The concept is the same, the picture is a bit different! What about, instead of species, we wanted to use body_mass_g. Note, the following code is to help us understand the material, and is not a learning objective of the course. The code you need to know is lm.

s3d <- penguins |>
  dplyr::select(bill_length_mm, body_mass_g, flipper_length_mm) |>
  scatterplot3d(xlab = "bill length (mm)", 
                ylab = "body mass (g)",
                zlab = "flipper length (mm)", 
                main = "additive model with 2 quan variables")
Warning: Unknown or uninitialised column: `color`.
model2 <- lm(flipper_length_mm ~ bill_length_mm + body_mass_g, penguins)

s3d$plane3d(model2)

round(summary(model2)$coefficients,3)
               Estimate Std. Error t value Pr(>|t|)
(Intercept)     121.956      2.855  42.715        0
bill_length_mm    0.549      0.080   6.859        0
body_mass_g       0.013      0.001  23.939        0