In today’s Lab you will gain practice with the following concepts from today’s class:
- Interpreting linear regression coefficients of numeric covariates
- Interpreting linear regression coefficients of categorical variables
- Fitting linear regression models with interaction terms
- Interpreting linear regression coefficients of interaction terms
We’ll begin by loading some packages and importing the data.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.3.2 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## ── Conflicts ────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
# Import data
gapminder <- read_delim("http://www.andrew.cmu.edu/user/achoulde/94842/data/gapminder_five_year.txt", delim = "\t")
## Parsed with column specification:
## cols(
## country = col_character(),
## year = col_double(),
## pop = col_double(),
## continent = col_character(),
## lifeExp = col_double(),
## gdpPercap = col_double()
## )
(a) Run a linear regression to better understand how birthweight varies with the mother’s age and smoking status (do not include interaction terms).
# Edit me
(b) What is the coefficient of mother.age in your regression? How do you interpret this coefficient?
# Edit me
(c) How many coefficients are estimated for the mother’s smoking status variable? How do you interpret these coefficients?
# Edit me
(d) What does the intercept mean in this model?
(e) Using ggplot, construct a scatterplot with birthweight on the y-axis and mother’s age on the x-axis. Color the points by mother’s smoking status, and add smoking status-specific linear regression lines using the stat_smooth
layer.
# Edit me
(f) Do the regression lines plotted in part (e) correspond to the model you fit in part (a)? How can you tell?
(g) Fit a linear regression model that now models potential interactions between mother’s age and smoking status in their effect on birthweight.
# Edit me
(h) Interpret your model. Is the interaction term statistically significant? What does it mean?
# Edit me