In today’s Lab you will gain practice with the following concepts from today’s class:
- Interpreting linear regression coefficients of numeric covariates
- Interpreting linear regression coefficients of categorical variables
- Applying the “2 standard error rule” to construct approximate 95% confidence intervals for regression coefficients
- Using the
confint
command to construct confidence intervals for regression coefficients- Using
pairs
plots to diagnose collinearity- Using the
update
command to update a linear regression model object- Diagnosing violations of linear model assumptions using
plot
We’ll begin by loading some packages.
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.3.2 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## ── Conflicts ─────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(knitr)
Cars93 <- as_tibble(MASS::Cars93)
# If you want to experiment with the ggpairs command,
# you'll want to run the following code:
# install.packages("GGally")
# library(GGally)
(a) Use the lm()
function to regress Price on: EngineSize, Origin, MPG.highway, MPG.city and Horsepower.
# Edit me
(b) Use the kable()
command to produce a nicely formatted coefficients table. Ensure that values are rounded to an appropriate number of decimal places.
# Edit me
Replace this text with your answer.
(c) Interpret the coefficient of Originnon-USA
. Is it statistically significant?
# Edit me
Replace this text with your answer.
(d) Interpret the coefficient of MPG.highway
. Is it statistically significant?
# Edit me
Replace this text with your answer.
(d) Use the “2 standard error rule” to construct an approximate 95% confidence interval for the coefficient of MPG.highway
. Compare this to the 95% CI obtained by using the confint
command.
# Edit me
Replace this text with your answer.
(e) Run the pairs
command on the following set of variables: EngineSize, MPG.highway, MPG.city and Horsepower. Display correlations in the Do you observe any collinearities?
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.4/strwidth(txt)
text(0.5, 0.5, txt, cex = pmax(1, cex.cor * r))
}
# Edit me
Replace this text with your answer.
(f) Use the update
command to update your regression model to exclude EngineSize
and MPG.city
. Display the resulting coefficients table nicely using the kable()
command.
# Edit me
(g) Does the coefficient of MPG.highway
change much from the original model? Calculate a 95% confidence interval and compare your answer to part (d). Does the CI change much from before? Explain.
# Edit me
Replace this text with your answer.
(h) Run the plot
command on the linear model you constructed in part (f). Do you notice any issues?
# Edit me
Replace this text with your answer.