Lab 4 Solution

Remember to change the `author:` field on this Rmd file to your own name.

library(tidyverse)

## ── Attaching packages ───────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ──────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Cars93 <- MASS::Cars93

1. Loop practice

(a) Write a function called calculateRowMeans that uses a for loop to calculate the row means of a matrix x.

# calculateRowMeans computes the row means of a matrix x
# input: matrix x
# output: vector of length nrow(x) giving row means of x
calculateRowMeans <- function(x) {
  row.means <- numeric(nrow(x))
  for(i in 1:nrow(x)) {
    row.means[i] <- mean(x[i,])
  }
  row.means
}

(b) Try out your function on the random matrix fake.data defined below.

set.seed(12345) # Set seed of random number generator
fake.data <- matrix(runif(800), nrow=25)
calculateRowMeans(fake.data)

##  [1] 0.5339087 0.6259388 0.4966049 0.5399315 0.5049318 0.5633372 0.4686503
##  [8] 0.4196579 0.5273801 0.4639143 0.5472661 0.5043049 0.6169601 0.4690874
## [15] 0.4920191 0.5841288 0.6108891 0.4879246 0.5401770 0.5223512 0.5086669
## [22] 0.4643891 0.5250635 0.4791480 0.5795024

(c) Use the apply() function to calculate the row means of the matrix fake.data

apply(fake.data, MARGIN=1, FUN=mean)

##  [1] 0.5339087 0.6259388 0.4966049 0.5399315 0.5049318 0.5633372 0.4686503
##  [8] 0.4196579 0.5273801 0.4639143 0.5472661 0.5043049 0.6169601 0.4690874
## [15] 0.4920191 0.5841288 0.6108891 0.4879246 0.5401770 0.5223512 0.5086669
## [22] 0.4643891 0.5250635 0.4791480 0.5795024

(d) Compare this to the output of the rowMeans() function to check that your calculation is correct.

identical(calculateRowMeans(fake.data), apply(fake.data, MARGIN=1, FUN=mean))

## [1] TRUE

2. summarize() practice

(a) Use group_by() and summarize() commands on the Cars93 data set to create a table showing the average Turn.circle of cars, broken down by vehicle Type and DriveTrain

Cars93 %>%
  group_by(Type, DriveTrain) %>%
  summarize(mean(Turn.circle))

## # A tibble: 14 x 3
## # Groups:   Type [6]
##    Type    DriveTrain `mean(Turn.circle)`
##    <fct>   <fct>                    <dbl>
##  1 Compact 4WD                       37  
##  2 Compact Front                     38.8
##  3 Compact Rear                      35.5
##  4 Large   Front                     42  
##  5 Large   Rear                      43.8
##  6 Midsize Front                     40.5
##  7 Midsize Rear                      39  
##  8 Small   4WD                       33.5
##  9 Small   Front                     35.3
## 10 Sporty  4WD                       39.5
## 11 Sporty  Front                     37  
## 12 Sporty  Rear                      41.2
## 13 Van     4WD                       41.8
## 14 Van     Front                     41.8

(b) Are all combinations of Type and DriveTrain shown in the table? If not, which ones are missing? Why are they missing?

Some are missing. E.g., there is no entry for Large 4WD cars. This is because there are no vehicles in this category.

sum(Cars93$Type == "Large" & Cars93$DriveTrain == "4WD")

## [1] 0

(c) Add the argument .drop = FALSE to your group_by command, and then re-run your code. What happens now?

Cars93 %>%
  group_by(Type, DriveTrain, .drop = FALSE) %>%
  summarize(mean(Turn.circle))

## # A tibble: 18 x 3
## # Groups:   Type [6]
##    Type    DriveTrain `mean(Turn.circle)`
##    <fct>   <fct>                    <dbl>
##  1 Compact 4WD                       37  
##  2 Compact Front                     38.8
##  3 Compact Rear                      35.5
##  4 Large   4WD                      NaN  
##  5 Large   Front                     42  
##  6 Large   Rear                      43.8
##  7 Midsize 4WD                      NaN  
##  8 Midsize Front                     40.5
##  9 Midsize Rear                      39  
## 10 Small   4WD                       33.5
## 11 Small   Front                     35.3
## 12 Small   Rear                     NaN  
## 13 Sporty  4WD                       39.5
## 14 Sporty  Front                     37  
## 15 Sporty  Rear                      41.2
## 16 Van     4WD                       41.8
## 17 Van     Front                     41.8
## 18 Van     Rear                     NaN

The .drop argument, which is set to TRUE by default, controls whether variable combinatinos that never appear together are dropped. When we set .drop = FALSE the combinations with 0 counts still appear in the table, but the summary shows NaN in that cell (not a number).

(d) Having a car with a small turn radius makes city driving much easier. What Type of car should city drivers opt for?

Small cars appear to have smaller turn radii.

(e) Does the vehicle’s DriveTrain appear to have an impact on turn radius?

There is no consistent association.

Lab 4 Solution

Your Name Here

Remember to change the author: field on this Rmd file to your own name.

1. Loop practice

2. summarize() practice

Remember to change the `author:` field on this Rmd file to your own name.