Remember to change the author: field on this Rmd file to your own name.

Learning objectives

In today’s Lab you will gain practice with the following concepts from Lecture 5:

Problems

library(tidyverse)
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
Cars93 <- as_tibble(MASS::Cars93)  # Pull Cars93 from MASS

1. map() practice

Note: This question previously (accidentally) appeared on Lab 4. Feel free to skip it if you already succeeded on this question in the previous week.

(a) The nlevels command tells you the number of levels in a factor variable. Use this function in combination with summarize_if() to produce an integer vector showing the number of levels for each factor variables in the Cars93 data.

Cars93 %>%
  summarize_if(is.factor, nlevels)
## # A tibble: 1 x 9
##   Manufacturer Model  Type AirBags DriveTrain Cylinders Man.trans.avail
##          <int> <int> <int>   <int>      <int>     <int>           <int>
## 1           32    93     6       3          3         6               2
## # … with 2 more variables: Origin <int>, Make <int>

(b) levels() returns the possible levels of a factor variable. Use this function in combination with select and map to create a list of all the levels of the Manufacturer, AirBags, DriveTrain, and Man.trans.avail variables

Cars93 %>%
  select(Manufacturer, AirBags, DriveTrain, Man.trans.avail) %>%
  map(levels)
## $Manufacturer
##  [1] "Acura"         "Audi"          "BMW"           "Buick"        
##  [5] "Cadillac"      "Chevrolet"     "Chrylser"      "Chrysler"     
##  [9] "Dodge"         "Eagle"         "Ford"          "Geo"          
## [13] "Honda"         "Hyundai"       "Infiniti"      "Lexus"        
## [17] "Lincoln"       "Mazda"         "Mercedes-Benz" "Mercury"      
## [21] "Mitsubishi"    "Nissan"        "Oldsmobile"    "Plymouth"     
## [25] "Pontiac"       "Saab"          "Saturn"        "Subaru"       
## [29] "Suzuki"        "Toyota"        "Volkswagen"    "Volvo"        
## 
## $AirBags
## [1] "Driver & Passenger" "Driver only"        "None"              
## 
## $DriveTrain
## [1] "4WD"   "Front" "Rear" 
## 
## $Man.trans.avail
## [1] "No"  "Yes"

2. mutate() variants with Cars93

(a) Use the toupper() command in combination with mutate_if() to produce a new version of Cars93 where every factor variable has been converted to upper case.

Cars93 %>%
  mutate_if(is.factor, toupper)
## # A tibble: 93 x 27
##    Manufacturer Model Type  Min.Price Price Max.Price MPG.city MPG.highway
##    <chr>        <chr> <chr>     <dbl> <dbl>     <dbl>    <int>       <int>
##  1 ACURA        INTE… SMALL      12.9  15.9      18.8       25          31
##  2 ACURA        LEGE… MIDS…      29.2  33.9      38.7       18          25
##  3 AUDI         90    COMP…      25.9  29.1      32.3       20          26
##  4 AUDI         100   MIDS…      30.8  37.7      44.6       19          26
##  5 BMW          535I  MIDS…      23.7  30        36.2       22          30
##  6 BUICK        CENT… MIDS…      14.2  15.7      17.3       22          31
##  7 BUICK        LESA… LARGE      19.9  20.8      21.7       19          28
##  8 BUICK        ROAD… LARGE      22.6  23.7      24.9       16          25
##  9 BUICK        RIVI… MIDS…      26.3  26.3      26.3       19          27
## 10 CADILLAC     DEVI… LARGE      33    34.7      36.3       16          25
## # … with 83 more rows, and 19 more variables: AirBags <chr>,
## #   DriveTrain <chr>, Cylinders <chr>, EngineSize <dbl>, Horsepower <int>,
## #   RPM <int>, Rev.per.mile <int>, Man.trans.avail <chr>,
## #   Fuel.tank.capacity <dbl>, Passengers <int>, Length <int>,
## #   Wheelbase <int>, Width <int>, Turn.circle <int>, Rear.seat.room <dbl>,
## #   Luggage.room <int>, Weight <int>, Origin <chr>, Make <chr>

(b) Currently the price columns of the Cars93 reflect prices in $1000’s of dollars. Use mutate_at to create a version of Cars93 where all prices are in $’s. (e.g., what used to be a price of 12.9 should become 12900).

Cars93 %>% 
  mutate_at(vars(contains("Price")), ~ .x * 1000)
## # A tibble: 93 x 27
##    Manufacturer Model Type  Min.Price Price Max.Price MPG.city MPG.highway
##    <fct>        <fct> <fct>     <dbl> <dbl>     <dbl>    <int>       <int>
##  1 Acura        Inte… Small     12900 15900    18800        25          31
##  2 Acura        Lege… Mids…     29200 33900    38700        18          25
##  3 Audi         90    Comp…     25900 29100    32300.       20          26
##  4 Audi         100   Mids…     30800 37700    44600        19          26
##  5 BMW          535i  Mids…     23700 30000    36200        22          30
##  6 Buick        Cent… Mids…     14200 15700    17300        22          31
##  7 Buick        LeSa… Large     19900 20800    21700        19          28
##  8 Buick        Road… Large     22600 23700    24900        16          25
##  9 Buick        Rivi… Mids…     26300 26300    26300        19          27
## 10 Cadillac     DeVi… Large     33000 34700    36300        16          25
## # … with 83 more rows, and 19 more variables: AirBags <fct>,
## #   DriveTrain <fct>, Cylinders <fct>, EngineSize <dbl>, Horsepower <int>,
## #   RPM <int>, Rev.per.mile <int>, Man.trans.avail <fct>,
## #   Fuel.tank.capacity <dbl>, Passengers <int>, Length <int>,
## #   Wheelbase <int>, Width <int>, Turn.circle <int>, Rear.seat.room <dbl>,
## #   Luggage.room <int>, Weight <int>, Origin <fct>, Make <fct>

(c) Use mutate_if to normalize all of the numeric variables in the Cars93 data to have variance 1. Save the resulting mutated data in a variable called Cars93.norm. (Hint: this is equivalent to dividing each of the columns by the standard deviation of the given column.)

Cars93.norm <- Cars93 %>%
  mutate_if(is.numeric, ~ .x / sd(.x))

To check that you’ve succeeded, you can confirm that the following lines of code all return the answer 1.

var(Cars93.norm$Min.Price)
## [1] 1
var(Cars93.norm$Horsepower)
## [1] 1

3. summarize() variants

Use summarize_if to calculate the standard deviation of every numeric column in the original Cars93 data. You’ll want to further specify na.rm = TRUE to ensure that you get a non-NA output value even for variables that have some missing (NA) observations.

Cars93 %>%
  summarize_if(is.numeric, sd, na.rm = TRUE)
## # A tibble: 1 x 18
##   Min.Price Price Max.Price MPG.city MPG.highway EngineSize Horsepower
##       <dbl> <dbl>     <dbl>    <dbl>       <dbl>      <dbl>      <dbl>
## 1      8.75  9.66      11.0     5.62        5.33       1.04       52.4
## # … with 11 more variables: RPM <dbl>, Rev.per.mile <dbl>,
## #   Fuel.tank.capacity <dbl>, Passengers <dbl>, Length <dbl>,
## #   Wheelbase <dbl>, Width <dbl>, Turn.circle <dbl>, Rear.seat.room <dbl>,
## #   Luggage.room <dbl>, Weight <dbl>