In today’s Lab you will gain practice with the following concepts from Lecture 5:
apply
and map
as loop alternativesmutate
commands to manipulate datasummarize
commands to produce simple tabular summaries, and interpreting the resultslibrary(tidyverse)
## ── Attaching packages ───────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Cars93 <- as_tibble(MASS::Cars93) # Pull Cars93 from MASS
Note: This question previously (accidentally) appeared on Lab 4. Feel free to skip it if you already succeeded on this question in the previous week.
(a) The nlevels
command tells you the number of levels in a factor variable. Use this function in combination with summarize_if()
to produce an integer vector showing the number of levels for each factor variables in the Cars93 data.
Cars93 %>%
summarize_if(is.factor, nlevels)
## # A tibble: 1 x 9
## Manufacturer Model Type AirBags DriveTrain Cylinders Man.trans.avail
## <int> <int> <int> <int> <int> <int> <int>
## 1 32 93 6 3 3 6 2
## # … with 2 more variables: Origin <int>, Make <int>
(b) levels()
returns the possible levels of a factor variable. Use this function in combination with select
and map
to create a list of all the levels of the Manufacturer, AirBags, DriveTrain, and Man.trans.avail variables
Cars93 %>%
select(Manufacturer, AirBags, DriveTrain, Man.trans.avail) %>%
map(levels)
## $Manufacturer
## [1] "Acura" "Audi" "BMW" "Buick"
## [5] "Cadillac" "Chevrolet" "Chrylser" "Chrysler"
## [9] "Dodge" "Eagle" "Ford" "Geo"
## [13] "Honda" "Hyundai" "Infiniti" "Lexus"
## [17] "Lincoln" "Mazda" "Mercedes-Benz" "Mercury"
## [21] "Mitsubishi" "Nissan" "Oldsmobile" "Plymouth"
## [25] "Pontiac" "Saab" "Saturn" "Subaru"
## [29] "Suzuki" "Toyota" "Volkswagen" "Volvo"
##
## $AirBags
## [1] "Driver & Passenger" "Driver only" "None"
##
## $DriveTrain
## [1] "4WD" "Front" "Rear"
##
## $Man.trans.avail
## [1] "No" "Yes"
mutate()
variants with Cars93(a) Use the toupper()
command in combination with mutate_if()
to produce a new version of Cars93 where every factor variable has been converted to upper case.
Cars93 %>%
mutate_if(is.factor, toupper)
## # A tibble: 93 x 27
## Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <int> <int>
## 1 ACURA INTE… SMALL 12.9 15.9 18.8 25 31
## 2 ACURA LEGE… MIDS… 29.2 33.9 38.7 18 25
## 3 AUDI 90 COMP… 25.9 29.1 32.3 20 26
## 4 AUDI 100 MIDS… 30.8 37.7 44.6 19 26
## 5 BMW 535I MIDS… 23.7 30 36.2 22 30
## 6 BUICK CENT… MIDS… 14.2 15.7 17.3 22 31
## 7 BUICK LESA… LARGE 19.9 20.8 21.7 19 28
## 8 BUICK ROAD… LARGE 22.6 23.7 24.9 16 25
## 9 BUICK RIVI… MIDS… 26.3 26.3 26.3 19 27
## 10 CADILLAC DEVI… LARGE 33 34.7 36.3 16 25
## # … with 83 more rows, and 19 more variables: AirBags <chr>,
## # DriveTrain <chr>, Cylinders <chr>, EngineSize <dbl>, Horsepower <int>,
## # RPM <int>, Rev.per.mile <int>, Man.trans.avail <chr>,
## # Fuel.tank.capacity <dbl>, Passengers <int>, Length <int>,
## # Wheelbase <int>, Width <int>, Turn.circle <int>, Rear.seat.room <dbl>,
## # Luggage.room <int>, Weight <int>, Origin <chr>, Make <chr>
(b) Currently the price columns of the Cars93
reflect prices in $1000’s of dollars. Use mutate_at
to create a version of Cars93
where all prices are in $’s. (e.g., what used to be a price of 12.9 should become 12900).
Cars93 %>%
mutate_at(vars(contains("Price")), ~ .x * 1000)
## # A tibble: 93 x 27
## Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway
## <fct> <fct> <fct> <dbl> <dbl> <dbl> <int> <int>
## 1 Acura Inte… Small 12900 15900 18800 25 31
## 2 Acura Lege… Mids… 29200 33900 38700 18 25
## 3 Audi 90 Comp… 25900 29100 32300. 20 26
## 4 Audi 100 Mids… 30800 37700 44600 19 26
## 5 BMW 535i Mids… 23700 30000 36200 22 30
## 6 Buick Cent… Mids… 14200 15700 17300 22 31
## 7 Buick LeSa… Large 19900 20800 21700 19 28
## 8 Buick Road… Large 22600 23700 24900 16 25
## 9 Buick Rivi… Mids… 26300 26300 26300 19 27
## 10 Cadillac DeVi… Large 33000 34700 36300 16 25
## # … with 83 more rows, and 19 more variables: AirBags <fct>,
## # DriveTrain <fct>, Cylinders <fct>, EngineSize <dbl>, Horsepower <int>,
## # RPM <int>, Rev.per.mile <int>, Man.trans.avail <fct>,
## # Fuel.tank.capacity <dbl>, Passengers <int>, Length <int>,
## # Wheelbase <int>, Width <int>, Turn.circle <int>, Rear.seat.room <dbl>,
## # Luggage.room <int>, Weight <int>, Origin <fct>, Make <fct>
(c) Use mutate_if
to normalize all of the numeric variables in the Cars93
data to have variance 1. Save the resulting mutated data in a variable called Cars93.norm
. (Hint: this is equivalent to dividing each of the columns by the standard deviation of the given column.)
Cars93.norm <- Cars93 %>%
mutate_if(is.numeric, ~ .x / sd(.x))
To check that you’ve succeeded, you can confirm that the following lines of code all return the answer 1
.
var(Cars93.norm$Min.Price)
## [1] 1
var(Cars93.norm$Horsepower)
## [1] 1
summarize()
variantsUse summarize_if
to calculate the standard deviation of every numeric column in the original Cars93
data. You’ll want to further specify na.rm = TRUE
to ensure that you get a non-NA
output value even for variables that have some missing (NA
) observations.
Cars93 %>%
summarize_if(is.numeric, sd, na.rm = TRUE)
## # A tibble: 1 x 18
## Min.Price Price Max.Price MPG.city MPG.highway EngineSize Horsepower
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 8.75 9.66 11.0 5.62 5.33 1.04 52.4
## # … with 11 more variables: RPM <dbl>, Rev.per.mile <dbl>,
## # Fuel.tank.capacity <dbl>, Passengers <dbl>, Length <dbl>,
## # Wheelbase <dbl>, Width <dbl>, Turn.circle <dbl>, Rear.seat.room <dbl>,
## # Luggage.room <dbl>, Weight <dbl>