There are certain situations where we want to transform right-skewed data before analysing it. Taking the log of right-skewed data often helps to make it more normally distributed.
Here are histograms of the MPG.highway
and MPG.city
variables.
qplot(MPG.city, data = Cars93, bins = 10)
qplot(MPG.highway, data = Cars93, bins = 10)
(a) Do the city and highway gas-mileage figures appear to have right-skewed distributions?
Your answer: Yes. Most of the the mass is closely concentrated near low MPG values, and there’s a long right tail indicating a small proportion of cars that have very high MPG.
(b) Use the mutate()
and log()
functions to create a new data frame called Cars93.log
that has MPG.highway
and MPG.city
replaced with log(MPG.highway)
and log(MPG.city)
.
Cars93.log <- mutate(Cars93, MPG.highway = log(MPG.highway), MPG.city = log(MPG.city))
(c) Run the histogram commands again, this time using your new Cars93.log
dataset instead of Cars93
.
qplot(MPG.city, data = Cars93.log, bins = 10)
qplot(MPG.highway, data = Cars93.log, bins = 10)
(d) Do the distributions appear less skewed than before?
The MPG highway distribution does look more symmetric.
(a) Use the table()
function to tabulate the data by DriveTrain and Origin.
table(Cars93$DriveTrain, Cars93$Origin)
##
## USA non-USA
## 4WD 5 5
## Front 34 33
## Rear 9 7
(b) Repeat part (a), this time using the count()
function.
Cars93 %>%
count(DriveTrain, Origin)
## # A tibble: 6 x 3
## DriveTrain Origin n
## <fct> <fct> <int>
## 1 4WD USA 5
## 2 4WD non-USA 5
## 3 Front USA 34
## 4 Front non-USA 33
## 5 Rear USA 9
## 6 Rear non-USA 7
(c) Does it looks like foreign car manufacturers had different Drivetrain production preferences compared to US manufacturers?
Your answer: The counts for each Drivetrain category are nearly the same for US and non-US manufacturers. The table suggests that they had similar Drivetrain production preferences.
(a) Write a function called isPassingGrade
whose input x
is a number, and which returns FALSE
if x
is lower than 50 and TRUE
otherwise.
isPassingGrade <- function(x) {
x >= 50
}
(b) Write a function called sendMessage
whose input x
is a number, and which prints Congratulations
if isPassingGrade(x)
is TRUE
and prints Oh no!
if isPassingGrade(x)
is FALSE
.
sendMessage <- function(x) {
if(isPassingGrade(x)) {
print("Congratulations!")
} else {
print("Oh no!")
}
}
# Here's another way of accomplishing the same thing
sendMessage2 <- function(x) print(ifelse(isPassingGrade(x), "Congratulations", "Oh no!"))
(c) Write a function called gradeSummary
whose input x
is a number. Your function will return a list with two elements, named letter.grade
and passed
. The letter grade will be "A"
if x
is at least 90
. The letter grade will be "B"
if x
is between 80
and 90
. The letter grade will be "F"
if x
is lower than "80"
. If the student’s letter grade is an A or B, passed
should be TRUE; passed
should be FALSE otherwise.
gradeSummary <- function(x) {
if(x >= 90) {
letter.grade <- "A"
passed <- TRUE
} else if (x >= 80) {
letter.grade <- "B"
passed <- TRUE
} else {
letter.grade <- "F"
passed <- FALSE
}
list(letter.grade = letter.grade, passed = passed)
}
gradeSummary(91)
## $letter.grade
## [1] "A"
##
## $passed
## [1] TRUE
gradeSummary(62)
## $letter.grade
## [1] "F"
##
## $passed
## [1] FALSE
To check if your function works, try the following cases:
x = 91
should return
## $letter.grade
## [1] "A"
##
## $passed
## [1] TRUE
x = 62
should return
## $letter.grade
## [1] "F"
##
## $passed
## [1] FALSE