library(tidyverse)
## ── Attaching packages ──────────────────
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(plotly) # for interactive graphics
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(DT)
options(scipen = 4)
We’ll illustrate some examples using a bunch of different data sets.
flights
: This data contains information on all flights departing from one of the 3 NYC airports (EWR, LGA, JFK) in 2013.diamonds
: You’ve seen this one beforetxhousing
: This data contains information on the Texas housing market from 2000 - 2015gapminder
: You’ve seen this one before# You'll need to run install.packages("nycflights13") and
# install.packages("gapminder")
flights <- nycflights13::flights
# Load the data from the gapminder library
data(gapminder, package = "gapminder")
Sometimes it’s helpful to output interactive summary or data tables into our reports. We can do this with the datatable
function.
# Printing data
flights %>%
group_by(carrier, origin) %>%
summarize(`Average delay (mins)` = round(mean(dep_delay, na.rm = TRUE), 0))
## # A tibble: 35 x 3
## # Groups: carrier [16]
## carrier origin `Average delay (mins)`
## <chr> <chr> <dbl>
## 1 9E EWR 6
## 2 9E JFK 19
## 3 9E LGA 9
## 4 AA EWR 10
## 5 AA JFK 10
## 6 AA LGA 7
## 7 AS EWR 6
## 8 B6 EWR 13
## 9 B6 JFK 13
## 10 B6 LGA 15
## # … with 25 more rows
# datatable
flights %>%
group_by(carrier, origin) %>%
summarize(`Average delay (mins)` = round(mean(dep_delay, na.rm = TRUE), 0)) %>%
datatable(options(list(pageLength = 12)))
One of the simplest ways to get started with interactive graphics in R is to use the ggplotly
function in the plotly
library. It converts ggplot objects into their interactive counterparts.
Let’s create some plots with ggplot and see what happens when we make them interactive.
# Form a bar chart showing the number of flights from each airport
p <- ggplot(flights, aes(x = origin)) +
geom_bar()
p
ggplotly(p)
Here’s a boxplot example which shows the distribution of departure delays across airports.
p <- ggplot(flights, aes(x = origin, y = dep_delay)) +
geom_boxplot() +
scale_y_continuous(trans='log2')
p
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 208344 rows containing non-finite values (stat_boxplot).
ggplotly(p)
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 208344 rows containing non-finite values (stat_boxplot).
Note that plotly
is its own graphing library. It just happens to be particularly convenient to use ggplotly
, because it enables us to make interactive graphics that we already have experience constructing. Here’s an example of a ggplotly version vs a plotly version of the boxplot. I’m switching to the gapminder data because htmlwidgets are super resource intensive for large data.
p <- ggplot(gapminder, aes(continent, lifeExp, color=continent)) +
geom_boxplot()
ggplotly(p)
plot_ly(gapminder, x = ~continent, y = ~lifeExp, color = ~continent, type = "box")
Here’s how we would do log-scaling for a plotly plot. First, a plot without log scaling on the y-axis.
plot_ly(gapminder, x = ~continent, y = ~gdpPercap, color = ~continent, type = "box")
Now a plot with logarithmic y-axis scaling, as controlled through the layout
command:
plot_ly(gapminder, x = ~continent, y = ~gdpPercap, color = ~continent, type = "box") %>%
layout(yaxis = list(type = "log"))
Now let’s look at an example where we calculate the average departure delay for flights out of LGA for each destination airport, and produce a plot that contains that information. In this plot the dot size represents the number of flights from LGA to that destination.
p <- flights %>%
filter(origin == "LGA") %>%
group_by(dest) %>%
summarize(av_dep_delay = mean(dep_delay, na.rm = TRUE),
count = n()) %>%
filter(count > 50) %>%
mutate(dest = reorder(dest, av_dep_delay)) %>%
ggplot(aes(x = dest, y = av_dep_delay,
size = count)) +
geom_point(alpha = 0.5) +
scale_size_area() +
ylab("Average departure delay") +
xlab("Destination airport") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p
ggplotly(p)
Now here’s a scatterplot example with the diamonds data. We’ll start by subsampling the data so we don’t have so many points. The sample_n
command makes it easy to sample a subset of the rows of the data.
diamonds.sub <- diamonds %>%
sample_n(2000)
p <- ggplot(diamonds.sub, aes(x = carat, y = price, color = color)) +
geom_point()
p
ggplotly(p)
The default behavior for ggplotly
is to provide the values of all aesthetic mappings in the hover text It is also possible to customize what gets displayed. The most general way of doing this is to specify a text
argument that contains the information you want to see. In the example below we specify text
to be the caract, clarity, color and cut of the diamond. The paste
command pastes together values into a single string, with values separated by the sep
argument. Setting sep = "\n"
leads every element to be displayed on a new line.
p <- ggplot(diamonds.sub, aes(x = carat, y = price, color = color,
text = paste(carat, clarity, color, cut, sep = "\n"))) +
geom_point()
p
ggplotly(p, tooltip = "text")
p <- ggplot(diamonds.sub, aes(x = carat, y = price, color = color)) +
geom_point(alpha = 0.5) +
geom_smooth()
p
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplotly(p)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Here we’ll have a look at how home sales have varied over time. We’ll focus first on sales in Austin, TX.
p <- txhousing %>%
filter(city == "Austin") %>%
ggplot(aes(x = month, y = sales, group = year)) +
geom_line()
ggplotly(p)
ggplot and plotly make it really easy to create animations across time (or across any other variable of interest). To do this, you simply need to specify a frame
variable.
p <- txhousing %>%
filter(city == "Austin") %>%
ggplot(aes(x = month, y = sales, frame = year)) +
geom_line()
ggplotly(p)
You can animate certain layers while keeping others static. It all depends on when you specify the frame
variable. Here’s an example where we have all of the years in the background, with the current year highlighted in blue.
p <- txhousing %>%
filter(city == "Austin") %>%
ggplot(aes(x = month, y = sales)) +
geom_line(aes(group = year), alpha = 0.2) +
geom_line(aes(frame = year), color = "steelblue", size = 2)
## Warning: Ignoring unknown aesthetics: frame
ggplotly(p)
Let’s have a look at several cities at the same time. Note that we’re using the animation_opts()
function here to change properties of the plotly animation. frame
controls the amount of time between transitions (in milliseconds)
p <- txhousing %>%
filter(city %in% c("Austin", "Dallas", "Houston", "San Antonio")) %>%
ggplot(aes(x = month, y = sales)) +
geom_line(aes(group = year), alpha = 0.2) +
geom_line(aes(frame = year), color = "steelblue", size = 1) +
facet_grid(. ~ city)
## Warning: Ignoring unknown aesthetics: frame
ggplotly(p) %>%
animation_opts(frame = 1000)
Through the animation options you can also change how the frames transition from one to the next by setting the easing
parameter. There are many options. See here.
ggplotly(p) %>%
animation_opts(frame = 1000, easing = "elastic")
First we’ll look at how life expectancy changes over time across countries. We’ll start the animation in 1952, with the countries ordered by their minimum life expectancy.
p <- gapminder %>%
mutate(country = reorder(country, lifeExp, function(.x) .x[1])) %>%
ggplot(aes(x = country, y = lifeExp, color = continent, size = pop)) +
geom_point(aes(frame = year)) +
theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))
## Warning: Ignoring unknown aesthetics: frame
ggplotly(p) %>%
animation_opts(1000)
Here’s an animated plot that shows life expectancy and GDP evolving over time. The redraw = FALSE
option means that the base plot won’t be redrawn at every transition.
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point(alpha = 0.1) +
geom_point(aes(frame = year, ids = country)) +
scale_x_continuous(trans = "log10")
## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(p) %>%
animation_opts(1000, redraw = FALSE)
There’s a ton more that one can do with interactive graphics (and tables!) in R.
Some of the examples used in today’s lecture were borrowed from Carson Sievert’s awesome slides. I encourage you to have a further look through those slides to see some of the other things you can do with ggplotly. Things like joint “brushing” and “filtering” are particularly useful if you’re designing interactive dashboards.
You should also have a look at htmlwidgets
, which you can learn about here.