94-842
February 19, 2015
Common import/export tasks
R with SQL
sqldf
packageRSQLite
, RMySQL
In the next set of slides we'll discuss the following tasks
Import/export from Excel spreadsheets (xlsx files)
Import/export from other common statistical analysis software
Saving data from your R workspace
Saving images and tables
xlsx
package contains useful functions for export/import/modification of xlsx files
One approach: export your xlsx file to csv then using read.csv()
Better approach: use the read.xlsx()
function from the xlsx
package
read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL,
startRow=NULL, endRow=NULL, colIndex=NULL,
as.data.frame=TRUE, header=TRUE, colClasses=NA,
keepFormulas=FALSE, encoding="unknown", ...)
spreadsheet.xlsx
would simply look likelibrary(xlsx)
my.df <- read.xlsx("spreadsheet.xlsx", header = TRUE)
write.xlsx(x = my.df, file = "spreadsheet.xlsx", sheetName="Sheet1", ...)
foreign
package allows you to import data from various other kinds of statistical software.Function name | Description |
---|---|
read.dta |
Read Stata binary files |
read.mtp |
Read a Minitab Portable Worksheet |
read.spss |
Read an SPSS data file |
read.ssd |
Obtain a Data Frame from a SAS Permanent Dataset, via read.xport |
read.xport |
Read a SAS XPORT Format Library |
write.dta |
Write Files in Stata Binary Format |
write.foreign |
Write text files and code to read them |
Fitting models to large data sets can take a lot of computation time
Instead of re-running all of your code every time you re-open R, you can save variables from your workspace (e.g., save that list of linear models that took plyr
30 minutes to produce)
To save your entire workspace, use
save.image(file = "myworkspace.RData")
save(..., file = "myvariables.RData")
df
and a linear model list lm.list
that you want to store for later use, you can save by writingsave(df, lm.list, file = "myvariables.RData")
load()
functionload(file = "myvariables.RData")
Suppose you have a data frame or matrix that contains variable summaries (e.g., you have some output from your generateDataSummary()
function from Homework 4)
To save the table, you can use write.table()
or write.csv()
data.summary <- generateDataSummary(df, var.names, group.name)
# write.table approach, output as tab-delimited
write.table(data.summary, file = "mytable.txt", sep = "\t")
# write.csv approach, output as csv file
write.table(data.summary, file = "mytable.csv")
By default, plots are output to your graphics window (Plots tab in RStudio)
You can save plots manually from the Plots tab by clicking Export > Save as PDF
To save them automatically you can use bmp()
, jpeg()
, png()
, tiff()
or pdf()
.
pdf()
gives nice vectorized graphics, so it's often the best one to use
pdf(file = "myplot.pdf", width = 6, height = 6) # Start a pdf graphics device
# Code
# to
# generate
# plot
dev.off() # Close the pdf graphics device
Here's an example that saves plots in a loop
library(ggplot2)
# Generate and save a scatterplot of price vs carat for every level of cut
for(cut.level in levels(diamonds[["cut"]])) {
file.name <- paste("./figures/", cut.level, ".pdf", sep = "") # Define file name based on cut
pdf(file = file.name, width = 6, height = 5) # Open pdf graphics device
print(qplot(carat, price, colour = color, data = subset(diamonds, subset = cut == cut.level)))
dev.off() # Close pdf graphics device
}
Most businesses use some kind of database for storing their data (commonly SQL)
We'll discuss two topics:
sqldf
package to interact with R data frames with SQL queriesRSQLite
and RMySQL
to interact with SQL databasesWhy bother with databases?
If you already know some SQL, you might find R's data manipulation functions somewhat unnatural
sqldf
allows you to use SQL queries on your data frames
The data frame is still stored in memory! (You can just query it in a different way)
# Note, if you don't have X11 on your Mac, loading sqldf will crash your machine
library(sqldf)
Let's go back to our diamonds data set and see how we can interact with it using sqldf()
This is easier to do in a demo than with slides…