Wrapping up Lecture 1 content
Importing data
Simple summaries of categorical and continuous data
Coding style
Review homework grading rubric
Lab 2
Fall 2020
Wrapping up Lecture 1 content
Importing data
Simple summaries of categorical and continuous data
Coding style
Review homework grading rubric
Lab 2
Let’s go back to where we left off in the lecture 1 slides.
# Load tidyverse library(tidyverse)
tidyverse
installed yet, run the following code in your Console:install.packages("tidyverse")
In this class we’ll learn both about “base R” and the “tidyverse”
The “tidyverse” describes a set of packages developed by RStudio in an attempt to streamline and unify data import, export, manipulation, summarization, and visualization tasks
To be able to write custom functions and work with packages outside of the tidyverse (most R packages are not “tidy”!) it’s worth learning some base R
Start with survey results from “Homework 0”
To import tabular data into R, we use the read.table()
command
survey <- read.table("http://www.andrew.cmu.edu/user/achoulde/94842/data/survey_data2020.csv", header=TRUE, sep=",")
survey_data2020.csv
, which is a file on the course websiteheader
as its first rowsep=","
read.csv()
, which is just read.table()
with the preset sep=","
data.frame
object. This is the standard base R data object.class(survey)
## [1] "data.frame"
head()
head(survey, 3)
## Program PriorExp Rexperience OperatingSystem TVhours ## 1 PPM Some experience Never used Windows 10.5 ## 2 Other Extensive experience Basic competence Mac OS X 3.0 ## 3 MISM Never programmed before Basic competence Windows 0.0 ## Editor ## 1 Other ## 2 Microsoft Word ## 3 Microsoft Word
head(data.frame, n)
returns the first n
rows of the data frame
In the Console, you can also use View(survey)
to get a spreadsheet view
str()
function to get a simple summary of your data frame objectstr(survey)
## 'data.frame': 57 obs. of 6 variables: ## $ Program : Factor w/ 3 levels "MISM","Other",..: 3 2 1 3 3 3 3 3 3 2 ... ## $ PriorExp : Factor w/ 3 levels "Extensive experience",..: 3 1 2 2 2 3 2 3 3 3 ... ## $ Rexperience : Factor w/ 4 levels "Basic competence",..: 4 1 1 4 4 1 4 3 1 1 ... ## $ OperatingSystem: Factor w/ 3 levels "Linux/Unix","Mac OS X",..: 3 2 3 3 3 2 2 2 3 3 ... ## $ TVhours : num 10.5 3 0 10 4 0 2 20 4 0 ... ## $ Editor : Factor w/ 5 levels "Excel","LaTeX",..: 4 3 3 1 3 3 3 4 3 3 ...
summary(survey)
## Program PriorExp Rexperience ## MISM : 9 Extensive experience : 8 Basic competence :24 ## Other:10 Never programmed before: 8 Experienced : 6 ## PPM :38 Some experience :41 Installed on machine: 7 ## Never used :20 ## ## ## OperatingSystem TVhours Editor ## Linux/Unix: 2 Min. : 0.000 Excel : 1 ## Mac OS X :19 1st Qu.: 3.000 LaTeX : 5 ## Windows :36 Median : 5.000 Microsoft Word:40 ## Mean : 6.763 Other : 8 ## 3rd Qu.:10.000 R Markdown : 3 ## Max. :21.000
We will talk more about lists and data frames (and their “tidy” variants, tibbles) next week, but here are a few basics
To see what an R object is made up of, you can use attributes()
attributes(survey)
## $names ## [1] "Program" "PriorExp" "Rexperience" "OperatingSystem" ## [5] "TVhours" "Editor" ## ## $class ## [1] "data.frame" ## ## $row.names ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 ## [47] 47 48 49 50 51 52 53 54 55 56 57
An R data frame is a list whose columns you can refer to by name or index
$
symbols are what tell you it’s a list of some kindnrow()
and ncol
to determine the number of survey responses and the number of survey questionsnrow(survey) # Number of rows (responses)
## [1] 57
ncol(survey) # Number of columns (questions)
## [1] 6
`r nrow(survey)`
nrow(survey)
changes.Here’s a more complex example of inline code use.
We collected data on `r ncol(survey)` survey questions from `r nrow(survey)` respondents. Respondents represented `r length(unique(survey[["Program"]]))` CMU programs. `r sum(survey[["Program"]] == "PPM")` of the repondents were from PPM.
Which results in:
We collected data on 6 survey questions from 57 respondents. Respondents represented 3 CMU programs. 38 of the repondents were from PPM.
IMPORTANT: You are expected to use inline code chunks instead of copying and pasting output whenever possible.
survey[["Program"]] # "Program" element
## [1] PPM Other MISM PPM PPM PPM PPM PPM PPM Other PPM ## [12] PPM PPM PPM PPM PPM PPM MISM MISM PPM PPM Other ## Levels: MISM Other PPM
survey$Program # "Program" element
## [1] PPM Other MISM PPM PPM PPM PPM PPM PPM Other PPM ## [12] PPM PPM PPM PPM PPM PPM MISM MISM PPM PPM Other ## Levels: MISM Other PPM
survey[,1] # Data from 1st column
## [1] PPM Other MISM PPM PPM PPM PPM PPM PPM Other PPM ## [12] PPM PPM PPM PPM PPM PPM MISM MISM PPM PPM Other ## Levels: MISM Other PPM
survey[["Program"]] # Returns the Program column as a vector
## [1] PPM Other MISM PPM PPM PPM PPM PPM PPM Other PPM ## [12] PPM PPM PPM PPM PPM PPM MISM MISM PPM PPM Other ## Levels: MISM Other PPM
survey["Program"] # single column data frame containing only "Program"
## Program ## 1 PPM ## 2 Other ## 3 MISM ## 4 PPM ## 5 PPM ## 6 PPM ## 7 PPM ## 8 PPM ## 9 PPM ## 10 Other ## 11 PPM ## 12 PPM ## 13 PPM ## 14 PPM ## 15 PPM ## 16 PPM ## 17 PPM ## 18 MISM ## 19 MISM ## 20 PPM ## 21 PPM ## 22 Other ## 23 PPM ## 24 PPM ## 25 Other ## 26 Other ## 27 Other ## 28 MISM ## 29 PPM ## 30 PPM ## 31 PPM ## 32 MISM ## 33 PPM ## 34 PPM ## 35 PPM ## 36 PPM ## 37 MISM ## 38 MISM ## 39 Other ## 40 PPM ## 41 PPM ## 42 Other ## 43 PPM ## 44 PPM ## 45 PPM ## 46 PPM ## 47 PPM ## 48 PPM ## 49 MISM ## 50 PPM ## 51 Other ## 52 PPM ## 53 PPM ## 54 PPM ## 55 PPM ## 56 Other ## 57 MISM
Here we’ll use qplot()
from the ggplot2
library (part of tidyverse
)
qplot(survey$Program)
qplot(survey[["TVhours"]], binwidth = 3, fill = I("steelblue"))
# Data from 1st and 5th columns survey[, c(1,5)]
## Program TVhours ## 1 PPM 10.5 ## 2 Other 3.0 ## 3 MISM 0.0 ## 4 PPM 10.0 ## 5 PPM 4.0 ## 6 PPM 0.0 ## 7 PPM 2.0 ## 8 PPM 20.0 ## 9 PPM 4.0 ## 10 Other 0.0 ## 11 PPM 15.0 ## 12 PPM 5.0 ## 13 PPM 20.0 ## 14 PPM 10.0 ## 15 PPM 5.0 ## 16 PPM 2.0 ## 17 PPM 14.0 ## 18 MISM 10.0 ## 19 MISM 4.0 ## 20 PPM 3.0 ## 21 PPM 6.0 ## 22 Other 10.0 ## 23 PPM 2.0 ## 24 PPM 3.0 ## 25 Other 3.0 ## 26 Other 1.0 ## 27 Other 1.0 ## 28 MISM 3.0 ## 29 PPM 5.0 ## 30 PPM 10.0 ## 31 PPM 20.0 ## 32 MISM 0.0 ## 33 PPM 9.0 ## 34 PPM 3.0 ## 35 PPM 4.0 ## 36 PPM 8.0 ## 37 MISM 7.0 ## 38 MISM 8.0 ## 39 Other 10.0 ## 40 PPM 10.0 ## 41 PPM 4.0 ## 42 Other 10.0 ## 43 PPM 4.0 ## 44 PPM 0.0 ## 45 PPM 1.0 ## 46 PPM 7.0 ## 47 PPM 2.0 ## 48 PPM 15.0 ## 49 MISM 8.0 ## 50 PPM 10.0 ## 51 Other 2.0 ## 52 PPM 3.0 ## 53 PPM 4.0 ## 54 PPM 21.0 ## 55 PPM 10.0 ## 56 Other 20.0 ## 57 MISM 0.0
# Data from "Program" and "Editor" survey[c("Program", "Editor")]
## Program Editor ## 1 PPM Other ## 2 Other Microsoft Word ## 3 MISM Microsoft Word ## 4 PPM Excel ## 5 PPM Microsoft Word ## 6 PPM Microsoft Word ## 7 PPM Microsoft Word ## 8 PPM Other ## 9 PPM Microsoft Word ## 10 Other Microsoft Word ## 11 PPM Microsoft Word ## 12 PPM LaTeX ## 13 PPM Microsoft Word ## 14 PPM Microsoft Word ## 15 PPM Other ## 16 PPM Other ## 17 PPM Microsoft Word ## 18 MISM LaTeX ## 19 MISM Microsoft Word ## 20 PPM Microsoft Word ## 21 PPM Microsoft Word ## 22 Other Microsoft Word ## 23 PPM Microsoft Word ## 24 PPM Other ## 25 Other LaTeX ## 26 Other Microsoft Word ## 27 Other Microsoft Word ## 28 MISM Microsoft Word ## 29 PPM R Markdown ## 30 PPM R Markdown ## 31 PPM Microsoft Word ## 32 MISM Microsoft Word ## 33 PPM Microsoft Word ## 34 PPM Microsoft Word ## 35 PPM Microsoft Word ## 36 PPM Microsoft Word ## 37 MISM Microsoft Word ## 38 MISM R Markdown ## 39 Other Microsoft Word ## 40 PPM Microsoft Word ## 41 PPM Microsoft Word ## 42 Other Microsoft Word ## 43 PPM Microsoft Word ## 44 PPM Microsoft Word ## 45 PPM Other ## 46 PPM Microsoft Word ## 47 PPM Other ## 48 PPM Microsoft Word ## 49 MISM LaTeX ## 50 PPM Microsoft Word ## 51 Other Microsoft Word ## 52 PPM Microsoft Word ## 53 PPM Microsoft Word ## 54 PPM Microsoft Word ## 55 PPM LaTeX ## 56 Other Microsoft Word ## 57 MISM Other
It is preferable to use the select()
function to select subsets of columns
select(survey, Program, Editor)
## Program Editor ## 1 PPM Other ## 2 Other Microsoft Word ## 3 MISM Microsoft Word ## 4 PPM Excel ## 5 PPM Microsoft Word ## 6 PPM Microsoft Word ## 7 PPM Microsoft Word ## 8 PPM Other ## 9 PPM Microsoft Word ## 10 Other Microsoft Word ## 11 PPM Microsoft Word ## 12 PPM LaTeX ## 13 PPM Microsoft Word ## 14 PPM Microsoft Word ## 15 PPM Other ## 16 PPM Other ## 17 PPM Microsoft Word ## 18 MISM LaTeX ## 19 MISM Microsoft Word ## 20 PPM Microsoft Word ## 21 PPM Microsoft Word ## 22 Other Microsoft Word ## 23 PPM Microsoft Word ## 24 PPM Other ## 25 Other LaTeX ## 26 Other Microsoft Word ## 27 Other Microsoft Word ## 28 MISM Microsoft Word ## 29 PPM R Markdown ## 30 PPM R Markdown ## 31 PPM Microsoft Word ## 32 MISM Microsoft Word ## 33 PPM Microsoft Word ## 34 PPM Microsoft Word ## 35 PPM Microsoft Word ## 36 PPM Microsoft Word ## 37 MISM Microsoft Word ## 38 MISM R Markdown ## 39 Other Microsoft Word ## 40 PPM Microsoft Word ## 41 PPM Microsoft Word ## 42 Other Microsoft Word ## 43 PPM Microsoft Word ## 44 PPM Microsoft Word ## 45 PPM Other ## 46 PPM Microsoft Word ## 47 PPM Other ## 48 PPM Microsoft Word ## 49 MISM LaTeX ## 50 PPM Microsoft Word ## 51 Other Microsoft Word ## 52 PPM Microsoft Word ## 53 PPM Microsoft Word ## 54 PPM Microsoft Word ## 55 PPM LaTeX ## 56 Other Microsoft Word ## 57 MISM Other
df[rows, cols]
to extract specified rows
and cols
from a data frame df
.survey[6, 5] # row 6, column 5
## [1] 0
survey[6, "Program"] # Program of 6th survey respondent
## [1] PPM ## Levels: MISM Other PPM
survey[["Program"]][6] # Program of 6th survey respondent
## [1] PPM ## Levels: MISM Other PPM
If you leave e.g., the rows
value blank in df[rows, cols]
, it will pull all of the rows for the specified cols
Leaving cols
blank pulls all the columns for the specified rows
survey[6,] # 6th row
## Program PriorExp Rexperience OperatingSystem TVhours ## 6 PPM Some experience Basic competence Mac OS X 0 ## Editor ## 6 Microsoft Word
survey[,2] # 2nd column
## [1] Some experience Extensive experience ## [3] Never programmed before Never programmed before ## [5] Never programmed before Some experience ## [7] Never programmed before Some experience ## [9] Some experience Some experience ## [11] Never programmed before Extensive experience ## [13] Extensive experience Some experience ## [15] Some experience Some experience ## [17] Some experience Some experience ## [19] Some experience Some experience ## [21] Extensive experience Extensive experience ## [23] Some experience Some experience ## [25] Some experience Some experience ## [27] Some experience Some experience ## [29] Some experience Some experience ## [31] Some experience Some experience ## [33] Some experience Some experience ## [35] Some experience Some experience ## [37] Extensive experience Some experience ## [39] Never programmed before Some experience ## [41] Some experience Some experience ## [43] Some experience Some experience ## [45] Extensive experience Never programmed before ## [47] Some experience Some experience ## [49] Some experience Some experience ## [51] Never programmed before Some experience ## [53] Some experience Some experience ## [55] Extensive experience Some experience ## [57] Some experience ## 3 Levels: Extensive experience ... Some experience
In Lab 1, you were introduced to the colon operator :
We can use this operator for indexing
survey[1:3,] # equivalent to head(survey, 3)
## Program PriorExp Rexperience OperatingSystem TVhours ## 1 PPM Some experience Never used Windows 10.5 ## 2 Other Extensive experience Basic competence Mac OS X 3.0 ## 3 MISM Never programmed before Basic competence Windows 0.0 ## Editor ## 1 Other ## 2 Microsoft Word ## 3 Microsoft Word
survey[3:5, c(1,5)]
## Program TVhours ## 3 MISM 0 ## 4 PPM 10 ## 5 PPM 4
We are often interested in learning something a specific subset of the data
survey[survey$Program=="MISM", ] # Data from the MISM students survey[which(survey$Program=="MISM"), ] # Does the same thing
## Program PriorExp Rexperience OperatingSystem ## 3 MISM Never programmed before Basic competence Windows ## 18 MISM Some experience Experienced Windows ## 19 MISM Some experience Basic competence Mac OS X ## 28 MISM Some experience Basic competence Mac OS X ## 32 MISM Some experience Experienced Windows ## 37 MISM Extensive experience Experienced Mac OS X ## 38 MISM Some experience Experienced Windows ## 49 MISM Some experience Never used Linux/Unix ## 57 MISM Some experience Basic competence Mac OS X ## TVhours Editor ## 3 0 Microsoft Word ## 18 10 LaTeX ## 19 4 Microsoft Word ## 28 3 Microsoft Word ## 32 0 Microsoft Word ## 37 7 Microsoft Word ## 38 8 R Markdown ## 49 8 LaTeX ## 57 0 Other
Let’s pull all of the PPM students who have never used R before
survey[survey$Program=="PPM" & survey$Rexperience=="Never used", ]
## Program PriorExp Rexperience OperatingSystem TVhours ## 1 PPM Some experience Never used Windows 10.5 ## 4 PPM Never programmed before Never used Windows 10.0 ## 5 PPM Never programmed before Never used Windows 4.0 ## 7 PPM Never programmed before Never used Mac OS X 2.0 ## 11 PPM Never programmed before Never used Windows 15.0 ## 12 PPM Extensive experience Never used Linux/Unix 5.0 ## 15 PPM Some experience Never used Windows 5.0 ## 17 PPM Some experience Never used Windows 14.0 ## 24 PPM Some experience Never used Windows 3.0 ## 31 PPM Some experience Never used Windows 20.0 ## 43 PPM Some experience Never used Windows 4.0 ## 44 PPM Some experience Never used Windows 0.0 ## 45 PPM Extensive experience Never used Mac OS X 1.0 ## 46 PPM Never programmed before Never used Windows 7.0 ## 53 PPM Some experience Never used Windows 4.0 ## Editor ## 1 Other ## 4 Excel ## 5 Microsoft Word ## 7 Microsoft Word ## 11 Microsoft Word ## 12 LaTeX ## 15 Other ## 17 Microsoft Word ## 24 Other ## 31 Microsoft Word ## 43 Microsoft Word ## 44 Microsoft Word ## 45 Other ## 46 Microsoft Word ## 53 Microsoft Word
filter()
In general, it is preferable to use the filter()
function
Here’s an example of selecting all responses from students who are either in PPM or Other and who listed their R experience as “Basic competence”.
filter(survey, (Program == "PPM" | Program == "Other") & Rexperience == "Basic competence")
## Program PriorExp Rexperience OperatingSystem TVhours ## 1 Other Extensive experience Basic competence Mac OS X 3 ## 2 PPM Some experience Basic competence Mac OS X 0 ## 3 PPM Some experience Basic competence Windows 4 ## 4 Other Some experience Basic competence Windows 0 ## 5 PPM Extensive experience Basic competence Mac OS X 20 ## 6 PPM Some experience Basic competence Windows 10 ## 7 PPM Some experience Basic competence Windows 3 ## 8 PPM Extensive experience Basic competence Windows 6 ## 9 Other Extensive experience Basic competence Mac OS X 10 ## 10 PPM Some experience Basic competence Mac OS X 2 ## 11 Other Some experience Basic competence Mac OS X 3 ## 12 PPM Some experience Basic competence Mac OS X 5 ## 13 PPM Some experience Basic competence Windows 10 ## 14 PPM Some experience Basic competence Windows 9 ## 15 PPM Some experience Basic competence Windows 10 ## 16 Other Some experience Basic competence Windows 10 ## 17 PPM Some experience Basic competence Mac OS X 2 ## 18 PPM Some experience Basic competence Windows 15 ## 19 PPM Some experience Basic competence Windows 3 ## 20 PPM Some experience Basic competence Windows 21 ## Editor ## 1 Microsoft Word ## 2 Microsoft Word ## 3 Microsoft Word ## 4 Microsoft Word ## 5 Microsoft Word ## 6 Microsoft Word ## 7 Microsoft Word ## 8 Microsoft Word ## 9 Microsoft Word ## 10 Microsoft Word ## 11 LaTeX ## 12 R Markdown ## 13 R Markdown ## 14 Microsoft Word ## 15 Microsoft Word ## 16 Microsoft Word ## 17 Other ## 18 Microsoft Word ## 19 Microsoft Word ## 20 Microsoft Word
filter()
allows you to split conditions across linesfilter(survey, (Program == "PPM" | Program == "Other") & Rexperience == "Basic competence")
filter(survey, Program == "PPM" | Program == "Other", Rexperience == "Basic competence")
## Program PriorExp Rexperience OperatingSystem TVhours ## 1 Other Extensive experience Basic competence Mac OS X 3 ## 2 PPM Some experience Basic competence Mac OS X 0 ## 3 PPM Some experience Basic competence Windows 4 ## 4 Other Some experience Basic competence Windows 0 ## 5 PPM Extensive experience Basic competence Mac OS X 20 ## 6 PPM Some experience Basic competence Windows 10 ## 7 PPM Some experience Basic competence Windows 3 ## 8 PPM Extensive experience Basic competence Windows 6 ## 9 Other Extensive experience Basic competence Mac OS X 10 ## 10 PPM Some experience Basic competence Mac OS X 2 ## 11 Other Some experience Basic competence Mac OS X 3 ## 12 PPM Some experience Basic competence Mac OS X 5 ## 13 PPM Some experience Basic competence Windows 10 ## 14 PPM Some experience Basic competence Windows 9 ## 15 PPM Some experience Basic competence Windows 10 ## 16 Other Some experience Basic competence Windows 10 ## 17 PPM Some experience Basic competence Mac OS X 2 ## 18 PPM Some experience Basic competence Windows 15 ## 19 PPM Some experience Basic competence Windows 3 ## 20 PPM Some experience Basic competence Windows 21 ## Editor ## 1 Microsoft Word ## 2 Microsoft Word ## 3 Microsoft Word ## 4 Microsoft Word ## 5 Microsoft Word ## 6 Microsoft Word ## 7 Microsoft Word ## 8 Microsoft Word ## 9 Microsoft Word ## 10 Microsoft Word ## 11 LaTeX ## 12 R Markdown ## 13 R Markdown ## 14 Microsoft Word ## 15 Microsoft Word ## 16 Microsoft Word ## 17 Other ## 18 Microsoft Word ## 19 Microsoft Word ## 20 Microsoft Word
filter
and select
?# First, get the desired rows row.subset <- filter(survey, Program == "PPM" | Program == "Other", Rexperience == "Basic competence") # Then, get the right columns select(row.subset, TVhours, Editor)
## TVhours Editor ## 1 3 Microsoft Word ## 2 0 Microsoft Word ## 3 4 Microsoft Word ## 4 0 Microsoft Word ## 5 20 Microsoft Word ## 6 10 Microsoft Word ## 7 3 Microsoft Word ## 8 6 Microsoft Word ## 9 10 Microsoft Word ## 10 2 Microsoft Word ## 11 3 LaTeX ## 12 5 R Markdown ## 13 10 R Markdown ## 14 9 Microsoft Word ## 15 10 Microsoft Word ## 16 10 Microsoft Word ## 17 2 Other ## 18 15 Microsoft Word ## 19 3 Microsoft Word ## 20 21 Microsoft Word
%>%
%>%
is pronounced “pipe”filter(survey, Program == "PPM" | Program == "Other", Rexperience == "Basic competence") %>% select(TVhours, Editor)
## TVhours Editor ## 1 3 Microsoft Word ## 2 0 Microsoft Word ## 3 4 Microsoft Word ## 4 0 Microsoft Word ## 5 20 Microsoft Word ## 6 10 Microsoft Word ## 7 3 Microsoft Word ## 8 6 Microsoft Word ## 9 10 Microsoft Word ## 10 2 Microsoft Word ## 11 3 LaTeX ## 12 5 R Markdown ## 13 10 R Markdown ## 14 9 Microsoft Word ## 15 10 Microsoft Word ## 16 10 Microsoft Word ## 17 2 Other ## 18 15 Microsoft Word ## 19 3 Microsoft Word ## 20 21 Microsoft Word
When piping, it is best to pipe right from the start
# OK: filter(survey, Program == "PPM" | Program == "Other", Rexperience == "Basic competence") %>% select(TVhours, Editor) # Better: survey %>% filter(Program == "PPM" | Program == "Other", Rexperience == "Basic competence") %>% select(TVhours, Editor)
As your function calls get longer and more complicated, you may find it useful to split them over multiple lines
Suppose you had something like this:
survey[(survey$Program == "PPM" | survey$Program == "Other") & survey$Rexperience == "Basic competence", ]
survey[(survey$Program == "PPM" | survey$Program == "Other") & survey$Rexperience == "Basic competence", ]
&
operatormean(survey$TVhours[survey$Program == "PPM"]) # Average time PPM's spent watching TV
## [1] 7.513158
mean(survey$TVhours[survey$Program == "MISM"]) # Average time MISM's spent watching TV
## [1] 4.444444
mean(survey$TVhours[survey$Program == "Other"]) # Average time "Others" spent watching TV
## [1] 6
group_by
and summarize
Here’s a much easier and cleaner way of getting the average TV hours watched by students in each program. We use group_by
and summarize
survey %>% group_by(Program) %>% summarize(mean(TVhours))
## # A tibble: 3 x 2 ## Program `mean(TVhours)` ## <fct> <dbl> ## 1 MISM 4.44 ## 2 Other 6 ## 3 PPM 7.51
tv.hours <- survey$TVhours # Vector of TVhours watched mean(tv.hours) # Average time spent watching TV
## [1] 6.763158
sd(tv.hours) # Standard deviation of TV watching time
## [1] 5.778737
sum(tv.hours >= 5) # How many people watched 5 or more hours of TV?
## [1] 29
Coding style (and code commenting) will become increasingly more important as we get into more advanced and involved programming tasks
Borrowing Hadley Wickham’s words:
You don’t have to use my style, but you really should use a consistent style.
This style guide is short and easy to follow
We’ll revisit the question of coding style several times over the course of the class
Assignment operator. Use <-
not =
<-
instead of =
as the assignment operatorstudent.names <- c("Eric", "Hao", "Jennifer") # Good student.names = c("Eric", "Hao", "Jennifer") # Bad
=
when specifying function arguments,sort(tv.hours, decreasing=TRUE) # Good sort(tv.hours, decreasing<-TRUE) # Works, but not what you want
Binary operators should have spaces around them
Commas should have a space after, but not before (just like in writing)
3 * 4 # Good 3*4 # Bad which(student.names == "Eric") # Good which(student.names=="Eric") # Bad
=
is optionalsort(tv.hours, decreasing=TRUE) # Accepted sort(tv.hours, decreasing = FALSE) # Accepted
To make code easy to read, debug, and maintain, you should use concise but descriptive variable names
Terms in variable names should be separated by _
or .
# Accepted day_one day.one day_1 day.1 day1 # Bad d1 DayOne dayone # Can be made more concise: first.day.of.the.month
# EXTREMELY bad: c T pi sum mean