Your grade in this course will be determined by a series of 5 weekly homework assignments, lab participation, two exams, and a final project
Assignments (20%)
Weekly assignments will take the form of a single R Markdown file: namely, code snippets integrated with captions and other narrative. Unless otherwise indicated, all assignments are due before the start of the Wednesday class session (10:30AM) on the dates indicated on Canvas.
Your assignment score for the course will be calculated by averaging your four (4) highest homework scores. That is, your lowest homework score will not count toward your grade.
While the homework assignments may vary in length and/or difficulty, each will be graded out of a possible 20 points
Lab participation (10%)
In addition to the two lectures, there is a weekly lab session that meets in HBH 1002 from 10:30 - 11:30AM each Friday. Lab attendance is mandatory and counts for 10% of your final grade. During the 1 hour lab section, students will get hands-on practice with the week's material by completing a set of structured data analytic exercises. Tasks may include but are not limited to: running or modifying code from the lecture, running methods, creating visualizations, writing short reports.
There is a Lab every Friday, with the exception of the Thanksgiving holiday in late November and the last week of class. Thus there are a total of 5 Lab sessions. The 4th session is reserved for an in-class midterm, and therefore does not count toward your participation score. Your participation score for the course will be calculated based on the number of "regular" (non-midterm) lab sessions you attend and participate in as specified by the table below.
Labs attended | 0 | 1 | 2 | 3-4 |
Points (max = 10) | 0 | 3.3 | 6.7 | 10 |
Midterm exam (15%)
The Midterm exam will take place from 10:30 - 11:50AM on Friday, November 15, in HBH 1206.
Only material covered during the first 3 weeks of class is eligible for the midterm exam.
The midterm exam will take the form an open book written test. The test will consist of several problems. Just about every problem will be TRUE/FALSE, Multiple choice, or a "and explain your answer" variant of such questions.
Sample question. Linear regression is only useful if you're certain that the true relationship between Y and your inputs X is linear. TRUE or FALSE? In a sentence or two, explain your answer.
General comment: The midterm is intended to assess your conceptual understanding of the material we covered in the first 3 weeks of class. Because the test is open note, I will not be asking questions where the answer is explicitly written out in the notes. E.g., I will not ask you to write out a step-by-step description of Cross-validation.
However, I could ask you something like: Suppose that we have n = 2000 observations and we perform 20-fold Cross-validation. How many observations are used for Training at each step?
(Answer: There will be 2000 / 20 = 100 observations in each Fold, so 1900 observations will be used for training and 100 for testing at each step).
Final exam (30%)
The time for the final exam is set by the University. Please check the official calendars for the latest time and date information
The final exam will be a closed book written exam. This exam is intended to test your complete knowledge of the concepts and methods covered in the class.
Regardless of grading basis, students must receive a score of at least 50% on the final exam in order to pass the class.
Final project (25%)
This will be a data analysis project to be conducted in groups of 2-4 students. More details to follow.