Date
|
Lectures and Readings
|
Out / Due
|
3/21
Review
|
(Recitation) Lecture 0: Set up
- Installation of Hadoop and Spark on your local machine
- Setting up AWS clusters
Please take this Python mini-quiz before the course and take this Python mini-course if you need to learn Python or refresh your Python knowledge.
|
|
3/21
|
Lecture 1: Introduction
- Big Data applications
- Technologies for handling big data
- Apache Hadoop and Spark overview
|
|
3/23
3/28
|
Lecture 2: Hadoop Fundamentals
- Hadoop architecture
- HDFS and the MapReduce paradigm
- Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark
|
HW1 out |
3/28
3/30
|
Lecture 3: Introduction to Apache Spark
- Big data and hardware trends
- History of Apache Spark
- Spark's Resilient Distributed Datasets (RDDs)
- Transformations and actions
|
|
4/4
|
Lecture 4: Machine Learning Overview
- Basic machine learning concepts
- Steps of typical supervised learning pipelines
- Linear algebra review
- Computational complexity / Big O notation review
| |
4/6
4/11
|
Lecture 5: Linear Regression and Distributed ML Principles
- Linear regression
- formulation and closed-form solution
- gradient descent
- grid search
- Distributed machine learning principles
- computation, storage, and communication
| HW1 due HW2 out
|
4/13
4/18
|
Lecture 6: Logistic Regression and Click-through Rate Prediction
- Online advertising
- Linear classification
- Logistic regression
- working with probabilistic predictions
- categorical data and one-hot-encoding
- feature hashing for dimensionality reduction
|
HW2 due HW3 out |
4/20
|
No classes; Spring Carnival
|
|
4/18
4/25
|
Lecture 7: Principal Component Analysis and Neuroimaging
- Exploratory data analysis
- Principal Component Analysis (PCA)
- Formulations and solution
- Distributed PCA
|
|
4/27
|
Lecture 8: Big Data ML with MLlib
- k-means Clustering
- Decision Trees and Random Forests
- Recommenders
|
HW3 due HW4 out |
5/2
|
Lecture 9: Introduction to SparkSQL
- Working with tables in Spark
- Higher-level declerative programming
|
|
5/4
|
Lecture 10: Analyzing Networks with GraphX
- Understanding network structure
- Computing graph statistics
|
HW4 due Project out |
TBD
|
Final Exam
|