skip to page content SBU
95-869 Big Data and Large Scale Computing
Spring 2019

Home
Syllabus
Assignments
Notes

Tentative Syllabus


Date

Lectures and Readings

Out
/ Due


 

Review

Please take this Python mini-quiz before the course and take this Python mini-course if you need to learn Python or refresh your Python knowledge.

   

3/19

 

Lecture 1: Introduction

  • Big Data applications
  • Technologies for handling big data
  • Apache Hadoop and Spark overview



3/19

3/26

Lecture 2: Hadoop Fundamentals

  • Hadoop architecture
  • HDFS and the MapReduce paradigm
  • Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark




HW0 out


3/26

Lecture 3: Introduction to Apache Spark

  • Big data and hardware trends
  • History of Apache Spark
  • Spark's Resilient Distributed Datasets (RDDs)
  • Transformations and actions




HW1 out

   

4/2

Lecture 4: Machine Learning Overview

  • Basic machine learning concepts
  • Steps of typical supervised learning pipelines
  • Linear algebra review
  • Computational complexity / Big O notation review

   

4/2


4/9

Lecture 5: Linear Regression and Distributed ML Principles

  • Linear regression
    • formulation and closed-form solution
    • gradient descent
    • grid search
  • Distributed machine learning principles
    • computation, storage, and communication
HW1 due    HW2 out





4/9


4/16

Lecture 6: Logistic Regression and Click-through Rate Prediction

  • Online advertising
  • Linear classification
  • Logistic regression
    • working with probabilistic predictions
    • categorical data and one-hot-encoding
    • feature hashing for dimensionality reduction
HW2 due    HW3 out



   

4/16

 

4/23

Lecture 7: Principal Component Analysis and Neuroimaging

  • Exploratory data analysis
  • Principal Component Analysis (PCA)
  • Formulations and solution
  • Distributed PCA
HW3 due    HW4 out


 

4/23

 

Lecture 8: Big Data ML with MLlib

  • k-means Clustering
  • Decision Trees and Random Forests
  • Recommenders
HW4 due   HW5 out



4/30

Lecture 9: Introduction to SparkSQL

  • Working with tables in Spark
  • Higher-level declarative programming

   

4/30

Lecture 10: Analyzing Networks with GraphX

  • Understanding network structure
  • Computing graph statistics
HW5 due   

See here

Final Exam