|
|
|
|
|
|
|
Tentative Syllabus
Disclaimer: This is an ambitious list of topics that I aim to cover in this course. I will adjust the pace depending on the progress of and the feedback from the students in class. As such, it is possible that only some subset of these topics will end up being covered. HW and exams will be adjusted accordingly.
Date
|
Lectures and Readings
|
Out / Due
|
1/16
1/18
|
Lecture 1: Intro to ML
- What is ML?
- ML applications
- Machine learning paradigms
- Supervised learning (classification, regression, feature selection)
- Unsupervised learning (density estimation, clustering, dimensionality reduction)
- Basic data types
- (Mixed) attribute data, text, time series, sequence, network data
- The problem solving process:
- Business/project understanding, data understanding through EDA, data preparation, modeling, evaluation, deployment
Readings:
- Witten & Frank Chapter 1.1-1.3
- Provost & Fawcett Chapter 2
|
|
PART I: PRELIMINARY ANALYSIS AND DATA PREPARATION | |
1/18
1/23
|
Lecture 2: Exploratory Data Analysis
- Getting to know your data
- Data types
- Attribute types
- Data quality issues
- Data visualization
- Histogram, Kernel Density Estimation
- Charts, plots, infographics
- Correlation analysis
Readings:
- Witten & Frank Chapter 2
- Bishop Chapter 2.5.1
|
1/25
1/30
|
Lecture 3: Data Preparation
- Feature creation
- Data cleaning
- Missing, inaccurate, duplicate values
- Data transformation
- Feature type conversion
- Discretization
- Normalization / Standardization
- Data reduction
- Feature and record selection
- Principal Component Analysis
- Multidimensional scaling
- Manifold learning (Isomap, LLE)
Readings:
|
HW1 out |
|
PART II: SUPERVISED LEARNING | |
2/1
|
Lecture 4: Learning Distributions
- Point estimation
- Maximum Likelihood Estimation (MLE)
- Bayesian learning
- Maximum A Posterior (MAP) Estimation
- MLE vs. MAP
- Gaussians
- What is ML revisited
Readings:
|
2/6
2/8
|
Lecture 5: Linear Models
- Linear Regression
- Robust Regression
- Sparse Linear Models
- Feature subset selection: revisited
- Shrinkage methods: ridge regression and Lasso
- Principal components regression, Partial least squares
Readings:
- ISLR (James, Witten, Hastie, Tibshirani) Chapter 3.1, 3.2, 3.3, 3.4
- ISLR (James, Witten, Hastie, Tibshirani) Chapter 6.1, 6.2.1, 6.2.2, 6.3.1, 6.3.2
Other readings:
- Hastie Chapter 3.1-3.4, 4.4
- Shalizi Chapter 2, 11
- Murphy Chapters 1.4, 7.1-7.5, 13.3-13.5
- Provost & Fawcett Chapter 4
- Witten & Frank Chapter 7.5
|
2/13
|
Lecture 6: Naive Bayes
- Bayes Optimal Classifier
- Conditional Independence
- Naive Bayes
- Gaussian Naive Bayes
Readings:
Other readings:
|
|
2/15
2/20
|
Lecture 7: Logistic Regression and Generalized Models
- Logistic Regression decision rule and boundary
- Logistic Regression loss function
- Gradient descent
- Non-linear basis expansions
Readings:
- ISLR (James, Witten, Hastie, Tibshirani) Chapter 4.1, 4.2, 4.3
- ISLR (James, Witten, Hastie, Tibshirani) Chapter 7.1, 7.2, 7.3, 7.4, 7.6, 7.7
Other readings:
- Hastie Chapter 9.1, 9.3, 9.6
- Shalizi Chapter 12
|
2/20
2/22
|
Lecture 8: Model Selection
- What is a good model?
- Overfitting
- Decomposition of error
- Bias-Variance tradeoff
- Cross Validation
- Regularization
- Information Criteria (AIC, BIC, MDL)
Readings:
- Hastie Chapter 7.1 - 7.10
- Provost & Fawcett Chapter 5
|
HW1 due HW2 out
Project proposal due
|
2/27
|
Lecture 9: Model Evaluation
- Performance measures for Machine Learning
- Creating baseline methods for comparison
- Visualizing model performance
Readings:
- Witten & Frank Chapter 5
- Provost & Fawcett Chapter 7, 8, 11
- Shalizi Chapter 3, 10
|
3/1
3/6
|
Lecture 10: Tree-based Methods
- Classification trees
- From trees to rules
- Missing values and pruning
- Regression trees
Readings:
- Hastie Chapter 9.2
- Witten & Frank Chapter 4.3-4.4, 6.1-6.2
- Provost & Fawcett Chapter 3
- Shalizi Chapter 13
- Murphy Chapter 16.2
|
|
3/8
|
Midterm Exam (in class)
|
3/12-16
|
Spring Break; No Classes
|
HW2 due HW3 out
|
3/20
3/22
|
Lecture 11: Support Vector Machines
- SVM intuition, formulation, and the dual
- Slack variables, Hinge loss
- The Kernel trick
- Kernel SVM
- Kernel Logistic Regression
- Kernel PCA
Readings:
|
3/22
3/27
|
Lecture 12: Instance-based Learning
- Kernel Density Estimation
- k-Nearest Neighbor Classifier
- Kernel Regression
- Locally-Weighted Linear Regression
Readings:
- Hastie Chapter 6.1-6.3, 6.6.1-6.6.2
- Murphy Chapter 1.4.1-1.4.3, 14.7
- Shalizi Chapter 7.1, 7.5
|
3/29
|
Lecture 13: Ensemble Learning
- Combining multiple models
- Bagging
- Random Forests
- Boosting
Readings:
- Witten & Frank Chapter 8
- Hastie Chapter 10.1, 15, 16
- ISL-with R Chapter 8.2
|
|
|
PART III: UNSUPERVISED AND SEMI-SUPERVISED LEARNING | |
4/3
4/5
4/10
|
Lecture 14: Clustering
- Distance functions
- Hierarchical clustering
- k-means clustering
- Kernel k-means clustering
- k-medians clustering
- Mixture models
- The EM algorithm
- Spectral clustering
Readings:
- Witten & Frank Chapter 6.8
- ISLR (James, Witten, Hastie, Tibshirani) Chapter 10.3
- Provost & Fawcett Chapter 6, 12 (part)
- Spectral Clustering tutorial by Ulrike von Luxburg
|
HW3 due Project midway report due HW4 out |
4/12
|
Lecture 15: Semi-supervised Learning
- Assumptions (smoothness, cluster, manifold)
- Semi-supervised learning
- Self-training
- Generative methods
- Graph-based methods
- Co-training
Readings:
|
|
|
PART IV: LEARNING WITH COMPLEX DATA | |
4/17
4/24
|
Lecture 16: Unstructured Data: ML for Text
- Representing text
- Topic modeling, Applications
- Latent Dirichlet Allocation (LDA)
- Inference: Gibbs sampling
- Collapsed Gibbs sampling for LDA
Readings:
- Witten & Frank Chapter 9.5, 9.6
- Provost & Fawcett Chapter 10
|
4/24
4/26
|
Lecture 17: Dependent Data: ML for Networks
- Transductive learning
- Learning in networks with and without attributes
- Probabilistic relational network classifier
- Iterative classification
- Loopy belief propagation
- Applications to auction, accounting, opinion fraud
Readings:
|
|
5/1
5/3
|
Project Presentations I (today's presenters return final report on 5/3)
Project Presentations II (today's presenters return final report on 5/1)
|
HW4 due |
Last modified by Leman Akoglu, Dec 2017
|
|