Syllabus (download as pdf)

Date

Lectures and Readings

1/28

1/30

2/4

Introduction to Machine Learning, Basics (3 lectures)

Lecture 1: Intro to ML

What is ML? ML applications
Learning paradigms
- Supervised learning (regression, classification)
- Unsupervised learning (density estimation, clustering, dimensionality reduction)

Readings:

Bishop 2.1, Appendix B
(Optional) Mitchell, Ch 1
(Optional) Murphy, 1.1, 1.2, 1.3.1

Recitation (Basics of Probability & Intro to Matlab)

Lecture 2: Learning Distributions

Point estimation
- Maximum Likelihood Estimation (MLE)
- Bayesian learning
- Maximum A Posterior (MAP) Estimation
MLE vs. MAP
Gaussians
What is ML revisited

Readings:

Bishop: Sec 1.5, 2.2, 2.3 (up to 2.3.6)
(Additional Resource) Andrew Moore's basic probability tutorial

2/6

2/11

2/13

Linear Models (Regression, Classification) (3 lectures)

Lecture 3: Linear Regression

Linear Regression, [Applet]
Regularized Least Squares,
Overfitting,
Bias-Variance Tradeoff,

Readings:

Bishop 1.1 to 1.4,
Bishop 3.1, 3.1.1, 3.1.4, 3.1.5, 3.2, 3.3, 3.3.1, 3.3.2
(Additional Resource) Andrew Moore’s Tutorial on regression
(Optional) Hastie, Ch 7
(Optional) Murphy, 1.4, Ch 7

Lecture 4: Naive Bayes

Bayes Optimal Classifier
Conditional Independence,
Naive Bayes, [Applet]
Gaussian Naive Bayes

Readings:

Bishop 1.3, 1.5, 3.2,
Mitchell's Chapter on Naive Bayes and Logistic Regression (Sect. 1 and 2)
(Optional) Murphy, 3.4

Lecture 5: Logistic Regression

Generative v. Discriminative
Logistic Regression [Applet]

Readings:

Mitchell's Chapter on Naive Bayes and Logistic Regression (Sect. 1 and 2)
Bishop - 4.0, 4.2, 4.3, 4.4, 4.5
(Optional) Murphy, Ch 8
(Optional) Ng and Jordan's NIPS 2001 paper on Discriminative versus Generative Learning

2/18

2/20

2/25

2/27

3/4

Non-linear Models and Model Selection (5 lectures)

Lecture 6: Decision Trees

Decision Trees [Applet]
Entropy, Information Gain
Overfitting, Pre-and Post-pruning

Readings:

(Bishop - 1.6) Information Theory
(Bishop - 14.4) Tree-based Models
(Recommended) Quantities of Information Wikipedia entry
(Recommended) Nils Nilsson's ML book (Ch 6, all sections): Decision Trees
(Optional) Mitchell, Ch 3
(Optional) Murphy, 16.2

Lecture 7: Boosting

Combining weak classifiers
Adaboost algorithm [Adaboost Applet]
Comparison with logistic regression and bagging

Readings:

(Bishop 14.3) Boosting
(Additional Resource) Boosting homepage
(Recommended) Schapire Boosting Tutorial, and its [Video].
(Optional) Multi-class AdaBoost by Zhu, Rosset, Zou, and Hastie.
(Optional) Murphy, 16.4

Lecture 8: Model Selection

Cross Validation,
Simple Model Selection,
Regularization,
Information Criteria (AIC, BIC, MDL)

Readings:

(Bishop 1.3) Model Selection / Cross Validation
(Optional) (Hastie et al. 3.2, 3.3, 3.4) Model selection and L1 regularization
(Optional) Ron Kohavi's A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
(Additional Resource) Minimum Description Length website

Lecture 9: Neural Networks

Neural Nets [Applet]
Prediction – Forward-propagation
Training – Back-propagation

Readings:

(Bishop 5.1) Feed-forward Network Functions
(Bishop 5.2) Network Training
(Bishop 5.3) Error Back-propagation
(Additional Resource) [CMU Course] on Neural Nets
(Optional) Murphy, 16.5

Lecture 10: Nonparametric Methods

Instance-based Learning [Applet]
Histogram, Kernel Density Estimation
K-NN Classifier
Kernel Regression

Readings:

(Bishop 2.5, 6.3) Nonparametric Methods
(Optional) Mitchell, Ch 8
(Recommended) Andrew Moore’s Tutorial on Instance-based Learning

3/6

3/11

Margin-based Approaches (2 lectures)

Lecture 11: Support Vector Machines

SVM Representation [LibSVM Applet] [Another SVM Applet]
Maximum Margin Classifiers
Slack variables, Hinge loss

Readings:

(Bishop 7.1, 4.1.1, 4.1.2, Appendix E)
Hearst 1998: High Level Presentation
Burges 1998: Detailed Tutorial
(Additional Resource) Smola video tutorial on SVM (see Part 3)
(Additional Resource) Scholkopf video tutorial on kernels
(Additional Resource) http://www.svms.org
(Optional) Murphy, 14.5

Lecture 12: The Kernel Trick

Dual SVM
Kernel Trick
Comparison with Kernel regression and Logistic Regression

Readings:

(Bishop 6.1, 6.2) Kernels
(Additional Resource) http://www.kernel-machines.org
(Optional) Murphy, 14.4

3/13

3/18 3/20

3/25

3/27

Midterm Exam

NO CLASS (Spring Break)

Learning Theory (2 lectures)

Lecture 13: PAC Learning

PAC-learning [Applets]
Sample complexity
Haussler bound, Hoeffding's bound

Readings:

Goldman's COLT survey, sections 1-3.1
(Recommended) Mitchell Ch 7
(Optional) John Langford's tutorial on generalization bounds
(Additional Resource) Langford video tutorial on generalization bounds
(Additional Resource) John Shawe-Taylor video tutorial on statistical learning theory
(Additional Resource) http://www.learningtheory.org

Lecture 14: VC Dimension

VC Dimension
Mistake Bounds
Midterm exam review

Readings:

(Recommended) Mitchell Ch 7
(Optional) Littlestone's original (excellent) paper on the Mistake Bound model: Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

4/1

4/3

4/8

4/10

Structured Models (Graphical Models and HMM) (4 lectures)

Lecture 15: Bayesian Networks – Representation

Bayes Nets [Applet: Java Bayes] [Another Bayes net applet]
Factorization of joint distribution
Local Markov Assumption
D-separation
Representation Theorem

Readings:

(Bishop 8.1, 8.2) Bayesian Networks
Intro to Graphical Models by K. Murphy
(Optional) Murphy, Ch 10

Lecture 16: Bayesian Networks – Inference

Marginalization
Variable Elimination

Readings:

(Bishop 8.4.1, 8.4.2) - Inference in Chain/Tree Structures
(Optional) Murphy, 10.3

Lecture 17: Bayesian Networks – Structure Learning

Learning CPTs
Learning structure - Chow-Liu Algorithm

Readings:

(Additional resource) Koller et. al, Graphical Models in a Nutshell
(Optional) Murphy, 26.1-26.4
(Additional resource) Heckerman BN Learning Tutorial
(Additional reading) Tree-Augmented Naive Bayes

Lecture 18: Hidden Markov Models

HMM Representation
Forward Algorithm
Forward-Backward Algorithm
Viterbi Algorithm
Baum-Welch Algorithm

Readings:

(Bishop, Ch 13)
HMM and EM Tutorial
(Optional) Rabiner's Detailed HMMs Tutorial
(Additional Resource) Ghahramani, An introduction to HMMs and Bayesian Networks
(Optional) Murphy, Ch 17

4/15

4/17

4/22

4/24

Unsupervised and semi-supervised learning (4 lectures)

Lecture 19: Clustering I

Hierarchical Clustering
Spectral Clustering [Demo]

Readings:

Spectral Clustering tutorial by Ulrike von Luxburg
(Optional) Murphy, 25.4, 25.5

Lecture 20: Clustering II

K-Means [Applet: K-means]
Gaussian Mixture Model [Applet: Mixture of Gaussians]

Readings:

(Bishop 9.1, 9.2) - K-means, Mixtures of Gaussian

Lecture 21: Expectation Maximization

EM Algorithm

Readings:

(Bishop 9.3, 9.4) – EM
Neal and Hinton EM paper
(Optional) Murphy, Ch 11

Lecture 22: Semi-Supervised Learning

Mixture Models
Graph Regularization
Co-training

Readings:

Combining Labeled and Unlabeled Data with Co-Training by Michell & Blum

4/29

Learning in High Dimensions (1 lecture)

Lecture 23: Dimensionality reduction

Curse of Dimensionality
Feature Selection
Principal Component Analysis (PCA)

Readings:

Shlens' PCA tutorial
(Optional) Murphy, 12.2-12.5

5/1

5/6

5/8

5/20

Project Presentations I

Project Presentations II

Final Exam Overview

Final Exam

Last modified: 2014, by Leman Akoglu