|
|
|
|
|
|
|
Syllabus
You can find the list of topics by lecture below. Readings for each lecture will be posted here.
Note that the syllabus is tentative and will be adjusted, if needed, as the semester proceeds.
Date
|
Lectures and Readings
|
Out / Due
|
10/24
10/26
|
Lecture 1: Introduction and Fast Similarity Search
- kd-trees and Locality Sensitive Hashing
Reading:
| |
10/31
11/2
|
Lecture 2: Frequent Itemsets and Association Rules
- Market-basket analysis and the Apriori algorithm
- Handling large datasets with limited-RAM and limited-pass algorithms
Reading:
| |
11/7
11/9
|
Lecture 3: Data Decomposition
- Singular Value Decomposition (SVD)
- SVD applications, case studies
- CUR for sparse decomposition
Reading:
| |
11/14
11/16
|
Lecture 4: Clustering
- Distance measures
- Hierarchical clustering, k-means, BFR, CURE algorithms
Reading:
| |
11/16
11/21
|
Lecture 5: Outlier Mining
- Extreme-value analysis
- Density-based outlier detection
- Ensemble methods
Reading:
- Isolation Forest, F. T. Liu, K. M. Ting, Z-H. Zhou. ICDM 2008.
- (Optional) LOF: Identifying density-based local outliers, M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. SIGMOD, pages 93–104, 2000.
- (Optional) LOCI: Fast outlier detection using the local correlation integral, S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. ICDE, 2003.
| |
11/23
|
No Class: Thanksgiving
| |
11/28 11/30
|
Lecture 6: Graphs: Link Analysis
- Ranking nodes in a graph
- Random walks (with restart), Pagerank, Topic-sensitive Pagerank, HITS
Reading:
| |
11/30 12/5
|
Lecture 7: Text Mining
- Topic modeling with LDA and visualization
Reading:
| |
12/7
|
Lecture 8: Data Streams
- Uniform-sampling: Reservoir sampling
- Filtering: the Bloom filter
- Counting distinct elements: Flajolet-Martin algorithm
- Counting frequencies: Count-min sketch
Reading:
| |
|
|