Lectures:
Recitations (A4/B4): Fridays 5pm-6:20pm, HBH A301
Instructor: George Chen (email: georgechen ♣ cmu.edu) ‐ replace "♣" with the "at" symbol
Teaching assistants:
Office hours (starting second week of class): Check the course Canvas homepage for the office hour times and locations.
Contact: Please use Piazza (follow the link to it within Canvas) and, whenever possible, post so that everyone can see (if you have a question, chances are other people can benefit from the answer as well!).
Companies, governments, and other organizations now collect massive amounts of data such as text, images, audio, and video. How do we turn this heterogeneous mess of data into actionable insights? A common problem is that we often do not know what structure underlies the data ahead of time, hence the data often being referred to as "unstructured". This course takes a practical approach to unstructured data analysis via a two-step approach:
We will be coding lots of Python and dabble a bit with GPU computing (Google Colab).
Prerequisite: If you are a Heinz student, then you must have taken 95-888 "Data-Focused Python" or 90-819 "Intermediate Programming with Python". If you are not a Heinz student and would like to take the course, please contact the instructor and clearly state what Python courses you have taken/what Python experience you have.
Helpful but not required: Math at the level of calculus and linear algebra may help you appreciate some of the material more
Grading: Homework (30%), Quiz 1 (35%), Quiz 2 (35%*)
*Students with the most instructor-endorsed posts on Piazza will receive a slight bonus at the end of the mini, which will be added directly to their Quiz 2 score (a maximum of 10 bonus points, so that it is possible to get 110 out of 100 points on Quiz 2).
Letter grades are determined based on a curve.
Previous version of course (including lecture slides and demos): 95-865 Fall 2024 mini 2
Date | Topic | Supplemental Materials |
---|---|---|
Part I. Exploratory data analysis | ||
Week 1 | ||
Mon Mar 10 | Lecture 1: Course overview, analyzing text using frequencies |
|
Wed Mar 12 | Lecture 2: Basic text analysis demo (requires Anaconda Python 3 & spaCy) | |
Fri Mar 14 | Recitation slot: Lecture 3 — Basic text analysis (cont'd), co-occurrence analysis | |
Week 2 | ||
Mon Mar 17 | Lecture 4: Co-occurrence analysis (cont'd), visualizing high-dimensional data with PCA | |
Wed Mar 19 | Lecture 5: PCA (cont'd), manifold learning (Isomap, MDS) | |
Fri Mar 21 | Recitation slot: More on dimensionality reduction | |
Week 3 | ||
Mon Mar 24 |
HW1 due Monday Mar 24, 11:59pm
Lecture 6: Manifold learning, intro to clustering |
|
Wed Mar 26 | Lecture 7: Clustering | |
Fri Mar 28 | Recitation slot: Quiz 1 (80-minute exam) — material coverage is up to and including Mon Mar 24's lecture (i.e., Lecture 6) | |
Week 4 | ||
Mon Mar 31 | Lecture 8: Clustering (cont'd) | |
Wed Apr 2 | Lecture 9: Wrap up clustering, topic modeling | |
Fri Apr 4 | No class (CMU Spring Carnival) 🎪 | |
Part II. Predictive data analysis | ||
Week 5 | ||
Mon Apr 7 | Lecture 10: Intro to predictive data analysis | |
Wed Apr 9 | Lecture 11: wrap up intro predictive data analysis; intro to neural nets & deep learning | |
Fri Apr 11 | Recitation slot: Some key concepts for prediction | |
Week 6 | ||
Mon Apr 14 |
HW2 due Monday Apr 14, 11:59pm
Lecture 12: Wrap up neural net basics; image analysis with convolutional neural nets (also called CNNs or convnets) |
|
Wed Apr 16 | Lecture 13: Time series analysis with recurrent neural nets (RNNs) | |
Fri Apr 18 | Recitation slot: TBD | |
Week 7 | ||
Mon Apr 21 | Lecture 14: Text generation with generative pretrained transformers (GPTs) | |
Wed Apr 23 | Lecture 15: Other deep learning topics; course wrap-up | |
Fri Apr 25 | Recitation slot: TBD | |
Final exam week | ||
Mon Apr 28 | HW3 due 11:59pm | |
Fri May 2 |
Quiz 2 (80-minute exam) — 1pm-2:20pm, location TBD
Quiz 2 focuses on material from Wed Mar 26's lecture (Lecture 7) and onwards (note that by how the course is set up, material from Lecture 7 onwards naturally at times relates to material from Lectures 1–6, so some ideas in these earlier lectures could still possibly show up on Quiz 2—please focus your studying on material from Lecture 7 onwards) |