Announcements
- 3/30:
Final Exam scheduled: Thursday, May 11,
9:30am - 12:30pm
HBH A301
- 3/14:
HW 3 out, due April 11.
- 3/2:
HW 2 submission date extended to Mar 13 (only hard copy submissions).
- 2/4:
HW 1 submission date extended to Feb 21.
- 2/1:
TA information is updated.
- 2/1:
You can find the course lectures posted on the Blackboard.
1/17:
Welcome to the class! Hope you will enjoy it :)
CLASS MEETS:
Time: Tue & Thu 4:30PM - 5:50PM
Place: HBH 1204
PEOPLE:
Instructor: Leman Akoglu
- Office: 2118C Hamburg Hall
- Office hours: Thu 12pm - 1pm
- Email: invert (cs.cmu.edu @ lakoglu)
Teaching Assistants:
Runshan Fu
- Office: TA room: HBH 3034
- Office hours: Tue @09:30-10:30am
- Email: invert (andrew.cmu.edu @ runshanf)
|
Zhe Zhang
- Office: TA room: HBH 3034
- Office hours: Fri @3pm-4pm
- Email: invert (cmu.edu @ zhezhang)
|
Grader: Qixin He
- Email: invert (andrew.cmu.edu @ qixinh)
COURSE DESCRIPTION:
Machine Learning (ML) is centered around automated methods that improve their own performance through learning patterns in data, and then using the uncovered patterns to predict the future and make decisions. ML is heavily used in a wide variety of domains such as business, finance, healthcare, security, etc. for problems including display advertising, fraud detection, disease diagnosis and treatment, face/speech/handwriting/object recognition, automated navigation, to name a few.
See
this for an extended introduction.
"If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions." -- Albert Einstein
"A problem well put is half solved." -- John Dewey
This course aims to equip students with the practical knowledge and experience of recognizing and formulating machine learning problems in the wild, as well as of applying machine learning techniques effectively in practice. The emphasis will be on learning and practicing the machine learning process, involving the cycle of feature design, modeling, and scaling.
"All models are wrong, but some models are useful." -- George Box
As there exists "no free lunch", we will cover a wide range of different models and learning algorithms, which can be applied to a variety of problems and have varying speed-accuracy-scalability-interpretability tradeoffs. In particular, the topics include generalized linear models, decision trees, Bayesian networks, feature selection, ensemble methods, semi-supervised learning, density estimation, latent factor models, network-based classification, and sequence models.
See the
syllabus for more.
This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, and best practices used in machine learning. This course does not assume any prior exposure to machine learning theory or practice. Undergraduates need instructor's permission to enroll. PhD students can either enroll or by permission audit the course.
Learning Objectives
By the end of this class, students will
- learn the main concepts, methodologies, and tools for machine learning
- be able to recognize machine learning tasks in real-world problems
- develop the critical thinking for comparing and contrasting models for a given task
- learn to reliably perform model selection and evaluation
- gain the experience of applying the data science process to various problems end-to-end
BULLETIN BOARD and other info
- For course material, assignments, announcements, and grades please see the Blackboard.
- For questions and discussions please use Piazza.
- Carnegie Mellon 2016-2017 Official academic
calendar
There is no official textbook for the course. I will post all the lecture notes and several readings on course website.
Below you can find a list of recommended reading. We will follow different parts of these various books.
I recommend the top 3 books in this list as regular reading for the course, and the rest for consulting various subjects
and for further reading.
- Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann Series
Ian H. Witten, Eibe Frank and Mark A. Hall
- Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, O'Reilly
Foster Provost and Tom Fawcett
- An Introduction to Statistical Learning:
with Applications in R, FREE!
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, FREE!
Trevor Hastie, Robert Tibshirani, Jerome Friedman
-
Data Mining,
The Textbook, Springer 2015.
Charu C. Aggarwal
-
Machine Learning: a Probabilistic Perspective, The MIT Press 2012.
Kevin P. Murphy
-
Advanced Data Analysis from an Elementary Point of View, Cambridge U. Press 2015.
Cosma R. Shalizi
Further reading: see the list at
http://aioptify.com/topdatasciencebooks.php.
MISC - FUN:
Fake (ML) protest