Course Overview
The objective of this course is to introduce students to state-of-the-art algorithms in large-scale machine learning and distributed optimization, in particular, the emerging field of federated learning. Topics to be covered include but are not limited to:
- Mini-batch SGD and its convergence analysis
- Momentum and variance reduction methods
- Synchronous and asynchronous SGD
- Local-update SGD
- Decentralized SGD
- Gradient compression/quantization
- Data Heterogeneity in federated learning
- Computational Heterogeneity in federated learning
- Client Selection and Partial Participation in federated learning
- Differential privacy in federated learning
- Secure Aggregation in federated learning
- Robustness to Adversaries in federated learning
Prerequisites
- A pre-requisite is an introductory course in machine learning: 18-461/661, 10-601/701 or equivalent
- Undergraduate level training or coursework in algorithms, linear algebra, calculus, probability, and statistics is strongly encouraged.
- A background in programming will also be necessary for the problem sets; students are expected to be familiar with Python or learn it during the course.
Comparison with Related Courses
- 18-660: Optimization: While 18-660 covers the fundamentals of convex and non-convex optimization and stochastic gradient descent, 18-667 will discuss state-of-the-art research papers in federated learning and optimization. 18-667 can be taken after or along with 18-660.
- 18-661: Introduction to Machine learning: 18-661 covers a breadth of machine learning methods including linear and logistic regression, neural networks, SVMs, decision trees, and online and reinforcement learning. Many of these methods used stochastic gradient descent (SGD) to train the model parameters. In 18-667, we will dive deeper into SGD and more specifically distributed implementations of SGD. While 18-661 covers classic and foundational concepts, 18-667 will discuss state-of-the-art research papers in federated learning and optimization. 18-667 can be taken after or along with 18-661.
Textbooks
Students are expected to read the research paper discussed in each lecture and review the lecture slides to prepare for the quizzes and homework assignments. Material covered in the first part of the class is also in Prof. Joshi's book on Optimization Algorithms for Distributed Machine Learning, available through CMU libraries
Piazza
We will use Piazza for class discussions. We strongly encourage students to post on this forum rather than emailing the course staff directly (this will be more efficient for both students and staff). Students should use Piazza to:
- Ask clarifying questions about the course material.
- Share useful resources with classmates (so long as they do not contain homework solutions).
- Look for students to form study and project groups.
- Answer questions posted by other students to solidify your own understanding of the material.
Tentative Grading Policy
Grades will be based on the following components:
- Homework (40%): There will be 4 equally weighted homeworks, each consisting of a mix of mathematical and implementation questions.
- You are given 3 late days (self-granted 24-hr extensions) which you can use to give yourself extra time without penalty. At most one late day can be used per assignment. This will be monitored automatically via Gradescope.
- Solutions will be graded on both correctness and clarity. If you cannot solve a problem completely, you will get more partial credit by identifying the gaps in your argument than by attempting to cover them up.
- Three Quizzes (45%) : Each quiz will be a mix of multiple-choice and descriptive questions. Every quiz will only be based on the papers discussed during lectures and recitations preceding that quiz.
- Class Project (15%) Students will form teams of 4 to conduct a detailed literature survey and/or original research and/or implementation on one of the following project topics. Projects on a topic outside this list are also welcome -- please contact the instructor to discuss your idea. At the end of semester, each team will need to submit a 4-page review paper and give a 15-min project presentation.
- Survey of Stochastic Variance Reduction Methods
- Concept/Data Drift in Federated Learning
- Data Unlearning in Federated Learning
- Convergence Analysis of Differentially Private Distributed Optimization Algorithms
- Federated Reinforcement Learning
- Client Selection in Federated Learning
- Incentivizing Client Participation in Federated Learning
- Federated Multi-armed Bandits and Online Learning
- Model-parallel, Split Federated Learning, Independent Subnet training
- Federated Training of Heterogeneous Sized Models
- One-shot Federated Learning and Model Fusion
- Efficient Distributed Inference on Large Models
- Parameter-efficient Federated Finetuning of LLMs
- Hyperparameter optimization in Distributed and Federated ML
Collaboration Policy
Group studying and collaborating on problem sets are encouraged, as working together is a great way to understand new material. Students are free to discuss the homework problems with anyone under the following conditions:- Students must write their own solutions and understand the solutions that they wrote down. AI tools like ChatGPT are considered as collaborators and their use must be acknowledged.
- Students must list the names of their collaborators (i.e., anyone with whom the assignment was discussed).
- Students may not use old solution sets from other classes under any circumstances, unless the instructor grants special permission.
Schedule (subject to change)
Date | Lecture/Recitation | Readings | Announcements |
---|---|---|---|
08/26 | Intro and Logistics [Slides] |
|
|
08/28 | SGD and its Variants in Machine Learning [Slides] |
|
|
08/30 | Math Review | HW1 release | |
09/02 | Labor Day; No classes | ||
09/04 | SGD Convergence Analysis [Slides] [Annotated] | ||
09/06 | PyTorch tutorial | ||
09/09 | Variance-reduced SGD, Distributed Synchronous SGD [Slides] [Annotated] | ||
09/11 | Asynchronous SGD, Hogwild [Slides] [Annotated] | ||
09/13 | Guest Lecture | ||
09/16 | Local-update SGD [Slides] [Annotated] | ||
09/18 | Adacomm, Elastic Averaging, Overlap SGD [Slides] [Annotated] | ||
09/20 | Concept Review and Practice | HW1 due; HW2 release | |
09/23 | Quiz 1 | ||
9/25 | Quantized and Sparsified Distributed SGD [Slides] [Annotated] | ||
9/27 | Guest Lecture: Leveraging Correlation in Sparsified SGD | ||
9/30 | Decentralized SGD [Slides] [Annotated] | ||
10/02 | Federated Learning Intro [Slides] [Annotated] | ||
10/04 | Guest Lecture on Decentralized SGD | ||
10/07 | Data Heterogeneity in FL [Slides] [Annotated] | ||
10/09 | Computational Heterogeneity in FL [Slides] [Annotated] | ||
10/11 | Guest Lecture: FedExp | HW2 due; HW3 release | |
10/14 | Fall Break | ||
10/16 | Fall Break | ||
10/18 | Fall Break | ||
10/21 | Client Selection and Partial Participation [Slides] [Annotated] | ||
10/23 | Personalized Federated Learning [Slides] [Annotated] | ||
10/25 | Concept Review and Practice | ||
10/28 | Quiz 2 | ||
10/30 | Multi-task Learning [Slides] [Annotated] | ||
11/01 | Guest Lecture | HW3 due | |
11/04 | Federated Min-max Optimization [Slides] [Annotated] | ||
11/06 | Fairness and Participation Incentives [Slides] [Annotated] | ||
11/8 | Guest Lecture | Project titles and teams due; HW4 release |
|
11/11 | Differential Privacy in Dist. Optimization [Slides] [Annotated] | ||
11/13 | Secure Aggregation in Distributed Learning [Slides] | ||
11/15 | Guest Lecture | ||
11/18 | Robustness to Adversaries | ||
11/20 | Federated Learning in the LLM Era | ||
11/22 | Concept Review and Practice | ||
11/25 | Quiz 3 | ||
11/27 | Thanksgiving Break | ||
11/29 | Thanksgiving Break | ||
12/02 | Project Presentations | ||
12/04 | Project Presentations | HW4 due |
|
12/06 | Project Presentations |
|
|
12/09 | Project reports due |