Course Overview
The objective of this course is to introduce students to state-of-the-art algorithms in large-scale machine learning and distributed optimization, in particular, the emerging field of federated learning. Topics to be covered include but are not limited to:
- Mini-batch SGD and its convergence analysis
- Momentum and variance reduction methods
- Synchronous and asynchronous SGD
- Local-update SGD
- Decentralized SGD
- Gradient compression/quantization
- Data Heterogeneity in federated learning
- Computational Heterogeneity in federated learning
- Client Selection and Partial Participation in federated learning
- Differential privacy in federated learning
- Secure Aggregation in federated learning
- Robustness to Adversaries in federated learning
Prerequisites
- A pre-requisite is an introductory course in machine learning: 18-461/661, 10-601/701 or equivalent
- Undergraduate level training or coursework in algorithms, linear algebra, calculus, probability, and statistics is strongly encouraged.
- A background in programming will also be necessary for the problem sets; students are expected to be familiar with Python or learn it during the course.
Comparison with Related Courses
- 18-660: Optimization: While 18-660 covers the fundamentals of convex and non-convex optimization and stochastic gradient descent, 18-667 will discuss state-of-the-art research papers in federated learning and optimization. 18-667 can be taken after or along with 18-660.
- 18-661: Introduction to Machine learning: 18-661 covers a breadth of machine learning methods including linear and logistic regression, neural networks, SVMs, decision trees, and online and reinforcement learning. Many of these methods used stochastic gradient descent (SGD) to train the model parameters. In 18-667, we will dive deeper into SGD and more specifically distributed implementations of SGD. While 18-661 covers classic and foundational concepts, 18-667 will discuss state-of-the-art research papers in federated learning and optimization. 18-667 can be taken after or along with 18-661.
Textbooks
Students are expected to read the research paper discussed in each lecture and review the lecture slides to prepare for the quizzes and homework assignments. Material covered in the first part of the class is also in Prof. Joshi's book on Optimization Algorithms for Distributed Machine Learning, available through CMU libraries
Piazza
We will use Piazza for class discussions. We strongly encourage students to post on this forum rather than emailing the course staff directly (this will be more efficient for both students and staff). Students should use Piazza to:
- Ask clarifying questions about the course material.
- Share useful resources with classmates (so long as they do not contain homework solutions).
- Look for students to form study and project groups.
- Answer questions posted by other students to solidify your own understanding of the material.
Tentative Grading Policy
Grades will be based on the following components:
- Homework (30%): There will be 4 equally weighted homeworks, each consisting of a mix of mathematical and implementation questions.
- You are given 3 late days (self-granted 24-hr extensions) which you can use to give yourself extra time without penalty. At most one late day can be used per assignment. This will be monitored automatically via Gradescope.
- Solutions will be graded on both correctness and clarity. If you cannot solve a problem completely, you will get more partial credit by identifying the gaps in your argument than by attempting to cover them up.
- Three Quizzes (45%) : Each quiz will be a mix of multiple-choice and descriptive questions. Every quiz will only be based on the papers discussed during lectures and recitations preceding that quiz.
- In-class quizzes (5%) : There will be short 1-2 question in-class quizzes conducted on Gradescope, primarily to encourage attendance and attention. The best 10 quiz scores will be considered in the final grade.
- Class Project (20%) Students will form teams of 3-5 to conduct a original research and implementation for one of the following project topics. Projects on a topic outside this list are also welcome -- please contact the instructor to discuss your idea. At the end of semester, each team give a 15-min project presentation and submit a project report.
- Beyond Central Coordination: Hierarchical and Semi-Decentralized FL
- Decision Making in Federated Settings: Multi-armed Bandits, Online learning, and Federated Reinforcement Learning
- Collaboration and Incentives: Data sharing, Valuation, and Incentivizing Client Participation
- Privacy Preserving FL, Robustness and Security
- Unlearning or Data Deletion in Federated Learning
- Parameter-Efficient Federated and distributed fine-tuning of large language models (LLMs)
- Continual and Lifelong Learning: Concept/Data Drift and Federated Continual Learning
- Scalable Inference: Efficient Distributed Inference on Large Models
- Model parallelism, split federated learning, and independent subnet training
- Federated Knowledge/Ensemble Distillation from Heterogeneous Local Models
- Model Fusion: Heterogeneous Model Fusion and One-Shot Federated Learning
- Personalization: Personalized Federated Learning, Fairness and Multi-Objective optimization
- Hyperparameter optimization: Tuning hyperparameters in Distributed and Federated ML
- Agentic Workflows: Distributed Training and Design of LLM based Agentic workflows
Collaboration Policy
Group studying and collaborating on problem sets are encouraged, as working together is a great way to understand new material. Students are free to discuss the homework problems with anyone under the following conditions:- Students must write their own solutions and understand the solutions that they wrote down. AI tools like ChatGPT are considered as collaborators and their use must be acknowledged.
- Students must list the names of their collaborators (i.e., anyone with whom the assignment was discussed).
- Students may not use old solution sets from other classes under any circumstances, unless the instructor grants special permission.
Schedule (subject to change)
Date | Lecture/Recitation | Readings | Announcements |
---|---|---|---|
08/25 | Intro and Logistics [Slides] [Annotated] |
|
|
08/27 | SGD in Machine Learning [Slides] [Annotated] |
|
|
08/29 | Math Review | HW1 release | |
09/01 | Labor Day; No classes | ||
09/03 | Lecture Cancelled | ||
09/05 | Review of SGD convergence analysis | ||
09/08 | SGD Convergence Analysis I [Slides] [Annotated] | ||
09/10 | SGD Convergence Analysis II [Slides] [Annotated] | ||
09/12 | Review of Order Statistics | ||
09/15 | Distributed Synchronous SGD [Slides] [Annotated] | ||
09/17 | Asynchronous SGD [Slides] | ||
09/19 | Concept Review and Practice | HW1 due; HW2 release | |
09/22 | Quiz 1 | ||
9/24 | Local-update SGD | ||
9/26 | Review of Local Update SGD | ||
9/29 | Adacomm, Elastic Averaging, Overlap SGD | ||
10/01 | Quantized and Sparsified Distributed SGD | ||
10/03 | Review of Concepts and Proofs | ||
10/06 | Decentralized SGD | ||
10/08 | Federated Learning Intro | ||
10/10 | No recitation | HW2 due; HW3 release | |
10/13 | Fall Break | ||
10/15 | Fall Break | ||
10/17 | Fall Break | ||
10/20 | Data Heterogeneity in FL | ||
10/22 | Computational Heterogeneity in FL | ||
10/24 | Concept Review and Practice | Project proposal due | |
10/27 | Client Selection and Partial Participation | ||
10/29 | Quiz 2 | ||
10/31 | No recitation | HW3 due | |
11/03 | Personalized Federated Learning | ||
11/05 | HW4 release; HW3 due |
||
11/07 | Concept Review and Project Office Hours | Project Checkpoint I |
|
11/10 | Fairness and Participation Incentives | ||
11/12 | Differential Privacy and Secure Aggregation | ||
11/14 | Concept Review and Project Office Hours | Project Checkpoint II |
|
11/17 | Robustness to Adversaries | ||
11/19 | Federated Learning in the LLM Era | ||
11/21 | Concept Review and Practice | ||
11/24 | Quiz 3 | ||
11/27 | Thanksgiving Break | ||
11/29 | Thanksgiving Break | ||
12/01 | Project Presentations | ||
12/03 | Project Presentations | HW4 due |
|
12/05 | Project Presentations |
|
|
12/10 | Project reports due |