18-847F: Foundations of Cloud and ML Infrastructure
Course Description
The objective of this seminar course is to introduce students to modern cloud and machine learning infrastructure, and its theoretical foundations. Students will read, present and critique a curated set of research papers from both theory and systems. There will also be a final project based on the topics discussed.
The first half of the course will cover distributed computing and storage systems. We will study frameworks such as MapReduce and Spark, and discuss scheduling and load balancing policies used in them. In the context of distributed storage systems, we will discuss coding-theoretic techniques used to improve availability and repair failed nodes. The second half of the course will focus on machine learning infrastructure. A key discussion topic will be stochastic gradient descent and its implementation in large-scale systems. Other topics include hyper-parameter tuning in neural networks, and generative adversarial networks.
Class Hours
MW 4:30-6:00 pm, starting Aug 30thScaife Hall 222