ISE 633: Large Scale Optimization for Machine Learning
Goal: The objective of the course is to introduce large scale optimization algorithms that arise in modern data science and machine learning applications.
Course Topics: The course covers the theory and tools for large-scale optimization that arise in modern data science and machine learning applications. We will cover topics such as stochastic optimization, accelerated methods, parallelization, nonsmooth optimization, online optimization, variance reduction, differential privacy in optimization, min-max games and generative adversarial networks, etc.
Link to Slides, Lecture Notes, and Videos:
- Here you can download all the slides.
- Here you can download the course lecture notes.
- Here you can watch lecture videos.
Course Plan:
- Basics and preliminaries (Week 1)
- Optimization overview, examples in machine learning, large-scale optimization, memory/time/CPU requirement
- Mathematical Basics
- Unconstrained optimization: optimality conditions and algorithms (Weeks 2-3)
- Necessary and sufficient optimality conditions
- Convex versus non-convex and their examples
- First-order methods (unconstrained)
- Convergence analysis: Lower and upper-iteration complexity bounds
- Momentum and accelerated methods
- Constrained optimization (Weeks 4-6)
- Examples of constrained optimization in machine learning: fairness, safety, etc.
- KKT optimality conditions and Lagrange multipliers
- Projection-based algorithms, examples in machine learning
- Nonsmooth optimization (Weeks 7-8)
- Examples of nonsmooth optimization in machine learning: Lasso, Netflix problem and Nuclear norm regularizers, nonsmooth activations in deep learning, etc.
- Necessary and sufficient optimality conditions
- Algorithms and their analysis: block coordinate descent, successive upper-bound minimization, alternating direction method of multipliers, subgradient descent, proximal gradient
- Empirical risk minimization and stochastic optimization (Weeks 9-11)
- Stochastic/Online/Incremental optimization, SAA and SA
- Bias/variance tradeoffs
- Algorithms and their analysis: (robust) stochastic (sub-)gradient descent, stochastic variance reduced gradient descent
- Parallel optimization: synchronous vs asynchronous
- Online convex optimization (Weeks 11-12)
- Examples and regret definition
- Algorithms and their analysis: follow-the-leader, follow-the-regularized-leader, online gradient descent
- Learning via min-max optimization (Weeks 13-14)
- Examples: Generative Adversarial Networks, Adversarial Learning
- Algorithms: Danskin’s theorem, gradient descent-ascent algorithm, proximal-type methods
- Differential privacy and optimization (Weeks 14-15) – If we have time
- Differential privacy definition and examples
- Lower and upper-bounds on differentially private convex optimization