ISE 633: Large Scale Optimization for Machine Learning

Goal: The objective of the course is to introduce large scale optimization algorithms that arise in modern data science and machine learning applications.

Course Topics: First order methods, accelerated methods, stochastic and online optimization, variance reduction methods, block methods (ADMM, BCD,…), conjugate gradient, Parallelization, Randomized linear algebra, Non-convex optimization, min-max problems and applications (GANs, Robust training, …)

Textbook: There is no required textbook for the class. All course materials will be presented in class or will be available online as notes. The following textbooks cover parts of the course materials and you may find them useful:

  • D. P. Bertsekas, Nonlinear Programming, Belmont: Athena scientific, 1999.

Slides and Lecture Notes

Course Plan:

  • Week 1: Optimization overview, examples in machine learning, large scale optimization, memory and CPU requirement, mathematical basics.
  • Week 2: Unconstrained optimization, necessary optimality conditions, smooth versus non-smooth optimization, sufficient optimality conditions, convex versus non-convex optimization
  • Week 3: Gradient methods (unconstrained), choices of direction, asymptotic convergence, Newton method
  • Week 4: Rate of convergence of gradient descent, first order oracle model, lower and upper bounds in the oracle model,
  • Week 5: Accelerated Nesterov, constrained optimization, optimality conditions
  • Week 6: KKT optimality conditions and Lagrange multipliers, projection and algorithms, examples in machine learning
  • Week 7: Exploiting multi-block structure of the problem, examples in machine learning, block coordinate descent methods, different block selection rules and convergence analysis
  • Week 8: Block successive upper-bound minimization and its convergence, midterm
  • Week 9: Alternating direction method of multipliers, non-smooth optimization and examples in machine learning
  • Week 10: Necessary and sufficient conditions in non-smooth optimization, successive upper-bound minimization, proximal operator, multi-block methods in non-smooth optimization
  • Week 11: Stochastic, online, incremental optimization, incremental gradient and its analysis
  • Week 12: Sample average approximation and stochastic approximation
  • Week 13: Parallelization: synchronous vs asynchronous, adversarial viewpoint and regret analysis
  • Week 14: Non-convexity and examples in machine learning: principal component analysis, deep learning, non-negative matrix factorization, local optimality results
  • Week 15: Hessian-vector product methods, min-max problems and generative adversarial networks.

Course Requirement and Grading:

  • In-class midterm (30%)
  • Final exam (35%)
  • Homework assignments (Best 4 out of 6: 20%)
  • Participation (5%)
  • Scribing (10%)

Homework assignments:

  • All homework assignments are due by 4:30pm on the date indicated.
  • Homework assignments must be submitted via email to the TA with cc’ing the instructor. Only one pdf file should be submitted for each homework assignment. You can submit latex pdf files or scanned images which are converted to pdf format.
  • Late homework submissions are not accepted under any circumstances. Start your homework assignments early.
  • There will be six homework assignments. The two lowest scores will not be considered in your final grade.
  • You are encouraged to discuss homework assignments with other students. However, each student is required to submit his/her own personal work.

Scribing: In order to gain experience with technical writing, each student is required to scribe notes for two lectures. These notes will be revised by the instructor and will be posted on the course website. The scribed notes should be written in a way that they are completely understandable to a student who may have missed the class.