ISE-529 Predictive Analytics

Course Description

This is a foundational course on predictive analytics and is required for all ISE MS Analytics majors.  The course covers the following topics:

  • Module 1:  Introduction to Predictive Analytics and Python/Pandas
  • Module 2:  Modeling Introduction.  Statistical learning, modeling types, model assessment and selection
  • Module 3:  Linear Regression Introduction.  Model definition and model assessment
  • Module 4:  Linear Model Diagnosis.  Resampling methods and model variance
  • Module 5:  Linear Model Validation.
  • Module 6:  Linear Model Selection and Regularization.  Subset selection, shrinkage methods, dimension reduction methods, high-dimensional data.
  • Module 7:  Classification.  Logistic regression, linear discriminant analysis, and generalized linear models.
  • Module 8:  Generalized Linear Models and Poisson Regression
  • Module 9:  Moving Beyond Linearity
  • Module 10:   Tree-Based Methods and Ensemble Models.  Decision trees, forests, gradient boosting.
  • Module 11:  Support Vector Machines.
  • Module 12:  Introduction to Neural Networks

Course Objectives

  • Develop an advanced level or proficiency with the primary classes of predictive modeling used by data scientists.
  • Develop skills in using the Python programming environment and the primary packages and tools currently used by data scientists.
  • Understand key concepts for measuring the performance of analytical models and techniques for enhancing their performance.

Texts

This class is based on the following text which is mandatory.  It can be downloaded free of charge from the author’s website at:  https://www.statlearning.com/

  • James, et. al., An Introduction to Statistical Learning with Applications in R, 2nd edition, Springer, 2021 (ISLR)
  • We will be augmenting this text with a systems view of the methodology for determining the most appropriate model types and configuring and diagnosing the models using materials from the following two textbooks (which are optional):
    • Harrell, Regression Modeling Strategies, 2nd edition, Springer, 2015 (RMS)
    • Kuhn, et. L., “Applied Predictive Modeling, Springer, 2016 (APM)

This class will be based on Python and several of major analytics libraries including NumPy, Pandas, Scikit Learn, and Statsmodels.  The following references will be used related to this software:

  • Heydt M., Learning Pandas, Packt, 2017, ISBN 978-1-78712-313-7 (LP)
  • VanderPlas, Python Data Science Handbook, O’Reilly, 2017 (PDS)
  • Muller, Introduction to Machine Learning with Python, O’Reilly, 2017 (MLP)