ISE-535 Data Mining

Course Description

Data mining is the discipline of extracting useful insights from large quantities of data.  As such, the focus in this class is on inference and not on prediction (which is the focus of ISE-529).

This course is organized into three broad sections:

  • Data preprocessing, data wrangling and data cleaning to prepare data for analysis.
  • Exploratory data analysis and statistical data analysis techniques to find useful information about the data.
  • Algorithmic data mining techniques of classification, clustering, association rule mining, linear modeling for inference, and tree-based modeling for inference.

To the maximum extent possible, this course teaches the concepts by means of case studies using actual or simulated but realistic business data.

Learning Objectives and Outcomes

  • Develop an advanced level of proficiency with the preprocessing, visualization, and statistical analysis of data as well as several of the primary data mining algorithmic techniques.
  • Develop skills in using the R programming environment and some of its packages that are broadly used in industry by data scientists (primarily the Tidyverse packages).
  • Review and re-enforce basic statistical concepts that are important in the field of data science.
  • At the completion of the semester, the student will be able to take raw data and perform all of the steps necessary to generate a professional data analysis report.

Textbooks

The theoretical material in the course are drawn from the following texts:

  • Scmueli, et. al., Data Mining for Business Analytics:  Concepts, Techniques, and Applications in R, Wiley, 2017 (DMBA)
  • Larose, et. al., Discovering Knowledge in Data, An Introduction to Data Mining, Wiley, 2014 (DKD)
  • James, et. al., An Introduction to Statistical Learning with Applications in R (second edition, Springer, 2021 (ISLR)
  • Bruce, et. al., Practical Statistics for Data Scientists, O’Reilly, 2020 (PSDS)

In addition, the following text will be used as our reference for R programming:

  • Wickham, R for Data Science, O’Reilly, 2017 (RDS)