Summer 2024 Research Activities

This summer, I will be continuing work on the two ongoing project from the past spring and expanding them to include assisting in the preparation of materials that I hope to use in my classes in the upcoming academic year and, ultimately, publishing them in some form. You can read more about them here.

As we did last spring, I will break up the students into smaller size groups and select one student per group to be the team lead and to coordinate the team’s activities. We will meet once per week (probably on Wednesday afternoons) to review the progress of the work and discuss next steps. These meetings will generally be in person on campus but they can also be attended via Zoom.

Note that in the summertime, this work is only done on a “volunteer” basis. If you are interested in continuing in the fall for directed research (ISE-590) credit, that is possible. See here for more general information on directed research credit.

Below are the specific research activities I would like to have students assist me with this summer:

  • Analytics case studies. I am preparing a series of “case studies” for teaching use in our ISE-535 Data Mining class. This involves selecting a dataset, defining a business question or problem, and developing analytical solutions to solve the business problem. Each case study would result in a 10-20 page document describing the case that is suitable for distribution to students and potential publication. A requirement is having previously taken ISE-529 and (optionally) ISE-535 and having good Python skills.
  • Data pipeline development. In conjunction with the current analytic dataset team, I am working on developing a large-scale data pipeline project for use in future ISE-558 classes and potential publication. This involves defining and populating realistic data sources (relational DBMS, semi-structured, and file-based), defining a large data warehouse (multiple dimensional data marts with conformed dimensions), and developing an ETL data pipeline to populate the data warehouse. A requirement is having previously taken ISE-558 and having good Python and SQL skills.
  • Other data engineering research. Perform research and develop materials suitable for incorporation into class PowerPoints and, ultimately, a text on the topics of analytic dataset structuring, data cleansing, and feature engineering.
  • Google Cloud Platform Vertex AI research. In my Spring 2024 ISE-543 class we spent about half of the semester learning about the Vertex AI platform. I would like to expand and extend this work to include some of the more advanced features of the platform including its feature store, model registry, lineage features, and advanced MLOps functions related to the deployment, monitoring, and maintenance of models placed into production.
  • Additional cloud analytics platforms. In conjunction with the Vertex AI team activities (described above), investigate the AWS Sagemaker and DataBricks platforms to learn their approaches for implementing the functionality in Vertex AI. The ultimate goal would be a paper or chapter of a text book on cloud analytics platforms comparing the different major platforms in use today.

I will be finalizing the team by mid-May and will consider any applications received before May 3. Please see this page on my general procedures for directed research and the process for applying.