For the fall semester of 2024, I will be working on continuing on the following two projects from the summer:
- Automated Database Generation. This team will complete the population and testing of the “Consulting Firm Relational Database” developed last spring/summer and work on generalizing the specification and generation of relational databases for academic purposes using business process modeling and/or unified modeling language methodologies and discrete event simulation techniques. This team is currently meeting with me this semester on Wednesdays at 4:00PM.
- Data pipeline development. I am working on developing a large-scale data pipeline project for use in future ISE-558 classes and potential publication. This involves defining and populating realistic data sources (relational DBMS, semi-structured, and file-based), defining a large data warehouse (multiple dimensional data marts with conformed dimensions), and developing an ETL data pipeline to populate the data warehouse using a data orchestration tool. A requirement is having previously taken ISE-558 and having good Python and SQL skills. This team is currently meeting with me this semester on Wednesdays at 2:30PM.
- Cloud Analytics Platforms. In my Spring 2024 ISE-543 class we spent about half of the semester learning about the Google Cloud Vertex AI platform. I would like to expand and extend this work to include some of the more advanced features of the platform including its feature store, model registry, lineage features, and advanced MLOps functions related to the deployment, monitoring, and maintenance of models placed into production. I would also like to investigate the AWS Sagemaker and DataBricks platforms to learn their approaches for implementing the functionality in Vertex AI. The ultimate goal would be a paper or chapter of a text book on cloud analytics platforms comparing the different major platforms in use today. Having previously taken my ISE-543 class is very helpful. This team is currently meeting with me this semester on Wednesdays at 5:00PM.
- Time Series Clustering. A small team of students is working on expanding the work from my PhD thesis described here. The first step in the work is to re-build the evaluation environment in Python (from the original Matlab code). Strong Python skills are needed.
- Analytic Dataset Generation. Continuing work done on automating the generation of analytic datasets with specified statistical properties, generating multiple unique datasets for individual students, and automating the grading of student solutions. More background can be found in our paper published in 2023. Strong programming skills in Python or Java Script are required.