DataFest is a collaborative CKIDS data science project done with GRIDS (Graduates Rising in Information and Data Science). Projects were proposed by USC faculty and researchers through an open call for proposals. The following are the descriptions of the projects that were selected for Spring 2022:
Selected DataFest Spring 2022 Projects
1. Machine Learning Enabled Fault Detection and Diagnosis of Quantum Circuits
Description: This is an interdisciplinary data science project that involves aspects and requires expertise from quantum information theory and machine learning. In this project we plan to develop and implement a novel approach to substantially improving the performance of quantum computers using advancements in the area of machine learning enabled fault detection and diagnosis. We will adapt and further develop existing machine learning protocols to efficiently and reliably detect and diagnose faulty quantum circuits. The protocols are expected to reach beyond the capabilities of current arts in the error diagnosis of quantum circuits, and to provide detailed and transparent information about various sources of errors in the quantum circuits with significantly fewer queries to the quantum circuit and considerably fewer repeated experiments. This project will allow student to learn and acquire expertise in topics that cross quantum information theory, quantum computing, and machine learning.
Skills Needed: Python; ML and quantum information is preferable.
What you will learn: The students will learn and acquire expertise in topics that cross quantum information theory, quantum computing, and machine learning.
Advisor:
- Amir Kalev, Information Sciences Institute, Viterbi School of Engineering
2. Decoding How Humans Encode Memories
Description: Advancements in closed-loop deep brain stimulation (DBS) enabled more intelligent autonomy for therapeutic intervention across a wide range of neurologic and psychiatric disorders. The predominant approach relies on control-theoretic approximations of the brain’s complex functional relationships with the external environment–in particular, a mapping between targeted stimulation and naturalistic responses of different regions of the brain. However, existing approaches fail to capture the environmental context of neuronal biomarkers. Thus, we leverage a set of IoT sensors to capture the human experience and environmental context, i.e., a subset of human sensory channels, in order to estimate the state of the human brain and provide the foundation for smarter, context-dependent DBS. We explore neural-symbolic approaches that integrate the powerful perception capabilities of deep learning with human logic to reason about the complex dependencies across a heterogeneous set of sensors.
Skills Needed: Python, Basic to intermediate Deep Learning
What you will learn: The students will learn how to reason about complex spatiotemporal data across a heterogeneous set of IoT sensors. In particular, they will explore the limitations of state-of-the-art deep learning approaches in terms of reasoning about complex events, e.g., reasoning about the audio, video, and inertial measurement data to detect when a person “walks through a doorway.” The students will also work with real-world patient data.
Advisor:
- Luis Garcia, Information Sciences Institute, Viterbi School of Engineering
3. Characterizing the counter-narratives of climate change
Description: Top climate scientists post their findings and views regularly on social media. These very scientists are met with tweets from those with opposing views, often containing vitriolic and false information. It is important that we can identify and characterize these tweets to understand the counter-narratives of climate change. We will address topics including false information, bot campaigns, and harassment.
Skills Needed: Python, Twitter data collection, Basic classification, Statistics
What you will learn: Data scraping, Machine learning, Text classification, Computational social science
Advisors:
- Deborah Khider, Information Sciences Institute, Viterbi School of Engineering
- Fred Morstatter, Information Sciences Institute, Viterbi School of Engineering
4. Scientific Concept Discovery: Using Machine Learning to Advance Scientific Research
Description: Our group focuses on the question of how to design a learning framework that promote the generalizability of machine learning models. In this project, you will focus on exploring how neural networks acquire information from the training examples and how they learn to solve various physical problems (e.g., emulation of simple quantum systems). The premise of this project is that by observing how a machine learning model learns to solve the specific task, we can learn about the underlying problem itself. As an example, by analyzing the weights of a trained neural network, you can discover non-trivial symmetries of the modeled physical system, determine the relative importance of features, or identify some non-trivial interplay between underlying physical mechanisms. Your task would be to learn various tools for interpreting deep neural networks. You will test them in practice and you will explore methods that promote model transparency and interpretability.
Skills Needed: Python 3.x, TensorFlow 2.x (preferred) or PyTorch, basic Bash shell scripting, a basic understanding of quantum mechanics (not required, but desirable).
What you will learn: How to model physical processes using machine learning algorithms. How to interpret deep neural networks. How to measure feature attribution. How to determine important weights/neurons in the network. How to measure the robustness of the predictions. How to create custom loss functions – and how to visualize loss function landscapes. You will have a chance to participate in weekly research meetings. Working along with us, you will learn about the scientific methods of research. The topic that you will study is novel and timely: you will get the knowledge and skills that can be applied to many domains, outside science as well (e.g., knowing how to interpret a machine learning model is a desirable skill in many industrial endeavors as well).
Advisors:
- Marcin Abram, Information Sciences Institute, Viterbi School of Engineering
5. Characterizing Online Attitudes, Expectations, and Concerns about Novel Medical Treatments
Description: Novel, or hypothesized medical treatments, such as COVID-19 vaccines and male contraception, are regularly discussed on social media. For example, on the AskReddit subreddit, questions of the form “”Would you take [x] if it existed?”” Aside from willingness to use these novel treatments, the answers to these questions contain important clues to peoples’ latent concerns and barriers to adoption of novel medications. Understanding them can provide crucial information about how to introduce, communicate, and counsel about new medications when they come to market. In this project you will use pre-collected Reddit data spanning 10 years to answer questions including: What concerns do individuals have about a novel medication? How do these concerns vary by demographics, such as cultural background? How have these concerns evolved over time? What has caused users to become more or less accepting of the treatment over time?
Skills Needed: Maching Learning; Python
What you will learn: Clustering; Data Analysis; Latent Analysis
Advisors:
- Fred Morstatter, Information Sciences Institute, Viterbi School of Engineering
- Researchers from the Keck School of Medicine
6. Community Economic Tool
Description: Determining what makes a region “most attractive” for new business will involve more exploratory research in determining what variable(s) are most indicative of potential economic growth opportunities. Additional research will be conducted to identify various predictors of our indicator variable, such as unemployment rate and educational level of tract level residents, neighboring tract residents, and broadband accessibility. That is to say, “What does this region of Miami do?” What industries are the largest employers in each region and are they also the ones generating the most revenue? Once each region has been properly identified, one can identify the dominant predictors of economic growth within this region and compare to other tracts with the same dominant industry structure. This will highlight the role geographic regions play in economic development and growth of various industries.
Skills Needed: Python, ML, Statistics
What you will learn: Geospatial analysis and economic modeling
Advisor:
- Palak Agarwal, US Ignite
7. OSINT Social Networks on GitHub
Description: Open-source intelligence (“OSINT”) is a rapidly growing area of cybersecurity. This project seeks to explore OSINT information available on GitHub. We’ll use the GitHub API and related tools to build networks to try to answer a number of interesting questions, such as “can you tell what software a company uses based on its employees networks?”, “do white hat hackers have social networks that look different than black hatters?” and others. If you’re interested in cybersecurity, OSINT, social networks, databases, APIs etc. then this is the project for you!
Skills Needed: Strong Python knowledge required! Graph databases/graphQL/neo4j desired but not required
What you will learn: OSINT gathering tools, cybersecurity tradecraft, social network analysis, data integration
Advisor:
- Jeremy Abramson, Information Sciences Institute, Viterbi School of Engineering
8. Studying Scientific Innovation with Temporal Knowledge Graph Representation Learning
Description: What’s the next big idea, and who’s going to discover it? Our project is trying to understand how researchers make new discoveries and innovate new ideas. To do that we will apply deep learning techniques for temporal knowledge graph learning (RE-NET, CyGNet, HINGE, StarE) to a huge citation network dataset. We have assembled a KG with 260M research papers, 270M authors, 700K fields. To learn representations, our training tasks include citation prediction, author collaboration prediction, and field of study prediction.
Skills Needed: Python, ideally: familiarity with PyTorch/TensorFlow, knowledge graphs like Wikidata
What you will learn: Knowledge graphs, citation network analysis, representation learning
Advisors:
- Dong-Ho Lee,
- Kian Ahrabian,
- Jay Pujara, Information Sciences Institute, Viterbi School of Engineering