Why a Center for Knowledge-Powered Interdisciplinary Data Science at USC?
We live in interesting times. Our ability to study and understand complex phenomena has been profoundly amplified by data. The intriguing patterns we find in data lead us to exciting new questions. Our curiosity is continuously driven to the possibilities that only data uncovers.
The new USC Center for Knowledge-Powered Interdisciplinary Data Science (CKIDS) was established in the Spring of 2019 as a campus-wide organized research unit that serves as a nexus for data science collaborations across schools and departments in the university. USC faculty and students across any departments and schools are working on data science, and CKIDS seeks to connect them with forward-looking data science tools and concepts.
CKIDS also puts forward a unique vision for data science at USC.
From Big Data to Data Science
The swell of data in recent years has been astonishing. We were overwhelmed by the magnitude, heterogeneity, and tempo, and dubbed this data deluge as “ big data ”.
The ubiquity of data is due just as much to technological advances as it is to our insatiable desire to interact with technology. We want to contribute our own data, and to expand the kinds of data that can be collected. We want data to materialize in every possible domain of study, and we want to maximize its impact to improve our world.
The era of big data soon gave way to the realization of the unprecedented challenges of data-rich inquiry. Data science arises from the recognition of data as an object of study that requires the development of effective approaches to harness data.
So just like we have microscopes, telescopes, cyclotrons, incubators, ionizers, mass spectrometers, and a myriad other scientific instruments, we need to develop appropriate instruments to augment, integrate, separate, clean, improve, analyze, visualize, summarize, store, collect, and disseminate data. Computer scientists, mathematicians, and engineers are diligently investigating more capable and more powerful instruments and methodologies for data. But instruments are just one ingredient of successful data-driven research.
Interdisciplinary Data Science
One can look at data de novo, and endeavor to find patterns and exceptions. One can also look at data to test an existing theory or model. Data science techniques enable both approaches.
But data alone does not tell us what the patterns mean or whether an exception is significant. It is a body of existing knowledge that enables us to understand those patterns and exceptions, and place them in context. It is the existing theories and models enable us to ask important questions that can only be tested with data.
Interdisciplinary data science brings together the theories and models from one or more domain disciplines with the instruments and methodologies developed by computer science, mathematics, and engineering.
Interdisciplinary data science is undoubtedly the most profound intellectual challenge of our time for university campuses. There are several reasons for this. The fields of study relevant to a data-rich inquiry are highly fragmented and very hard to bridge. The advanced computing techniques required for data science require high technical expertise that make them unreachable for many. And we are entering new technological territory while studying highly complex real-world problems.
Yet, this is where we need to go if we want to address important questions of societal relevance, whether understanding our planet, preventing disease, enhancing brain function, fighting poverty, trusting information, improving businesses, discovering new materials, or restoring natural resources.
Knowledge-Powered Data Science
When data science is informed by existing models and theories, data really comes alive. Based on their extensive knowledge, domain experts can pose worthy questions, provide powerful observations, justify unusual methods, and validate surprising results. If harnessing data is already hard, harnessing and integrating this knowledge makes data science even more challenging but orders of magnitude more effective.
Knowledge-powered data science focuses on developing effective approaches to harness data that incorporate domain knowledge in order to advance interdisciplinary data science. How do we identify and articulate the knowledge relevant to a data-rich inquiry? How do we express this knowledge in ways that can be incorporated in data science methodologies? How do we develop data science methodologies that can be used directly by domain experts and end users? What kinds of support do researchers in different disciplines need to carry out data science projects efficiently? These are some of the key questions that we are addressing in CKIDS.
Key research areas in knowledge-powered data science include human-guided machine learning, crowdsourcing data models, semi-automated extraction of knowledge graphs, intelligent workflows, natural interfaces, and meaningful data-backed reporting.
CKIDS offers unique methodologies for capturing and integrating knowledge to tackle data science challenges. These methodologies are designed to accelerate the design and implementation of complex interdisciplinary data science projects.
CKIDS: Carpe Data
In CKIDS, we believe it is the pairing of data and knowledge that gives us an edge to discovery.
While data dominates our conversations, the knowledge that domain experts bring to bear is what amplifies our possibilities and expands our ambitions. Data humbles us, but knowledge emboldens us to pursue what is seemingly impossible.
I hope you will join one of our events and contact us to learn about ongoing work at CKIDS!
Yolanda Gil
Director, USC Center for Knowledge-Powered Interdisciplinary Data Science
February 5, 2019