The research collaboration on spoken language grows from a shared interest in deploying real-time dynamic MR imaging to the investigation of human speech production, both in typical speech and in breakdown. The capability for the articulation of spoken language is a unique aspect of the human endowment, yet only recently has real-time MRI provided us with a view of this process inside the body from larynx to lips. These movements are rapid and complex and are the causal events shaping the acoustic speech signal that links speech production and speech perception. MR video imaging provides information about how the articulatory actions of the vocal tract unfold in time and space, and can be deployed to capture variation across languages, across the lifespan, across individuals and across speaking styles, both in health and when impacted by disorder and disease.
Structural magnetic resonance imaging (MRI) is a powerful tool for obtaining vocal tract data without radiation risks. These structural images have good signal-to-noise ratio, are amenable to computerized 3-D modeling, and provide excellent structure differentiation. Vocal tract (airway) area and volume can be directly calculated, and computational and machine learning tools can be deployed to discover underlying control primitives. In 2004, the USC Speech Production and kNowledge (SPAN) group (originated by Narayanan & Byrd) was the first to report the ability to create RT-MRI movies of vocal tract speech production at image reconstruction rates of 24 images per second using gradient echo imaging with a fast interleaved spiral acquisition (Narayanan, Nayak, Lee, Sethy, Byrd 2004); imaging we’ve now made possible at up to 96 images per second (Lingala et al. 2015, 2017). Our hardware and signal-processing innovations have also provided for simultaneous image-synchronized audio recording (Bresch, Nielsen, Nayak, Narayanan 2006), wherein an MRI-specific noise canceling procedure is employed to reduce scan noise. DISC’s 8-channel upper airway coil was designed specifically for our USC speech production MRI studies by Stark Contrast LLC, Erlangen, Germany, and is used with an OptoAcoustics FOMRI-III audio communication, de-noising, and recording system. The Dynamic Imaging Science Center’s research-dedicated high performance, low field (0.55T) MR instrument allows for a quieter environment compared to the loud noise produced by conventional higher-field MRI, yielding better quality acoustic recording and a more natural ambient noise environment for participants. Additionally, the HPLF configuration is highly suitable for speech production applications because it eases a technical limitation of conventional configurations—artifacts at air-tissue interfaces.
The interdisciplinary SPAN group includes faculty and students ranging from theoretical linguists and computer scientists to signal processing specialists and speech and biomedical engineers. In addition to fostering research across departments at USC (e.g., Electrical & Computer Engineering, Linguistics, Computer Science, Neurobiology, Otolaryngology-Head & Neck Surgery, and Radiology). This group has promoted interdisciplinary education and training through integrated exposure to multiple aspects of speech and language research and vigorous knowledge sharing through: overview and tutorial papers for science, engineering and clinical audiences (Lingala et al. 2016, Toutios & Narayanan 2016, Ramanarayanan et al. 2018, Hagedorn et al. 2019, Toutios et al. 2019), invited lectures to broad audiences (e.g., 2017 Zemlin lecture for ASHA-SIG19, 2019 Plenary Lecture at the International Congress on Acoustics), and creation and public dissemination of novel databases and tools that are transforming the practice of speech production research. In addition to the large multimodal articulatory USC TIMIT database (Narayanan et al. 2014), the growing list of publicly-shared resources includes: an emotional speech database (Kim et al. 2014, 2020), a multi-speaker volumetric and RT-MRI database (Sorensen et al. 2017a), a comprehensive (and much accessed) web-based RT-MRI illustration of the IPA chart by prominent phoneticians (Toutios et al. 2016), and most recently the public release of a multi-speaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images, alongside analysis software tools, in Scientific Data (Lim et al. 2021).
Team Members
Dani Byrd, PhD
Professor of Linguistics
Dr. Byrd is an expert in how dynamic MRI imaging can be leveraged to understand the movements of the vocal tract inside the human body during speech production and planning. These movements include the complex geometries and movement trajectories of the tongue, lips, jaw, velum, pharynx, and larynx as used in speaking the languages of the world. She brings to the team a deep experience in how linguistic structure conditions the temporal realization of speech, as articulation cannot be understood independently of linguistic organization of syllables, words and phrases.
Louis Goldstein, PhD
Professor of Linguistics
Dr. Goldstein’s research program for the past thirty-five years has involved developing novel techniques for acquiring, analyzing, and modeling speech kinematics and dynamics, along with pioneering a theoretical framework for the phonological description (Articulatory Phonology) that formally decomposes information content into units of articulatory action, thus making direct contact with these data sources. He has a substantial body of experience developing tools to help linguists leverage RT-MRI to answer fundamental questions in phonetics and phonology.
Khalil Iskarous, PhD
Associate Professor of Linguistics
Dr. Iskarous studies the motor organization of the human speech production system, and is using dynamic MRI imaging to probe the marvel of how we speak. He is especially interested in how advanced machine learning methods can be used to quantify movements of the vocal organs, and how what is learnt about the relationship between speech acoustics and speech motor control from this facility can be used to improve the use of machine learning for documentation and revitalization of under-resourced languages.
Shrikanth (Shri) Narayanan, PhD
University Professor and Niki & Max Nikias Chair in Engineering
Dr. Narayanan’s work focuses on developing and applying engineering methods to understand human communication, interaction and behavior, and to develop technologies that support human experiences. This includes sensing and imaging to illuminate the intricate details of speech production, signal processing and computing to visualize, model and interpret multimodal data and technologies to support novel clinical applications. Movements of the vocal organs, and how what is learnt about the relationship between speech acoustics and speech motor control from this facility, can be used to improve the use of machine learning for documentation and revitalization of under-resourced languages.
Krishna S. Nayak, PhD
Professor of Electrical and Computer Engineering, Biomedical Engineering, and Radiology
Dr. Nayak is an expert at MRI physics and real-time MRI technology. He leads the development of the MRI acquisition and reconstruction methods that are used by this team to visualize the dynamics of vocal tract shaping.
Yijing Lu, MA
PhD student in Linguistics
Yijing Lu’s research focuses on speech production and articulation, especially in atypical speakers. Leveraging vocal tract RT-MRI, her dissertation project aims to reveal the articulatory mechanisms underlying stuttering. The movements of all vocal tract structures during stuttering have never been visualized before. RT-MRI provides an unprecedented opportunity to do so. With the abundance and quality of articulatory data that RT-MRI enables, new insights can be drawn as to the nature of articulatory breakdowns during stuttering, as well as the mechanism of maintaining articulator coordination in fluent speech.