Important Dates
Join our mailing list! |
Special Sessions
The 2012 Conference received a number of outstanding special session applications. The following specials sessions have been selected to proceed to the second phase of the selection process. Upon receiving a suitable number of high quality papers accepted to the target special session, the Organizing Committee will finalize the acceptance of the Special Session. Upon approval, the special session organizers will work directly with the Organizing Committee to implement the session at the Conference. Please note that it is possible that some special sessions may not materialize at the end of Step 2.
Paper submitters are required to select an additional submission topic(s) in the event the Special Session is not selected in the second round. Papers will then be dispersed to reviewers in the secondary category and considered for presentation in a standard conference session.
Special Session Proposal InformationSpecial Session 1: Analysis of Spoken Language Disorders for Speech Processing, Forensic and Health Applications Organizer: Marina Nastasenko, Speech Technology Center, Russia
Summary: This speech session focused on speech and language disorders processing aims to promote the importance and goals of this research field in the study of important and interesting questions about interdisciplinary problems, tasks, solutions and conclusions related to:
1. Speech Processing: acoustic and linguistic analysis of speech and language disorders, discourse analysis of pathological occurrences in spoken language, automatic speech recognition and spoken dialogue systems in the presence of pathologies
2. Forensic Applications: speaker identification with speech and language disorders, analysis tools and evaluation systems involving speech and language pathologies
3. Health Applications: diagnostic tools and training systems for medical conditions involving speech and language disorders, coping with spoken defects in man-machine interaction
Special Session 2: Beyond timing: New directions in speech rhythm analysis Organizers: Emily Nava Joseph Tepperman, Rosetta Stone Summary: This special session comes in response to a growing dissatisfaction with the state of speech rhythm analysis. To date the most widely-used speech rhythm metrics propose to capture phonotactic characteristics through the empirical timing patterns of vocalic and consonantal intervals. These measures lack essential analyses of grouping rhythmic units based on perceived phrasal prominence and other hierarchical criteria. The goal of this session is to facilitate dialogue, discuss the potential merits of new directions, and to create collective momentum in the search for rhythm models that subscribe to a more holistic definition that includes prominence, grouping, and other perceptually salient information.
Special Session 3: Exemplar-based methods for Speech Processing Organizers: Jort F. Gemmeke, K.U. Leuven (Belgium), Tara N. Sainath, IBM T.J. Watson Research Center (USA), Bhuvana Ramabhadran, IBM T.J. Watson Research Center (USA) Summary: For the past few decades, speech processing is dominated by the use of all-data based approaches to describe the acoustics (i.e. Gaussian Mixture Models, Neural Networks), which estimate parameters from all training data. With recent advances in computational power, storage and algorithms, however, there is a renewed interested in exemplar-based models to alleviate some of the shortcomings of these all-data methods, including losing the information in individual training samples and the need for large amounts of training data.
Special Session 4: Glottal Source Processing: from Analysis to Applications Organizers: Thomas Drugman, Paavo Alku, B. Yegnanarayana and Abeer Alwan Summary: Most current speech processing systems focus on features generated by the vocal tract. However, the glottal source conveys complementary information which has been shown to be useful for voice pathology detection, emotion/voice quality recognition, speech synthesis, speaker identification, etc. Recent advances in imaging and signal processing techniques have resulted in a number of new and interesting ways to visualize and model the source, as well as to inverse filter the speech signal and study source-related features. The goal of the present special session is to gather people interested in the glottal source to discuss new analysis techniques and investigate their potential use in various speech-technology applications. Special Session 5: New Trends in Vowel Nasalization: The Articulation of Nasal Vowels Organizers: Ryan Shosted and Christopher Carignan Summary: This session will concentrate on the unique problems of studying vowel nasalization through a novel perspective: oro-pharyngeal articulation. A significant challenge in the study of nasal vowels is separating the relative contribution of the oral and naso-pharyngeal tracts to the acoustic output. A growing body of research shows that it is possible to measure differences in the physical configuration of the oro-pharyngeal tract during nasal and oral vowel congeners. These articulatory differences have acoustic consequences relating to the oral/nasal contrast. This research has implications for speech processing, the biomedical diagnosis and treatment of velopharyngeal dysfunction, phonetics, and phonology.
Special Session 6: Non‐Statistical Voice Transformation Organizers: Fernando Villavicencio and Alexander Kain
Summary: Voice Transformation (VT) is a field of increasing interest to the speech signal processing community. VT aims to modify the auditory perception of a speaker in such dimensions as timbre, pitch, speaking style, vocal quality, and speaker mimicry. In recent years, a number of contributions have focused on the statistical modeling and machine learning aspects of the transformation approaches. In contrast, this session aims to provide a distinctive place for the study of the accompanying signal analysis/synthesis issues and to offer an opportunity for discussion among the participants in this field. The session welcomes research work that provides valuable contributions to the study of signal modification for the purposes of VT.
Special Session 7: Prosodic Prominence: Annotation, Prediction, Applications Organizers: Petra Wagner (Bielefeld University, Germany), Fabio Tamburini (Universitá di Bologna, Italy)
Summary: Recent years have shown a renewed interest in various aspects of manual and automatic prominence annotation and its integration into computational systems or technical applications. Also, we are beginning to understand how prominence perception and annotation is influenced by various constraints, ranging from auditory processing to top-down expectancies and multimodality. Among other things, prominence information helps in differentiating speaking styles, conversational settings and inter-speaker dynamics. Furthermore, it provides important cues to periods of high information density within a stretch of speech, thereby facilitating both comprehension and learning.
We invite submissions on all aspects related to prosodic prominence, including
Special Session 8: Multimedia Robust Speech Retrieval in Heterogeneous and Noisy Archives Organizers: Murat Akbacak, Sadaoki Furui
Summary: Several speech retrieval systems have been developed in different domains (broadcast news, meetings, lectures, etc.) in the past, but most assumed the collection is homogeneous in terms of acoustic conditions, lexical/topical content, and noise levels are moderate. As speech retrieval finds its way into real-life applications (e.g., search on internet data or digital library archives) and the amount of data is becoming larger and more heterogeneous, challenges are introduced from acoustic, language, and lexical modeling point of view. This special session aims to bring together researchers working on different aspects of speech retrieval, and motivate them to focus on proposing novel techniques that will work more robustly in heterogeneous and/or noisy collections.
Special Session 9: Speech and Audio Analysis of Consumer and Semi-Professional Multimedia
Organizers: General chairs: Florian Metze, Guillaume Gravier, Murat Akbacak, Xavier Anguera, Dan Ellis, Gareth Jones, Martha Larson, Rohit Prasad
Summary: Media sharing sites and the one-click upload capability of portable devices have led to a deluge of online multimedia content, whose diversity poses significant challenges to the current state-of-the-art in multimedia analytics. Data are mostly generated by users, on non-professional equipment, in the wild, and have little or no manual labeling. Large-scale multimodal analysis of audio-visual material is a key enabler for language understanding, human action recognition, or scene identification algorithms, with applications in media retrieval, robotics, interactive agents, etc. This special session caters to all aspects of speech and audio processing including non-speech audio, multi-media segmentation, retrieval and summarization, multi-modal and cross-modal labeling and training, with the goal of fostering new research directions and building a strong scientific community.
Special Session 10: Speech and Language Technologies for STEM Organizers: Maxine Eskenazi, Abeer Alwan, Diane Litman, Martin Russell, Klaus Zechner
Summary: This session explores the use of speech and language processing in education applied across learning disciplines, especially in the areas of Science, Technology, Engineering and Mathematics (STEM). It will include present work and chart out possible future research directions.
While the focus of this special session is the application of speech and language technologies for STEM applications, we encourage submissions concerned with all topics related to speech and language technologies for education. This includes, but is not limited to, the following topics:
Special Event 1: Speech Processing Tools Organizers: Christoph Draxler, LMU Munich, Germany Nick Campbell, TCD Dublin, Ireland, Martin Cooke, Ikerbasque Bilbao, Spain Han Sloetjes, MPI Nijmegen, The Netherlands Summary: The special session on Speech Processing Tools is about software and web services for speech technology development and research - tools which support tasks relevant to the processing of speech data and which is intended to serve the developer and researcher in his or her daily work. The special session will give tool and service developers a forum to present their software and to obtain academic credit for their important but often unappreciated work. Contributions to this special session should focus on, and critically evaluate, the innovative features, joy of use, stability, appropriateness, availability, and interoperability of the software or web service.
Special Event 2: Speaker Trait Challenge Organizers: Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Felix Burkhardt, Rob Van Son Summary:Whereas the first open comparative challenges in the field of paralinguistics targeted more "conventional" phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. In the last instalment, we focused on speaker states, namely sleepiness and intoxication. Consequently, we now want to focus on speaker traits: the InterSpeech 2012 Speaker Trait Challenge broadens the scope by addressing three less researched speaker traits: the computational analysis of personality, likability, and pathology in speech. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance.
Questions? Contact:
Shri Narayanan
Special Sessions Chair
|



.gif)

.jpg)

.png)
.png)

.png)




.jpg)
