Important Dates

  • April 1, 2012
    Full Paper Submission Deadline
  • June 8, 2012
    Notification of Paper Acceptance
  • June 16, 2012
    Grant Application Deadline
  • June 22, 2012
    Camera-ready Paper Due
  • June 30, 2012
    Early Registration Deadline
    Deadline for Presenters to Register
  • August 8, 2012
    Hotel and Standard Registration Deadline

Join our mailing list!

Organizing Secretariat

Conference Solutions

 

Computational Paralinguistics: Emotion, Affect, and Personality in Speech and Language Processing 

Abstract

When there is spoken or written language, there is paralinguistic information. Social awareness of this information on affect-, personality-, and other states and traits can be expected to be an integral factor in future multi-modal user interfaces and large scale retrieval systems on speech, text, and audiovisual databases. In this vein, the tutorial aims at covering the young fields of automatic recognition of human affect, emotion, personality, and speaker states and traits as reflected in one’s speech or written text. It will first introduce the general topic covering a short history of the field as well as definitions of and examples for basic terms. Next, we will contrast the formal aspects of the linguistic code with the formal aspects of the “non-linguistic”, i.e. paralinguistic, code. The part on functional aspects will cover the most important phenomena such as biological (age, gender) and cultural (regional/foreign accent) trait primitives, as well as the “big” topics personality, emotion, and pathology. Moreover, for these “big” topics, we will address theoretical foundations as well as fundamental aspects (e.g., categorical vs. dimensional modelling). This will be followed by corpus engineering including annotation and selection of units. Moreover, we will introduce important corpora and benchmarks and synthesized speech for training and semi-supervised learning. Next will be signal processing and machine learning aspects including pre-processing, feature extraction, and machine learning algorithms, followed by acoustic and linguistic analyses in isolation or combined within efficient and synergistic fusion. As for integration of the information in a system context, we will discuss standards for paralinguistic information encoding, error-prone prediction results and confidence measurement, real-time issues, application design, and real-life evaluation of systems. Finally, a practical “hands-on” part includes examples employing our open source “openEAR” toolkit for emotion and affect recognition and general speaker classification - the official feature and baseline toolkit as used in the series of Challenges the presenters co-organised at INTERSPEECH since 2009 - as well as data from these Challenges.

The objective of this tutorial is thus to give a comprehensive introduction and broad overview on recent algorithms and methodologies of “real-life” speech processing, focusing on paralinguistic aspects. We will present the facets and nuances that can be extracted from speaker states and states such as affect, emotion, personality, behavioural and social signals, including practical aspects as current datasets and research tools. While it will not be possible to discuss all aspects of “Computational Paralinguistics”, a participant in this tutorial will gain all the skills needed to identify algorithms and tools for solving a particular problem from her/his field.

The main target audience is a broad group of scholars, practitioners, and experts in speech processing, natural language understanding or even human-computer-interaction: “real-life” speech touches any of these fields. The tutorial will assume very little knowledge of signal processing principles (so it will be suitable for the non-specialist), but it will cover many state-of-the-art subjects, so that also the specialist will find it interesting.

 

Outline

1. Introduction: [20 mins]
1.1. What is Computational Paralinguistics?
1.2. History and Subject Area
1.3. Form vs. Function
1.4. Taxonomies
1.5. Generation and Synthesis, Multi-Modality
1.6. Usability and Applications, Ethics

 

2. Formal Aspects [15 mins]
2.1. The Linguistic Code
2.2. The Non-Distinctive Use of Phonetic Elements
2.3. The Non-Distinctive use of Linguistics Elements
2.4. Non-Verbal, Vocal Events

 

3. Functional Aspects [15 mins]
3.1. Biological Trait Primitives
3.2. Cultural Trait Primitives
3.3. Personality
3.4. Emotion
3.5. Pathology
3.6. Non-Verbal Events

 

4. Aspects of Modelling [15 mins]
4.1. Personality, Emotion, and Affect
4.2. Prototypes and Fringe phenomena
4.3. Categories and Dimensions
4.4. Complex Phenomena and Varying Degrees of Markedness

 

5. Corpus Engineering [20 mins]
5.1. Selecting a Corpus (Availability, Suitability)
5.2. Designing a Corpus
5.3. Important Corpora and Benchmarks

 

6. Computational Modelling of Paralinguistics: Overview [15 mins]
6.1. Chain of Processing
6.2. Pre-Processing and Enhancement
6.3. Space Compression
6.4. Static/Dynamic Classification, Regression
6.5. Adaptation, Semi- and Unsupervised Learning
6.6. Early, Late and Hybrid Fusion
6.7. Testing and Interpretation

 

7. Acoustic Analysis [15 mins]
7.1. Acoustic Segmentation
7.2. Continuous Descriptors
7.3. Systematic Feature Generation
7.4. Higher Level Features
7.5. Acoustic and Phonetic Robustness

 

8. Linguistic Analysis [15 mins]
8.1. Capturing
8.2. Tokenisation, Tolerant Mapping and Tagging
8.3. Stopping and Stemming
8.4. Data-based Modelling
8.5. Knowledge Bases
8.6. Out of Vocabulary Resolution

 

9. System Integration and Application [10 mins]
9.1. Standards for Emotion and Personality
9.2. Confidences Measurement
9.3. Real-time Issues and Incremental Processing
9.4. Application Design
9.5. Real-life Evaluation

 

10. “Hands-on”: Existing Toolkits and Practical Tutorial [30 mins]
10.1. Existing Tools
10.2. openEAR
10.3. Databases for Experimentation

 

11. Discussion [10 mins]

Short Biographies 

Björn Schuller, TUM, Munich/Germany 

Björn Schuller received his diploma in 1999 and his doctoral degree for his study on Automatic Speech and Emotion Recognition in 2006, both in electrical engineering and information technology from TUM (Munich University of Technology). He is tenured as Senior Lecturer in Pattern Recognition and Speech Processing heading the Intelligent Audio Analysis Group at TUM’s Institute for Human-Machine Communication since 2006. From 2009 to 2010 he lived in Paris/France and was with the CNRS-LIMSI Spoken Language Processing Group in Orsay/France dealing with affective and social signals in speech. In 2010 he was also a visiting scientist in the Imperial College London's Department of Computing in London/UK working on audiovisual behaviour recognition. In 2011 he was guest lecturer at the Università Politecnica delle Marche (UNIVPM) in Ancona/Italy and visiting researcher of NICTA in Sydney/Australia. Best known are his works advancing Speech Processing, Affective Computing, and Music Information Retrieval. Dr. Schuller (co-)authored 3 books and more than 250 publications (2,300 citations, H-index 25). He serves as member and secretary of the steering committee, associate editor, and guest editor of the IEEE Transactions on Affective Computing, associate editor for the IEEE Transactions on Systems, Man and Cybernetics: Part B and the IEEE Transactions on Neural Networks and Learning Systems, repeated guest editor for the Computer Speech and Language, and guest editor for the IEEE Intelligent Systems Magazine, Speech Communication, Image and Vision Computing, Cognitive Computation, and the EURASIP Journal on Advances in Signal Processing, and as organiser including the INTERSPEECH 2009 Emotion, 2010 Paralinguistic, 2011 Speaker State, and 2012 Speaker Trait Challenges and the 2011 and 2012 Audio/Visual Emotion Challenge and Workshop and programme committee member of more than 30 international workshops and conferences. Steering and involvement in research projects includes the EC funded ASC-Inclusion project as coordinator and the awarded SEMAINE project. Advisory board activities comprise his membership as invited expert in the W3C Emotion Markup Language Incubator Group, and his repeated election into the Executive Committee of the HUMAINE Association where he chairs the Special Interest Group Speech.
Webpage: http://www.schuller.it
Software: http://www.openaudio.eu

 

 

 Anton Batliner, FAU, Erlangen/Germany

Anton Batliner has been a member of the research staff of the Institute for Pattern Recognition since 1997. He is co-editor of one book and author/co-author of more than 200 technical articles, with a current H-index of 30 and more than 3,300 citations. His research interests are the modelling and automatic recognition of emotional user states, all aspects of prosody and paralinguistics in speech processing, uni- and multi-modal focus of attention, pronunciation assessment, and spontaneous speech phenomena such as disfluencies, irregular phonation, etc. He served as Workshop/Session (co)-organizer for Emotional Corpora I, II, III (LREC 2006/2008/2010)), Paralinguistics (ICPhS 07), Non-prototypical Emotions (ACCI 09), Emotion Challenge (INTERSPEECH 2009), Paralinguistic Challenge (INTERSPEECH 2010), Computer Aided Pronunciation Training (Prosody 2010), and Speaker State Challenge (Interspeech 2011); he was guest editor for AHCI, Computer Speech and Language, and Speech Communication, and is Associated Editor for the IEEE Transactions on Affective Computing as well as reviewer for numerous leading journals and conferences.
Webpage: http://www.batliner.de

 

Thank you to our Sponsors

 

 

 

 

“Microsoft is a trademark of the Microsoft group of companies and is used under license from Microsoft.”

 

 

 

http://www.ets.org/

 

 

“Intel” and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other Countries.