Important Dates

  • April 1, 2012
    Full Paper Submission Deadline
  • June 8, 2012
    Notification of Paper Acceptance
  • June 16, 2012
    Grant Application Deadline
  • June 22, 2012
    Camera-ready Paper Due
  • June 30, 2012
    Early Registration Deadline
  • August 8, 2012
    Hotel and Standard Registration Deadline

Join our mailing list!

With Support From

ISCA Speech

Organizing Secretariat

Conference Solutions

 

Speaker Trait Challenge

Personality, Likability, Pathology

Official Challenge Website

 

The Challenge:

Whereas the first open comparative challenges in the field of paralinguistics targeted more "conventional" phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. In the last instalment, we focused on speaker states, namely sleepiness and intoxication. Consequently, we now want to focus on speaker traits: the InterSpeech 2012 Speaker Trait Challenge broadens the scope by addressing three less researched speaker traits: the computational analysis of personality, likability, and pathology in speech. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance.

 

For these Challenge tasks, the SPEAKER PERSONALITY CORPUS (SPC), the SPEAKER LIKABILITY DATABASE (SLD), and the VOICE PATHOLOGY CORPUS (VPC) with high diversity of speakers of different personality and likability and genuine pathologies will be provided by part of the organisers. The first – SPC – consists of French speech from 330 speakers labelled by 11 judges with standardised personality assessment tests, and will serve to evaluate features and algorithms for the estimation of speakers' personality traits in the popular "Big Five" OCEAN dimensions (openness, conscientiousness, extraversion, agreeableness, and neuroticism). The second – SLD – bases on the aGender corpus as employed in the InterSpeech 2010 Paralinguistic Challenge. Likability annotations by 32 labellers were added for 800 speakers in perfect age class and gender balance. Finally, VPC provides Dutch speech from 40 speakers with head and neck cancer (tumours located in the vocal tract and larynx) recorded before and at various times after treatment. VPC was created within the scope of an unrestricted research grant of Atos Medical, Sweden. The corpora feature further rich annotation such as speaker meta-data, orthographic transcript, phonemic transcript, and segmentation and multiple annotation tracks. All three are given with distinct definitions of test, development, and training partitions, incorporating speaker independence as needed in most real-life settings. Benchmark results of the most popular approaches will be provided.

 

Three Sub-Challenges are addressed:

  • In the Personality Sub-Challenge, the personality of a speaker has to be determined based on acoustics potentially including linguistics above or below average for the OCEAN five personality dimensions.
  • In the Likability Sub-Challenge, the likability of a speaker's voice has to be determined by a suited learning algorithm and acoustic features. Two classes have to be recognized accordingly: likability above or below average.
  • In the Pathology Sub-Challenge, the intelligibility of a speaker has to be determined by a suited classification algorithm and acoustic features.

 

Annotation of the train and development sets will be known. All Sub-Challenges allow contributors to find their own features with their own machine learning algorithm. However, a standard feature set will be provided per corpus that may be used. Participants will have to stick to the definition of training, development, and test sets. They may report on results obtained on the development set, but have only five trials to upload their results on the test sets, whose labels are unknown to them. Each participation will be accompanied by a paper presenting the results that undergoes peer-review and has to be accepted for the conference in order to participate in the Challenge. The organizers preserve the right to re-evaluate the findings, but will not participate themselves in the Challenge. Participants are encouraged to compete in all Sub-Challenges.

 

Overall, contributions using the provided or an equivalent database are sought in (but not limited to) the following areas:

  • Participation in the Personality Sub-Challenge
  • Participation in the Likability Sub-Challenge
  • Participation in the Pathology Sub-Challenge
  • Novel features and algorithms for the analysis of speaker traits
  • Unsupervised learning methods for speaker trait analysis
  • Perception studies, additional annotation and feature analysis on the given sets
  • Context exploitation in speaker trait assessment

 

The results of the Challenge shall be presented at InterSpeech 2012 in Portland, Oregon.

 

Literature on the Predecessor Events and the Corpora used for the Challenge:

  • B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski: "The InterSpeech 2011 Speaker State Challenge", Proc. InterSpeech 2011, ISCA, Florence, Italy, pp. 3201-3204, 2011.
  • B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan: "The InterSpeech 2010 Paralinguistic Challenge", Proc. InterSpeech 2010, ISCA, Makuhari, Japan, pp. 2794-2797, 2010.
  • B. Schuller, S. Steidl, A. Batliner: "The InterSpeech 2009 Emotion Challenge", Proc. InterSpeech 2009, ISCA, Brighton, UK, pp. 312-315, 2009.
  • B. Schuller, A. Batliner, S. Steidl, D. Seppi: "Recognising Realistic Emotions and Affect in Speech: State of the Art and Lessons Learnt from the First Challenge", Speech Communication, ELSEVIER, Vol. 53, No. 9/10, pp. 1062-1087, November/December 2011.
  • F. Burkhardt, B. Schuller, B. Weiss, F. Weninger: "'Would You Buy A Car From Me?' – On the Likability of Telephone Voices", Proc. InterSpeech 2011, ISCA, Florence, Italy, pp. 1557-1560, 2011.
  • G. Mohammadi, M. Mortillaro, A. Vinciarelli: "The Voice of Personality: Mapping Nonverbal Vocal Behavior into Trait Attributions", Proc. International Workshop on Social Signal Processing, pp. 17-20, ACM, Florence, 2010.

 

Organizers:

BJÖRN W. SCHULLER received his diploma in 1999 and his doctoral degree in 2006, both in electrical engineering and information technology from TUM (Munich University of Technology/Germany). He is tenured as Senior Lecturer in Pattern Recognition and Speech Processing heading the Intelligent Audio Analysis Group at TUM’s Institute for Human-Machine Communication since 2006. From 2009 to 2010 he lived in Paris/France and was with the CNRS-LIMSI Spoken Language Processing Group in Orsay/France. In 2010 he was also a visiting scientist in the Imperial College London's Department of Computing in London/UK. In 2011 he was guest lecturer at the Università Politecnica delle Marche (UNIVPM) in Ancona/Italy. Best known are his works advancing Semantic Audio and Audiovisual Processing and Affective Computing. Dr. Schuller (co-)authored 2 books and 240 publications in the field leading to more than 2,000 citations - his current H-index equals 24. He serves as member and secretary of the steering committee, associate editor, and repeated guest editor of the IEEE Transactions on Affective Computing, associate and repeated guest editor for the Computer Speech and Language, associate editor for the IEEE Transactions on Neural Networks and Learning Systems and guest editor for the Speech Communication, Image and Vision Computing, Cognitive Computation, and the EURASIP Journal on Advances in Signal Processing, and as challenge organizer including the first of their kind InterSpeech 2009 Emotion, 2010 Paralinguistic, 2011 Speaker State Challenges and the 2011 Audio/Visual Emotion Challenge and Workshop and chairman and programme committee member of numerous further international workshops and conferences. Steering and involvement in current and past research projects includes the European Community funded ASC-Inclusion STREP project as coordinator and the awarded SEMAINE project, and projects funded by the German Research Foundation (DFG) and companies such as BMW, Continental, Daimler, HUAWEI, Siemens, Toyota, and VDO. Advisory board activities comprise his membership as invited expert in the W3C Emotion Incubator and Emotion Markup Language Incubator Groups, and his repeated election into the Executive Committee of the HUMAINE Association where he chairs the Special Interest Group Speech. 

 

STEFAN STEIDL received his diploma degree in Computer Science in 2002 at the Friedrich-Alexander University of Erlangen-Nuremberg in Germany, where he also received his doctoral degree in 2008 for his work on Automatic Classification of Emotion-Related User States in Spontaneous Children.s Speech and currently  pursues his state doctorate (habilitation). He was last a research scholar at the International Computer Science Institute (ICSI) at Berkeley, CA, U.S.A. His primary research interests are the automatic classification of naturally occurring emotion-related states of users in a human-machine interaction and the recognition of atypical speech (children's speech, speech of elderly people, pathological voices). Dr. Steidl has (co-)authored more than 60 publications in journals and peer reviewed conference proceedings. His current H-index is 19. Dr. Steidl has co-organized the special sessions InterSpeech 2009 Emotion Challenge., `InterSpeech 2010 Paralinguistic Challenge., and .InterSpeech 2011 Speaker State Challenge., and was guest editor for special issues in the Computer Speech and Language and Speech Communication ISCA journals. He has served as reviewer for several journals and conferences in this area of research and has been a member of the Network-of-Excellence HUMAINE (Human Machine Interaction Network on Emotion) in the 7th framework programme of the European Community.

 

ANTON BATLINER has been a member of the research staff of the Institute for Pattern Recognition since 1997.  He is co-editor of one book and author/co-author of more than 200 technical articles, with a current H-index of 29 and more than 3000 citations. His research interests are the modelling and automatic recognition of emotional user states, all aspects of prosody and paralinguistics in speech processing, uni- and multi-modal focus of attention, pronunciation assessment, and spontaneous speech phenomena such as disfluencies, irregular phonation, etc. He served as Workshop/Session (co)-organizer for Emotional Corpora I, II, III (LREC 2006/2008/2010)), Paralinguistics (ICPhS 07), Non-prototypical Emotions (ACCI 09), Emotion Challenge (InterSpeech 2009), Paralinguistic Challenge (InterSpeech 2010), Computer Aided Pronunciation Training (Prosody 2010), and Speaker State Challenge (Interspeech 2011); he was guest editor for AHCI, Computer Speech and Language, and Speech Communication, and is Associated Editor for the IEEE Transactions on Affective Computing as well as reviewer for numerous leading journals and conferences.

 

ELMAR NÖTH obtained his `Diplom' in Computer Science and his doctoral degree at the University of FAU Erlangen-Nuremberg in 1985 and 1990, respectively. From 1985 to 1990 he was a member of the research staff of the Institute for Pattern Recognition (Lehrstuhl fur Mustererkennung), working on the use of prosodic information in automatic speech understanding. Since 1990 he is a professor at the same institute and head of the speech group. He is one of the founders of the Sympalog company, which markets conversational dialogue systems.

 

ALESSANDRO VINCIARELLI is Lecturer at the University of Glasgow (UK) and Senior Researcher at the Idiap Research Institute (Switzerland). After earning his PhD in Applied Mathematics at the University of Bern (Switzerland), he has spent seven years at the Idiap Research Institute where he has worked on Social Signal Processing. Since 2010, he is with the School of Computing Science of the University of Glasgow where he continues his research on automatic analysis of social signals in human behaviour, including the detection of personality markers, automatic role recognition in multiparty conversations, conflict analysis and automatic proxemics understanding. He is author and co-author of more than 60 technical papers, including 20 journal papers and one authored book. His works have been cited more than 1100 times. Dr. Vinciarelli is coordinator of the SSPNet, the FP7 European Network of Excellence on Social Signal Processing (www.sspnet.eu) and head of IM2.SSP, the Swiss research initiative on Social Signal Processing (www.im2.ch). Furthermore, he is, or has been, Principal Investigator of five national (Swiss) and international projects on Multimedia retrieval and Behaviour Analysis. Last, but not least, he is the co-founder of Klewel (www.klewel.com), a multimedia retrieval company awarded with several national and international prizes (Swiss Venture Leaders, European Seal of Excellence, IMD E-MBA Start-up) and including among its customers Nestle, the United Nations, ACM, the Swiss Federal Polytechnic Institute and several other prestigious institutions. Dr. Vinciarelli was program chair of the IEEE International Conference on Social Computing 2011 (SocialCom 2011) and area chair of the 2011 International Conference on Multimodal Interfaces (ICMI 2011). Furthermore, he has been initiator and co-organizer of the Social Signal Processing Workshop (SSPW 2009-2011), of the International Workshop on Socially Intelligent Surveillance and Monitoring (SISM 2010-2011) and of the Workshop on Human Behaviour Understanding (HBU 2010-2011). He is Associate Editor of the IEEE Signal Processing Magazine (for the Social Sciences) and of the Journal of Discourse, Context and Media (Elsevier).

 

FELIX BURKHARDT has a longstanding background in language technology. Originally an expert of Speech Synthesis at the Technical University of Berlin, he has been working for the Deutsche Telekom AG since 2000. He does tutoring, consulting, research, and development in the working fields VoiceXML based Voice-Portal architectures, Text-to-Speech synthesis, speaker classification, ontology based language modelling, and emotional human-machine interfaces. He has been a member of the European Network of Excellence HUMAINE on emotion-oriented computing and the W3C Emotion Markup Language Incubator Group.

 

ROB VAN SON (1960) studied Biology at Nijmegen University from 1978-1984. He obtained his PhD from the University of Amsterdam in 1993 (Spectro-temporal features of vowel segments). He worked at the Institute of Phonetic Sciences of the University of Amsterdam on various NWO Post-Doc projects (on consonant reduction (1993-1997) and on speech efficiency (1999-2002)). In 2004 he received a NWO VIDI grant for his project on "Integration of information in spoken communication". Since 2009, he has been appointed as senior researcher at the Department of Head and Neck Oncology and Surgery of the Netherlands Cancer Institute to coordinate the research on the use of speech technology for the evaluation of voice communication disorders and its implementation in the treatment and rehabilitation in patients with Head and Neck Cancer.

 

Sponsors:

HUMAINE Association

Telekom Innovation Laboratories

Thank you to our Sponsors

 

 

 

 

 

 

http://www.ets.org/

 

 

“Intel” and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other Countries.