Important Dates

  • April 1, 2012
    Full Paper Submission Deadline
  • June 8, 2012
    Notification of Paper Acceptance
  • June 16, 2012
    Grant Application Deadline
  • June 22, 2012
    Camera-ready Paper Due
  • June 30, 2012
    Early Registration Deadline
    Deadline for Presenters to Register
  • August 8, 2012
    Hotel and Standard Registration Deadline

Join our mailing list!

Organizing Secretariat

Conference Solutions

 

From Stationary to Adaptive Sinusoidal Modeling of Speech with Applications to Speech Technologies and Voice Function Assessment

Overview

This tutorial will discuss a) the passage from the non-adaptive and stationary to the adaptive and non-stationary analysis of speech and b) the use of this speech analysis framework in the analysis of speech, focusing on the pathologic speech processing as well as its potential in speech technologies like speech synthesis and speech modifications. In the first part of the tutorial, novel algorithms for the adaptive speech analysis will be presented and how they are related to the well known sinusoidal representation as well as to non-linear frequency estimators like the Newton-Gauss. The second part will be dedicated to applications like tremor estimation, estimation of jitter and shimmer through a mathematical – sinusoidal based – description, objective evaluation of spasmodic dysphonia and vocal fatigue. Also, the potential that such a speech analysis framework may offer to well known speech technologies will be discussed. Care about the balance between theoretical and application aspects will be taken.
 
The main target audience of the suggested tutorial includes students, researchers, and engineers having specific interests in the recent developments of sinusoidal models, in frequency estimation, in non-linear speech signal processing, in speech synthesis and modifications, and in novel algorithms of signal processing applied in the domain of voice function assessment and pathologic voices.
 

Outline

Part I: Adaptive and Non-stationary Sinusoidal Speech Modeling (2hs)

  • Stationary Sinusoidal Modeling of Speech (Overview: Harmonic w/o Mixed Excitation model, Sinusoidal model)  (30 mins)
  • Frequency Estimation (in Speech): (1h)
    • One/multi tone estimation (non adaptive)
      • Linear approaches: FFT-based
      • Non-linear approaches: Newton-Gauss, Prony-based, …
      • Estimations in Noise (SNR, Cramer-Rao Bound)
    • Adaptive to the signal frequency estimators
      • Parametric approaches
      • Non-parametric approaches
  • Adaptive and Non-stationary Modeling of Speech (30 mins)
 
Part II: Applications (1h)
  • Voice Function Assessment (40mins)
    • Jitter and Shimmer in the sinusoidal modeling
    • Spectral Jitter 
    • Estimation of Tremor
    • Spasmodic dysphonia
  • Speech technology (20 mins)
    • Speech Synthesis, Speech modifications
    • Speech Enhancement

 

Biography

Yannis Stylianou
Computer Science Department, University of Crete

 

 

Yannis Stylianou is Professor at University of Crete, Department of Computer Science, CSD UOC, Associated Researcher in the Signal Processing Laboratory of the Institute of Computer Science ICS at FORTH and visiting Professor at AHOLAB, University of the Basque Country, in Bilbao, Spain (2011-2012). He received the Diploma of Electrical Engineering from the National Technical University, N.T.U.A., of Athens in 1991 and the M.Sc. and Ph.D. degrees in Signal Processing from the Ecole National Superieure des Telecommunications, ENST, Paris, France in 1992 and 1996, respectively. From 1996 until 2001 he was with AT&T Labs Research (Murray Hill and Florham Park, NJ, USA) as a Senior Technical Staff Member. In 2001 he joined Bell-Labs Lucent Technologies, in Murray Hill, NJ, USA (now Alcatel-Lucent). Since 2002 he is with the Computer Science Department at the University of Crete and the Institute of Computer Science at FORTH.
His current research focuses on speech signal processing algorithms for speech analysis, statistical signal processing (detection and estimation), and time-series analysis/modelling. He has (co-)authored more than 100 scientific publications, and 9 US patents, which have received more than 1600 citations (excluding self-citations) with H-index=20. He co-edited the book on “Progress in Non Linear Speech Processing”, Springer-Verlag, 2007 and at Interspeech 2007, he gave a tutorial on Voice Conversion. He is co-organizer of the IEEE Signal Processing Society Winter School on Speech and Audio Processing for Immersive Environments and Future Interfaces (16-20 January 2012, Heraklion, Crete, Greece - http://www.s3p-saie.eu/ ). He has been the P.I. and scientific director of several European and Greek research programs and has been participating as leader in USA research programs.
 
Among other projects, he is currently P.I. of the FP7-FET-OPEN project LISTA: “The Listening Talker”, where the goal is to develop scientific foundations for spoken language technologies based on human communicative strategies. In LISTA, he is charged of speech modelling and speech modifications in order to suggest novel techniques for spoken output generation of artificial and natural speech.  He has created a lab for voice function assessment equipped with high quality instruments for speech and voice recordings (i.e., high-speed camera) for the purpose of basic research in speech and voice, as well for services, in collaboration with the Medical School at the University of Crete.
 
He is on the Board of the International Speech Communication Association (ISCA), and of the IEEE Multimedia Communications Technical Committee. He was member of the IEEE Speech and Language Technical Committee. He is on the Editorial Board of the Digital Signal Processing Journal of Elsevier, of Journal of Electrical and Computer Engineering, Hindawi JECE, Associate Editor of the EURASIP Journal on Speech, Audio, and Music Processing, ASMP, and of the EURASIP Research Letters in Signal Processing, RLSP. He was Associate Editor for the IEEE Signal Processing Letters, Vice-Chair of the Cost Action 2103: "Advanced Voice Function Assessment", VOICE, and on the Management Committee for the COST Action 277: "Nonlinear Speech Processing".

Thank you to our Sponsors

 

 

 

 

“Microsoft is a trademark of the Microsoft group of companies and is used under license from Microsoft.”

 

 

 

http://www.ets.org/

 

 

“Intel” and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other Countries.