June 30, 2012
Early Registration Deadline
Deadline for Presenters to Register
August 8, 2012
Hotel and Standard Registration Deadline
Join our mailing list!
Organizing Secretariat
Privacy-Preserving Speech Processing
Abstract
Speech is one of the most private form of personal communication, yet, current speech processing techniques are not designed to preserve speaker privacy and require complete access to the speech data. In this tutorial we study privacy-preserving techniques for speech processing applications focusing on speaker verification and identification. A speaker verification system uses the speech input to authenticate the user. We discuss privacy-preserving speaker verification, where the system is able to perform authentication without observing the speech input provided by the user and the user does not observe the speech models used by the system. These privacy criteria are important in order to prevent an adversary having unauthorized access to the user's client device or the system data from impersonating the user in another system. We develop two privacy-preserving algorithms for speaker verification: firstly, we use Gaussian mixture models (GMMs) and create a homomorphic encryption based protocol to evaluate GMMs over private data. Secondly, we apply locality sensitive hashing (LSH) and one-way cryptographic functions to reduce the speaker verification problem to private string comparison.
Speaker identification is a related problem where we are interested in identifying the speaker among a given set of speakers best corresponding to a given speech sample. This task finds applications in surveillance applications, where a security agency such as the police has access to a speaker models for individuals, e.g., a set of criminals it is interested in monitoring and an independent party such as a phone company might have access to the phone conversations. The agency is interested in identifying the speaker participating in a given phone conversation among its set of speakers. The agency can demand the complete recording from the phone company if it has a warrant for that person. By using a privacy-preserving speaker identification system, the phone company can provide the privacy guarantee to its subscribers that the agency will not be able to obtain any phone conversation for the speakers that are not under surveillance. Similarly, the agency does not need to send the list of speakers under surveillance to the phone company. Speaker identification can be considered to be an extension of speaker verification to the multiclass setting. We extend the GMM-based and LSH-based approaches to create analogous privacy-preserving speaker identification frameworks.
As this is an emerging field that has not yet been exhaustively studied, we will have the opportunity to cover its theory from first principles. Due to that, the target audience of this tutorial can be very wide and will not be expected to have any prior experience in this area. The target participants of this tutorial are speech researchers without any background in privacy, We hope that this tutorial will enable these researchers to develop privacy-preserving variants of their speech processing algorithms, and help foster cross-pollination between these two research areas.
Outline
Introdcution
Motivations
Overview of the tutorial
Speech Processing Background
Speech-Processing Basics
Gaussian Mixture Models & Hidden Markov Models
Supervector Framework
Algorithms for Speaker Verification
Algorithms for Speaker Identification
Privacy Background
What is Privacy?
Adversarial Models
Tools
Homomorphic Encryption
Cryptographic Hash Functions
Fundamental Protocols
Private Inner Product
Private Comparison
Privacy-Preserving Speaker Verification
Problem Overview
Privacy Issues
Privacy-Preserving Algorithms
using Gaussian Mixture Models
using Supervectors
Experiments
Privacy-Preserving Speaker Identification
Problem Overview
Privacy Issues
Privacy-Preserving Algorithms
using Gaussian Mixture Models
using Supervectors
Experiments
Privacy-Preserving Speech Recognition
Problem Overview
Privacy Issues
Basic Privacy-Preserving Algorithm
using Hidden Markov Models
Experiments
Conclusions
Brief recapitulation
Current trends and ideas
Privacy in other speech processing tasks
Short Bios
Manas A. Pathak Carnegie Mellon University, USA
Manas A. Pathak is a Ph.D. candidate in the Language Technologies Institute at Carnegie Mellon University. He received his M.S. from Carnegie Mellon University in 2009 and his B.Tech in Computer Science from Visvesvaraya National Institute of Technology, Nagpur, India in 2006. He has done internships at PARC, MERL, IBM Research, and has 10 papers published in various conferences and journals and 4 patents pending. His research interests lie at the intersection of data privacy, machine learning, speech processing.
Bhiksha Raj is an associate professor and non-tenured faculty chair at Carnegie Mellon University's Language Technologies Institute, and also holds the position of associate professor by courtesy in the Electrical and Computer Engineering department at CMU. Dr. Raj obtained his PhD from CMU in 2000 and was at Mistubishi Electric Research Laboratories from 2001-2008. Dr. Raj's chief research interests lie in robust automatic speech recognition, machine learning and associated topics. Since 2005 he has also investigated topic models for signal processing, particularly in the context of modelling, enhancing and modifying speech signals, and has published several papers on the topic.