INSTITUTIONAL PARTICIPANTS

Kaavya Sriskandaraja

Associate Lecturer
School of Electrical Engineering & Telecommunication

UNSW Sydney

Kaavya Sriskandaraja is currently an Associate Lecturer of the School of Electrical Engineering and Telecommunications at the University of New South Wales (UNSW Sydney), Australia. Her research interests include speech signal processing and machine learning. Specifically, she focuses on the application of machine learning techniques for addressing speech processing related tasks. She is currently looking into the automation of robust and reliable voice biometric systems and spoofing and anti-spoofing techniques for automatic speaker verification. She received her BSc Engineering degree from the University of Peradeniya, Sri Lanka and her PhD in Electrical Engineering from UNSW Sydney. Kaavya currently lives in Sydney and she has interests in education development and academic-related careers.

Spoofing Countermeasures for Secure and Robust Voice Authentication System: Feature Extraction and Modelling

The ability to employ automatic speaker verification systems without face-to-face contact makes them more prone to malicious spoofing attacks compared to most other biometric systems. The study of spoofing countermeasures has become increasingly important and is currently a critical area of research, which is the principal objective of this thesis. Additionally, as a preliminary work, this thesis aimed to make the automatic speaker verification system (ASV) robust to adverse noise conditions. Thus, the overarching goal of this thesis is to significantly advance the state-of-the-art in automatic speaker verification systems by making them more secure and robust.


Spoofing attacks can be categorised into one of four types: impersonation, replay, voice conversion or speech synthesis. Among these, speech synthesis, voice conversion and replay attacks have been identified as the most effective and accessible forms of spoofing. Accordingly, this thesis investigates and develops a framework to extract the discriminative features to deflect these three attacks.


As a pre-processing step to ASV, a novel self-adaptive voice activity detection algorithm is proposed to make the ASV robust to adverse noise scenarios, which combines the information from statistical modelling of cepstral features along with short term smoothed log energy to determine frame based voicing decisions. In addition, a novel two stage post processing technique is included to improve these voicing decisions significantly.


Investigations are undertaken to analyse the discrimination between spoofed and genuine speech as a function of frequency bands across the speech bandwidth, which in turn informed some novel filter bank designs for discrimination of spoofed speech. In order to capture a richer representation of the spectral content of speech, novel hierarchical scattering decomposition technique based features are proposed to implement effective front-ends for stand-alone spoofing detection. These scattering features are computed through a cascade of wavelet transforms and modulus non-linearity. The results showed that the proposed scattering features were superior to all other front-ends that had previously been benchmarked on the voice conversion, speech synthesis and replay corpora.


Consequently, a hybrid network consisting of a scattering network followed by a convolutional network is investigated. By using scattering layers, the number of parameters to be learned is reduced, and the first layers are guaranteed to be stable. Finally, a novel approach to evaluate the similarities between pairs of speech samples is proposed to detect replayed speech based on a suitable embedding learned by deep Siamese architectures. Siamese networks are particularly suited to this task and have been shown to be effective in problems where intra-class variability is large and the number of training samples per class is relatively small. The internal low-dimensional embedding learnt by the Siamese network to accomplish this task is then used as the basis for replay detection. The proposed Siamese architecture produces state-of-the-art performance when evaluated on the ASVspoof 2017 challenge replay corpus.