TTO_Grant Catalogue Grant Catalogue | Page 28

Strategies for Protecting the Text-Independent Speaker Verification Systems against Attacks Using Hidden Markov Model Based Speech Synthesis and Developing Synthesis Algorithms for More Effective Attacks Electrical & Electronics Engineering ABSTRACT Recently, there has been major progress in text-independent speaker verification systems. Especially, the success of the factor analysis approach to speaker verification enabled the deployment of those systems in the call centers of banks and telecom operators. However, despite all the progress and increasing use in our daily lives, the state-of-the-art systems are found to be vulnerable to attacks with Hidden Markov Model (HMM)-based TTS systems which can synthesize speech with MFCC features that are commonly used in the speaker verification systems. Indeed, in one paper, it has been found that the false alarm probability increases 5-folds in a factor analysis based speaker verification system when attacked with an HMM-based speech synthesizer. Despite the importance of the problem in practical scenarios, there is very limited literature on attacking the speaker verification systems with speech synthesis. Existing literature only documented the vulnerability of the speaker verifications but there is no proposed effective solution to the problem. Moreover, there is no literature on improving the HMM-based synthesis methods for attacking the speaker verification systems more effectively. Therefore, the gap in the literature offers opportunities for research and publishing papers with high impact. The proposed project has two goals. The first goal is to develop algorithms to make the speaker verification systems more robust to attacks with speech synthesis. The algorithms that will be investigated in the project aim to substantially reduce the risk in real-life use of speaker verification systems. The second goal of the project is to develop synthesis methods such that more effective attacks to the speaker verification systems can be accomplished with very little data from the target speaker. That way, not only reaching the first goal will become more difficult, but also, in the process of improving the speech synthesizer, more knowledge and experience will be gained which will help improve the robustness of the speaker verification systems to attacks. The results of the methods that will be investigated to achieve the two goals of the project are of major importance to banks and intelligence organizations that are currently using the speaker verification systems. 2012 National Grants Moreover, the speech synthesis and adaptation algorithms proposed here are expected to make impact outside the context of this project. For example, the algorithms proposed to increase the speaker similarity and naturalness of the speech synthesizer are expected to make impact in the general field of parametric speech synthesis Four different novel speech synthesis detection mechanisms are proposed in the project. The proposed mechanisms focus on the differences in discriminative features and excitation signal between natural and synthetic speech. Moreover, unnatural transitions between speech sounds will also be studied. Furthermore, as a novel approach in speaker verification systems, probabilistic linear discriminant analysis (PLDA) is proposed to filter out the session variability effects while preserving the variability due to speech synthesis. On the speech synthesis side, to more effectively deceive the proposed protection mechanisms, algoritms are proposed to better model the discriminative speech features, create more natural speech excitation, and model a linear relationship between the MFCC features and discriminative features to make the synthesized speech sound closer to the target speaker. This way, while trying to increase the robustness of the speaker verification systems, synthesis system will be improved to bypass the new protection mechanisms and the rate of success in doing that will be investigated. The project team has been working on speaker verification and rapid adaptation for the HMM-based text-to-speech systems at the Ozyegin University speech lab since 2009. The speaker verification and text-to-speech system prototypes developed by the team is in the process of commercialization. Moreover, the team participated in the NIST 2010 Speaker Recognition evaluations and will again participate in the NIST SRE challenge in 2012. An international collaboration agreement has been signed between the Ozyegin speech lab and AT&T speech research group in the US to work on the speaker verification and text-to-speech synthesis technologies. The team has expertise in both domains and has the capacity and infrastructure to create a synergy in the int \