Strategies for Protecting the Text-Independent Speaker Verification
Systems against Attacks Using Hidden Markov Model Based Speech
Synthesis and Developing Synthesis Algorithms for More Effective Attacks
Electrical & Electronics Engineering
ABSTRACT
Recently, there has been major progress in text-independent speaker verification systems. Especially, the success of the factor
analysis approach to speaker verification enabled the deployment of those systems in the call centers of banks and telecom
operators. However, despite all the progress and increasing use in our daily lives, the state-of-the-art systems are found to be
vulnerable to attacks with Hidden Markov Model (HMM)-based TTS systems which can synthesize speech with MFCC features
that are commonly used in the speaker verification systems. Indeed, in one paper, it has been found that the false alarm probability
increases 5-folds in a factor analysis based speaker verification system when attacked with an HMM-based speech synthesizer.
Despite the importance of the problem in practical scenarios, there is very limited literature on attacking the speaker verification
systems with speech synthesis. Existing literature only documented the vulnerability of the speaker verifications but there is no
proposed effective solution to the problem. Moreover, there is no literature on improving the HMM-based synthesis methods for
attacking the speaker verification systems more effectively. Therefore, the gap in the literature offers opportunities for research and
publishing papers with high impact.
The proposed project has two goals. The first goal is to develop algorithms to make the speaker verification systems more robust to
attacks with speech synthesis. The algorithms that will be investigated in the project aim to substantially reduce the risk in real-life
use of speaker verification systems. The second goal of the project is to develop synthesis methods such that more effective attacks
to the speaker verification systems can be accomplished with very little data from the target speaker. That way, not only reaching the
first goal will become more difficult, but also, in the process of improving the speech synthesizer, more knowledge and experience
will be gained which will help improve the robustness of the speaker verification systems to attacks.
The results of the methods that will be investigated to achieve the two goals of the project are of major importance to banks and
intelligence organizations that are currently using the speaker verification systems.
2012 National Grants
Moreover, the speech synthesis and adaptation algorithms proposed here are expected to make impact outside the context of this
project. For example, the algorithms proposed to increase the speaker similarity and naturalness of the speech synthesizer are
expected to make impact in the general field of parametric speech synthesis
Four different novel speech synthesis detection mechanisms are proposed in the project. The proposed mechanisms focus on the
differences in discriminative features and excitation signal between natural and synthetic speech. Moreover, unnatural transitions
between speech sounds will also be studied. Furthermore, as a novel approach in speaker verification systems, probabilistic linear
discriminant analysis (PLDA) is proposed to filter out the session variability effects while preserving the variability due to speech
synthesis. On the speech synthesis side, to more effectively deceive the proposed protection mechanisms, algoritms are proposed
to better model the discriminative speech features, create more natural speech excitation, and model a linear relationship between
the MFCC features and discriminative features to make the synthesized speech sound closer to the target speaker. This way,
while trying to increase the robustness of the speaker verification systems, synthesis system will be improved to bypass the new
protection mechanisms and the rate of success in doing that will be investigated.
The project team has been working on speaker verification and rapid adaptation for the HMM-based text-to-speech systems at the
Ozyegin University speech lab since 2009. The speaker verification and text-to-speech system prototypes developed by the team
is in the process of commercialization. Moreover, the team participated in the NIST 2010 Speaker Recognition evaluations and
will again participate in the NIST SRE challenge in 2012. An international collaboration agreement has been signed between the
Ozyegin speech lab and AT&T speech research group in the US to work on the speaker verification and text-to-speech synthesis
technologies. The team has expertise in both domains and has the capacity and infrastructure to create a synergy in the int \