MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks
Ding, Wenhao; He, Liang;
Subject: Computer Science - Sound | Electrical Engineering and Systems Science - Audio and Speech Processing
In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softm... View more
 N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
 E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. G. Dominguez, “Deep neural networks for small footprint textdependent speaker verification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014.
 C. Zhang and K. Koishida, “End-to-end text-independent speaker verification with triplet loss on short utterances,” in Interspeech, Stockholm, Sweden, 2017.
 S. J. D. Prince and J. H. Elder, “Probabilistic linear discriminant analysis for inferences about identity,” in International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007.
 F. Schroff, D. Kalenichenko, and J. Philbin, “Probabilistic linear discriminant analysis for inferences about identity,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015.
 D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “End-to-end text-independent speaker verification with triplet loss on short utterances,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 2016.
 W. Chen, X. Chen, J. Zhang, and K. Huang, “Beyond triplet loss: A deep quadruplet network for person re-identification,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017.
 H. Alexander, B. Lucas, and L. Bastian, “In Defense of the Triplet Loss for Person Re-Identification,” arXiv preprint arXiv:1703.07737, 2017.
 L. Tran, X. Yin, and X. Liu, “Representation learning by rotating your faces,” arXiv preprint arXiv:1705.11136, 2017.
 A. Makhzani, N. J. J. Shlens, I. Goodfellow, and B. Frey, “Adversarial Autoencoders,” arXiv preprint arXiv:1511.05644, 2015.