Improved Text-Independent Speaker Identification Using Fused Mfcc And Imfcc Feature Sets Based On Gaussian Filter

{"references": ["J. P. Cambell, Jr., \"Speaker Recognition:A Tutorial\", Proceedings of\nThe IEEE, vol. 85, no. 9, pp. 1437-1462, Sept. 1997.", "Faundez-Zanuy M. and Monte-Moreno E., \"State-of-the-art in speaker\nrecognition\", Aerospace and Electronic Systems Magazine, IEEE, vol.\n20, No. 5, pp. 7-12, Mar. 2005", "S. B. Davis and P. Mermelstein, \"Comparison of Parametric\nRepresentation for Monosyllabic Word Recognition in Continuously\nSpoken Sentences\", IEEE Trans. On ASSP, vol. ASSP 28, no. 4, pp.\n357-365, Aug. 1980.", "R. Vergin, B. O- Shaughnessy and A. Farhat, \"Generalized Mel\nfrequency cepstral coefficients for large-vocabulary speakeridenpendent\ncontinuous-speech recognition, IEEE Trans. On ASSP,\nvol. 7, no. 5, pp. 525-532, Sept. 1999.", "Harrag A. Mohamadi T., Serignat J.F., \"LDA Combination of Pitch\nand MFCC Features in Speaker Recognition\", Proceedings of\nINDICON 2005, pp. 237-240, 11-13 Dec., IIT Chennai, India, 2005.", "K. Sri Rama Murty and B. Yegnanarayana, \"Combining evidence from\nresidual phase and MFCC features for speaker recognition\", IEEE\nSignal Processing Letters, vol 13, no. 1, pp. 52-55, Jan. 2006.", "Yegnanarayana B., Prasanna S.R.M., Zachariah J.M. and Gupta C. S.,\n\"Combining evidence from source, suprasegmental and spectral\nfeatures for a fixed-text speaker verification system\", IEEE Trans.\nSpeech and Audio Processing, Vol. 13, No. 4, pp. 575-582, July 2005.", "Chakroborty, S., Roy, A. and Saha, G., \"Improved Closed set Text-\nIndependent Speaker Identification by Combining MFCC with\nEvidence from Flipped Filter Banks\". International Journal of Signal\nProcessing, Vol. 4, No. 2, Page(s):114-122, 2007.", "J. Kittler, M. Hatef, R. Duin, J. Mataz, \"On combining classifiers\",\nIEEE Trans. Pattern Anal. Mach. Intell. 20 (1998) 226-239.\n[10] D. Reynolds, R. Rose, \"Robust text-independent speaker identification\nusing gaussian mixture speaker models\", IEEE Trans. Speech Audio\nProcess., vol. 3, no.1, pp. 72-83, Jan. 1995.\n[11] Laurent Besacier and Jean-Francois Bonastre, \"Subband architechute\nfor automatic speaker recognition\", Signal Processing, vol-80, pp.\n1245-1259, 2000.\n[12] R. P. Lippmann, ``Speech recognition by machines and humans\",\nSpeech Communication, vol. 22, No. 1, pp. 1-15, 1997.\n[13] Zheng F., Zhang, G. and Song, Z., \"Comparison of different\nimplementations of MFCC\", J. Computer Science & Technology, vol.\n16 no. 6, pp. 582-589, Sept. 2001.\n[14] Ganchev, T., Fakotakis, N., and Kokkinakis, G. \"Comparative\nEvaluation of Various MFCC Implementations on the Speaker\nVerification Task\", Proc. of SPECOM 2005, October 17-19, 2005.\nPatras, Greece, vol. 1, pp.191-194.\n[15] J. Campbell, \"Testing with the YOHO CDROM voice verification\ncorpus\", ICASSP95, 1995, vol.1 pp. 341-344.\n[16] Petrovska, D., et al. \"POLYCOST: A Telephone-Speech Database for\nSpeaker Recognition\", RLA2C, Avignon, France, April 20-23, 1998,\npp. 211-214.\n[17] D. O- Shaughnessy, Speech Communication Human and Machine,\nAddison-Wesley, New York, 1987.\n[18] Ben Gold and Nelson Morgan, Speech and Audio Signal Processing,\nPart- IV, Chap.14, pp. 189-203, John Willy & Sons ,2002.\n[19] Daniel J. Mashao, Marshalleno Skosan, \"Combining Classifier\nDecisions for Robust Speaker Identification\", Pattern Recog,, vol. 39,\npp. 147-155, 2006.\n[20] A. Papoulis and S. U. Pillai, \"Probability, Random variables and\nStochastic Processes\", Tata McGraw-Hill Edition, Fourth Edition,\nChap. 4, pp. 72-122, 2002.\n[21] Y. Linde, A. Buzo, and R. M. Gray, \"An algorithm for vector quantizer\ndesign\", IEEE Trans. Commun., vol. 28, no. 1, pp. 84-95, 1980.\n[22] Daniel Garcia-Romero, Julian Fierrez-Aguilar, Joaquin Gonzalez-\nRodriguez, Javier Ortega-Garcia, \"Using quality measures for\nmultilevel speaker recognition\", Computer Speech and Language, Vol.\n20, Issue 2-3, pp. 192-209, Apr. 2006,\n[23] S.R. Mahadeva Prasanna, Cheedella S. Gupta b, B. Yegnanarayana,\nExtraction of speaker-specific excitation information from linear\nprediction residual of speech\", Speech Communication, Vol. 48, Issue\n10, pp. 1243- 1261, October 2006.\n[24] H. Melin and J. Lindberg. \"Guidelines for experiments on the polycost\ndatabase\", In Proceedings of a COST 250 workshop on Application of\nSpeaker Recognition Techniques in Telephony, pp. 59- 69, Vigo,\nSpain, November 1996."]}

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. On a recent contribution by authors, it has been shown that the Inverted Mel- Frequency Cepstral Coefficients (IMFCC) is useful feature set for SI, which contains complementary information present in high frequency region. This paper introduces the Gaussian shaped filter (GF) while calculating MFCC and IMFCC in place of typical triangular shaped bins. The objective is to introduce a higher amount of correlation between subband outputs. The performances of both MFCC & IMFCC improve with GF over conventional triangular filter (TF) based implementation, individually as well as in combination. With GMM as speaker modeling paradigm, the performances of proposed GF based MFCC and IMFCC in individual and fused mode have been verified in two standard databases YOHO, (Microphone Speech) and POLYCOST (Telephone Speech) each of which has more than 130 speakers.

Keywords

Triangular Filter, GMM., MFCC, Gaussian Filter, Subbands, Correlation, IMFCC

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average