Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Computación y Sistemas
versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546
Comp. y Sist. vol.15 no.1 Ciudad de México jul./sep. 2011
Artículos
Speaker Verification on SummedChannel Conditions with Confidence Measures
Verificación de locutor en condiciones de canal sumado con medidas de confianza
Carlos Vaquero Avilés Casco, Jesús Villalba López, Alfonso Ortega Giménez, and Eduardo Lleida Solano
Communications Technology Group (GTC), Aragón Institute for Engineering Research (I3A), University of Zaragoza, Spain. Email: cvaquero@unizar.es, villalba@unizar.es, ortega@unizar.es, lleida@unizar.es
Article received on July 30, 2010.
Accepted on January 15, 2011.
Abstract
This paper addresses the problem of speaker verification in two speaker conversations, proposing a set of confidence measures to assess the quality of a given speaker segmentation. We study how these measures can be used to estimate the performance of a stateoftheart speaker verification system, the I3A submission for the coresummed condition in the NIST SRE 2010. We present a Factor Analysis based speaker segmentation system, along with three confidence measures that are fused to obtain a single measure that we show to constitute a good estimation of the segmentation accuracy, when evaluated on the summedchannel telephone data of the NIST SRE 2008. Finally we present speaker verification results obtained with the I3A submission for the NIST SRE 2010 on several conditions of this evaluation, involving summedchannel. We show that the confidence measure also predicts the performance of a stateofthe art speaker verification system when it faces two speaker conversations.
Keywords: Confidence measures, speaker segmentation, speaker verification and telephone conversations.
Resumen
Este artículo trata el problema de verificación de locutor en conversaciones con dos locutores, proponiendo un conjunto de medidas de confianza para evaluar la calidad de una segmentación de locutores dada. Estudiamos cómo estas medidas pueden ser utilizadas para estimar el rendimiento de un sistema de verificación del locutor del estado del arte, el sistema del I3A para la evaluación de reconocimiento del locutor NIST SRE 2010. Presentamos un sistema de segmentación de locutor basado en Análisis Factorial y tres medidas de confianza que son combinadas en una medida que constituye una buena estimación de la calidad de la segmentación, cuando se evalúa en las grabaciones de canal sumado de la NIST SRE 2008. Finalmente presentamos resultados de verificación de locutor obtenidos con el sistema del I3A en distintas condiciones de canal sumado de la NIST SRE 2010. Se demuestra que las medidas de confianza también predicen el rendimiento de un sistema de verificación del locutor cuando se enfrenta a conversaciones de dos locutores.
Palabras clave: Medidas de confianza, segmentación de locutor, verificación de locutor y conversaciones telefónicas.
DESCARGAR ARTÍCULO EN FORMATO PDF
Acknowledgements
This work was supported by project TIN200806856C0504 and FPU program of MEC of the Spanish government.
References
1. Bogert, B. P., Healy, M. J. R. & Tukey, J. W. (1963). The quefrency alanysis of time series for echoes: Cepstrum, pseudoautocovariance, crosscepstrum and saphe cracking. Symposium on Time Series Analysis, New York, USA, 209243. [ Links ]
2. Burget, L., Fapso, M. Hubeika, V., Glembek, O., Karafiát, M., Kockmann, M., Matejka, P., Schwarz, P., & Cernocky, J. (2009). But system for nist 2008 speaker recognition evaluation. Interspeech 2009. Brighton, Great Britain, 23352338. [ Links ]
3. Chen, S. S., & Gopinath, R. A. (2001). Gaussianization. In Todd K. Leen, Thomas G. Dietterich,Volker Tresp (Eds.). Advances in neural information processing systems 13, (423429), Massachusetts, USA, The MIT Press. [ Links ]
4. Davis, S. & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357366. [ Links ]
5. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal ofthe Royal Statistical Society, Series B, 39 (1), 138. [ Links ]
6. Duda, R. O. & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley. [ Links ]
7. Furui, S. (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29 (2), 254272. [ Links ]
8. Gauvain, J. L. & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2 (2), 291298. [ Links ]
9. Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTAPLP speech analysis technique. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP92, San Francisco, USA, 1, 121124. [ Links ]
10. Lee, C.H. (1997). A unified statistical hypothesis testing approach to speaker verification and verbal information verification. Proceedings COST,Workshop on Speech Technology in the Public Telephone Network: Where are we today?, Rhodes, Greece, 6372. [ Links ]
11. Marcel, S., McCool, C., Matejka, P., Ahonen, T., Cernocky, J. (2010). Mobile biometry (mobio) face and speaker verification evaluation. Retrieved from http://publications.idiap.ch/index.php/publications/show/1848 [ Links ]
12. Mariéthoz, J. & Bengio, S. (2005). A unified framework for score normalization techniques applied to textindependent speaker verification. IEEE Signal Processing Letters, 12 (7), 532535. [ Links ]
13. Martin, A.F. & Greenberg, C.S. (2009). NIST 2008 Speaker Recognition Evaluation: Performance across Telephone and Room Microphone Channels. Interspeech 2009, Brighton, United Kingdom, 25792582. [ Links ]
14. McCool, C. & Marcel, S. (2010). Mobio database for the ICPR 2010 face and speech competition. Retrieved from http://publications.idiap.ch/index.php/publications/show/1757 [ Links ]
15. Navratil, J. & Ramaswamy, G.N. (2003). The awe and mystery of tnorm. 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, 20092012. [ Links ]
16. Pelecanos, J. & Sridharan, S. (2001). Feature warping for robust speaker verification. A Speaker OdysseyThe Speaker Recognition Workshop, Crete, Greece, 213218. [ Links ]
17. PetrovskaDelacrétaz, D., Hannani, A. E., & Chollet, G. (2007). Textindependent speaker verification: state of the art and challenges. Progress in nonlinear speech processing. Lecture Notes in Computer Science, 4391, 135169. [ Links ]
18. Reynolds, D.A. (1992). A Gaussian mixture modeling approach to textindependent speaker identification. Ph.D. dissertation, Georgia Institute of Technology, Atlanta, Georgia, USA. [ Links ]
19. Reynolds, D.A. (1995), Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17 (12), 91108. [ Links ]
20. Reynolds, D.A., Quatieri, T. F. & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10 (13), 1941. [ Links ]
21. Speaker Recognition Evaluation. Retrieved from http://www.itl.nist.gov/iad/mig/tests/sre/ [ Links ]
22. Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Frontend feature extraction algorithm; Compression algorithms. ETSI ES 201 108 V1.1.2 (200004), 2000. [ Links ]
23. Viikki, O. & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech CommunicationSpecial issue on robust speech recognition, 25 (13), 133147. [ Links ]
24. Wald, A. (1947). Sequential analysis. New York: John Wiley and Sons. [ Links ]