SciELO - Scientific Electronic Library Online

 
vol.15 número1Verificación de hablante en diferentes escenarios de base de datosEvocanto: Programa de cómputo para analizar la voz cantada mediante técnicas de procesamiento digital de señales índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.15 no.1 Ciudad de México jul./sep. 2011

 

Artículos

 

Speaker Verification on Summed–Channel Conditions with Confidence Measures

 

Verificación de locutor en condiciones de canal sumado con medidas de confianza

 

Carlos Vaquero Avilés Casco, Jesús Villalba López, Alfonso Ortega Giménez, and Eduardo Lleida Solano

 

Communications Technology Group (GTC), Aragón Institute for Engineering Research (I3A), University of Zaragoza, Spain. E–mail: cvaquero@unizar.es, villalba@unizar.es, ortega@unizar.es, lleida@unizar.es

 

Article received on July 30, 2010.
Accepted on January 15, 2011.

 

Abstract

This paper addresses the problem of speaker verification in two speaker conversations, proposing a set of confidence measures to assess the quality of a given speaker segmentation. We study how these measures can be used to estimate the performance of a state–of–the–art speaker verification system, the I3A submission for the core–summed condition in the NIST SRE 2010. We present a Factor Analysis based speaker segmentation system, along with three confidence measures that are fused to obtain a single measure that we show to constitute a good estimation of the segmentation accuracy, when evaluated on the summed–channel telephone data of the NIST SRE 2008. Finally we present speaker verification results obtained with the I3A submission for the NIST SRE 2010 on several conditions of this evaluation, involving summed–channel. We show that the confidence measure also predicts the performance of a state–of–the art speaker verification system when it faces two speaker conversations.

Keywords: Confidence measures, speaker segmentation, speaker verification and telephone conversations.

 

Resumen

Este artículo trata el problema de verificación de locutor en conversaciones con dos locutores, proponiendo un conjunto de medidas de confianza para evaluar la calidad de una segmentación de locutores dada. Estudiamos cómo estas medidas pueden ser utilizadas para estimar el rendimiento de un sistema de verificación del locutor del estado del arte, el sistema del I3A para la evaluación de reconocimiento del locutor NIST SRE 2010. Presentamos un sistema de segmentación de locutor basado en Análisis Factorial y tres medidas de confianza que son combinadas en una medida que constituye una buena estimación de la calidad de la segmentación, cuando se evalúa en las grabaciones de canal sumado de la NIST SRE 2008. Finalmente presentamos resultados de verificación de locutor obtenidos con el sistema del I3A en distintas condiciones de canal sumado de la NIST SRE 2010. Se demuestra que las medidas de confianza también predicen el rendimiento de un sistema de verificación del locutor cuando se enfrenta a conversaciones de dos locutores.

Palabras clave: Medidas de confianza, segmentación de locutor, verificación de locutor y conversaciones telefónicas.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Acknowledgements

This work was supported by project TIN2008–06856–C05–04 and FPU program of MEC of the Spanish government.

 

References

1. Bogert, B. P., Healy, M. J. R. & Tukey, J. W. (1963). The quefrency alanysis of time series for echoes: Cepstrum, pseudo–autocovariance, cross–cepstrum and saphe cracking. Symposium on Time Series Analysis, New York, USA, 209–243.         [ Links ]

2. Burget, L., Fapso, M. Hubeika, V., Glembek, O., Karafiát, M., Kockmann, M., Matejka, P., Schwarz, P., & Cernocky, J. (2009). But system for nist 2008 speaker recognition evaluation. Interspeech 2009. Brighton, Great Britain, 2335–2338.         [ Links ]

3. Chen, S. S., & Gopinath, R. A. (2001). Gaussianization. In Todd K. Leen, Thomas G. Dietterich,Volker Tresp (Eds.). Advances in neural information processing systems 13, (423–429), Massachusetts, USA, The MIT Press.         [ Links ]

4. Davis, S. & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.         [ Links ]

5. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal ofthe Royal Statistical Society, Series B, 39 (1), 1–38.         [ Links ]

6. Duda, R. O. & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley.         [ Links ]

7. Furui, S. (1981). Cepstral analysis techniques for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29 (2), 254–272.         [ Links ]

8. Gauvain, J. L. & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2 (2), 291–298.         [ Links ]

9. Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA–PLP speech analysis technique. IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP–92, San Francisco, USA, 1, 121–124.         [ Links ]

10. Lee, C.–H. (1997). A unified statistical hypothesis testing approach to speaker verification and verbal information verification. Proceedings COST,Workshop on Speech Technology in the Public Telephone Network: Where are we today?, Rhodes, Greece, 63–72.         [ Links ]

11. Marcel, S., McCool, C., Matejka, P., Ahonen, T., Cernocky, J. (2010). Mobile biometry (mobio) face and speaker verification evaluation. Retrieved from http://publications.idiap.ch/index.php/publications/show/1848        [ Links ]

12. Mariéthoz, J. & Bengio, S. (2005). A unified framework for score normalization techniques applied to text–independent speaker verification. IEEE Signal Processing Letters, 12 (7), 532–535.         [ Links ]

13. Martin, A.F. & Greenberg, C.S. (2009). NIST 2008 Speaker Recognition Evaluation: Performance across Telephone and Room Microphone Channels. Interspeech 2009, Brighton, United Kingdom, 2579–2582.         [ Links ]

14. McCool, C. & Marcel, S. (2010). Mobio database for the ICPR 2010 face and speech competition. Retrieved from http://publications.idiap.ch/index.php/publications/show/1757        [ Links ]

15. Navratil, J. & Ramaswamy, G.N. (2003). The awe and mystery of t–norm. 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, 2009–2012.         [ Links ]

16. Pelecanos, J. & Sridharan, S. (2001). Feature warping for robust speaker verification. A Speaker Odyssey–The Speaker Recognition Workshop, Crete, Greece, 213–218.         [ Links ]

17. Petrovska–Delacrétaz, D., Hannani, A. E., & Chollet, G. (2007). Text–independent speaker verification: state of the art and challenges. Progress in nonlinear speech processing. Lecture Notes in Computer Science, 4391, 135–169.         [ Links ]

18. Reynolds, D.A. (1992). A Gaussian mixture modeling approach to text–independent speaker identification. Ph.D. dissertation, Georgia Institute of Technology, Atlanta, Georgia, USA.         [ Links ]

19. Reynolds, D.A. (1995), Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17 (1–2), 91–108.         [ Links ]

20. Reynolds, D.A., Quatieri, T. F. & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10 (1–3), 19–41.         [ Links ]

21. Speaker Recognition Evaluation. Retrieved from http://www.itl.nist.gov/iad/mig/tests/sre/        [ Links ]

22. Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front–end feature extraction algorithm; Compression algorithms. ETSI ES 201 108 V1.1.2 (2000–04), 2000.         [ Links ]

23. Viikki, O. & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication–Special issue on robust speech recognition, 25 (1–3), 133–147.         [ Links ]

24. Wald, A. (1947). Sequential analysis. New York: John Wiley and Sons.         [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons