SciELO - Scientific Electronic Library Online

vol.18 issue2Unsupervised Learning for Syntactic DisambiguationEnrichment of Learner Profile with Ubiquitous User Model Interoperability author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.2 México Apr./Jun. 2014 

Artículos regulares


A Gaussian Selection Method for Speaker Verification with Short Utterances


Método de selección de gaussianas para la verificación de locutores con señales cortas


Flavio J. Reyes Díaz, Gabriel Hernández Sierra, and José Calvo de Lara


Advanced Technologies Application Center (CENATAV), La Havana, Cuba.,,



Speaker recognition systems frequently use GMM-MAP method for modeling speakers. This method represents the speaker using a Gaussian mixture. However, in this mixture not all Gaussian components are truly representative of the speaker. In order to remove the model redundancy, this work proposes a Gaussian selection method to achieve a new GMM model only with the more representative Gaussian components. The results of speaker verification experiments applying the proposal show a similar performance to the baseline; however, the speaker models used have a reduction of 80% compared to the speaker model used as the baseline. Our proposal was also applied to speaker recognition system with short test signals of 15, 5 and 3 seconds obtaining an improvement in EER of 0.43%, 2.64% and 1.60%, respectively, compared to the baseline. The application of this method in real or embedded speaker verification systems could be very useful for reducing computational and memory cost.

Keywords: Speaker verification, Gaussian components selection, cumulative vector, short utterance.



Los sistemas de reconocimiento de locutores con frecuencia utilizan el método GMM-MAP para modelar locutores. Sin embargo, en estos modelos no todas las componentes gaussianas son representativas del locutor. Con el fin de eliminar dicha redundancia, proponemos un método de selección de gaussianas obteniendo un nuevo modelo con las componentes gaussianas más representativas. Los resultados experimentales muestran un rendimiento similar a la línea de base, no obstante los modelos obtenidos presentan una reducción del 80% respecto al modelo del locutor utilizado en la línea base. Los métodos propuestos son aplicados sobre señales de prueba más cortas, 15, 5 y 3 segundos; mejorando el EER de 0,43%, 2,64% y 1,60% respectivamente en comparación con la línea base. La aplicación del método propuesto en sistemas reales de verificación podría ser muy útil para reducir el costo computacional y la carga en memoria.

Palabras clave: Verificación de locutores, selección de componentes gaussianas, vector acumulativo, señales cortas.





1. Reynolds, D.A., Quatieri, T.F., & Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1-3), 19-41.         [ Links ]

2. Campbell, W.M., Sturim, D.E., & Reynolds, D.A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308-311.         [ Links ]

3. Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1-38.         [ Links ]

4. Cheng, S.S., Wang, H.M., & Fu, H.C. (2004). A model-selection-based self-splitting Gaussian mixture learning with application to speaker identification. EURASIP Journal on Applied Signal Processing, 2004, 2626-2639.         [ Links ]

5. Chan, A., Ravishankar, M., Rudnicky, A., & Sherwani, J. (2004). Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems. INTERSPEECH 2004-ICSLP, Lisbon, Portugal.         [ Links ]

6. Reynolds, D.A. (2003). Model Compression for GMM based Speaker Recognition Systems. INTERSPEECH 2003, Geneva, Switzerland.         [ Links ]

7. Auckenthaler, R. & Mason, J. (2001). Gaussian selection applied to text-independent speaker verification. Speaker Odyssey: the Speaker Recognition Workshop, Crete, Greece, 83-88.         [ Links ]

8. Xiang, B. & Berger, T. (2003). Efficient text-independent speaker verification with structural Gaussian mixture models and neural network. IEEE Transactions on Speech and Audio Processing, 11(5), 447-456.         [ Links ]

9. Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 277-288.         [ Links ]

10. Roch, M. (2006). Gaussian-selection-based nonoptimal search for speaker identification. Speech Communication, 48(1), 85-95.         [ Links ]

11. Aronowitz, H. & Burshtein, D. (2007). Efficient speaker recognition using approximated cross entropy (ACE). IEEE Transactions on Audio, Speech and Language Processing, 15(7), 20332043.         [ Links ]

12. Liu Q., Huang W., Xu, D., Cai, H., & Dai, B. (2010). A Fast Implementation of Factor Analysis for Speaker Verification. INTERSPEECH 2010 ISCA, Makuhari, Japan, 1077-1080.         [ Links ]

13. Saeidi, R., Mohammadi, H.R.S., Ganchev, T., & Rodman, R.D. (2009). Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 344-353.         [ Links ]

14. Mohammadi, H.R.S. & Saeidi, R. (2006). Efficient implementation of GMM based speaker verification using sorted Gaussian mixture model. 14th European Signal Processing Conference (EUSIPCO'06), Florence, Italy.         [ Links ]

15. Saeidi, R., Kinnunen, T., Mohammadi, H.R.S., Rodman, R., & Franti, P. (2010). Joint frame and gaussian selection for text independent speaker verification. 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, 4530-4533.         [ Links ]

16. Anguera, X. & Bonastre, J.F. (2010). A Novel Speaker Binary Key Derived from Anchor Models. INTERSPEECH 2010, Makuhari, Japan, 21182121.         [ Links ]

17. Reyes, F.J., Calvo, J.R., & Hernández-Sierra, G. (2012). Gaussian selection for speaker recognition using cumulative vectors. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Lecture Notes in Computer Science, 7441, 724-731.         [ Links ]

18. Moreno, A., Comeyne, R., Haslam, K., van den Heuvel, H., Höge, H., Horbach, S., & Micca, G. (2000). SALA: Speechdat across Latin America. Results of the First Phase. Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta.         [ Links ]

19. Ortega-Garcia, J., Gonzalez-Rodriguez, J., Marrero-Aguiar, V., Diaz-Gomez, J.J., Garcia-Jimenez, R., Lucena-Molina, J., & Sanchez-Molero, J.A.G. (1998). AHUMADA: A Large Speech Corpus in Spanish for Speaker Identification and Verification. 1998 IEEE International Conference on Acoustics, Speech and SignalProcessing, Seattle, WA, USA, 2, 773776.         [ Links ]

20. Davis, S.B. & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-28 (4), 357-366.         [ Links ]

21. Campbell, J.P., Jr. (1997). Speaker Recognition: A tutorial. Proceedings of the IEEE, 85(9), 14371462.         [ Links ]

22. Viikki, O. & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 25(1-3), 133-147.         [ Links ]

23. Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transaction on Acoustics, Speech and Signal Processing, 29(2), 254-272.         [ Links ]

24. Reynolds, D.A. (1997). Comparison of background normalization methods for text-independent speaker verification. EUROSPEECH 1997, Rhodes, Greece.         [ Links ]

25. Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. EUROSPEECH 1997, Rhodes, Greece.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License