A pattern recognition based esophageal speech enhancement system

Mantilla-Caeiros, A.; Nakano-Miyatake, M.; Perez-Meana, H.

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Journal of applied research and technology

versión On-line ISSN 2448-6736versión impresa ISSN 1665-6423

J. appl. res. technol vol.8 no.1 Ciudad de México abr. 2010

A pattern recognition based esophageal speech enhancement system

A. Mantilla–Caeiros¹, M. Nakano–Miyatake², H. Perez–Meana^*2

¹ Instituto Tecnológico y de Estudios Superiores de Monterrey, Campus Ciudad de México Calle del Puente 222, Ejidos de Huipulco, Tlalpan 14380 Mexico City.

²ESIME Culhuacán, Instituto Politécnico Nacional Av. Santa Ana 1000, Col, San Francisco Culhuacán, 04430 Mexico City. *Email hmperezm@ipn.mx

ABSTRACT

A system for improving the intelligibility and quality of alaryngeal speech based on the replacement of voiced segments of alaryngeal speech with the equivalent segments of normal speech is proposed. To this end, the system proposed identifies the voiced segments of the alaryngeal speech signal by using isolate speech recognition methods, and replaces them by their equivalent voiced segments of normal speech, keeping the silence and unvoiced segments without change. Evaluation results using objective and subjective evaluation methods show that the proposed system proposed provides a fairly good improvement of the quality and intelligibility of alaryngeal speech signals.

Keywords: Speech enhancement, esophageal speech, electronic larynx, multilayer perceptron, voiced and unvoiced segments detection, speech synthesis.

RESUMEN

Este artículo propone un sistema para mejorar la calidad e inteligibilidad de la voz de personas laringetomizadas, el cual se basa en el reemplazo de segmentos vocalizados de voz laringetomizada por segmentos equivalentes de voz normal. Con esta finalidad el sistema identifica los segmentos vocalizados de voz laringetomizada usando técnicas de reconocimiento de comandos aislados de voz, y las reemplaza por los segmentos equivalentes de voz normal, conservando sin cambio los segmentos y los no–vocalizados. Resultados obtenidos usando métodos de evaluación tanto subjetivos como objetivos muestran que el sistema propuesto proporciona una mejoría importante tanto en la calidad como en la inteligibilidad de señales de voz laringetomizada.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgments

We thank the Consejo Nacional de Ciencia y Tecnología (CONACyT) for the support provided during the realization of this research. Also, we would like to thank Dr. Xochiquetzal Hernandez from the Instituto de la Comunicación Humana of the Centro Nacional de la Rehabilitación of Mexico for her assistance during the subjective system evaluation.

References

[1] Barney H., Hawork H. & Dunn F., An experimental transitorized artifcial larynx,.Bell System Technical Journal, Vol. 38, 1959, pp. 1337–1356. [ Links ]

[2] Aguilar G., Nakano–Miyatake M. & Perez–Meana H., Alaryngeal Speech Enhancement Using Pattern Recognition Techniques, IEICE Trans. Inf. & Syst. Vol. E88–D, No. 7, 2005, pp. 1618–1622. [ Links ]

[3] Espy–Wilson, C., Chari V. & Huang C., Enhancement of alaryngeal speech by adaptive filtering, Technical report, Boston University, Boston, MA, 2000. [ Links ]

[4] Becerril H., Nakano–Miyatake M. & Perez–Meana H., Development of an adaptive system for voice enhancement in persons with artificial larynx using DSP, Cientifica, Vol. 8, No. 2, April 2004, pp. 12–20. [ Links ]

[5] Cole D., Sridharan S. & Geva M., Application of noise reduction techniques for alaryngeal speech enhancement, IEEE TECON Speech and Image Processing for Computing and Telecommunications, 1997, pp. 491–494. [ Links ]

[6] K. Matsui and N. Hara, Enhancement of esophageal speech using format synthesis, IEEE International Conference on Acoustic, Speech and Signal Pprocessing, Vo1. 1, 1999, pp. 81–84. [ Links ]

[7] Gorrits M. & Valiere J. , Low–band extension of telephone–band speech, IEEE International Conference on Acoustic, Speech and Signal Processing, 2000, pp. 1851–1854. [ Links ]

[8] Bi N. & Qi Y., Speech conversion and its application to alaryngeal speech enhancement, Proc. of The International Conference on Signal Processing, 1997, pp. 1586–1589. [ Links ]

[9] Bi N. & Qi Y., Application of speech conversion to alaryngeal speech enhancement, IEEE Trans. Speech and Audio Processing, Vol. 5, No. 2, March 1997, pp. 97–105. [ Links ]

[10] Aguilar G., Perez–Meana H., Nakano–Miyatake M. & Becerril H., Speech enhancement of voice produced by an electronic larynx, IEEE Midwest Symposium on Circuit and Systems, Vol. III, August 2004, pp. 37–40. [ Links ]

[11] Rabiner L. & Gold B., Digital processing of speech signals, Prentice Hall, Englewood Cliffs NJ, 1975. [ Links ]

[12] Rabiner L. & Juang B., Fundamentals of Speech Recognition, Prentice Hall, Piscataway, USA, 1993. [ Links ]

[13] Rabiner L., Juang B. & Lee C., An Overview of Automatic Speech Recognition, in Automatic Speech and Speaker Recognition: Advanced Topics, C. H. Lee, F. K. Soong and K. K. Paliwal editors, Kluwer Academic Publisher, 1996, pp. 1–30. [ Links ]

[14] Junqua J. & Halton J., Robustness in Automatic Speech Recognition, Kluwer Academic Publishers, Norwell MA, 1996. [ Links ]

[15] Suarez–Guerra S. & Oropeza–Rodriguez J., Introduction to Speech Recognition, in Advances in Audio and Speech Signal Processing; Technologies and Applications, H Perez–Meana editor, Idea Group Publishing, 2007, pp. 325–347. [ Links ]

[16] Mantilla–Caeiros A., Nakano–Miyatake M. & Perez–Meana H., A New Wavelet Function for Audio and Speech Processing, IEEE Midwest Symposium on Circuit and Systems, August 2007, pp. 101–104. [ Links ]

[17] Zhang X., Heinz M., Bruce I. & Carney L., A phenomenological model for the responses of auditory–nerve fibers: I. Nonlinear tuning with compression and suppression, Acoustical Society of America, Vol. 109, No.2, 2001, pp. 648–670. [ Links ]

[18] Mantilla–Caeiros A., Nakano. MIyatake M. & Perez–Meana H., Isolate speech recognition based on time–frequency analysis methods, Lecture Notes in Computer Science, vol. LNCS 5856, pp. 297–304. [ Links ]

[19] Rao R. & Bopardikar A., Wavelets Transforms, Introduction to Theory and Applications, Addison Wesley, New York, 1998. [ Links ]

[20] Schroeder M., "Objective measure of certain speech signal degradations based on masking properties of the human auditory perception", Frontiers of Speech Communication Research, Academic Press, New York, 1979. [ Links ]

[21] Wang S., Sekey A. & Gersho A., "An objective measure for predicting subjective quality of speech coders," IEEE Journal on Selected Areas in Comm., Vol. 10, No. 3, June 1992, pp. 819–829. [ Links ]