Speech Enhancement with Local Adaptive Rank-Order Filtering

Kober, Vitaly; Diaz Ramirez, Victor; Sandoval Ibarra, Yuma

doi:10.13053/CyS-18-1-2014-023

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.1 Ciudad de México Jan./Mar. 2014

https://doi.org/10.13053/CyS-18-1-2014-023

Artículos

Speech Enhancement with Local Adaptive Rank-Order Filtering

Mejora de voz con filtrado local adaptativo basado en estadísticas de orden

Vitaly Kober¹, Victor Diaz Ramirez², and Yuma Sandoval Ibarra²

¹ Computer Science Department, CICESE, Ensenada, B.C., Mexico. vkober@cicese.mx

² Instituto Politécnico Nacional, CITEDI, Tijuana, B.C., Mexico. vdiazr@ipn.mx, juma_san@hotmail.com

Abstract

A local adaptive algorithm for speech enhancement is presented. The algorithm is based on calculation of the rank-order statistics of an input speech signal over a moving window. The algorithm varies the size and contents of a sliding window signal as well as an estimation function employed for recovering a clean speech signal from a noisy signal. The algorithm improves the quality of a speech signal preserving its intelligibility. The performance of the algorithm for suppressing additive noise in an input test speech signal is compared with that of common speech enhancement algorithms in terms of objective metrics.

Keywords. Speech enhancement, local adaptive filtering, rank-order statistics, musical noise, intelligibility.

Resumen

Se presenta un algoritmo localmente adaptativo para la mejora de voz. El algoritmo, se basa en el cálculo de estadísticas de orden prioritario de una señal de voz dentro de una ventana deslizante. El algoritmo es localmente adaptativo ya que puede variar el tamaño y contenido de la señal dentro de la ventana deslizante así como también, la función de estimación usada para la recuperación de la señal limpia a partir de la señal ruidosa. El algoritmo propuesto mejora la calidad de la voz preservando la inteligibilidad del mensaje, e introduciendo únicamente ruido musical imperceptible. El desempeño del algoritmo propuesto es comparado con el desempeño de los algoritmos existentes en términos de varias métricas objetivas.

Palabras clave. Mejora de voz, filtrado local adaptativo, estadísticas de orden prioritario, ruido musical, inteligibilidad.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

1. Oropeza-Rodriguez, J.L. (2006). Algorithms and methods for the automatic speech recognition in spanish language using syllables. Computación y Sistemas, 9(3), 270-286. [ Links ]

2. Ma, J. & Loizou, P.C. (2011). SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech. Speech Communication, 53(3), 340-354. [ Links ]

3. Hu, Y. & Loizou, P.C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229238. [ Links ]

4. Ma, J., Hu, Y., & Loizou, P.C. (2009). Objective measures for predicting speech intelligibility in noisy conditons based on new band-importance functions. The Journal of the Acoustical Society of America, 125(5), 3387-3405. [ Links ]

5. Vaseghi, S.V. (2008). Advanced digital signal processing and noise reduction (4^th ed.). Chichester, U.K.: J. Wiley & Sons. [ Links ]

6. Loizou, P.C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press. [ Links ]

7. Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2098-2108. [ Links ]

8. Scalart, P. & Filho, J.V. (1996). Speech enhancement based on a priori signal to noise estimation. IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA, 2, 629-632. [ Links ]

9. Hansen, J.H.L. & Clements, M.A. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing, 39(4), 795805. [ Links ]

10. Sreenivas, T.V. & Kirnapure, P. (1996). Codebook constrained Wiener filtering for speech enhancement. IEEE Transactions on Speech and Audio Processing, 4(5), 383-389. [ Links ]

11. Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27(2), 113-120. [ Links ]

12. McAulay, R. & Malpass, M. (1980). Speech enhancement using a soft-decision noise suppression filter. IEEE Transactions on Acoustics Speech and Signal Processing, 28(2), 137-145. [ Links ]

13. Gustafsson, H., Nordholm, S.E., & Claesson, I. (2001). Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9(8), 799-807. [ Links ]

14. Hansler, E. (2008). Speech and audio processing in adverse environments. New York: Springer. [ Links ]

15. Astola, J. & Kuosmanen, P. (1997). Fundamentals of nonlinear digital filtering. Boca Raton, Fla.: CRC Press. [ Links ]

16. Yaroslavsky, L. (1996). Fundamentals of digital optics: digital signal processing in optics and holography. Boston: Birkhäuser. [ Links ]

17. Wang, S.S. & Lin, C.F. (1995). Conditional trimmed mean filters and their applicacions for noise removal. Signal Processing, 43(1), 103-109. [ Links ]

18. Coyle, E.J., Lin, J.H., & Gabbouj, M. (1989). Optimal stack filtering and the estimation and structural approaches to image processing. IEEE Transactions on Acoustics Speech and Signal Processing, 37(12), 2037-2066. [ Links ]

19. Gallagher, N.C. Jr. & Wise, G.L. (1981). A theoretical analysis of the properties of median filters. Transactions on Acoustics Speech and Signal Processing, 29(6), 1136-1141. [ Links ]

20. Arce, G.R. & McLoughlin, M.P. (1987). Theoretical analysis of the max/median filter. IEEE Transactions on Acoustics Speech and Signal Processing, 35(1), 60-69. [ Links ]

21. Kober, V.I., Mozerov, M.G., Alvarez-Borrego, J., & Ovseyevich, I.A. (2001). Rank image processing using spatially adaptive neighborhoods. Pattern Recognition and Image Analysis,11(3), 542-552. [ Links ]

22. Huber, P. J. (1981). Robust statistics. New York: Wiley. [ Links ]

23. Breithaupt, C., Gerkmann, T., & Martin, R. (2007). Cepstral smoothing of spectral filter gains for speech enhancement without musical Noise. IEEE Signal Processing Letters, 14(12), 1036-1039. [ Links ]

24. Karahanoglu, F.I., Bayram, I., & Van De-Ville, D. (2011). A signal processing approach to generalized 1-D total variation. IEEE Transactions on Signal Processing, 59(11), 5265-5274. [ Links ]

25. Pomalaza-Raez, C. & McGillem, C.D. (1984). An adaptative, nonlinear edge-preserving filter. IEEE Transactions on Acoustics Speech and Signal Processing, 32(3), 571-576. [ Links ]

26. Duan, Z., Mysore, G.J., & Smaragdis, P. (2012). Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments. 13^th Annual Conference of the International Speech Communication Association (INTERSPEECH 2012), Portland, Oregon, USA. [ Links ]

27. Erkelens, J.S. & Heusdens, R. (2008). Tracking of nonstationary noise based on data-driven recursive noise power Estimation. IEEE Transactions on Audio, Speech and Languge Processing, 16(6), 1112-1123. [ Links ]

28. Christensen, H., Barker, J., Ma, N., & Green, P. (2010). The CHiME corpus: a resource and a challenge for Computational Hearing in Multisource Environments. 11^th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Japan. [ Links ]

29. Ephraim, Y. & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443-445. [ Links ]

30. ITU (2001). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T Recommendation P.862. [ Links ]

31. Taal, C.H., Hendriks, R.C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transactions on Acoustics, Speech and Signal Processing, 19(7), 2125-2136. [ Links ]

32. Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Acoustics, Speech and Signal Processing, 14(4), 1462-1469. [ Links ]

33. Caballero-Morales S. O. & Trujillo-Romero F. (2013). 3D Modeling of the Mexican sign language for a speech-to-sign languaje system. Computación y Sistemas, 17(4), 593-608. [ Links ]