SciELO - Scientific Electronic Library Online

 
vol.6 issue12Influence of meteorological conditions on short-term radon level in an underground laboratorySugarcane agro-industrial chain conversion in Veracruz México author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Nova scientia

On-line version ISSN 2007-0705

Abstract

DE LUNA-ORTEGA, Carlos A. et al. Speech recognition by using cross correlation and a multilayer perceptron. Nova scientia [online]. 2014, vol.6, n.12, pp.108-124. ISSN 2007-0705.

It this paper we present an algorithmic alternative to the current Automatic Speech Recognition (ASR) systems by proposing a way to characterize words based on approximations that use an extracted coefficient from Linear Predictive Coding (LPC). The method consists in extracting phonetic characteristics through the use of LPC coefficients, after which pattern vectors are formed from the LPC coefficient averages taken from the word sampling, thus creating a unique vector for each pronunciation through the auto correlation of the LPC coefficient sequences. These vectors are used to train a Multilayer Perceptron (MLP) classifier. After training performance trials were executed. The sounds from the digits zero through nine where used as a target vocabulary, given its general use, and to estimate the performance of this method two corpus where used: the UPA corpus, which in its vocabulary uses a pronunciation familiar to the western part of Mexico, and the Tlatoa corpus, who's vocabulary presents a pronunciation typical of the central region of Mexico. The signals from both corpus where sampled in the Spanish language, and at a sampling frequency of 8kHz. The recognition rate for the mono-speaker from the UPA corpus and the multiple-speaker from the Tlatoa corpus were 96.7% and 93.3% respectively. Additionally, there where comparisons done against two classic methods used for speech recognition, Dynamic Time Warping (DTW) and Hidden Markov Models (HMM).

Keywords : automatic speech recognition; cross-correlation; multilayer perceptron; linear predictive coding.

        · abstract in Spanish     · text in Spanish     · Spanish ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License