Speech recognition by using cross correlation and a multilayer perceptron

de Luna-Ortega, Carlos A.; Mora-González, Miguel; Martínez-Romo, Julio C.; Luna-Rosas, Francisco J.; Muñoz-Maciel, Jesús

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Permalink

Nova scientia

versão On-line ISSN 2007-0705

Resumo

DE LUNA-ORTEGA, Carlos A. et al. Speech recognition by using cross correlation and a multilayer perceptron. Nova scientia [online]. 2014, vol.6, n.12, pp.108-124. ISSN 2007-0705.

It this paper we present an algorithmic alternative to the current Automatic Speech Recognition (ASR) systems by proposing a way to characterize words based on approximations that use an extracted coefficient from Linear Predictive Coding (LPC). The method consists in extracting phonetic characteristics through the use of LPC coefficients, after which pattern vectors are formed from the LPC coefficient averages taken from the word sampling, thus creating a unique vector for each pronunciation through the auto correlation of the LPC coefficient sequences. These vectors are used to train a Multilayer Perceptron (MLP) classifier. After training performance trials were executed. The sounds from the digits zero through nine where used as a target vocabulary, given its general use, and to estimate the performance of this method two corpus where used: the UPA corpus, which in its vocabulary uses a pronunciation familiar to the western part of Mexico, and the Tlatoa corpus, who's vocabulary presents a pronunciation typical of the central region of Mexico. The signals from both corpus where sampled in the Spanish language, and at a sampling frequency of 8kHz. The recognition rate for the mono-speaker from the UPA corpus and the multiple-speaker from the Tlatoa corpus were 96.7% and 93.3% respectively. Additionally, there where comparisons done against two classic methods used for speech recognition, Dynamic Time Warping (DTW) and Hidden Markov Models (HMM).

Palavras-chave : automatic speech recognition; cross-correlation; multilayer perceptron; linear predictive coding.

· resumo em Espanhol · texto em Espanhol · Espanhol (

pdf )