SciELO - Scientific Electronic Library Online

 
vol.17 issue4Handling the Multi-Class Imbalance Problem using ECOCAnalog Processing based on Quasi-Infinite Resistors author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.4 México Oct./Dec. 2013

 

Artículos regulares

 

3D Modeling of the Mexican Sign Language for a Speech-to-Sign Language System

 

Modelado 3D del lenguaje de señas mexicano para un sistema de voz-a-lenguaje de señas

 

Santiago-Omar Caballero-Morales, Felipe Trujillo-Romero

 

Postgraduate Division, Technological University of the Mixteca, Oaxaca, Mexico. scaballero@mixteco.utm.mx, ftrujillo@mixteco.utm.mx

 

Article received on 15/10/2012
Accepted 21/06/2013

 

Abstract

There are many people with communication impairments, deafness being one of the most common of them. Deaf people use Sign Language (SL) to communicate, and translation systems (Speech/Text-to-SL) have been developed to assist such communication. However, since SLs are dependent of countries and cultures, there are differences between grammars, vocabularies, and signs, even if these come from places with similar spoken languages. In Mexico, work in this field is very limited, so any development must consider the characteristics of the Mexican-Sign-Language (MSL). In this paper, we present a new approach to creating a Mexican Speech-to-SL system, integrating 3D modeling of the MSL with a multi-user Automatic Speech Recognizer (ASR) with dynamic adaptation. The 3D models (avatar) were developed by means of motion capture of a MSL performer. Kinect was used as a 3D sensor for the motion capture process, and DAZ Studio 4 was used for its animation. The multi-user ASR was developed using the HTK and Matlab as the programming platform for a Graphical User Interface (GUI). Experiments with a vocabulary set of 199 words were performed to validate the system. An accuracy of 96.2% was achieved for the ASR and interpretation into MSL of 70 words and 20 spoken sentences. The 3D avatar presented clearer realizations than those of standard video recordings of a human MSL performer.

Keywords: Mexican sign language, automatic speech recognition, human-computer interaction.

 

Resumen

Hay muchas personas con problemas para comunicarse, siendo la sordera una de las más comunes. Personas con este problema hacen uso de Lenguaje de Señas (LSs) para comunicarse, y sistemas de traducción (Voz/Texto-a-LS) se han desarrollado para asistir a esta tarea. Sin embargo, porque los LSs son dependientes de países y culturas, hay diferencias entre gramáticas, vocabularios y señas, incluso si estos provienen de lugares con lenguajes hablados similares. En México, el trabajo es muy limitado en este campo, y cualquier desarrollo debe considerar las características del Lenguaje de Señas Mexicano (LSM). En este artículo, presentamos nuestro enfoque para un sistema de Voz-a-LS Mexicano, integrando el modelado 3D del LSM con un Reconocedor Automático de Voz (RAV) multi-usuario con adaptación dinámica. Los modelos 3D (avatar) fueron desarrollados por medio de captura de movimiento de un signante del LSM. Kinect fue usado como un sensor 3D para el proceso de captura de movimiento, y DAZ Studio 4 fue usado para su animación. El RAV multi-usuario fue desarrollado usando HTK y Matlab fue la plataforma de programación para la Interfaz Gráfica de Usuario (GUI). Experimentos con un vocabulario de 199 palabras fueron realizados para validar el sistema. Una precisión del 96.20% fue obtenida para el RAV e interpretación en vocabulario del LSM de 70 palabras y 2o frases habladas. Las realizaciones del avatar 3D fueron más claras que aquellas de grabaciones de video de un signante humano del LSM.

Palabras clave: Lenguaje de señas mexicano, reconocimiento automático de voz, interacción humano-computadora.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

1. Nuance Communications, Inc. (2012). Dragon Speech Recognition Software. Retrieved from http://www.nuance.com/dragon/index.htm.         [ Links ]

2. IBM (2012). WebSphere Voice. Retrieved from http://www-01.ibm.com/software/voice/.         [ Links ]

3. Lavie, A., Waibel, A., Levin, L., Finke, M., Gates, D., Gavalda, M., Zeppenfeld, T., & Zhan, P. (1997). JANUS III: Speech-To-Speech Translation In Multiple Languages. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-97), Munich, Germany, 1, 99-102.         [ Links ]

4. Parker, M., Cunningham, S., Enderby, P., Hawley, M., & Green, P. (2006). Automatic speech recognition and training for severely dysarthric users of assistive technology: The STARDUST project. Clinical Linguistics and Phonetics, 20(2-3), 149-156.         [ Links ]

5. Dalby, J. & Kewley-Port, D. (1999). Explicit Pronunciation Training Using Automatic Speech Recognition Technology. Computer-Assisted Language Instruction Consortium (CALICO) Journal, 16(3), 425-445.         [ Links ]

6. Rosetta Stone (2012). Rosetta Stone Version 4 TOTALe. Retrieved from http://www.rosettastone.com/learn-spanishLinks ]rosettastone.com/learn-spanish">.

7. Cox, S., Lincoln, M., Nakisa, M., Wells, M., Tutt, M., & Abbott, S. (2003). The Development and Evaluation of a Speech to Sign Translation System to Assist Transactions. International Journal of Human Computer Interaction, 16(2), 141 -161.         [ Links ]

8. San-Segundo, R., Barra, R., D'Haro, L.F., Montero, J.M., Córdoba, R., & Ferreiros, J. (2006). A spanish speech to sign language translation system for assisting deaf-mute people. Ninth International Conference on Spoken Language Processing (INTERSPEECH 2006-ICSLP), Pittsburgh, PA, USA, 1399-1402.         [ Links ]

9. Baldassarri, S., Cerezo, E., & Royo-Santas, F. (2009). Automatic Translation System to Spanish Sign Language with a Virtual Interpreter. Human-Computer Interaction - INTERACT 2009, Lecture Notes in Computer Science, 5726, 196-199.         [ Links ]

10. López-Colino, F. & Colás, J. (2011). The Synthesis of LSE Classifiers: From Representation to Evaluation. Journal of Universal Computer Science, 17(3), 399-425.         [ Links ]

11. Massó, G. & Badia, T. (2010). Dealing with Sign Language Morphemes in Statistical Machine Translation. 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies, Valletta, Malta, 154-157.         [ Links ]

12. Calvo, M.T. (2004). Diccionario Español - Lengua de Señas Mexicana (DIELSEME): estudio introductorio. Dirección de Educación Especial: México.         [ Links ]

13. Saldivar-Piñon, L., Chacon-Murguia, M., Sandoval-Rodriguez, R., & Vega-Pineda, J. (2012). Human Sign Recognition for Robot Manipulation. Pattern Recognition, Lecture Notes in Computer Science, 7329, 107-116.         [ Links ]

14. Rios, D. & Schaeffer, S. (2012). A Tool for Hand-Sign Recognition. Pattern Recognition, Lecture Notes in Computer Science, 7329, 137-146.         [ Links ]

15. Clymer, E., Geigel, J., Behm, G., & Masters, K. (2012). Use of Signing Avatars to Enhance Direct Communication Support for Deaf and Hard-of-Hearing Users. National Technical Institute for the Deaf (NTID), Rochester Institute of Technology, United States.         [ Links ]

16. Microsoft Co. (2012). Kinect for Windows. Retrieved from http://www.microsoft.com/enus/kinectforwindows/.         [ Links ]

17. Albrecht, I., Haber, J., & Seidel, H.P. (2003). Construction and Animation of Anatomically Based Human Hand Models. 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, San Diego, CA, USA, 98109.         [ Links ]

18. Bretzner, L., Laptev, I., & Lindeberg, T. (2002). Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering. Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, 423-428.         [ Links ]

19. Oikonomidis, I., Kyriazis, N., & Argyros, A.A. (2011). Efficient model-based 3D tracking of hand articulations using Kinect. Proceedings of the British Machine Vision Conference (BMVC 2011), Dundee, UK, (101.1-101.11).         [ Links ]

20. Trigo, T.R. & Pellegrino, S.R. (2010). An analysis of features for hand-gesture classification. 17th International Conference on Systems, Signals and Image Processing (IWSSIP 2010), Rio de Janeiro, Brazil, 412-415.         [ Links ]

21. DAZ Productions (2012). DAZ Studio 4.5. Retrieved from http://www.daz3d.com/daz-studio-4-pro/.         [ Links ]

22. Bonilla, G. (2012). Interfax de Voz para Personas con Disartria. Tesis Ingeniero en Computación, Universidad Tecnológica de la Mixteca (UTM), Huajuapan, Oaxaca, Mexico.         [ Links ]

23. Jurafsky, D. & Martin, J.H. (2009). Speech and Language Processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, N.J.: Pearson Prentice Hall.         [ Links ]

24. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev., V & Woodland, P. (2006). The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department: Cambridge, UK.         [ Links ]

25. Leggetter, C.J. & Woodland, P.C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9(2), 171-185.         [ Links ]

26. Cuétara, J.O. (2004). Fonética de la Ciudad de México. Aportaciones desde las tecnologías del habla. Maestro en Lingüistica Aplicada, Universidad Nacional Autónoma de México (UNAM), México, D.F.         [ Links ]

27. Pineda, L.A., Villaseñor, L., Cuétara, J., Castellanos, H., & López, I. (2004). DIMEx100: A new phonetic and speech corpus for Mexican Spanish. Advances in Artificial Intelligence (IBERAMIA 2004), Lecture Notes in Computer Science, 3315, 974-983.         [ Links ]

28. Pineda, L.A., Castellanos, H., Cuétara, J., Galescu, L., Juárez, J., Llisterri, J., Pérez, P., & Villaseñor, L. (2010). The corpus dimex100: Transcription and evaluation. Language Resources and Evaluation, 44(4), 347-370.         [ Links ]

29. Trujillo-Romero, F. & Caballero-Morales, S.O. (2012). Towards the Development of a Mexican Speech-to-Sign-Language Translator for the Deaf Community. Acta Universitaria, 22(NE-1), 83-89.         [ Links ]

30. Sjolander, K. & Beskow, J. (2006). Wavesurfer. Retrieved from http://www.speech.kth.se/wavesurfer/.         [ Links ]

31. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.         [ Links ]

32. National Institute of Standards and Technology (NIST) (s.f.). The History of Automatic Speech Recognition Evaluations at NIST. Retrieved from http://www.itl.nist.gov/iad/mig/publications/ASRhistory/.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License