A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesture Recognition

Avilés-Arriaga, H.H.; Sucar-Succar, L.E.; Mendoza-Durán, C.E.; Pineda-Cortés, L.A.

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Journal of applied research and technology

versión On-line ISSN 2448-6736versión impresa ISSN 1665-6423

J. appl. res. technol vol.9 no.1 Ciudad de México abr. 2011

A Comparison of Dynamic Naive Bayesian Classifiers and Hidden Markov Models for Gesture Recognition

H.H. Avilés–Arriaga*¹, L.E. Sucar–Succar², C.E. Mendoza–Durán³, L.A. Pineda–Cortés⁴

^1,4 Department of Computer Science, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México Circuito Escolar, Ciudad Universitaria, 04510 Mexico City, Mexico *E–mail: haviles@live.com

² Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica, Luis Enrique Erro 1, 72840 Tonantzintla, Mexico.

³ Universidad Anáhuac (México Norte), Av. Universidad Anáhuac, Núm. 46, Col. Lomas Anáhuac, 52786 Huixquilucan, Mexico.

ABSTRACT

In this paper we present a study to assess the performance of dynamic naive Bayesian classifiers (DNBCs) versus standard hidden Markov models (HMMs) for gesture recognition. DNBCs incorporate explicit conditional independence among gesture features given states into HMMs. We show that this factorization offers competitive classification rates and error dispersion, it requires fewer parameters and it improves training time considerably in the presence of several attributes. We propose a set of qualitative and natural set of posture and motion attributes to describe gestures. We show that these posture–motion features increase recognition rates significantly in comparison to motion features. Additionally, an adaptive skin detection approach to cope with multiple users and different lighting conditions is proposed. We performed one of the most extensive experimentation presented in the literature to date that considers gestures of a single user, multiple people and with variations on distance and rotation using a gesture database with 9441 examples of 9 different classes performed by 15 people. Results show the effectiveness of the overall approach and the reliability of DNBCs in gesture recognition.

Keywords: Gesture recognition, hidden Markov models, motion analysis, visual tracking.

RESUMEN

En este documento se compara el desempeño de los clasificadores Bayesianos dinámicos simples (CBDSs) y los modelos ocultos de Markov (MOM) en el reconocimiento visual de ademanes. Los CBDSs extienden a los MOM incorporando suposiciones de independencia condicional entre los atributos dado el estado del modelo. Esta factorización ofrece porcentajes de clasificación y dispersión de error competitivos, un menor número de parámetros para el modelo y una mejora considerable del tiempo de entrenamiento. Para describir los gestos se propone un conjunto de atributos simples de postura y movimiento que incrementan el porcentaje de reconocimiento en comparación a modelos que sólo utilizan información de movimiento. Adicionalmente, se propone un esquema de detección de color de piel adaptativo para considerar diferentes usuarios y condiciones de iluminación. Se describe uno de los conjuntos de experimentos más exhaustivos presentados en la literatura de reconocimiento de gestos hasta el momento que incluyen gestos de un usuario, de diferentes personas, con variaciones de distancia y de rotación. Se presenta también una base de datos con 9441 ejemplos de 9 gestos de 15 personas. Los resultados muestran la efectividad de esta aproximación y la confiabilidad de los CBDSs en el reconocimiento de gestos.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

[1] Starner T., Weaver J. & Pentland A., Real–Time American Sign Language Recognition Using Desk and Wearable Computer–Based Video, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, No. 12, Dec 1998, pp. 1371–1375. [ Links ]

[2] Lee H.K. & Kim J.H., An HMM–Based Threshold Model Approach for Gesture Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 21, No. 10, Oct 1999, pp. 1371–1375. [ Links ]

[3] Inoue M. & Ueda N., Exploitation of Unlabeled Sequences in Hidden Markov Models, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25, No. 12, pp. Dec 2003, pp. 1570–1581. [ Links ]

[4] Pavlovic V., Sharma R. & Huang T.S., Visual Interpretation of Hand Gestures for Human–Computer Interaction: A Review, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, 1997, pp. 677–695. [ Links ]

[5] Domingos P. & Pazzani M., On the Optimality of the Simple Bayesian Classifier under Zero–One Loss, Machine Learning, Vol. 29, No. 2–3, 1997, pp. 103–130. [ Links ]

[6] Friedman N., Geiger D. & Goldszmidt M., Bayesian Network Classifiers, Machine Learning, Vol. 29, No. 2–3, 1997, pp. 131–163. [ Links ]

[7] Avilés H. & Sucar L.E., Dynamic Bayesian networks for visual recognition of dynamic gestures, Journal of Intelligent and Fuzzy Systems, Vol. 12, No. 3–4, 2002, pp. 243–250. [ Links ]

[8] Hannaford B., Multi–dimensional hidden Markov model of Telemanipulation Tasks with Varying Outcomes, Proc. IEEE International Conference on Systems, Man and Cybernetics, 1990, pp. 127–133. [ Links ]

[9] Frasconi P., Soda G. & Vullo A., Text Categorization for Multi–page Documents: A hybrid Naive Bayes HMM Approach, Proc. ACM/IEEE Joint Conference on Digital Libraries, 2001, pp. 11–20. [ Links ]

[10] Pavlovic V., Garg A. & Kasif S., A Bayesian Framework for combining gene predictions, Bioinformatics, Vol. 18, No. 1, 2002, pp. 19–27. [ Links ]

[11] Xiang T. & Gong S., Incremental and adaptive abnormal behaviour detection, Computer Vision and Image Understanding, 2008, pp. 59–73. [ Links ]

[12] Lester J., Choudhury T., Kern N., Borriello G. & Hannaford B., A hybrid discriminative/generative approach for modeling human activities, Proc. Nineteeth International Joint Conference on Artificial Intelligence, 2005, pp. 766–722. [ Links ]

[13] Ahmad M. & Lee S.W., Human action recognition using shape and CLG–motion flow from multi–view image sequences, Pattern Recognition, Vol. 41, No. 7, 2008, pp. 2237–2252. [ Links ]

[14] Palacios M.A., Brizuela C.A. & Sucar L.E., Evolutionary Learning of Dynamic Naive Bayesian Classifiers, Proc. 21th International FLAIRS Conference, 2008, pp. 655–659. [ Links ]

[15] Palacios M.A., Brizuela C.A. & Sucar L.E., Evolutionary Learning of Dynamic Naive Bayesian Classifiers, Journal of Automated Reasoning, Vol. 45 No. 1, 2009, pp. 21–37. [ Links ]

[16] Avilés H., Sucar L.E. & Mendoza C.E., Visual Recognition of Similar Gestures, 18th International Conference on Pattern Recognition, 2006, pp. 1100–1103. [ Links ]

[17] Rabiner L.E., A tutorial on hidden Markov models and selected applications in speech recognition, Readings in speech recognition, Alex Waibel, Kai–Fu Lee Editors, Morgan Kaufmann, 1990, pp. 267–296. [ Links ]

[18] Wilson A. & Bobick A., Using Hidden Markov Models to Model and Recognize Gesture Under Variation, International Journal on Pattern Recognition and Artificial Intelligence, Special Issue on Hidden Markov Models in Computer Vision, Vol. 15, No. 1, 2000, pp. 123–160. [ Links ]

[19] Brand M., Olivier N. & Pentland A., Coupled hidden markov models for complex action recognition, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 1999, pp. 994–999. [ Links ]

[20] Marcel S., Bernier O., Viallet J.E. & Collobert D., Hand gesture recognition using input–output hidden Markov models, Proc. Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000, pp. 456–461. [ Links ]

[21] Vogler C. & Metaxas D.N., Parallel Hidden Markov Models for American Sign Language Recognition, Proc. International Conference on Computer Vision, 1999, pp. 116–122. [ Links ]

[22] Chambers G.S., Venkatesh S., West G.A.W. & Bui H.H., Hierarchical recognition of intentional human gestures for sports video annotation, Proc. 16th International Conference on Pattern Recognition, Vol. 2, 2002, pp. 1082–1085. [ Links ]

[23] Pavlovic V., Frey B.J. & Huang T.S., Variational learning in mixed–state dynamic graphical models, Proc. Uncertainty in Artificial Intelligence (UAI), 1999, pp. 522–530. [ Links ]

[24] Natarajan P. & Nevatia R., Hierarchical Multi–channel Hidden Semi Markov Models, Proc. International Joint Conference on Artificial Intelligence (IJCAI'07), 2007, pp. 2562–2567. [ Links ]

[25] Duong T., Bui H.H., Phung D.Q. & Venkatech S., Activity Recognition and Abnormality Detection with the Switching Hidden semi–Markov Model, Proc. 9th IEEE International Conference on Computer Vision, Vol.1, 2005, pp. 838–845. [ Links ]

[26] Artieres T., Marukatat S. & Gallinari P., Online Handwritten Shape Recognition Using Segmental Hidden Markov Models, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 29, No. 2, Feb 2007. pp. 205–217. [ Links ]

[27] Cassandra A.R., Kaelbling L.P. & Littman M.L., Acting optimally in partially observable stochastic domains, Proc. Twelfth National Conference on Artificial Intelligence (AAAI), Vol. 2, 1994, pp. 1023–1028. [ Links ]

[28] Hoey J. & Little J.J., Value–Directed Human Behavior Analysis from Video Using Partially Observable Markov Decision Processes, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 29, No. 7, Jul 2007, pp. 1118–1132. [ Links ]

[29] Rubine D., Specifying Gesture by Example, Computer Graphics, Vol. 25, No. 4, July 1991, pp. 329–337. [ Links ]

[30] Mardia K.V., Ghali N.M., Hainsworth T.J., Howes M. & Sheehy N., Techniques for online gesture recognition on workstations, Image and Vision Computing, Vol. 11, No. 5, 1993, pp. 283–294. [ Links ]

[31] Montero J.A. & Sucar L.E., Feature Selection for Visual Gesture Recognition Using Hidden Markov Models, Proc. Fifth Mexican International Conference in Computer Science, (ENC'04), 2004, pp. 1–8. [ Links ]

[32] Cui Y., Swets D. & Weng J., Learning–based hand sign recognition using SHOSLIF–16, Proc. 5th Int. Conf. Computer Vision, 1995, pp. 631–636. [ Links ]

[33] Matthews I., Cootes T.F., Bangham J.A., Cox S. & Harvey R., Extraction of Visual Features for Lipreading, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, No. 2, Feb 2002, pp. 198–213. [ Links ]

[34] Shanableh T., Assaleh K. & Al–Rousan M., SpatioTemporal Feature–Extraction Techniques for Isolated Gesture Recognition in Arabic Sign Language, IEEE Trans. Systems, Man, and Cybernetics–Part B: Cybernetics, Vol. 37, No. 3, June 2007, pp. 641–650. [ Links ]

[35] Johansson G., Visual Perception of Biological Motion and a model for its analysis, Perception and Psychophysics, Vol. 14, No. 2, 1973, pp. 201–211. [ Links ]

[36] Webb J.A. & Aggarwal J.K., Structure from motion from rigid and jointed objects, Artificial Intelligence, Vol. 19, No. 1, 1982, pp. 107–130. [ Links ]

[37] Shah M., Understanding human behavior from motion imagery, Machine Vision and Applications, Vol. 14, No. 1, 2003, pp. 210–214. [ Links ]

[38] Giese M.A. & Poggio T., Morphable Models for the Analysis and Synthesis of Complex Motion Patterns, International Journal of Computer Vision, Vol. 38, No. 1, 2000, pp. 59–73. [ Links ]

[39] Bobick A.F. & Davis J.W., The recognition of human movement using temporal templates, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, Mar 2001, pp. 257–267. [ Links ]

[40] Waldherr S., Gesture Recognition on a Mobile Robot, Diploma thesis, Carnegie Mellon University. School of Computer Science, 1998. [ Links ]

[41] Beintema J.A. & Lappe M., Perception of Biological motion without local image motion, Proc. of the National Academy of Sciences, Vol. 4, No. 8, 2002, pp. 56615663. [ Links ]

[42] Sigala R., Serre T., Poggio T. & Giese M., Learning Features of Intermediate Complexity for the Recognition of Biological Motion, International Conference on Artificial Neural Networks (ICANN), 2005, pp. 241–246. [ Links ]

[43] Casile A. & Giese M., Roles of motion and form in biological motion and recognition, International Conference on Artificial Neural Networks (ICANN), 2003, pp. 854–862. [ Links ]

[44] Casile A. & Giese M., Critical features for the recognition of biological motion, Journal of Vision, Vol. 5, 2005, pp. 348–360. [ Links ]

[45] Stokoe W., Sign Language Structure, University Buffalo Press, 1960. [ Links ]

[46] Just A., Bernier O. & Marcel S., Recognition of Isolated Complex Mono and BiManual 3D Hand Gestures, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 571–577. [ Links ]

[47] Ren H., Xu G. & Kee S.C., Subject–independent Natural Action Recognition, Proc. Sixth IEEE Conference on Automatic Face and Gesture Recognition, 2004, pp. 523–528. [ Links ]

[48] Corradini A. & Gross H.M., Camera–based Gesture Recognition for Robot Control, IEEE–INNS–ENNS International Joint Conference on Neural Networks, Vol. 4, 2000, pp. 133–138. [ Links ]

[49] Campbell L.W. , Becker A.D., Azarbayejani A., Bobick A.F. & Pentland A., Invariant features for 3–D Gesture Recognition, Technical report 379, M.I.T. Media Laboratory Perceptual Computing Section, 1996. [ Links ]

[50] Vogler C. & Metaxas D., ASL Recognition based on a Coupling Between HMMs and 3D Motion Analysis, Proc. International Conference on Computer Vision (ICCV'98), 1998, pp. 363–369. [ Links ]

[51] Ahmad M. & Lee S.W., Human Action Recognition Using Multi–View Image Sequences Features, Seventh International Conference on Automatic Face and Gesture Recognition, pp. 523–528, 2006. [ Links ]

[52] Baum L.E., Petrie T., Soules G. & Weiss N., A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Stat., Vol. 41, No. 1, 1970, pp. 164–171. [ Links ]

[53] Bilmes J.A., A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models, U.C. Berkeley, TR–97–021, http://citeseer.ist.psu.edu/1570.html, 1998. [ Links ]

[54] Rabiner L. & Juang B.H., Fundamentals on Speech Recognition, Prentice–Hall Signal Processing Series, New Jersey, 1993. [ Links ]

[55] Viola P.A. & Jones M.J., Robust Real–time Object Detection, International Journal of Computer Vision, Vol. 57, No. 2, May 2004, pp. 137–154. [ Links ]

[56] Azpeitia L.G.G., Con la Vara que Midas. Universidad de Colima, Colima, México. 1987. (In Spanish). [ Links ]

[57] Jones M.J. & Rehg J.M., Statistical Color Models with Application to Skin Detection, Technical report CRL–98/11, Cambridge Research Laboratory, 1996. [ Links ]

[58] Avilés H. & Sucar L.E., Real–Time Visual Recognition of Dynamic Arm Gestures, Video–Based Surveillance Systems: Computer Vision and Distributed Processing, P. Remagnino, P., G.A. Jones, N, Paragios, C.S. Regazzoni, Editors, Kluwer Academic, 2002, pp. 227–238. [ Links ]

[59] Manyika J. & Durrant–Whyte H., Data Fusion and Sensor Management: A descentralized Information–Theoretic Approach, Ellis Horwood, NY–London, 1994. [ Links ]

[60] Bradski G.R., Real Time Face and Object Tracking as a Component of a Perceptual User Interface, Proc. 4th IEEE Workshop on Applications of Computer Vision (WACV'98), 1998, pp. 214–219. [ Links ]

[61] Kanungo T., Hidden Markov Models Software, Available at: http://www.kanungo.com/. Last retrieved: May 26, 2008 [ Links ]

[62] Kendon A., An agenda for gesture studies, Semiotic Review of Books, Vol. 7 No. 3, pp. 8–12, 1996. Available at: http://www.univie.ac.at/Wissenschaftstheorie/srb/srb/gesture.html. [ Links ]

[63] Yang H.D., Park A.Y. & Lee S.W., Gesture Spotting and Recognition for Human–Robot Interaction, IEEE Trans. in Robotics, Vol. 23, No. 2, Apr 2007, pp. 256–279. [ Links ]

[64] Elmezain M., Al Hamadi A., Appenrodt J. & Michaelis B., A Hidden Markov Model–based continuous gesture recognition system for hand motion trajectory, 19th International Conference on Pattern Recognition, 2008, pp. 1–4. [ Links ]

[65] Son R.J.J.H. van, The Relation Between the Error Distribution and the Error Rate in Identification Experiments, Proc. European Conference on Speech Communication and Technology, 1995, pp. 2277–2280. [ Links ]

[66] Shannon C.E., A Mathematical Theory of Communication, Bell System Technical Journal, Vol. 27, 1948, pp. 379–423 and 623–656. [ Links ]

[67] Wu Y. & Huang T.S., Vision–Based Gesture Recognition: A Review, Gesture–Based Communication in Human–Computer Interaction, A. Camurri, G. Volpe, Springer Berlin / Heidelberg, Vol. 1739/1999, 1999, pp. 103–115. [ Links ]

[68] Parameswaran V. & Chellappa R., Human action–recognition using mutual invariants, Computer Vision and Image Understanding, Vol. 98, 2005, pp. 295–325. [ Links ]

[69] Friedman N., Murphy K. & Russell S., Learning the Structure of Dynamic Probabilistic Networks, Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI), 1998, pp. 139–147. [ Links ]

[70] Bressan M. & Vitria J., On the Selection and Classification of Independent Features, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25, No. 10, Oct 2003, pp. 1312–1317. [ Links ]