Object Detection with Vocabularies of Space-time Descriptors

Hernandez-Heredia, Y.; González-Linares, J.M.ª; Guil, N.; Ortiz, J.; Hernandez, R.; Cózar, J.R.

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Journal of applied research and technology

versión On-line ISSN 2448-6736versión impresa ISSN 1665-6423

J. appl. res. technol vol.10 no.6 Ciudad de México dic. 2012

Object Detection with Vocabularies of Space-time Descriptors

Y. Hernandez-Heredia*¹, J.M.^a González-Linares³, N. Guil³, J. Ortiz², R. Hernandez¹, J.R. Cózar³

¹ Centro de Geoinformática y Señales Digitales Universidad de las Ciencias Informáticas Cuba, Habana. *yhernandezh@uci.cu.

² Vicerrectoria de Tecnología Universidad de las Ciencias Informáticas Cuba, Habana.

³ Departamento de Arquitectura de Computadores Universidad de Málaga, E.T.S.I. Informática España, Málaga.

ABSTRACT

This paper presents a novel framework for objects detection in security and broadcast videos. Our method assumes that object classes are unknown in advance and exploit the temporal-space properties of the videos for the creation of a vocabulary that describes these classes. Local space-time features have recently became a popular video representation for action recognition and object detection. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes.

In this work we propose the use of different kinds of descriptors for the creation of vocabularies for different detection object task. For a better description of the videos we carry out a background model, tryring to clean up and follow the areas where there are objects. The points of interest in the videos to characterize the objects are calculated with a temporary variant of the famous Harris corner detector. With the descriptors obtained from the points of interest, a vocabulary is realized usingthe kinds of videos we want to train. Then we obtained the frequency histograms between the videos for training and the vocabulary so, with a binary classifier obtain the trained classes and following the same procedure without the vocabulary realized the detection and monitoring of the objects.

The new method presented is also compared with a state of the art method, obtaining better results in both accuracy and false object rejection.

Keywords: object detection, video segmentation, vocabulary, binary classifier.

RESUMEN

Este artículo presenta un método novedoso para la detección de objetos en videos de seguridad y de transmisión de televisión. Nuestro método supone que las clases de objetos son desconocidas por adelantado y explota las propiedades temporales y espaciales de los videos para la creación de un vocabulario que describe estas clases. Las características locales del espacio y el tiempo se han convertido recientemente en una representación popular de los vídeos para el reconocimiento de acciones y la detección objetos. En estudios recientes se han propuesto varios métodos para la localización y descripción de características de videos y han demostrado resultados prometedores de reconocimiento para clases de acción de personas y objetos.

En este trabajo proponemos el uso de diferentes tipos de descriptores para la creación de vocabularios para tareas de detección de objetos diferentes. Para una mejor descripción de los videos generamos el modelo del fondo para tratar de limpiar y seguir las zonas donde están los objetos. Los puntos de interés de los videos para caracterizar a los objetos se calculan con una variante temporal del famoso detector de esquinas Harris. Con los descriptores obtenidos de los puntos de interés se realiza un vocabulario con las clases de videos que se quieran entrenar. Luego se obtienen los histogramas de frecuencia entre los videos de entrenamiento y el vocabulario para con un clasificador binario obtener las clases entrenadas y siguiendo el mismo procedimiento sin el vocabulario realizar la detección y seguimiento de los objetos.

El nuevo método presentado también se compara con propuestas actuales para situaciones similares, obteniendo mejores resultados en la precisión y el rechazo de objetos falsos.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

[1] Yingzi, D. Unsupervised approach to color video thresholding. s.l. : Optical Engineering, 2004. [ Links ]

[2] Alfredo, M. Vision AIBO, ITAM. 2008. [ Links ]

[3] Cipolla, R., et al. Semantic texton forests forimage categorization and segmentation. s.l. : Computer Vision and Pattern Recognition, CVPR08. IEEE Conference, 2008. [ Links ]

[4] Laptev, I. histograms, Improving object detection with boosted. INRIA Rennes, France : Image and Vision Computing, 2009. [ Links ]

[5] Fergus, R., et al. Object class recognition by unsupervised scale-invariant learning. Oxford, UK : Computer Vision and Pattern Recognition, CVPR03 IEEE Computer Society Conference, 2003. [ Links ]

[6] Laptev, I., et al. Learning realistic human actions from movies. Anchorage, Alaska, USA : Computer Vision and Pattern Recognition, CVPR08., 2008. [ Links ]

[7] Burl, M.C., et al. probabilistic approach to object recognition using local photometry and global geometry. s.l. : Computer Vision, ECCV98, 1998. [ Links ]

[8] Weber, M., et al. Unsupervised Learning of Models for Recognition. Dublin, Ireland : Computer Vision - ECCV 2000, 2000. 978-3-540-67685-0. [ Links ]

[9] Laptev, I. and Lindeberg, T. On Space-time interest points. Springer : International Journal of Computer Vision, Kluwer Academic Publishers, 2005. 0920-5691. [ Links ]

[10] Willems, G. and Tuytelaars, T. An efficient dense and scale-invariant spatio-temporal interest point detector. s.l. : Lecture Notes in Computer Science, 2008. 978-3-540-88685-3. [ Links ]

[11] Dollar, P., et al. Behavior recognition via sparse spatio-temporal features. USA : Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 2nd Joint IEEE International Workshop, 2005. [ Links ]

[12] Triggs, B. and Dalal, N. Histograms of oriented gradients for human detection. San Diego, CA, USA : Computer Vision and Pattern Recognition, CVPR05, 2005. [ Links ]

[13] Lazebnik, S., et al. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. New York, NY, USA : Computer Vision and Pattern Recognition, CVPR06, 2006. [ Links ]

[14] Gool, L.V. Bag of visual words model: recognizing object categories. England : Oxford University, 2007. [ Links ]

[15] Wallach, H.M. Topic Modeling: Beyond Bag-of-Words. New York USA : ICML '06 Proceedings of the 23rd international conference on Machine learning, 2006. 1-59593-383-2. [ Links ]

[16] Lempitsky, V. and Gall, J. Class-Specific Hough Forests for Object Detection. Miami, FL, USA : Computer Vision and Pattern Recognition, CVPR09, 2009. [ Links ]

[17] Herbrich, R. Bayes Point Machines. Department of Engineering Mathematics, Bristol University, United Kingdom: The Journal of Machine Learning Research, 2001. [ Links ]

[18] Rüping, S. mySVM - a support vector machine. 2010. [ Links ]

[19] Dragonfly Interactive. Nec Laboratories, INC America. 2008. [ Links ]

[20] Thorsten, J. Support Vector Machine for Complex Outputs. 2008. [ Links ]

[21] Chang, C. and Lin, C. LIBSVM - A Library for Support Vector Machines. s.l. : ACM Transactions on Intelligent Systems and Technology (TIST), 2011. [ Links ]