SciELO - Scientific Electronic Library Online

 
vol.26 número2A Comparative Study in Machine Learning and Audio Features for Kitchen Sounds RecognitionFuzzy Flower Pollination Algorithm: Comparative Study of Type-1 and Interval Type-2 Fuzzy Logic System in Parameter Adaptation Optimization índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Resumen

SILVA, David et al. Analysis of CNN Architectures for Human Action Recognition in Video. Comp. y Sist. [online]. 2022, vol.26, n.2, pp.623-641.  Epub 10-Mar-2023. ISSN 2007-9737.  https://doi.org/10.13053/cys-26-2-4245.

Every year, new Convolutional Neural Network (CNN) architectures appear to deal with different problems in the activity of image and video recognition. These architectures usually work along the ImageNet dataset for looking for the best performance of the CNNs without taking into account the video task where they are used. This can represent a problem if the task is Human Action Recognition (HAR) in video, since the CNN architectures are pre-trained with an image dataset that can practically contain any object, while HAR problem requires consecutive frames of people doing actions. To prove the idea that using CNNs pre-trained on an image dataset does not always achieve the best performance on a video dataset and that, therefore, it is worth comparing the performance of different CNNs under similar circumstances for the HAR problem, this work proposes an analysis between eight different CNN architectures. Each one of the CNN was exclusively trained with RGB images, which were extracted from the frames of the different classes of videos of HMDB51 dataset. To make the classification of an activity in video, we average the predictions taking into account the successes. We also made some ensembles with the best performance CNNs to measure the improvement in accuracy. Our results suggest that Xception is a strong baseline model that could be used by the community to make their comparisons of their proposals more robust.

Palabras llave : Human action recognition; convolutional neural network; HMDB51.

        · texto en Inglés     · Inglés ( pdf )