Higher-order pooling of cnn features via kernel linearization for action recognition

Date

Authors

Cherian, Anoop
Koniusz, Piotr
Gould, Stephen

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE Inc)

Abstract

ost successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higherorder correlations, specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized into kernel feature maps. The HOK descriptors are then generated from the higher-order cooccurrences of these feature maps, and are then used as input to a video-level classifier. We provide experiments on two fine-grained action recognition datasets, and show that our scheme leads to state-of-The-Art results

Description

Keywords

Citation

Source

Proceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017

Book Title

Entity type

Access Statement

License Rights

Restricted until