Higher-order pooling of cnn features via kernel linearization for action recognition
Date
Authors
Cherian, Anoop
Koniusz, Piotr
Gould, Stephen
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers (IEEE Inc)
Abstract
ost successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higherorder correlations, specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized into kernel feature maps. The HOK descriptors are then generated from the higher-order cooccurrences of these feature maps, and are then used as input to a video-level classifier. We provide experiments on two fine-grained action recognition datasets, and show that our scheme leads to state-of-The-Art results
Description
Keywords
Citation
Collections
Source
Proceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017