Skip navigation
Skip navigation

Discriminatively Learned Hierarchical Rank Pooling Networks

Fernando, Basura; Gould, Stephen

Description

Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a single vector which has shown good results in human action recognition in prior work. In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present discriminative rank pooling in which the shared weights of our video representation and the parameters of the action...[Show more]

dc.contributor.authorFernando, Basura
dc.contributor.authorGould, Stephen
dc.date.accessioned2020-12-20T20:56:32Z
dc.date.available2020-12-20T20:56:32Z
dc.identifier.issn0920-5691
dc.identifier.urihttp://hdl.handle.net/1885/217981
dc.description.abstractRank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a single vector which has shown good results in human action recognition in prior work. In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present discriminative rank pooling in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequences using a bilevel optimization formulation of the learning problem. When the frame level features vectors are obtained from a convolutional neural network (CNN), we rank pool the network activations and jointly estimate all parameters of the model, including CNN filters and fully-connected weights, in an end-to-end manner which we coined as end-to-end trainable rank pooled CNN. Importantly, this model can make use of any existing convolutional neural network architecture (e.g., AlexNet or VGG) without modification or introduction of additional parameters. Then, we extend rank pooling to a high capacity video representation, called hierarchical rank pooling. Hierarchical rank pooling consists of a network of rank pooling functions, which encode temporal semantics over arbitrary long video clips based on rich frame level features. By stacking non-linear feature functions and temporal sub-sequence encoders one on top of the other, we build a high capacity encoding network of the dynamic behaviour of the video. The resulting video representation is a fixed-length feature vector describing the entire video clip that can be used as input to standard machine learning classifiers. We demonstrate our approach on the task of action and activity recognition. We present a detailed analysis of our approach against competing methods and explore variants such as hierarchy depth and choice of non-linear feature function. Obtained results are comparable to state-of-the-art methods on three important activity recognition benchmarks with classification performance of 76.7% mAP on Hollywood2, 69.4% on HMDB51, and 93.6% on UCF101.
dc.description.sponsorshipThis research was supported by the Australian Research Council Centre of Excellence for Robotic Vision (project number CE140100016).
dc.format.mimetypeapplication/pdf
dc.language.isoen_AU
dc.publisherSpringer
dc.rights© 2017 Springer Science+Business Media, LLC
dc.sourceInternational Journal of Computer Vision
dc.subjectRank pooling
dc.subjectAction recognition
dc.subjectActivity recognition
dc.subjectConvolutional neural networks
dc.titleDiscriminatively Learned Hierarchical Rank Pooling Networks
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume124
dc.date.issued2017
local.identifier.absfor091599 - Interdisciplinary Engineering not elsewhere classified
local.identifier.ariespublicationa383154xPUB7347
local.type.statusAccepted Version
local.contributor.affiliationFernando, Basura, College of Engineering and Computer Science, ANU
local.contributor.affiliationGould, Stephen, College of Engineering and Computer Science, ANU
local.bibliographicCitation.issue3
local.bibliographicCitation.startpage335
local.bibliographicCitation.lastpage355
local.identifier.doi10.1007/s11263-017-1030-x
dc.date.updated2020-11-23T10:26:03Z
local.identifier.scopusID2-s2.0-85021248197
local.identifier.thomsonID000407961700005
dcterms.accessRightsOpen Access
dc.provenancehttps://v2.sherpa.ac.uk/id/publication/13404..."Author Accepted Manuscript can be made open access on institutional repository after 12 month embargo" from SHERPA/RoMEO site (as at 4.11.2021).
CollectionsANU Research Publications

Download

File Description SizeFormat Image
1705.10420.pdfAuthor Accepted Manuscript5.09 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator