DORi: Discovering object relationships for moment localization of a natural language query in a video

dc.contributor.authorRodriguez Opazo, Cristian
dc.contributor.authorMarrese-Taylor, Edison
dc.contributor.authorFernando, Basura
dc.contributor.authorLi, Hongdong
dc.contributor.authorGould, Stephen
dc.coverage.spatialVirtual, Waikoloa, HI, USA
dc.date.accessioned2023-07-20T22:56:46Z
dc.date.createdJanuary 5-9, 2021
dc.date.issued2021
dc.date.updated2022-05-22T08:15:59Z
dc.description.abstractThis paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach.en_AU
dc.description.sponsorshipThis research is supported in part by the Australia Research Council Centre of Excellence for Robotics Vision (CE140100016).en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.isbn978-1-6654-0477-8en_AU
dc.identifier.urihttp://hdl.handle.net/1885/294463
dc.language.isoen_AUen_AU
dc.publisherIEEEen_AU
dc.relationhttp://purl.org/au-research/grants/arc/CE140100016en_AU
dc.relation.ispartofProceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021en_AU
dc.relation.ispartofseries2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021en_AU
dc.rights© 2021 IEEEen_AU
dc.titleDORi: Discovering object relationships for moment localization of a natural language query in a videoen_AU
dc.typeConference paperen_AU
local.bibliographicCitation.lastpage1087en_AU
local.bibliographicCitation.startpage1078en_AU
local.contributor.affiliationRodriguez Opazo, Cristian, College of Engineering and Computer Science, ANUen_AU
local.contributor.affiliationMarrese-Taylor, Edison, University of Tokyoen_AU
local.contributor.affiliationFernando, Basura, A*STAR Artificial Intelligence Initiative (A*AI)en_AU
local.contributor.affiliationLi, Hongdong, College of Engineering and Computer Science, ANUen_AU
local.contributor.affiliationGould, Stephen, College of Engineering and Computer Science, ANUen_AU
local.contributor.authoruidRodriguez Opazo, Cristian, u5419700en_AU
local.contributor.authoruidLi, Hongdong, u4056952en_AU
local.contributor.authoruidGould, Stephen, u4971180en_AU
local.description.embargo2099-12-31
local.description.notesImported from ARIESen_AU
local.description.refereedYes
local.identifier.absfor460208 - Natural language processingen_AU
local.identifier.absfor460304 - Computer visionen_AU
local.identifier.absfor461103 - Deep learningen_AU
local.identifier.ariespublicationa383154xPUB24258en_AU
local.identifier.doi10.1109/WACV48630.2021.00112en_AU
local.identifier.scopusID2-s2.0-85106097711
local.publisher.urlhttps://www.ieee.org/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DORi.pdf
Size:
6.02 MB
Format:
Adobe Portable Document Format
Description: