DORi: Discovering object relationships for moment localization of a natural language query in a video
dc.contributor.author | Rodriguez Opazo, Cristian | |
dc.contributor.author | Marrese-Taylor, Edison | |
dc.contributor.author | Fernando, Basura | |
dc.contributor.author | Li, Hongdong | |
dc.contributor.author | Gould, Stephen | |
dc.coverage.spatial | Virtual, Waikoloa, HI, USA | |
dc.date.accessioned | 2023-07-20T22:56:46Z | |
dc.date.created | January 5-9, 2021 | |
dc.date.issued | 2021 | |
dc.date.updated | 2022-05-22T08:15:59Z | |
dc.description.abstract | This paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach. | en_AU |
dc.description.sponsorship | This research is supported in part by the Australia Research Council Centre of Excellence for Robotics Vision (CE140100016). | en_AU |
dc.format.mimetype | application/pdf | en_AU |
dc.identifier.isbn | 978-1-6654-0477-8 | en_AU |
dc.identifier.uri | http://hdl.handle.net/1885/294463 | |
dc.language.iso | en_AU | en_AU |
dc.publisher | IEEE | en_AU |
dc.relation | http://purl.org/au-research/grants/arc/CE140100016 | en_AU |
dc.relation.ispartof | Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 | en_AU |
dc.relation.ispartofseries | 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 | en_AU |
dc.rights | © 2021 IEEE | en_AU |
dc.title | DORi: Discovering object relationships for moment localization of a natural language query in a video | en_AU |
dc.type | Conference paper | en_AU |
local.bibliographicCitation.lastpage | 1087 | en_AU |
local.bibliographicCitation.startpage | 1078 | en_AU |
local.contributor.affiliation | Rodriguez Opazo, Cristian, College of Engineering and Computer Science, ANU | en_AU |
local.contributor.affiliation | Marrese-Taylor, Edison, University of Tokyo | en_AU |
local.contributor.affiliation | Fernando, Basura, A*STAR Artificial Intelligence Initiative (A*AI) | en_AU |
local.contributor.affiliation | Li, Hongdong, College of Engineering and Computer Science, ANU | en_AU |
local.contributor.affiliation | Gould, Stephen, College of Engineering and Computer Science, ANU | en_AU |
local.contributor.authoruid | Rodriguez Opazo, Cristian, u5419700 | en_AU |
local.contributor.authoruid | Li, Hongdong, u4056952 | en_AU |
local.contributor.authoruid | Gould, Stephen, u4971180 | en_AU |
local.description.embargo | 2099-12-31 | |
local.description.notes | Imported from ARIES | en_AU |
local.description.refereed | Yes | |
local.identifier.absfor | 460208 - Natural language processing | en_AU |
local.identifier.absfor | 460304 - Computer vision | en_AU |
local.identifier.absfor | 461103 - Deep learning | en_AU |
local.identifier.ariespublication | a383154xPUB24258 | en_AU |
local.identifier.doi | 10.1109/WACV48630.2021.00112 | en_AU |
local.identifier.scopusID | 2-s2.0-85106097711 | |
local.publisher.url | https://www.ieee.org/ | en_AU |
local.type.status | Published Version | en_AU |
Downloads
Original bundle
1 - 1 of 1