DORi: Discovering object relationships for moment localization of a natural language query in a video
Date
Authors
Rodriguez Opazo, Cristian
Marrese-Taylor, Edison
Fernando, Basura
Li, Hongdong
Gould, Stephen
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
This paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
Entity type
Access Statement
License Rights
Restricted until
2099-12-31
Downloads
File
Description