DORi: Discovering object relationships for moment localization of a natural language query in a video

Date

Authors

Rodriguez Opazo, Cristian
Marrese-Taylor, Edison
Fernando, Basura
Li, Hongdong
Gould, Stephen

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

This paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach.

Description

Keywords

Citation

Source

Book Title

Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021

Entity type

Access Statement

License Rights

Restricted until

2099-12-31

Downloads

File
Description