Exploring Predicate Visual Context in Detecting of Human-Object Interactions

dc.contributor.authorZhang, Frederic Z.en
dc.contributor.authorYuan, Yuhuien
dc.contributor.authorCampbell, Dylanen
dc.contributor.authorZhong, Zhuoyaoen
dc.contributor.authorGould, Stephenen
dc.date.accessioned2025-05-23T17:30:22Z
dc.date.available2025-05-23T17:30:22Z
dc.date.issued2023en
dc.description.abstractRecently, the DETR framework has emerged as the dominant approach for human-object interaction (HOI) research. In particular, two-stage transformer-based HOI detectors are amongst the most performant and training-efficient approaches. However, these often condition HOI classification on object features that lack fine-grained contextual information, eschewing pose and orientation information in favour of visual cues about object identity and box extremities. This naturally hinders the recognition of complex or ambiguous interactions. In this work, we study these issues through visualisations and carefully designed experiments. Accordingly, we investigate how best to re-introduce image features via cross-attention. With an improved query design, extensive exploration of keys and values, and box pair positional embeddings as spatial guidance, our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks, while maintaining low training cost.en
dc.description.statusPeer-revieweden
dc.format.extent11en
dc.identifier.isbn9798350307184en
dc.identifier.issn1550-5499en
dc.identifier.otherORCID:/0000-0002-4717-6850/work/162053200en
dc.identifier.scopus85179112613en
dc.identifier.urihttp://www.scopus.com/inward/record.url?scp=85179112613&partnerID=8YFLogxKen
dc.identifier.urihttps://hdl.handle.net/1885/733752826
dc.language.isoenen
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en
dc.relation.ispartofProceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023en
dc.relation.ispartofseries2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023en
dc.relation.ispartofseriesProceedings of the IEEE International Conference on Computer Visionen
dc.rightsPublisher Copyright: © 2023 IEEE.en
dc.titleExploring Predicate Visual Context in Detecting of Human-Object Interactionsen
dc.typeConference paperen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage10387en
local.bibliographicCitation.startpage10377en
local.contributor.affiliationZhang, Frederic Z.; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationYuan, Yuhui; Microsoft USAen
local.contributor.affiliationCampbell, Dylan; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationZhong, Zhuoyao; Microsoft USAen
local.contributor.affiliationGould, Stephen; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.identifier.ariespublicationa383154xPUB47170en
local.identifier.doi10.1109/ICCV51070.2023.00955en
local.identifier.pure1067dbcb-8175-4c5f-a136-381b7532092ben
local.identifier.urlhttps://www.scopus.com/pages/publications/85179112613en
local.type.statusPublisheden

Downloads