Partially-supervised image captioning

dc.contributor.authorAnderson, Peteren
dc.contributor.authorGould, Stephenen
dc.contributor.authorJohnson, Marken
dc.date.accessioned2025-12-29T07:40:35Z
dc.date.available2025-12-29T07:40:35Z
dc.date.issued2018en
dc.description.abstractImage captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially-specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores.en
dc.description.sponsorshipThis research was supported by a Google award through the Natural Language Understanding Focused Program, CRP 8201800363 from Data61/CSIRO, and under the Australian Research Council's Discovery Projects funding scheme (project number DP160102156). We also thank the anonymous reviewers for their valuable comments that helped to improve the paper. This research was supported by a Google award through the Natural Language Understanding Focused Program, CRP 8201800363 from Data61/CSIRO, and under the Australian Research Council’s Discovery Projects funding scheme (project number DP160102156). We also thank the anonymous reviewers for their valuable comments that helped to improve the paper.en
dc.description.statusPeer-revieweden
dc.format.extent12en
dc.identifier.issn1049-5258en
dc.identifier.scopus85064841019en
dc.identifier.urihttps://hdl.handle.net/1885/733797270
dc.language.isoenen
dc.relation.ispartofseries32nd Conference on Neural Information Processing Systems, NeurIPS 2018en
dc.rightsPublisher Copyright: © 2018 Curran Associates Inc.All rights reserved.en
dc.sourceAdvances in Neural Information Processing Systemsen
dc.titlePartially-supervised image captioningen
dc.typeConference paperen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage1886en
local.bibliographicCitation.startpage1875en
local.contributor.affiliationAnderson, Peter; Macquarie Universityen
local.contributor.affiliationGould, Stephen; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationJohnson, Mark; Macquarie Universityen
local.identifier.ariespublicationu3102795xPUB1743en
local.identifier.citationvolume2018-Decemberen
local.identifier.pure324bc4b5-fe82-47a9-b6eb-62ea4a354048en
local.identifier.urlhttps://www.scopus.com/pages/publications/85064841019en
local.type.statusPublisheden

Downloads