Partially-supervised image captioning
| dc.contributor.author | Anderson, Peter | en |
| dc.contributor.author | Gould, Stephen | en |
| dc.contributor.author | Johnson, Mark | en |
| dc.date.accessioned | 2025-12-29T07:40:35Z | |
| dc.date.available | 2025-12-29T07:40:35Z | |
| dc.date.issued | 2018 | en |
| dc.description.abstract | Image captioning models are becoming increasingly successful at describing the content of images in restricted domains. However, if these models are to function in the wild - for example, as assistants for people with impaired vision - a much larger number and variety of visual concepts must be understood. To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets. Since image labels and object classes can be interpreted as partial captions, we formulate this problem as learning from partially-specified sequence data. We then propose a novel algorithm for training sequence models, such as recurrent neural networks, on partially-specified sequences which we represent using finite state automata. In the context of image captioning, our method lifts the restriction that previously required image captioning models to be trained on paired image-sentence corpora only, or otherwise required specialized model architectures to take advantage of alternative data modalities. Applying our approach to an existing neural captioning model, we achieve state of the art results on the novel object captioning task using the COCO dataset. We further show that we can train a captioning model to describe new visual concepts from the Open Images dataset while maintaining competitive COCO evaluation scores. | en |
| dc.description.sponsorship | This research was supported by a Google award through the Natural Language Understanding Focused Program, CRP 8201800363 from Data61/CSIRO, and under the Australian Research Council's Discovery Projects funding scheme (project number DP160102156). We also thank the anonymous reviewers for their valuable comments that helped to improve the paper. This research was supported by a Google award through the Natural Language Understanding Focused Program, CRP 8201800363 from Data61/CSIRO, and under the Australian Research Council’s Discovery Projects funding scheme (project number DP160102156). We also thank the anonymous reviewers for their valuable comments that helped to improve the paper. | en |
| dc.description.status | Peer-reviewed | en |
| dc.format.extent | 12 | en |
| dc.identifier.issn | 1049-5258 | en |
| dc.identifier.scopus | 85064841019 | en |
| dc.identifier.uri | https://hdl.handle.net/1885/733797270 | |
| dc.language.iso | en | en |
| dc.relation.ispartofseries | 32nd Conference on Neural Information Processing Systems, NeurIPS 2018 | en |
| dc.rights | Publisher Copyright: © 2018 Curran Associates Inc.All rights reserved. | en |
| dc.source | Advances in Neural Information Processing Systems | en |
| dc.title | Partially-supervised image captioning | en |
| dc.type | Conference paper | en |
| dspace.entity.type | Publication | en |
| local.bibliographicCitation.lastpage | 1886 | en |
| local.bibliographicCitation.startpage | 1875 | en |
| local.contributor.affiliation | Anderson, Peter; Macquarie University | en |
| local.contributor.affiliation | Gould, Stephen; School of Computing, ANU College of Systems and Society, The Australian National University | en |
| local.contributor.affiliation | Johnson, Mark; Macquarie University | en |
| local.identifier.ariespublication | u3102795xPUB1743 | en |
| local.identifier.citationvolume | 2018-December | en |
| local.identifier.pure | 324bc4b5-fe82-47a9-b6eb-62ea4a354048 | en |
| local.identifier.url | https://www.scopus.com/pages/publications/85064841019 | en |
| local.type.status | Published | en |