Spatial-aware feature aggregation for cross-view image based geo-localization

Shi, Yujiao; Liu, Liu; Yu, Xin; Li, Hongdong

Spatial-aware feature aggregation for cross-view image based geo-localization

dc.contributor.author	Shi, Yujiao
dc.contributor.author	Liu, Liu
dc.contributor.author	Yu, Xin
dc.contributor.author	Li, Hongdong
dc.coverage.spatial	Vancouver, Canada
dc.date.accessioned	2024-01-21T23:38:47Z
dc.date.created	December 8-14 2019
dc.date.issued	2019
dc.date.updated	2022-10-02T07:17:08Z
dc.description.abstract	Recent works show that it is possible to train a deep network to determine the geographic location of a ground-level image (e.g., a Google street-view panorama) by matching it against a satellite map covering the wide geographic area of interest. Conventional deep networks, which often cast the problem as a metric embedding task, however, suffer from poor performance in terms of low recall rates. One of the key reasons is the vast differences between the two view modalities, i.e., ground view versus aerial/satellite view. They not only exhibit very different visual appearances, but also have distinctive geometric configurations. Existing deep methods overlook those appearance and geometric differences, and instead use a brute force training procedure, leading to inferior performance. In this paper, we develop a new deep network to explicitly address these inherent differences between ground and aerial views. We observe that pixels lying on the same azimuth direction in an aerial image approximately correspond to a vertical image column in the ground view image. Thus, we propose a two-step approach to exploit this prior. The first step is to apply a regular polar transform to warp an aerial image such that its domain is closer to that of a ground-view panorama. Note that polar transform as a pure geometric transformation is agnostic to scene content, hence cannot bring the two domains into full alignment. Then, we add a subsequent spatial-attention mechanism which brings corresponding deep features closer in the embedding space. To improve the robustness of feature representation, we introduce a feature aggregation strategy via learning multiple spatial embeddings. By the above two-step approach, we achieve more discriminative deep representations, facilitating cross-view Geo-localization more accurate. Our experiments on standard benchmark datasets show significant performance boosting, achieving more than doubled recall rate compared with the previous state of the art. Remarkably, the recall rate@top-1 improves from 22.5% in [5] (or 40.7% in [11]) to 89.8% on CVUSA benchmark, and from 20.1% [5] to 81.0% on the new CVACT dataset.	en_AU
dc.description.sponsorship	This research is supported in part by China Scholarship Council (201708320417), the Australia Research Council ARC Centre of Excellence for Robotics Vision (CE140100016), ARC-Discovery (DP 190102261) and ARC-LIEF (190100080), and in part by a research gift from Baidu RAL (ApolloScapes-Robotics and Autonomous Driving Lab). The authors gratefully acknowledge the GPU gift donated by NVIDIA Corporation. We thank all anonymous reviewers for their constructive comments.	en_AU
dc.format.mimetype	application/pdf	en_AU
dc.identifier.uri	http://hdl.handle.net/1885/311666
dc.language.iso	en_AU	en_AU
dc.publisher	Neural Information Processing Systems Foundation	en_AU
dc.relation	http://purl.org/au-research/grants/arc/CE140100016	en_AU
dc.relation	http://purl.org/au-research/grants/arc/DP190102261	en_AU
dc.relation	http://purl.org/au-research/grants/arc/LE190100080	en_AU
dc.relation.ispartofseries	33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019	en_AU
dc.rights	© 2019 Neural Information Processing Systems Foundation	en_AU
dc.source	Advances in Neural Information Processing Systems	en_AU
dc.title	Spatial-aware feature aggregation for cross-view image based geo-localization	en_AU
dc.type	Conference paper	en_AU
local.bibliographicCitation.lastpage	11	en_AU
local.bibliographicCitation.startpage	1	en_AU
local.contributor.affiliation	Shi, Yujiao, College of Engineering and Computer Science, ANU	en_AU
local.contributor.affiliation	Liu, Liu, College of Engineering and Computer Science, ANU	en_AU
local.contributor.affiliation	Yu, Xin, College of Engineering and Computer Science, ANU	en_AU
local.contributor.affiliation	Li, Hongdong, College of Engineering and Computer Science, ANU	en_AU
local.contributor.authoruid	Shi, Yujiao, u6293587	en_AU
local.contributor.authoruid	Liu, Liu, u1013337	en_AU
local.contributor.authoruid	Yu, Xin, u5819038	en_AU
local.contributor.authoruid	Li, Hongdong, u4056952	en_AU
local.description.embargo	2099-12-31
local.description.notes	Imported from ARIES	en_AU
local.description.refereed	Yes
local.identifier.absfor	460306 - Image processing	en_AU
local.identifier.absfor	461103 - Deep learning	en_AU
local.identifier.ariespublication	a383154xPUB14042	en_AU
local.identifier.doi	10.5555/3454287.3455192	en_AU
local.identifier.scopusID	2-s2.0-85090169788
local.publisher.url	https://dl.acm.org/	en_AU
local.type.status	Published Version	en_AU

Downloads

Original bundle

Now showing 1 - 1 of 1

Name:: Spatial-aware feature aggregation for cross-view image based geo-localization.pdf
Size:: 4.02 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

ANU Research Publications