Skip navigation
Skip navigation

A Supervised Statistical Learning Approach for Accurate Legionella pneumophila Source Attribution during Outbreaks

Buultjens, Andrew H.; Chua, Kyra Y. L.; Baines, Sarah L.; Kwong, Jason; Gao, Wei; Cutcher, Zoe; Adcock, Stuart; Ballard, Susan; Schultz, Mark B.; Tomita, Takehiro; Subasinghe, Nela

Description

Public health agencies are increasingly relying on genomics during Legionnaires' disease investigations. However, the causative bacterium (Legionella pneumophila) has an unusual population structure, with extreme temporal and spatial genome sequence conservation. Furthermore, Legionnaires' disease outbreaks can be caused by multiple L. pneumophila genotypes in a single source. These factors can confound cluster identification using standard phylogenomic methods. Here, we show that a statistical...[Show more]

dc.contributor.authorBuultjens, Andrew H.
dc.contributor.authorChua, Kyra Y. L.
dc.contributor.authorBaines, Sarah L.
dc.contributor.authorKwong, Jason
dc.contributor.authorGao, Wei
dc.contributor.authorCutcher, Zoe
dc.contributor.authorAdcock, Stuart
dc.contributor.authorBallard, Susan
dc.contributor.authorSchultz, Mark B.
dc.contributor.authorTomita, Takehiro
dc.contributor.authorSubasinghe, Nela
dc.date.accessioned2019-12-17T01:36:36Z
dc.identifier.issn0099-2240
dc.identifier.urihttp://hdl.handle.net/1885/195639
dc.description.abstractPublic health agencies are increasingly relying on genomics during Legionnaires' disease investigations. However, the causative bacterium (Legionella pneumophila) has an unusual population structure, with extreme temporal and spatial genome sequence conservation. Furthermore, Legionnaires' disease outbreaks can be caused by multiple L. pneumophila genotypes in a single source. These factors can confound cluster identification using standard phylogenomic methods. Here, we show that a statistical learning approach based on L. pneumophila core genome single nucleotide polymorphism (SNP) comparisons eliminates ambiguity for defining outbreak clusters and accurately predicts exposure sources for clinical cases. We illustrate the performance of our method by genome comparisons of 234 L. pneumophila isolates obtained from patients and cooling towers in Melbourne, Australia, between 1994 and 2014. This collection included one of the largest reported Legionnaires' disease outbreaks, which involved 125 cases at an aquarium. Using only sequence data from L. pneumophila cooling tower isolates and including all core genome variation, we built a multivariate model using discriminant analysis of principal components (DAPC) to find cooling tower-specific genomic signatures and then used it to predict the origin of clinical isolates. Model assignments were 93% congruent with epidemiological data, including the aquarium Legionnaires' disease outbreak and three other unrelated outbreak investigations. We applied the same approach to a recently described investigation of Legionnaires' disease within a UK hospital and observed a model predictive ability of 86%. We have developed a promising means to breach L. pneumophila genetic diversity extremes and provide objective source attribution data for outbreak investigations.
dc.format.mimetypeapplication/pdf
dc.language.isoen_AU
dc.publisherAmerican Society for Microbiology
dc.rights© 2017 American Society for Microbiology
dc.sourceApplied and Environmental Microbiology
dc.titleA Supervised Statistical Learning Approach for Accurate Legionella pneumophila Source Attribution during Outbreaks
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume83
dcterms.dateAccepted2017-08-11
dc.date.issued2017-10-17
local.identifier.absfor111706 - Epidemiology
local.identifier.ariespublicationu4485658xPUB871
local.publisher.urlhttps://aem.asm.org
local.type.statusPublished Version
local.contributor.affiliationBuultjens, Andrew H, University of Melbourne
local.contributor.affiliationChua, Kyra Y. L., University of Melbourne
local.contributor.affiliationBaines, Sarah L., University of Melbourne
local.contributor.affiliationKwong, Jason, University of Melbourne
local.contributor.affiliationGao, Wei, University of Melbourne
local.contributor.affiliationCutcher, Zoe, College of Health and Medicine, ANU
local.contributor.affiliationAdcock, Stuart, Victorian Government Department of Health and Human Services
local.contributor.affiliationBallard, Susan, University of Melbourne
local.contributor.affiliationSchultz, Mark B., University of Melbourne
local.contributor.affiliationTomita, Takehiro, University of Melbourne
local.contributor.affiliationSubasinghe, Nela, University of Melbourne
local.description.embargo2037-12-31
local.bibliographicCitation.issue21
local.bibliographicCitation.startpage1
local.bibliographicCitation.lastpage13
local.identifier.doi10.1128/AEM.01482-17
local.identifier.absseo920404 - Disease Distribution and Transmission (incl. Surveillance and Response)
dc.date.updated2019-07-28T08:20:24Z
local.identifier.scopusID2-s2.0-85031700672
local.identifier.thomsonID000413104500011
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Buultjens_A_Supervised_Statistical_2017.pdf2.27 MBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator