Skip navigation
Skip navigation

Prediction of properties from simulations: a re-examination with modern statistical methods

Mansson, R A; Frey, J G; Essex, J W; Welsh, Alan

Description

We discuss models fit to data collected by Duffy and Jorgensen to predict solvation free energies and partition equilibria of drugs, organic molecules, aromatic heterocycles, and other molecules. These data were originally examined using linear regression, but here more recently developed statistical models are applied. The data set is complicated due to the presence of discrepant observations and also curvature in the response. In some cases it is possible to discard a small number of the...[Show more]

dc.contributor.authorMansson, R A
dc.contributor.authorFrey, J G
dc.contributor.authorEssex, J W
dc.contributor.authorWelsh, Alan
dc.date.accessioned2015-12-13T23:04:19Z
dc.identifier.issn1549-9596
dc.identifier.urihttp://hdl.handle.net/1885/85319
dc.description.abstractWe discuss models fit to data collected by Duffy and Jorgensen to predict solvation free energies and partition equilibria of drugs, organic molecules, aromatic heterocycles, and other molecules. These data were originally examined using linear regression, but here more recently developed statistical models are applied. The data set is complicated due to the presence of discrepant observations and also curvature in the response. In some cases it is possible to discard a small number of the observations to get good fit to the data, but, in others, discarding an increasing proportion of the observations does not improve the fit. Our general preference is to use robust parameter estimation which downweights to reduce the influence of discrepant observations on the fitted models. Models are selected for four responses using linear or more complicated representations of the explanatory variables, such as cubic polynomials, B-splines, or smoothers via generalized additive models (GAMs). Variables are chosen using the traditional approach of formal tests to assess contribution to the fit of a model, and resampling methods including bootstrap are also considered to assess the prediction error for given models. Results of our analysis indicate that GAMs are an improvement on linear models for describing the data and making predictions. In general robust regression models and GAMs have the smallest conditional expected loss of prediction over the four responses. In addition, robust regression models offer the advantage of identifying molecules that perform poorly in the fit. In general, models were identified that yielded an improvement of approximately 50% in the conditional expected loss of prediction compared with the original parametrization of Duffy and Jorgensen. It was also found that the use of cross-validation to compare models was unreliable, and bootstrapping is preferred.
dc.publisherAmerican Chemical Society
dc.sourceJournal of Chemical Information and Modeling
dc.subjectFree energy
dc.subjectMolecular structure
dc.subjectParameter estimation
dc.subjectRegression analysis
dc.subjectStatistical methods
dc.subjectData collection
dc.subjectLinear models
dc.subjectOrganic molecules
dc.subjectStatsitical models
dc.subjectComputer simulation
dc.titlePrediction of properties from simulations: a re-examination with modern statistical methods
dc.typeJournal article
local.description.notesImported from ARIES
local.description.refereedYes
local.identifier.citationvolume45
dc.date.issued2005
local.identifier.absfor010401 - Applied Statistics
local.identifier.ariespublicationMigratedxPub13662
local.type.statusPublished Version
local.contributor.affiliationMansson, R A, University of Southampton
local.contributor.affiliationFrey, J G, University of Southampton
local.contributor.affiliationEssex, J W, University of Southampton
local.contributor.affiliationWelsh, Alan, College of Physical and Mathematical Sciences, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.startpage1791
local.bibliographicCitation.lastpage1803
local.identifier.doi10.1021/ci050056i
dc.date.updated2015-12-12T07:55:22Z
local.identifier.scopusID2-s2.0-28944441116
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Mansson_Prediction_of_properties_from_2005.pdf220.9 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator