Distribution of mutual information from complete and incomplete data

dc.contributor.authorHutter, Marcus
dc.contributor.authorZaffalon, Marco
dc.date.accessioned2015-09-01T04:41:28Z
dc.date.available2015-09-01T04:41:28Z
dc.date.issued2005-03-01
dc.description.abstractMutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n−3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.en_AU
dc.description.sponsorshipThis research was supported in part by the SNF grants 2000-61847 and 2100-067961.en_AU
dc.identifier.issn0167-9473en_AU
dc.identifier.urihttp://hdl.handle.net/1885/15050
dc.publisherElsevieren_AU
dc.rights© 2004 Elsevier B.V. http://www.sherpa.ac.uk/romeo/issn/0167-9473/..."Author's post-print on open access repository after an embargo period of between 12 months and 48 months" from SHERPA/RoMEO site (as at 1/09/15).en_AU
dc.sourceComputational Statistics & Data Analysisen_AU
dc.subjectDirichlet distributionen_AU
dc.subjectExpectation and variance of mutual informationen_AU
dc.subjectFeature selectionen_AU
dc.subjectFiltersen_AU
dc.subjectNaive Bayes classifieren_AU
dc.subjectBayesian statisticsen_AU
dc.titleDistribution of mutual information from complete and incomplete dataen_AU
dc.typeJournal articleen_AU
dcterms.accessRightsOpen Access
local.bibliographicCitation.issue3en_AU
local.bibliographicCitation.lastpage657en_AU
local.bibliographicCitation.startpage633en_AU
local.contributor.affiliationHutter, M., Research School of Computer Science, The Australian National Universityen_AU
local.contributor.authoremailmarcus.hutter@anu.edu.auen_AU
local.contributor.authoruidu4350841en_AU
local.identifier.citationvolume48en_AU
local.identifier.doi10.1016/j.csda.2004.03.010en_AU
local.identifier.uidSubmittedByu1005913en_AU
local.publisher.urlhttp://www.elsevier.com/en_AU
local.type.statusAccepted Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hutter and Zaffalon Distribution of Mutual Information 2005.pdf
Size:
371.18 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
884 B
Format:
Item-specific license agreed upon to submission
Description: