A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel
Abstract
For the relevant data S that nests in the observation
X, the information bottleneck (IB) aims to encode X into Xˆ in
order to maximize the extracted useful information I(S; Xˆ) with
the minimum coding rate I(X; Xˆ). For the dual privacy funnel
(PF) problem where S denotes the sensitive/private data, the
goal is to minimize the privacy leakage I(S; Xˆ) while maintain
a certain level of utility I(X; Xˆ). For both problems, we propose
an efficient iterative agglomerative clustering algorithm based
on the minimization of the difference of submodular functions
(IAC-MDSF). It starts with the original alphabet Xˆ := X and
iteratively merges the elements in the current alphabet Xˆ that
optimizes the Lagrangian function I(S; Xˆ)−λI(X; Xˆ). We prove
that the best merge in each iteration of IAC-MDSF can be
searched efficiently over all subsets of Xˆ by the existing MDSF
algorithms. By varying the value of the Lagrangian multiplier
λ, we obtain the experimental results on a heart disease data
set in terms of the Pareto frontier: I(S; Xˆ) vs. −I(X; Xˆ). We
show that our IAC-MDSF algorithm outperforms the existing
iterative pairwise merge approaches for both PF and IB and is
computationally much less complex
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
2099-12-31