A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel

Date

Authors

Ding, Ni
Sadeghi, Parastoo

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

For the relevant data S that nests in the observation X, the information bottleneck (IB) aims to encode X into X in order to maximize the extracted useful information I(S;;X) with the minimum coding rate I(X;;X). For the dual privacy tunnel (PF) problem where S denotes the sensitive/private data, the goal is to minimize the privacy leakage I(S;X) while maintain a certain level of utility I(X;;X). For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet ;X:= X and iteratively merges the elements in the current alphabet\;X that optimizes the Lagrangian function I(S;;X-λ I(X;X). We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of ;X by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;;X) vs.-I(X;;X). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.

Description

Keywords

Citation

Source

Book Title

2019 IEEE Information Theory Workshop, ITW 2019

Entity type

Publication

Access Statement

License Rights

Restricted until