Accurate Synthetic Generation of Realistic Personal Information
Date
2009
Authors
Christen, Peter
Pudjijono, Agus
Journal Title
Journal ISSN
Volume Title
Publisher
Springer
Abstract
A large portion of data collected by many organisations today is about people, and often contains personal identifying information, such as names and addresses. Privacy and confidentiality are of great concern when such data is being shared between organisations or made publicly available. Research in (privacy-preserving) data mining and data linkage is suffering from a lack of publicly available real-world data sets that contain personal information, and therefore experimental evaluations can be difficult to conduct. In order to overcome this problem, we have developed a data generator that allows flexible creation of synthetic data containing personal information with realistic characteristics, such as frequency distributions, attribute dependencies, and error probabilities. Our generator significantly improves earlier approaches, and allows the generation of data for individuals, families and households.
Description
Keywords
Keywords: Artificial data; Data linkage; Data matching; Error probabilities; Experimental evaluation; Frequency distributions; Personal information; Privacy; Privacy preserving; Real world data; Synthetic data; Synthetic generation; Mining; Probability distribution Artificial data; Data linkage; Data matching; Data mining pre-processing; Privacy
Citation
Collections
Source
Type
Book chapter
Book Title
Advances in Knowledge Discovery and Data Mining
Entity type
Access Statement
License Rights
Restricted until
2037-12-31