Parametric model-based clustering

Date

Authors

Nikulin, Vladimir
Smola, Alex J.

Journal Title

Journal ISSN

Volume Title

Publisher

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

Parametric, model-based algorithms learn generative models from the data, with each model corresponding to one particular cluster. Accordingly, the model-based partitional algorithm will select the most suitable model for any data object (Clustering step), and will recompute parametric models using data specifically from the corresponding clusters (Maximization step). This Clustering-Maximization framework have been widely used and have shown promising results in many applications including complex variable-length data. The paper proposes Experience-Innovation (EI) method as a natural extension of the Clustering-Maximization framework. This method includes 3 components: 1) keep the best past experience and make empirical likelihood trajectory monotonical as a result; 2) find a new model as a function of existing models so that the corresponding cluster will split existing clusters with bigger number of elements and smaller uniformity; 3) heuristical innovations, for example, several trials with random initial settings. Also, we introduce clustering regularisation based on the balanced complex of two conditions: 1) significance of any particular cluster; 2) difference between any 2 clusters. We illustrate effectiveness of the proposed methods using first-order Markov model in application to the large webtraffic dataset. The aim of the experiment is to explain and understand the way people interact with web sites.

Description

Citation

Source

Proceedings of SPIE - The International Society for Optical Engineering

Book Title

Entity type

Publication

Access Statement

License Rights

Restricted until