Significant term extraction by Higher Order SVD

Date

2009

Authors

Manna, Sukanya
Petres, Zoltan
Gedeon, Tamas (Tom)

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers (IEEE Inc)

Abstract

In this paper, we present a novel method for term importance, called Tensor Term Indexing (TTI). This extracts significant terms from a document as well as a coherent collection of document set. The basic idea of this approach is to represent the whole document collection in a Term-Sentence-Document tensor and employs higher-order singular value decomposition (HOSVD) for important term extraction. TTI uses the lower rank approximation technique to reduce noise by eliminating anecdotal terms, to mitigate synonymy by merging the dimensions associated with terms that have similar meanings, and to mitigates polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Our evaluation shows that that TTI model can extract significant terms relevant to a topic from a small number of documents which Term Frequency and Inverse Document Frequency (tfidf) cannot.

Description

Keywords

Keywords: Approximation techniques; Basic idea; Collection of documents; Document collection; Higher order singular value decomposition; Higher order SVD; Inverse Document Frequency; Novel methods; Polysemous word; Term extraction; Term Frequency; Term importance;

Citation

Source

Proceedings of the 7th International Symposium on Applied Machine Intelligence and Informatics Proceedings

Type

Conference paper

Book Title

Entity type

Access Statement

License Rights

Restricted until

2037-12-31