Significant term extraction by Higher Order SVD
Date
2009
Authors
Manna, Sukanya
Petres, Zoltan
Gedeon, Tamas (Tom)
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers (IEEE Inc)
Abstract
In this paper, we present a novel method for term importance, called Tensor Term Indexing (TTI). This extracts significant terms from a document as well as a coherent collection of document set. The basic idea of this approach is to represent the whole document collection in a Term-Sentence-Document tensor and employs higher-order singular value decomposition (HOSVD) for important term extraction. TTI uses the lower rank approximation technique to reduce noise by eliminating anecdotal terms, to mitigate synonymy by merging the dimensions associated with terms that have similar meanings, and to mitigates polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Our evaluation shows that that TTI model can extract significant terms relevant to a topic from a small number of documents which Term Frequency and Inverse Document Frequency (tfidf) cannot.
Description
Keywords
Keywords: Approximation techniques; Basic idea; Collection of documents; Document collection; Higher order singular value decomposition; Higher order SVD; Inverse Document Frequency; Novel methods; Polysemous word; Term extraction; Term Frequency; Term importance;
Citation
Collections
Source
Proceedings of the 7th International Symposium on Applied Machine Intelligence and Informatics Proceedings
Type
Conference paper
Book Title
Entity type
Access Statement
License Rights
Restricted until
2037-12-31
Downloads
File
Description