Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Ideal spam nullification : understanding the impact of web spam

Loading...
Thumbnail Image

Date

Authors

Jones, Timothy Lee

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Past research in Adversarial Information Retrieval (AIR) has thoroughly addressed the detection of web spam. However, elimination of all detected spam is not the optimum strategy for nullifying spam, as spam documents sometimes help users complete tasks. In this thesis, the impact of web spam in search results is examined and the ideal properties of effective spam nullification are outlined. Evaluation of web spam nullification could be performed using test collections. The properties of a suitable test collection are examined, and it is argued that due to coverage and the rapidly evolving nature of web spam, the construction of such a collection is impractical. Consequently, low cost user studies are used for experiments in this thesis. Initially, experimental subjects were asked to compare side-by-side search result pages containing varied amounts of spam. The presence of even one or two spam documents decreased the judged quality of the search results. Unfortunately, subjects often did not continue to use the experimental search engine. Additionally, selecting alternating documents from a high quality ranking did not produce two different panels of equivalent search quality. Accordingly, the impact of web spam on search results is studied using a normal single panel web search engine to improve subject retention. A lightweight browser extension is created to record user interactions. In the initial experiment, click data suffered from sparsity: However, scrolling data was less sparse, and was able to be used to reliably distinguish high and low quality search engines. To investigate the sensitivity of scrolling as an implicit measure of result set quality, a method for systematic stepped degradation of the NDCG of high quality search results is developed. Using engines degraded in this way, the scrolling classification appears only effective when the difference in search quality is large. The results reported in this thesis suggest that the conventional use of independent document relevance as a gain value for NDCG does not accurately model result quality for information gathering tasks. Also, the usual log discount used in NDCG leads to equal scores for rankings which appear to be materially different. Finally, the impact of low value documents in web search results is examined using observed user reaction to inserted spam/irrelevant documents, and a series of explicit judgements on hand-created document pools. It is found that when users are given two documents of equal utility scores, the one with the lower spam score will be preferred; a result list without any spam documents will be preferred to one with spam documents; and an irrelevant document high in a result list is more damaging to user satisfaction than a spam document. These are the preferences that must be rewarded by any believable, reliable procedure for evaluating web spam nullification.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

Open Access

License Rights

Restricted until

abcd