Jones, Timothy Lee
Past research in Adversarial Information Retrieval (AIR) has thoroughly addressed the detection of web spam. However, elimination of all detected spam is not the optimum strategy for nullifying spam, as spam documents sometimes help users complete tasks. In this thesis, the impact of web spam in search results is examined and the ideal properties of effective spam nullification are outlined.
Evaluation of web spam nullification could be performed using test collections. The properties of a...[Show more] suitable test collection are examined, and it is argued that due to coverage and the rapidly evolving nature of web spam, the construction of such a collection is impractical. Consequently, low cost user studies are used for experiments in this thesis.
Initially, experimental subjects were asked to compare side-by-side search result pages containing varied amounts of spam. The presence of even one or two spam documents decreased the judged quality of the search results. Unfortunately, subjects often did not continue to use the experimental search engine. Additionally, selecting alternating documents from a high quality ranking did not produce two different panels of equivalent search quality. Accordingly, the impact of web spam on search results is studied using a normal single panel web search engine to improve subject retention. A lightweight browser extension is created to record user interactions. In the initial experiment, click data suffered from sparsity: However, scrolling data was less sparse, and was able to be used to reliably distinguish high and low quality search engines.
To investigate the sensitivity of scrolling as an implicit measure of result set quality, a method for systematic stepped degradation of the NDCG of high quality search results is developed. Using engines degraded in this way, the scrolling classification appears only effective when the difference in search quality is large.
The results reported in this thesis suggest that the conventional use of independent document relevance as a gain value for NDCG does not accurately model result quality for information gathering tasks. Also, the usual log discount used in NDCG leads to equal scores for rankings which appear to be materially different.
Finally, the impact of low value documents in web search results is examined using observed user reaction to inserted spam/irrelevant documents, and a series of explicit judgements on hand-created document pools. It is found that when users are given two documents of equal utility scores, the one with the lower spam score will be preferred; a result list without any spam documents will be preferred to one with spam documents; and an irrelevant document high in a result list is more damaging to user satisfaction than a spam document. These are the preferences that must be rewarded by any believable, reliable procedure for evaluating web spam nullification.
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.