Skip navigation
Skip navigation

Nullification test collections for web spam and SEO

Jones, Timothy; Sankaranarayana, Ramesh S; Hawking, David; Craswell, Nick

Description

Research in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or excessive search engine optimisation (SEO) on the ranking of non-spam pages is not well supported by these resources. Nor is the study of cloaking techniques or of click spam. Finally, the domain-restricted nature of a .uk crawl means that only...[Show more]

dc.contributor.authorJones, Timothy
dc.contributor.authorSankaranarayana, Ramesh S
dc.contributor.authorHawking, David
dc.contributor.authorCraswell, Nick
dc.coverage.spatialMadrid Spain
dc.date.accessioned2015-12-10T22:35:18Z
dc.date.createdApril 21 2009
dc.identifier.isbn9781605584386
dc.identifier.urihttp://hdl.handle.net/1885/56223
dc.description.abstractResearch in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or excessive search engine optimisation (SEO) on the ranking of non-spam pages is not well supported by these resources. Nor is the study of cloaking techniques or of click spam. Finally, the domain-restricted nature of a .uk crawl means that only parts of link-farm icebergs may be visible in these crawls. We introduce the term nullification which we define as "preventing problem pages from negatively affecting search results". We show some important differences between properties of current .uk-restricted crawls and those previously reported for the Web as a whole. We identify a need for an adversarial IR collection which is not domain-restricted and which is supported by a set of appropriate query sets and (optimistically) user-behaviour data. The billion-page unrestricted crawl being conducted by CMU (web09-bst) and which will be used in the 2009 TREC Web Track is assessed as a possible basis for a new AIR test collection. We discuss the pros and cons of its scale, and the feasibility of adding resources such as query lists to enhance the utility of the collection for AIR research.
dc.publisherAssociation for Computing Machinery Inc (ACM)
dc.relation.ispartofseriesInternational Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2009)
dc.sourceProceedings of The 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2009)
dc.source.urihttp://doi.acm.org/10.1145/1531914.1531927
dc.subjectKeywords: Air research; Evaluation test; Optimisations; Query lists; Search results; Test Collection; Information retrieval; Research; Sea ice; Search engines; Internet evaluation; test collections; web spam
dc.titleNullification test collections for web spam and SEO
dc.typeConference paper
local.description.notesImported from ARIES
local.description.refereedYes
dc.date.issued2009
local.identifier.absfor080704 - Information Retrieval and Web Search
local.identifier.ariespublicationu8803936xPUB355
local.type.statusPublished Version
local.contributor.affiliationJones, Timothy, College of Engineering and Computer Science, ANU
local.contributor.affiliationSankaranarayana, Ramesh S, College of Engineering and Computer Science, ANU
local.contributor.affiliationHawking, David, College of Engineering and Computer Science, ANU
local.contributor.affiliationCraswell, Nick, Microsoft Research
local.description.embargo2037-12-31
local.bibliographicCitation.startpage53
local.bibliographicCitation.lastpage60
local.identifier.doi10.1145/1531914.1531927
dc.date.updated2016-02-24T11:44:19Z
local.identifier.scopusID2-s2.0-77954447478
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Jones_Nullification_test_collections_2009.pdf710.58 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator