Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Engineering a multi-purpose test collection for Web retrieval experiements

dc.contributor.authorBailey, Peter
dc.contributor.authorCraswell, Nick
dc.contributor.authorHawking, David
dc.date.accessioned2015-12-13T23:05:21Z
dc.date.available2015-12-13T23:05:21Z
dc.date.issued2003
dc.date.updated2015-12-12T07:59:54Z
dc.description.abstractPast research into text retrieval methods for the Web has been restricted by the lack of a test collection capable of supporting experiments which are both realistic and reproducible. The 1.69 million document WT10g collection is proposed as a multi-purpose testbed for experiments with these attributes, in distributed IR, hyperlink algorithms and conventional ad hoc retrieval. WT10g was constructed by selecting from a superset of documents in such a way that desirable corpus properties were preserved or optimised. These properties include: a high degree of inter-server connectivity, integrity of server holdings, inclusion of documents related to a very wide spread of likely queries, and a realistic distribution of server holding sizes. We confirm that WT10g contains exploitable link information using a site (homepage) finding experiment. Our results show that, on this task, Okapi BM25 works better on propagated link anchor text than on full text. WT10g was used in TREC-9 and TREC-2000 and both topic relevance and homepage finding queries and judgments are available.
dc.identifier.issn0306-4573
dc.identifier.urihttp://hdl.handle.net/1885/85484
dc.publisherPergamon Press
dc.sourceInformation Processing and Management
dc.subjectKeywords: Algorithms; Information retrieval; Optimization; Query languages; Servers; Link-based ranking; World Wide Web Distributed information retrieval; Link-based ranking; Test collections; Web retrieval
dc.titleEngineering a multi-purpose test collection for Web retrieval experiements
dc.typeJournal article
local.bibliographicCitation.issue6
local.bibliographicCitation.lastpage871
local.bibliographicCitation.startpage853
local.contributor.affiliationBailey, Peter, College of Engineering and Computer Science, ANU
local.contributor.affiliationCraswell, Nick, Microsoft Research
local.contributor.affiliationHawking, David, National ICT Australia
local.contributor.authoruidBailey, Peter, u4026252
local.description.notesImported from ARIES
local.description.refereedYes
local.identifier.absfor080708 - Records and Information Management (excl. Business Records and Information Management)
local.identifier.ariespublicationMigratedxPub13917
local.identifier.citationvolume39
local.identifier.doi10.1016/S0306-4573(02)00084-5
local.identifier.scopusID2-s2.0-0042766369
local.type.statusPublished Version

Downloads

abcd