Engineering a multi-purpose test collection for Web retrieval experiements
| dc.contributor.author | Bailey, Peter | |
| dc.contributor.author | Craswell, Nick | |
| dc.contributor.author | Hawking, David | |
| dc.date.accessioned | 2015-12-13T23:05:21Z | |
| dc.date.available | 2015-12-13T23:05:21Z | |
| dc.date.issued | 2003 | |
| dc.date.updated | 2015-12-12T07:59:54Z | |
| dc.description.abstract | Past research into text retrieval methods for the Web has been restricted by the lack of a test collection capable of supporting experiments which are both realistic and reproducible. The 1.69 million document WT10g collection is proposed as a multi-purpose testbed for experiments with these attributes, in distributed IR, hyperlink algorithms and conventional ad hoc retrieval. WT10g was constructed by selecting from a superset of documents in such a way that desirable corpus properties were preserved or optimised. These properties include: a high degree of inter-server connectivity, integrity of server holdings, inclusion of documents related to a very wide spread of likely queries, and a realistic distribution of server holding sizes. We confirm that WT10g contains exploitable link information using a site (homepage) finding experiment. Our results show that, on this task, Okapi BM25 works better on propagated link anchor text than on full text. WT10g was used in TREC-9 and TREC-2000 and both topic relevance and homepage finding queries and judgments are available. | |
| dc.identifier.issn | 0306-4573 | |
| dc.identifier.uri | http://hdl.handle.net/1885/85484 | |
| dc.publisher | Pergamon Press | |
| dc.source | Information Processing and Management | |
| dc.subject | Keywords: Algorithms; Information retrieval; Optimization; Query languages; Servers; Link-based ranking; World Wide Web Distributed information retrieval; Link-based ranking; Test collections; Web retrieval | |
| dc.title | Engineering a multi-purpose test collection for Web retrieval experiements | |
| dc.type | Journal article | |
| local.bibliographicCitation.issue | 6 | |
| local.bibliographicCitation.lastpage | 871 | |
| local.bibliographicCitation.startpage | 853 | |
| local.contributor.affiliation | Bailey, Peter, College of Engineering and Computer Science, ANU | |
| local.contributor.affiliation | Craswell, Nick, Microsoft Research | |
| local.contributor.affiliation | Hawking, David, National ICT Australia | |
| local.contributor.authoruid | Bailey, Peter, u4026252 | |
| local.description.notes | Imported from ARIES | |
| local.description.refereed | Yes | |
| local.identifier.absfor | 080708 - Records and Information Management (excl. Business Records and Information Management) | |
| local.identifier.ariespublication | MigratedxPub13917 | |
| local.identifier.citationvolume | 39 | |
| local.identifier.doi | 10.1016/S0306-4573(02)00084-5 | |
| local.identifier.scopusID | 2-s2.0-0042766369 | |
| local.type.status | Published Version |