Skip navigation
Skip navigation

Server selection methods in personal metasearch: A comparative empirical study

Thomas, Paul; Hawking, David

Description

Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch-a novel application of DIR which includes all of a user's online resources-may involve collections which vary in size by several orders of magnitude, and which have highly...[Show more]

dc.contributor.authorThomas, Paul
dc.contributor.authorHawking, David
dc.date.accessioned2015-12-10T22:35:21Z
dc.identifier.issn1386-4564
dc.identifier.urihttp://hdl.handle.net/1885/56254
dc.description.abstractServer selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch-a novel application of DIR which includes all of a user's online resources-may involve collections which vary in size by several orders of magnitude, and which have highly varied data. We describe a number of algorithms for server selection, and consider their effectiveness when collections vary widely in size and are represented by imperfect samples. We compare the algorithms on a personal metasearch testbed comprising calendar, email, mailing list and web collections, where collection sizes differ by three orders of magnitude. We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments. Kullback-Leibler divergence, previously considered poorly effective, performs better than expected in this application; other techniques thought to be effective perform poorly and are not appropriate for this problem. A strong correlation with size-based rankings for many techniques may be responsible.
dc.publisherKluwer Academic Publishers
dc.sourceInformation Retrieval
dc.subjectKeywords: Distributed information retrieval; Server selection
dc.titleServer selection methods in personal metasearch: A comparative empirical study
dc.typeJournal article
local.description.notesImported from ARIES
local.identifier.citationvolume12
dc.date.issued2009
local.identifier.absfor080704 - Information Retrieval and Web Search
local.identifier.ariespublicationu8803936xPUB356
local.type.statusPublished Version
local.contributor.affiliationThomas, Paul, College of Engineering and Computer Science, ANU
local.contributor.affiliationHawking, David, College of Engineering and Computer Science, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.issue5
local.bibliographicCitation.startpage581
local.bibliographicCitation.lastpage604
local.identifier.doi10.1007/s10791-009-9094-z
dc.date.updated2016-02-24T11:44:20Z
local.identifier.scopusID2-s2.0-78049307287
local.identifier.thomsonID000269320200004
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Thomas_Server_selection_methods_in_2009.pdf544.76 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator