Server selection methods in personal metasearch: A comparative empirical study

Thomas, Paul; Hawking, David

doi:10.1007/s10791-009-9094-z

A change is coming. Click to see a sneak peek of the new Open Research Repository.

Server selection methods in personal metasearch: A comparative empirical study

Request a Copy

link to publisher version

Altmetric Citations

Thomas, Paul; Hawking, David

Description

Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch-a novel application of DIR which includes all of a user's online resources-may involve collections which vary in size by several orders of magnitude, and which have highly...[Show more] varied data. We describe a number of algorithms for server selection, and consider their effectiveness when collections vary widely in size and are represented by imperfect samples. We compare the algorithms on a personal metasearch testbed comprising calendar, email, mailing list and web collections, where collection sizes differ by three orders of magnitude. We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments. Kullback-Leibler divergence, previously considered poorly effective, performs better than expected in this application; other techniques thought to be effective perform poorly and are not appropriate for this problem. A strong correlation with size-based rankings for many techniques may be responsible.

dc.contributor.author	Thomas, Paul
dc.contributor.author	Hawking, David
dc.date.accessioned	2015-12-10T22:35:21Z
dc.identifier.issn	1386-4564
dc.identifier.uri	http://hdl.handle.net/1885/56254
dc.description.abstract	Server selection is an important subproblem in distributed information retrieval (DIR) but has commonly been studied with collections of more or less uniform size and with more or less homogeneous content. In contrast, realistic DIR applications may feature much more varied collections. In particular, personal metasearch-a novel application of DIR which includes all of a user's online resources-may involve collections which vary in size by several orders of magnitude, and which have highly varied data. We describe a number of algorithms for server selection, and consider their effectiveness when collections vary widely in size and are represented by imperfect samples. We compare the algorithms on a personal metasearch testbed comprising calendar, email, mailing list and web collections, where collection sizes differ by three orders of magnitude. We then explore the effect of collection size variations using four partitionings of the TREC ad hoc data used in many other DIR experiments. Kullback-Leibler divergence, previously considered poorly effective, performs better than expected in this application; other techniques thought to be effective perform poorly and are not appropriate for this problem. A strong correlation with size-based rankings for many techniques may be responsible.
dc.publisher	Kluwer Academic Publishers
dc.source	Information Retrieval
dc.subject	Keywords: Distributed information retrieval; Server selection
dc.title	Server selection methods in personal metasearch: A comparative empirical study
dc.type	Journal article
local.description.notes	Imported from ARIES
local.identifier.citationvolume	12
dc.date.issued	2009
local.identifier.absfor	080704 - Information Retrieval and Web Search
local.identifier.ariespublication	u8803936xPUB356
local.type.status	Published Version
local.contributor.affiliation	Thomas, Paul, College of Engineering and Computer Science, ANU
local.contributor.affiliation	Hawking, David, College of Engineering and Computer Science, ANU
local.description.embargo	2037-12-31
local.bibliographicCitation.issue	5
local.bibliographicCitation.startpage	581
local.bibliographicCitation.lastpage	604
local.identifier.doi	10.1007/s10791-009-9094-z
dc.date.updated	2016-02-24T11:44:20Z
local.identifier.scopusID	2-s2.0-78049307287
local.identifier.thomsonID	000269320200004
Collections	ANU Research Publications

Download

File	Description	Size	Format	Image
01_Thomas_Server_selection_methods_in_2009.pdf		544.76 kB	Adobe PDF	Request a copy

Show simple item record

Server selection methods in personal metasearch: A comparative empirical study

Altmetric Citations

Description

Download