Methods for Distributed Information Retrieval
Abstract
Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ...