A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization
Date
1998
Authors
Strazdins, Peter
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking with) lookahead for distributed memory LU, LLT and QR factorizations. Concepts and some useful properties of a simplified model of lookahead are explored, including the minimal degree of lookahead required for optimal performance. Issues in the implementation of lookahead are discussed, which are more involved for the cases of LLT and QR factorizations. It is also explained how hybrid algorithmic blocking and lookahead techniques can be implemented. Implications for parallel linear algebra library design are also discussed. Results are given on the Fujitsu AP1000 and AP+ multicomputers, which have relatively high communication to computation to speeds. The results indicate that both methods are superior to storage blocking (without lookahead). They also indicate that for such machines, the hybrid method is optimal for smaller matrices, due to savings in communication startups. For larger matrices, algorithmic blocking gave the best performance, due to its better load balancing properties. An exception was LLT for the AP+, where lookahead alone gave comparable or better performance for larger matrix sizes as well. Performance models, predicting the minimum matrix size where lookahead becomes effective, indicate this trend can be expected for machines with lower communication to computation speeds, but that the range for where lookahead is superior is extended.
Description
Keywords
dense linear algebra, block cyclic decomposition, storage blocking, algorithmic blocking, pipelining, lookahead, TR-CS
Citation
Collections
Source
Type
Working/Technical Paper
Book Title
Entity type
Access Statement
License Rights
DOI
Restricted until
Downloads
File
Description