A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization

Date

1998

Authors

Strazdins, Peter

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this paper, we analyse and compare the techniques of algorithmic blocking and (storage blocking with) lookahead for distributed memory LU, LLT and QR factorizations. Concepts and some useful properties of a simplified model of lookahead are explored, including the minimal degree of lookahead required for optimal performance. Issues in the implementation of lookahead are discussed, which are more involved for the cases of LLT and QR factorizations. It is also explained how hybrid algorithmic blocking and lookahead techniques can be implemented. Implications for parallel linear algebra library design are also discussed. Results are given on the Fujitsu AP1000 and AP+ multicomputers, which have relatively high communication to computation to speeds. The results indicate that both methods are superior to storage blocking (without lookahead). They also indicate that for such machines, the hybrid method is optimal for smaller matrices, due to savings in communication startups. For larger matrices, algorithmic blocking gave the best performance, due to its better load balancing properties. An exception was LLT for the AP+, where lookahead alone gave comparable or better performance for larger matrix sizes as well. Performance models, predicting the minimum matrix size where lookahead becomes effective, indicate this trend can be expected for machines with lower communication to computation speeds, but that the range for where lookahead is superior is extended.

Description

Keywords

dense linear algebra, block cyclic decomposition, storage blocking, algorithmic blocking, pipelining, lookahead, TR-CS

Citation

Source

Type

Working/Technical Paper

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads