Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Implementation of the BLAS Level 3 and LINPACK benchmark on the AP1000

Loading...
Thumbnail Image

Date

Authors

Brent, Richard P
Strazdins, Peter

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This paper describes an implementation of Level 3 of the Basic Linear Algebra Sub-program (BLAS-3) library and the LINPACK Benchmark on the Fujitsu AP1000. The performance of these applications is regarded as important for distributed memory architectures such as the AP1000. We discuss the techniques involved in optimizing these applications without significantly sacrificing numerical stability. Many of these techniques may also be applied to other numerical applications. They include the use of software pipelining and loop unrolling to optimize scalar processor computation, the utilization of fast communication primitives on the AP1000 (particularly row and column broadcasting using wormhole routing), blocking and partitioning methods, and `fast' algorithms (using reduced floating point operations). These techniques enable a performance of 85-90% of the AP1000's theoretical peak speed for the BLAS Level 3 procedures and up to 80% for the LINPACK benchmark.

Description

Citation

Source

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads

abcd