Skip navigation
Skip navigation

Implementation of the BLAS Level 3 and LINPACK benchmark on the AP1000

Brent, Richard P; Strazdins, Peter

Description

This paper describes an implementation of Level 3 of the Basic Linear Algebra Sub-program (BLAS-3) library and the LINPACK Benchmark on the Fujitsu AP1000. The performance of these applications is regarded as important for distributed memory architectures such as the AP1000. We discuss the techniques involved in optimizing these applications without significantly sacrificing numerical stability. Many of these techniques may also be applied to other numerical applications. They include the use...[Show more]

dc.contributor.authorBrent, Richard P
dc.contributor.authorStrazdins, Peter
dc.date.accessioned2003-07-11
dc.date.accessioned2004-05-19T12:57:01Z
dc.date.accessioned2011-01-05T08:43:41Z
dc.date.available2004-05-19T12:57:01Z
dc.date.available2011-01-05T08:43:41Z
dc.date.created1992
dc.identifier.urihttp://hdl.handle.net/1885/40796
dc.identifier.urihttp://digitalcollections.anu.edu.au/handle/1885/40796
dc.description.abstractThis paper describes an implementation of Level 3 of the Basic Linear Algebra Sub-program (BLAS-3) library and the LINPACK Benchmark on the Fujitsu AP1000. The performance of these applications is regarded as important for distributed memory architectures such as the AP1000. We discuss the techniques involved in optimizing these applications without significantly sacrificing numerical stability. Many of these techniques may also be applied to other numerical applications. They include the use of software pipelining and loop unrolling to optimize scalar processor computation, the utilization of fast communication primitives on the AP1000 (particularly row and column broadcasting using wormhole routing), blocking and partitioning methods, and `fast' algorithms (using reduced floating point operations). These techniques enable a performance of 85-90% of the AP1000's theoretical peak speed for the BLAS Level 3 procedures and up to 80% for the LINPACK benchmark.
dc.format.extent221893 bytes
dc.format.extent356 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/octet-stream
dc.language.isoen_AU
dc.subjectBLAS
dc.subjectLINPACK benchmark
dc.subjectSPARC processors
dc.subjectimplementation
dc.subjectAP1000
dc.subjectBLAS-3 triangular matrix updates
dc.titleImplementation of the BLAS Level 3 and LINPACK benchmark on the AP1000
dc.typeWorking/Technical Paper
local.description.refereedno
local.identifier.citationmonthoct
local.identifier.citationyear1992
local.identifier.eprintid1648
local.rights.ispublishedyes
dc.date.issued1992
local.contributor.affiliationANU
local.contributor.affiliationDepartment of Computer Science, FEIT
local.citationTR-CS-92-14
CollectionsANU Research Publications

Download

File Description SizeFormat Image
TR-CS-92-14.pdf216.69 kBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  22 January 2019/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator