Real and complex fast fourier transforms on the Fujitsu VPP 500
Date
1994
Authors
Hegland, Markus
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Fast Fourier transforms parallelize well but need large amounts of communication. An algorithm which concentrates all the communication in one or two transposition steps is the transpose split algorithm. Different transposition algorithms can be used depending on data size and communication latency. A new transpose split algorithm for real and hermitian data is presented for one, two and three dimensional transforms. This algorithm is implemented on the Fujitsu VPP 500. The Fujitsu VPP 500 is a parallel processor with a moderate number of very fast vector processors connected by a crossbar switch. Each processor has a peak performance of 1.6 Gop/s and can simultaneously read and write 400 MByte/s. Very long vector length stride one implementations of multiple FFTs on one node [Hegland, Numerische Mathematik, to appear, 1994] are combined with optimized transpositions. One third of peak performance was achieved on a configuration with up to 11 processors.
Description
Keywords
fast vector processors, transposition algorithms, dimensional transforms
Citation
Collections
Source
Type
Working/Technical Paper
Book Title
Entity type
Access Statement
License Rights
DOI
Restricted until
Downloads
File
Description