A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

Xia, Yufan; Pierre, Marco De La; Barnard, Amanda S.; Barca, Giuseppe Maria Junior

A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

Date

2026

Authors

Xia, Yufan

Pierre, Marco De La

Barnard, Amanda S.

Barca, Giuseppe Maria Junior

Abstract

The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and autotuning. However, due to the complexity of modern multi-core shared memory systems, it is challenging to determine the number of threads that minimises the multi-thread GEMM runtime. We present a proof-of-concept approach to building an Architecture and Data-Structure Aware Linear Algebra (ADSALA) software library that uses machine learning to optimise the runtime performance of BLAS routines. More specifically, our method uses a machine learning model on-the-fly to automatically select the optimal number of threads for a given GEMM task based on the collected training data. Test results on two different HPC node architectures, one based on a two-socket Intel Cascade Lake and the other on a two-socket AMD Zen 3, revealed a 25 to 40 per cent speedup compared to traditional GEMM implementations in BLAS when using GEMM of memory usage within 100 MB.

URI

https://hdl.handle.net/1885/733807519

Collections

ANU Research Publications

Source

CoRR

Type

Journal article

Entity type

Publication

DOI

10.48550/arXiv.2601.09114

Full item page

A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until