[flang] add fused matmul-transpose to the runtime
authorTom Eccles <tom.eccles@arm.com>
Fri, 17 Mar 2023 09:26:39 +0000 (09:26 +0000)
committerTom Eccles <tom.eccles@arm.com>
Fri, 17 Mar 2023 09:30:04 +0000 (09:30 +0000)
commit4ff8ba72b58328ebf6e8eb8c10a428eece73c89f
tree51584a545a6a528e6b18c7a6f8607985670fed56
parenta351a60ebae456735ec32808f311a6e9cf5e751e
[flang] add fused matmul-transpose to the runtime

This fused operation should run a lot faster than first transposing the
lhs array and then multiplying the matrices separately.

Based on flang/runtime/matmul.cpp

Depends on D145959

Reviewed By: klausler

Differential Revision: https://reviews.llvm.org/D145960
flang/include/flang/Runtime/matmul-transpose.h [new file with mode: 0644]
flang/runtime/CMakeLists.txt
flang/runtime/matmul-transpose.cpp [new file with mode: 0644]
flang/unittests/Runtime/CMakeLists.txt
flang/unittests/Runtime/MatmulTranspose.cpp [new file with mode: 0644]