[flang] Lower TRANSPOSE without using runtime.
authorSlava Zakharin <szakharin@nvidia.com>
Fri, 8 Jul 2022 22:10:33 +0000 (15:10 -0700)
committerSlava Zakharin <szakharin@nvidia.com>
Tue, 12 Jul 2022 15:33:39 +0000 (08:33 -0700)
commita280043b523182ab6bb3ce5caf75e931a26eaf3f
treee0cc623d4a131b59f3fc5e748b79f3ecf62a3e5c
parentd6ef3d20b4e3768dc30fb229dfa938d8059fffef
[flang] Lower TRANSPOSE without using runtime.

Calling runtime TRANSPOSE requires a temporary array for the result,
and, sometimes, a temporary array for the argument. Lowering it inline
should provide faster code.

I added -opt-transpose control just for debugging purposes temporary.
I am going to make driver changes that will disable inline lowering
for -O0. For the time being I would like to enable it by default
to expose the code to more tests.

Differential Revision: https://reviews.llvm.org/D129497
flang/lib/Lower/ConvertExpr.cpp
flang/test/Lower/Intrinsics/transpose.f90
flang/test/Lower/Intrinsics/transpose_opt.f90 [new file with mode: 0644]