[AArch64] Add patterns for scalar FMUL, FMULX
authorOverMighty <its.overmighty@gmail.com>
Fri, 30 Jun 2023 07:34:20 +0000 (08:34 +0100)
committerDavid Green <david.green@arm.com>
Fri, 30 Jun 2023 07:34:20 +0000 (08:34 +0100)
commitea045b99da8ee236076fddb256bdac98681441fa
tree95b30b7a5180dad14c02064a5540543fb30d98d2
parent0446bfcc5ca206701b511796ed1c8316daa2d169
[AArch64] Add patterns for scalar FMUL, FMULX

Scalar FMUL, FMULX instructions perform better or the same compared to indexed
FMUL, FMULX.

For example, the Arm Cortex-A55 Software Optimization Guide lists the following
instructions with a throughput of 2 IPC:
 - "FP multiply" FMUL
 - "ASIMD FP multiply" FMULX

whereas it lists the following with a throughput of 1 IPC:
 - "ASIMD FP multiply, by element" FMUL, FMULX

The Arm Cortex-A510 Software Optimization Guide, however, does not separately
list "by element" variants of the "ASIMD FP multiply" instructions, which are
listed with the same throughput as the non-ASIMD ones.

Fixes #60817.

Differential Revision: https://reviews.llvm.org/D153207
llvm/lib/Target/AArch64/AArch64InstrFormats.td
llvm/lib/Target/AArch64/AArch64InstrInfo.td
llvm/test/CodeGen/AArch64/arm64-fma-combines.ll
llvm/test/CodeGen/AArch64/arm64-fml-combines.ll
llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll
llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll
llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll
llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
llvm/test/CodeGen/AArch64/vecreduce-fmul-legalization-strict.ll