review.tizen.org Git - platform/upstream/llvm.git/commit

author	OverMighty <its.overmighty@gmail.com>
	Fri, 30 Jun 2023 07:34:20 +0000 (08:34 +0100)
committer	David Green <david.green@arm.com>
	Fri, 30 Jun 2023 07:34:20 +0000 (08:34 +0100)
commit	ea045b99da8ee236076fddb256bdac98681441fa
tree	95b30b7a5180dad14c02064a5540543fb30d98d2	tree \| snapshot
parent	0446bfcc5ca206701b511796ed1c8316daa2d169	commit \| diff

[AArch64] Add patterns for scalar FMUL, FMULX

Scalar FMUL, FMULX instructions perform better or the same compared to indexed
FMUL, FMULX.

For example, the Arm Cortex-A55 Software Optimization Guide lists the following
instructions with a throughput of 2 IPC:
- "FP multiply" FMUL
- "ASIMD FP multiply" FMULX

whereas it lists the following with a throughput of 1 IPC:
- "ASIMD FP multiply, by element" FMUL, FMULX

The Arm Cortex-A510 Software Optimization Guide, however, does not separately
list "by element" variants of the "ASIMD FP multiply" instructions, which are
listed with the same throughput as the non-ASIMD ones.

Fixes #60817.

Differential Revision: https://reviews.llvm.org/D153207

llvm/lib/Target/AArch64/AArch64InstrFormats.td		diff \| blob \| history
llvm/lib/Target/AArch64/AArch64InstrInfo.td		diff \| blob \| history
llvm/test/CodeGen/AArch64/arm64-fma-combines.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/arm64-fml-combines.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll		diff \| blob \| history
llvm/test/CodeGen/AArch64/vecreduce-fmul-legalization-strict.ll		diff \| blob \| history