[X86] Add inst fixup for `unpckpd` -> `unpckqdq`.
authorNoah Goldstein <goldstein.w.n@gmail.com>
Mon, 10 Apr 2023 00:23:26 +0000 (19:23 -0500)
committerNoah Goldstein <goldstein.w.n@gmail.com>
Mon, 10 Apr 2023 05:16:57 +0000 (00:16 -0500)
commitc3f01f13b10d708b9b7ff45a6ccc2f0c3462b3af
treecb63181bfc8242f4dedeb6e0ac132f21efb88bb5
parent2ce1698a343c599910bceed399ca7020816b230e
[X86] Add inst fixup for `unpckpd` -> `unpckqdq`.

`unpckqdq` seems to be treated as a shuffle from bypass delay
perspective (which makes sense it appears to have shared shuffle units
for all micro-arch).

`unpckqdq` is slightly preferable to `shufpd` as it saves 1-byte of
code size and can be used to replace the micro-fused `rm` version. So,
if the target has no bypass delay, we should do `unpckpd` ->
`unpckqdq` instead of `shufpd.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D147728
llvm/lib/Target/X86/X86FixupInstTuning.cpp
llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll
llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll