[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.
authorSimon Tatham <simon.tatham@arm.com>
Mon, 18 Nov 2019 10:38:17 +0000 (10:38 +0000)
committerSimon Tatham <simon.tatham@arm.com>
Mon, 18 Nov 2019 10:39:30 +0000 (10:39 +0000)
commitf4f77aa53e5b872bd8a93c3a193714d8eba9578c
tree4c964244d314566dbcfe61e0c2653608fd5af011
parent4a4dd85e5ab51aa8c01c690cd14205af157178e7
[ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.

If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.

  mve_pred16_t pred = vcmpeqq(v1, v2);
  v_out = vaddq_m(v_inactive, v3, v4, pred);

then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.

To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:

  vpt.u16 eq, q1, q2
  vaddt.u16 q0, q3, q4

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70313
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
llvm/test/CodeGen/Thumb2/mve-vpt-from-intrinsics.ll [new file with mode: 0644]
llvm/test/Transforms/InstCombine/ARM/mve-v2i2v.ll [new file with mode: 0644]