Only match BMI (BLSR, BLSI, BLSMSK) if the add/sub op is single use
If the add/sub is not single use, it will need to be materialized
later, in which case using the BMI instruction is a de-optimization in
terms of code-size and throughput.
i.e:
```
// Good
leal -1(%rdi), %eax
andl %eax, %eax
xorl %eax, %esi
...
```
```
// Unecessary BMI (lower throughput, larger code size)
leal -1(%rdi), %eax
blsr %edi, %eax
xorl %eax, %esi
...
```
Note, this may cause more `mov` instructions to be emitted sometimes
because BMI instructions only have 1 src and write-only to dst. A
better approach may be to only avoid BMI for (and/xor X, (add/sub
0/-1, X)) if this is the last use of X but NOT the last use of
(add/sub 0/-1, X).
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D141180