Transform AtomicRMW logic operations to BT{R|C|S} if only changing/testing a single bit.
This is essentially expanding on the optimizations added on: D120199
but applies the optimization to cases where the bit being changed /
tested is not am IMM but is a provable power of 2.
The only case currently added for cases like:
`__atomic_fetch_xor(p, 1 << c, __ATOMIC_RELAXED) & (1 << c)`
Which instead of using a `cmpxchg` loop can be done with `btcl; setcc; shl`.
There are still a variety of missed cases that could/should be
addressed in the future. This commit documents many of those
cases with Todos.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D140939