Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask).
From the perspective of the pipeline, `andn + and + ior` version take
2 cycles(AND and ANDN doesn't have dependence), but xor + and + xor
will take 3 cycles.
- xorl %edi, %esi
andl %edx, %esi
- movl %esi, %eax
- xorl %edi, %eax
+ andn %edi, %edx, %eax
+ orl %esi, %eax
gcc/ChangeLog:
PR target/94790
* config/i386/i386.md (*xor2andn): New define_insn_and_split.
gcc/testsuite/ChangeLog:
PR target/94790
* gcc.target/i386/pr94790-1.c: New test.
* gcc.target/i386/pr94790-2.c: Ditto.