i386: Relax cmpxchg instruction under -mrelax-cmpxchg-loop [PR103069]
For cmpxchg, it is commonly used in spin loop, and several user code
such as pthread directly takes cmpxchg as loop condition, which cause
huge cache bouncing.
This patch extends previous implementation to relax all cmpxchg
instruction under -mrelax-cmpxchg-loop with an extra atomic load,
compare and emulate the failed cmpxchg behavior.
For original spin loop which looks like
loop: mov %eax,%r8d
or $1,%r8d
lock cmpxchg %r8d,(%rdi)
jne loop
It will now truns to
loop: mov %eax,%r8d
or $1,%r8d
mov (%r8),%rsi <--- load lock first
cmp %rsi,%rax <--- compare with expected input
jne .L2 <--- lock ne expected
lock cmpxchg %r8d,(%rdi)
jne loop
L2: mov %rsi,%rax <--- perform the behavior of failed cmpxchg
jne loop
under -mrelax-cmpxchg-loop.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.cc (ix86_expand_atomic_fetch_op_loop):
Split atomic fetch and loop part.
(ix86_expand_cmpxchg_loop): New expander for cmpxchg loop.
* config/i386/i386-protos.h (ix86_expand_cmpxchg_loop): New
prototype.
* config/i386/sync.md (atomic_compare_and_swap<mode>): Call new
expander under TARGET_RELAX_CMPXCHG_LOOP.
(atomic_compare_and_swap<mode>): Likewise for doubleword modes.
gcc/testsuite/ChangeLog:
PR target/103069
* gcc.target/i386/pr103069-2.c: Adjust result check.
* gcc.target/i386/pr103069-3.c: New test.
* gcc.target/i386/pr103069-4.c: Likewise.