[nvptx] Add bar.warp.sync
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx] Update
default ptx isa to 6.3".
The first divergent branch looks like:
...
{
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r59,%x,0;
}
@ %r59 bra $L15;
mov.u64 %r48,%ar0;
mov.u32 %r22,2;
ld.u64 %r53,[%r48];
mov.u32 %r55,%r22;
mov.u32 %r54,1;
$L15:
...
and when inspecting the generated SASS, the branch is not setup as a divergent
branch, but instead as a regular branch.
This causes us to execute a shfl.sync insn in divergent mode, which is likely
to cause trouble given a remark in the ptx isa version 6.3, which mentions
that for .target sm_6x or below, all threads must excute the same
shfl.sync instruction in convergence.
Fix this by placing a "bar.warp.sync 0xffffffff" at the desired convergence
point (in the example above, after $L15).
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_warpsync.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_WARPSYNC.
(define_insn "nvptx_warpsync"): New define_insn.