openacc: Remove unnecessary barriers (gimple worker partitioning/broadcast)
This is an optimisation for middle-end worker-partitioning support (used
to support multiple workers on AMD GCN). At present, barriers may be
emitted in cases where they aren't needed and cannot be optimised away.
This patch stops the extraneous barriers from being emitted in the
first place.
One exception to the above (where the barrier is still needed) is for
predicated blocks of code that perform a write to gang-private shared
memory from one worker. We must execute a barrier before other workers
read that shared memory location.
gcc/
* config/gcn/gcn.c (gimple.h): Include.
(gcn_fork_join): Emit barrier for worker-level joins.
* omp-oacc-neuter-broadcast.cc (find_local_vars_to_propagate): Add
writes_gang_private bitmap parameter. Set bit for blocks
containing gang-private variable writes.
(worker_single_simple): Don't emit barrier after predicated block.
(worker_single_copy): Don't emit barrier if we're not broadcasting
anything and the block contains no gang-private writes.
(neuter_worker_single): Don't predicate blocks that only contain
NOPs or internal marker functions. Pass has_gang_private_write
argument to worker_single_copy.
(oacc_do_neutering): Add writes_gang_private bitmap handling.