nvptx: reimplement libgomp barriers [PR99555]
Instead of trying to have the GPU do CPU-with-OS-like things, this new barriers
implementation for NVPTX uses simplistic bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0
It is noted that: there might be a little bit of performance forfeited for
cases where earlier arriving threads could've been used to process tasks ahead
of other threads, but that has the requirement of implementing complex
futex-wait/wake like behavior, which is what we're try to avoid with this patch.
It is deemed that task processing is not what GPU target offloading is usually
used for.
Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in
the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"
4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
The main synchronization is done using a 'bar.red' instruction. This reduces
across all threads the condition (team->task_count != 0), to enable the task
processing down below if any thread created a task.
(this bar.red usage means that this patch is dependent on the prior NVPTX
bar.red GCC patch)
PR target/99555
libgomp/ChangeLog:
* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_barrier_t): Remove waiters, lock fields.
(gomp_barrier_init): Remove init of waiters, lock fields.
(gomp_team_barrier_wake): Remove prototype, add new static inline
function.