From: Alexey Bataev Date: Thu, 27 Jun 2019 18:33:09 +0000 (+0000) Subject: [OPENMP][NVPTX]Relax flush directive. X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=bb55ece2692e290daf930738dd636a1375e9f261;p=platform%2Fupstream%2Fllvm.git [OPENMP][NVPTX]Relax flush directive. Summary: According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied. According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality. __threadfence_system() also provides required functionality, but it also includes some extra functionality, like synchronization of page-locked host memory, synchronization for the host, etc. It is not required per the standard and we can use more relaxed version of memory fence operation. Reviewers: grokos, gtbercea, kkwli0 Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin Tags: #openmp Differential Revision: https://reviews.llvm.org/D62397 llvm-svn: 364572 --- diff --git a/openmp/libomptarget/deviceRTLs/nvptx/src/sync.cu b/openmp/libomptarget/deviceRTLs/nvptx/src/sync.cu index d81aa8f..688420e 100644 --- a/openmp/libomptarget/deviceRTLs/nvptx/src/sync.cu +++ b/openmp/libomptarget/deviceRTLs/nvptx/src/sync.cu @@ -130,7 +130,7 @@ EXTERN void __kmpc_end_single(kmp_Ident *loc, int32_t global_tid) { EXTERN void __kmpc_flush(kmp_Ident *loc) { PRINT0(LD_IO, "call kmpc_flush\n"); - __threadfence_system(); + __threadfence(); } //////////////////////////////////////////////////////////////////////////////// diff --git a/openmp/libomptarget/deviceRTLs/nvptx/test/parallel/flush.c b/openmp/libomptarget/deviceRTLs/nvptx/test/parallel/flush.c new file mode 100644 index 0000000..412538b --- /dev/null +++ b/openmp/libomptarget/deviceRTLs/nvptx/test/parallel/flush.c @@ -0,0 +1,35 @@ +// RUN: %compile-run-and-check + +#include +#include + +int main(int argc, char *argv[]) { + int data, out, flag = 0; +#pragma omp target parallel num_threads(64) map(tofrom \ + : out, flag) map(to \ + : data) + { + if (omp_get_thread_num() == 0) { + /* Write to the data buffer that will be read by thread */ + data = 42; +/* Flush data to thread 32 */ +#pragma omp flush(data) + /* Set flag to release thread 32 */ +#pragma omp atomic write + flag = 1; + } else if (omp_get_thread_num() == 32) { + /* Loop until we see the update to the flag */ + int val; + do { +#pragma omp atomic read + val = flag; + } while (val < 1); + out = data; +#pragma omp flush(out) + } + } + // CHECK: out=42. + /* Value of out will be 42 */ + printf("out=%d.\n", out); + return !(out == 42); +}