[OPENMP][NVPTX]Relax flush directive.
Summary:
According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.
According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.
__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62397
llvm-svn: 364572