From 1364d268a418bd77863f6f35e3fb285376441ecd Mon Sep 17 00:00:00 2001 From: Jeroen Ketema Date: Mon, 9 Oct 2017 19:43:04 +0000 Subject: [PATCH] Implement mem_fence on ptx MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit PTX does not differentiate between read and write fences. Hence, these a lowered to a mem_fence call. The mem_fence function compiles to the “member.cta” instruction, which commits all outstanding reads and writes of a thread such that these become visible to all other threads in the same CTA (i.e., work-group). The instruction does not differentiate between global and local memory. Hence, the flags parameter is ignored, except for deciding whether a “member.cta” instruction should be issued at all. Reviewed-by: Jan Vesely llvm-svn: 315235 --- libclc/ptx-nvidiacl/lib/SOURCES | 1 + libclc/ptx-nvidiacl/lib/mem_fence/fence.cl | 15 +++++++++++++++ 2 files changed, 16 insertions(+) create mode 100644 libclc/ptx-nvidiacl/lib/mem_fence/fence.cl diff --git a/libclc/ptx-nvidiacl/lib/SOURCES b/libclc/ptx-nvidiacl/lib/SOURCES index ce26bcb..c92c2a6 100644 --- a/libclc/ptx-nvidiacl/lib/SOURCES +++ b/libclc/ptx-nvidiacl/lib/SOURCES @@ -1,3 +1,4 @@ +mem_fence/fence.cl synchronization/barrier.cl workitem/get_global_id.cl workitem/get_group_id.cl diff --git a/libclc/ptx-nvidiacl/lib/mem_fence/fence.cl b/libclc/ptx-nvidiacl/lib/mem_fence/fence.cl new file mode 100644 index 0000000..16b0391 --- /dev/null +++ b/libclc/ptx-nvidiacl/lib/mem_fence/fence.cl @@ -0,0 +1,15 @@ +#include + +_CLC_DEF void mem_fence(cl_mem_fence_flags flags) { + if (flags & (CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE)) + __nvvm_membar_cta(); +} + +// We do not have separate mechanism for read and write fences. +_CLC_DEF void read_mem_fence(cl_mem_fence_flags flags) { + mem_fence(flags); +} + +_CLC_DEF void write_mem_fence(cl_mem_fence_flags flags) { + mem_fence(flags); +} -- 2.7.4