Implement mem_fence on ptx
authorJeroen Ketema <j.ketema@xs4all.nl>
Mon, 9 Oct 2017 19:43:04 +0000 (19:43 +0000)
committerJeroen Ketema <j.ketema@xs4all.nl>
Mon, 9 Oct 2017 19:43:04 +0000 (19:43 +0000)
commit1364d268a418bd77863f6f35e3fb285376441ecd
treea97f9e1701261cb48112a7e0af1fdc2eb58bc563
parent492d7134f3bdd76415d8e7b20a4f1c9a42b85e44
Implement mem_fence on ptx

PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
libclc/ptx-nvidiacl/lib/SOURCES
libclc/ptx-nvidiacl/lib/mem_fence/fence.cl [new file with mode: 0644]