review.tizen.org Git - platform/upstream/llvm.git/commit

author	Artem Belevich <tra@google.com>
	Thu, 25 Apr 2019 22:28:09 +0000 (22:28 +0000)
committer	Artem Belevich <tra@google.com>
	Thu, 25 Apr 2019 22:28:09 +0000 (22:28 +0000)
commit	5fe85a003f6b6ba3c2b83c319d1c160ca7af7c7c
tree	50365a275ecd4099aa476aca85492dd47702fb30	tree \| snapshot
parent	16737538f4fc4757ae5226e95b177155ed8e13ad	commit \| diff

[CUDA] Implemented _[bi]mma* builtins.

These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248

clang/include/clang/Basic/BuiltinsNVPTX.def		diff \| blob \| history
clang/lib/Basic/Targets/NVPTX.cpp		diff \| blob \| history
clang/lib/CodeGen/CGBuiltin.cpp		diff \| blob \| history
clang/lib/Driver/ToolChains/Cuda.cpp		diff \| blob \| history
clang/test/CodeGen/builtins-nvptx-mma.cu	[new file with mode: 0644]	blob
clang/test/CodeGen/builtins-nvptx-mma.py	[new file with mode: 0644]	blob
llvm/lib/Target/NVPTX/NVPTX.td		diff \| blob \| history