[CUDA] Implemented _[bi]mma* builtins.
authorArtem Belevich <tra@google.com>
Thu, 25 Apr 2019 22:28:09 +0000 (22:28 +0000)
committerArtem Belevich <tra@google.com>
Thu, 25 Apr 2019 22:28:09 +0000 (22:28 +0000)
commit5fe85a003f6b6ba3c2b83c319d1c160ca7af7c7c
tree50365a275ecd4099aa476aca85492dd47702fb30
parent16737538f4fc4757ae5226e95b177155ed8e13ad
[CUDA] Implemented _[bi]mma* builtins.

These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248
clang/include/clang/Basic/BuiltinsNVPTX.def
clang/lib/Basic/Targets/NVPTX.cpp
clang/lib/CodeGen/CGBuiltin.cpp
clang/lib/Driver/ToolChains/Cuda.cpp
clang/test/CodeGen/builtins-nvptx-mma.cu [new file with mode: 0644]
clang/test/CodeGen/builtins-nvptx-mma.py [new file with mode: 0644]
llvm/lib/Target/NVPTX/NVPTX.td