review.tizen.org Git - platform/upstream/llvm.git/commit

author	Christopher Bate <cbate@nvidia.com>
	Tue, 17 May 2022 23:54:29 +0000 (17:54 -0600)
committer	Christopher Bate <cbate@nvidia.com>
	Fri, 20 May 2022 15:42:55 +0000 (09:42 -0600)
commit	1ca772ed951e6412ef006459b56ae9a21691a97c
tree	75266e3df8ab3745fab697b6143cf3cbcae07bfd	tree \| snapshot
parent	480dcdc8975d895ebb1f35ccf324900c816a0d6d	commit \| diff

[MLIR][GPU] Add NvGpu mma.sync path to the VectorToGPU pass

This changes adds the option to lower to NvGpu dialect ops during the
VectorToGPU convsersion pass. Because this transformation reuses
existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is
not created. The option `use-nvgpu` is added to the VectorToGPU pass.
When this is true, the pass will attempt to convert slices rooted at
`vector.contract` operations into `nvgpu.mma.sync` ops, and
`vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or
one or more `vector.load` operations. The specific data loaded will
depend on the thread id within a subgroup (warp). These index
calculations depend on data type and shape of the MMA op
according to the downstream PTX specification. The code for supporting
these details is separated into `NvGpuSupport.cpp|h`.

Differential Revision: https://reviews.llvm.org/D122940

mlir/include/mlir/Conversion/Passes.td		diff \| blob \| history
mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h		diff \| blob \| history
mlir/lib/Conversion/PassDetail.h		diff \| blob \| history
mlir/lib/Conversion/VectorToGPU/CMakeLists.txt		diff \| blob \| history
mlir/lib/Conversion/VectorToGPU/NvGpuSupport.cpp	[new file with mode: 0644]	blob
mlir/lib/Conversion/VectorToGPU/NvGpuSupport.h	[new file with mode: 0644]	blob
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp		diff \| blob \| history
mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir	[new file with mode: 0644]	blob