[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes
authorManish Gupta <manigupta@google.com>
Wed, 12 Oct 2022 05:17:32 +0000 (05:17 +0000)
committerManish Gupta <manigupta@google.com>
Thu, 20 Oct 2022 00:10:21 +0000 (17:10 -0700)
commit114ba722c1e58d23bafdf3654e4f8e537150c318
treeed2e26d5d12c7b97e159bcd1d6114449d1b834f4
parent97196a2d9282f4508dcf600d298047c910b2763c
[mlir][NVGPU] Handle native mma.sync and ldmatrix(x4) sizes

This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on
largest possible tiles for matrixB. It requires handling
`vector.extract_strided_slice` from vector to ngpu lowering.

Differential Revision: https://reviews.llvm.org/D135749
mlir/include/mlir/Dialect/NVGPU/Utils/MMAUtils.h
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
mlir/lib/Dialect/NVGPU/Utils/MMAUtils.cpp
mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir