[mlir][nvgpu] add simple pipelining for shared memory copies
authorAlex Zinenko <zinenko@google.com>
Thu, 13 Jul 2023 17:55:36 +0000 (17:55 +0000)
committerAlex Zinenko <zinenko@google.com>
Mon, 17 Jul 2023 14:29:12 +0000 (14:29 +0000)
commit371366ce27303e0b949aeb643b973a1a110da469
tree4049b84eeb45bda0b07fb4ca166e7c0b5915fccb
parenta23d6c760ccc97eb57c7395e106a895cfd4ca536
[mlir][nvgpu] add simple pipelining for shared memory copies

Add a simple transform operation to the NVGPU extension that performs
software pipelining of copies to shared memory. The functionality is
extremely minimalistic in this version and only supports copies from
global to shared memory inside an `scf.for` loop with either
`vector.transfer` or `nvgpu.device_async_copy` operations when
pipelining preconditions are already satisfied in the IR. This is the
minimally useful version that uses the more general loop pipeliner in an
NVGPU-specific way. Further extensions and orthogonalizations will be
necessary.

This required a change to the loop pipeliner itself to properly
propagate errors should the predicate generator fail.

This is loosely inspired from the vesion in IREE, but has less unsafe
assumptions and more principled way of communicating decisions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155223
mlir/include/mlir/Dialect/NVGPU/TransformOps/NVGPUTransformOps.h
mlir/include/mlir/Dialect/NVGPU/TransformOps/NVGPUTransformOps.td
mlir/include/mlir/Dialect/SCF/Transforms/Patterns.h
mlir/include/mlir/Dialect/SCF/Transforms/Transforms.h
mlir/lib/Dialect/NVGPU/TransformOps/CMakeLists.txt
mlir/lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp
mlir/lib/Dialect/SCF/Transforms/LoopPipelining.cpp
mlir/test/Dialect/NVGPU/transform-pipeline-shared.mlir [new file with mode: 0644]
utils/bazel/llvm-project-overlay/mlir/BUILD.bazel