[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims`
authorGuray Ozen <guray.ozen@gmail.com>
Thu, 10 Nov 2022 16:55:49 +0000 (17:55 +0100)
committerGuray Ozen <guray.ozen@gmail.com>
Fri, 11 Nov 2022 07:44:57 +0000 (08:44 +0100)
commit6663f3470417523141ee87923840d17b35d6b4c7
tree370a4544c84505627cc38a4724bb7deb91d550de
parent99d3ead44cfb21dca24c63a0b0731eaad54b491e
[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims`

`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```

It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.

The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```

Reviewed By: ftynse, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D137413
29 files changed:
mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
mlir/include/mlir/Dialect/GPU/TransformOps/CMakeLists.txt
mlir/include/mlir/Dialect/GPU/TransformOps/GPUDeviceMappingAttr.td [new file with mode: 0644]
mlir/include/mlir/Dialect/GPU/TransformOps/GPUTransformOps.td
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
mlir/include/mlir/Dialect/SCF/IR/CMakeLists.txt
mlir/include/mlir/Dialect/SCF/IR/DeviceMappingInterface.h [new file with mode: 0644]
mlir/include/mlir/Dialect/SCF/IR/DeviceMappingInterface.td [new file with mode: 0644]
mlir/include/mlir/Dialect/SCF/IR/SCF.h
mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
mlir/lib/Dialect/GPU/TransformOps/CMakeLists.txt
mlir/lib/Dialect/GPU/TransformOps/GPUTransformOps.cpp
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
mlir/lib/Dialect/SCF/IR/CMakeLists.txt
mlir/lib/Dialect/SCF/IR/DeviceMappingInterface.cpp [new file with mode: 0644]
mlir/lib/Dialect/SCF/IR/SCF.cpp
mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp
mlir/test/Dialect/GPU/transform-gpu-failing.mlir
mlir/test/Dialect/GPU/transform-gpu.mlir
mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir
mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir
mlir/test/Dialect/SCF/one-shot-bufferize-tensor-copy-insertion.mlir
mlir/test/Dialect/SCF/ops.mlir
mlir/test/lib/Dialect/Tensor/TestTensorTransforms.cpp
utils/bazel/llvm-project-overlay/mlir/BUILD.bazel