[mlir:Async] Implement recursive async work splitting for scf.parallel operation...
authorEugene Zhulenev <ezhulenev@google.com>
Thu, 24 Jun 2021 12:27:42 +0000 (05:27 -0700)
committerEugene Zhulenev <ezhulenev@google.com>
Fri, 25 Jun 2021 17:34:39 +0000 (10:34 -0700)
commit86ad0af87054c3cccd68d32e103a6f1f6c6194c7
tree73da8cbbc8349658b3cec1d97240cafe5ad40ece
parentd43b23608ad664f02f56e965ca78916bde220950
[mlir:Async] Implement recursive async work splitting for scf.parallel operation (async-parallel-for pass)

Depends On D104780

Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.

Algorithm outline:
1. Collapse scf.parallel dimensions into a single dimension
2. Compute the block size for the parallel operations from the 1d problem size
3. Launch parallel tasks
4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
5. Each parallel task computes the original parallel operation body using scf.for loop nest

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D104850
mlir/include/mlir/Dialect/Async/Passes.h
mlir/include/mlir/Dialect/Async/Passes.td
mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp
mlir/test/Dialect/Async/async-parallel-for-async-dispatch.mlir [new file with mode: 0644]
mlir/test/Dialect/Async/async-parallel-for-seq-dispatch.mlir [moved from mlir/test/Dialect/Async/async-parallel-for.mlir with 58% similarity]
mlir/test/Integration/Dialect/Async/CPU/microbench-linalg-async-parallel-for.mlir
mlir/test/Integration/Dialect/Async/CPU/microbench-scf-async-parallel-for.mlir [new file with mode: 0644]
mlir/test/Integration/Dialect/Async/CPU/test-async-parallel-for-1d.mlir
mlir/test/Integration/Dialect/Async/CPU/test-async-parallel-for-2d.mlir