[mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations
authorEugene Zhulenev <ezhulenev@google.com>
Thu, 9 Dec 2021 11:21:22 +0000 (03:21 -0800)
committerEugene Zhulenev <ezhulenev@google.com>
Thu, 9 Dec 2021 14:50:50 +0000 (06:50 -0800)
commit49ce40e9ab255588bf5093f468b30cbb6b99895b
treef0c5fb0870d75912553e49afd8988e60f8084883
parent9f151b784be0d0b27ad29cb24629815701d60481
[mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations

Depends On D115263

By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D115436
mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp
mlir/test/Dialect/Async/async-parallel-for-compute-fn.mlir
mlir/test/Integration/Dialect/Async/CPU/test-async-parallel-for-2d.mlir