From fc37f717770acdfe5504bb9b969a01bb16a187f9 Mon Sep 17 00:00:00 2001 From: Yuan Yao Date: Wed, 28 Jun 2023 13:50:25 -0700 Subject: [PATCH] [mlir][NVGPU]: Fix op description of nvgpu.device_async_wait. According to the NVIDIA documentation on `cp.async.wait_group` (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-wait-group-cp-async-wait-all), the `numGroups` attribute in `nvgpu.device_async_wait` should give an upper bound of pending async group count (instead of a lower bound) when the executing thread can be unblocked. Reviewed By: christopherbate Differential Revision: https://reviews.llvm.org/D154046 --- mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td b/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td index e595e9d..41571fc 100644 --- a/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td +++ b/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td @@ -336,8 +336,11 @@ def NVGPU_DeviceAsyncWaitOp : NVGPU_Op<"device_async_wait", []> { The `nvgpu.device_async_wait` op will block the execution thread until the group associated with the source token is fully completed. - The optional `$numGroup` attribute gives a lower bound of the number of - groups uncompleted when the wait can unblock the thread. + The optional `$numGroups` attribute gives an upper bound of the number of + groups uncompleted when the wait can unblock the thread. For example, if + 16 async groups are pushe and `$numGroups` is set to 12, then the thread + will unblock when 12 groups or fewer are in flight (4 groups have + completed). Example: -- 2.7.4