[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)
authorJohannes Doerfert <johannes@jdoerfert.de>
Sun, 16 Aug 2020 15:49:37 +0000 (10:49 -0500)
committerJohannes Doerfert <johannes@jdoerfert.de>
Sun, 16 Aug 2020 19:38:33 +0000 (14:38 -0500)
commitaa27cfc1e7d7456325e951a4ba3ced405027f7d0
treed83bd80ef78294c169876b31b1f323e6cca6da5c
parent95a25e4c3203f35e9f57f9fac620b4a21bffd6e1
[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)

Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038
openmp/libomptarget/plugins/cuda/src/rtl.cpp