[PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.
- Skip generating profile data on `__global__` function in the host
compilation. It's a host-side stub function only and don't have
profile instrumentation generated on the real function body. The extra
profile data results in the malformed instrumentation profile data.
- Skip generating region mapping on functions in the wrong-side, i.e.,
+ For the device compilation, skip host-only functions; and,
+ For the host compilation, skip device-only functions (including
`__global__` functions.)
- As the device-side profiling is not ready yet, only host-side profile
code generation is checked.
Differential Revision: https://reviews.llvm.org/D85276