openmp/docs/SupportAndFAQ.rst

   1 Support, Getting Involved, and FAQ
   2 ==================================
   3
   4 Please do not hesitate to reach out to us via openmp-dev@lists.llvm.org or join
   5 one of our :ref:`regular calls <calls>`. Some common questions are answered in
   6 the :ref:`faq`.
   7
   8 .. _calls:
   9
  10 Calls
  11 -----
  12
  13 OpenMP in LLVM Technical Call
  14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  15
  16 -   Development updates on OpenMP (and OpenACC) in the LLVM Project, including Clang, optimization, and runtime work.
  17 -   Join `OpenMP in LLVM Technical Call <https://bluejeans.com/544112769//webrtc>`__.
  18 -   Time: Weekly call on every Wednesday 7:00 AM Pacific time.
  19 -   Meeting minutes are `here <https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit>`__.
  20 -   Status tracking `page <https://openmp.llvm.org/docs>`__.
  21
  22
  23 OpenMP in Flang Technical Call
  24 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  25 -   Development updates on OpenMP and OpenACC in the Flang Project.
  26 -   Join `OpenMP in Flang Technical Call <https://bit.ly/39eQW3o>`_
  27 -   Time: Weekly call on every Thursdays 8:00 AM Pacific time.
  28 -   Meeting minutes are `here <https://docs.google.com/document/d/1yA-MeJf6RYY-ZXpdol0t7YoDoqtwAyBhFLr5thu5pFI>`__.
  29 -   Status tracking `page <https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0>`__.
  30
  31
  32 .. _faq:
  33
  34 FAQ
  35 ---
  36
  37 .. note::
  38    The FAQ is a work in progress and most of the expected content is not
  39    yet available. While you can expect changes, we always welcome feedback and
  40    additions. Please contact, e.g., through ``openmp-dev@lists.llvm.org``.
  41
  42
  43 Q: How to contribute a patch to the webpage or any other part?
  44 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  45
  46 All patches go through the regular `LLVM review process
  47 <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
  48
  49
  50 .. _build_offload_capable_compiler:
  51
  52 Q: How to build an OpenMP GPU offload capable compiler?
  53 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  54 To build an *effective* OpenMP offload capable compiler, only one extra CMake
  55 option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
  56 information about building LLVM is available `here
  57 <https://llvm.org/docs/GettingStarted.html>`__.).  Make sure all backends that
  58 are targeted by OpenMP to be enabled. By default, Clang will be built with all
  59 backends enabled.  When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP
  60 should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by
  61 default.
  62
  63 For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
  64 For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
  65
  66 .. note::
  67   The compiler that generates the offload code should be the same (version) as
  68   the compiler that builds the OpenMP device runtimes. The OpenMP host runtime
  69   can be built by a different compiler.
  70
  71 .. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html
  72
  73 .. _build_nvidia_offload_capable_compiler:
  74
  75 Q: How to build an OpenMP NVidia offload capable compiler?
  76 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  77 The Cuda SDK is required on the machine that will execute the openmp application.
  78
  79 If your build machine is not the target machine or automatic detection of the
  80 available GPUs failed, you should also set:
  81
  82 - `CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX` where `XX` is the architecture of your GPU, e.g, 80.
  83 - `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
  84
  85
  86 .. _build_amdgpu_offload_capable_compiler:
  87
  88 Q: How to build an OpenMP AMDGPU offload capable compiler?
  89 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  90 A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is
  91 required to build the LLVM toolchain and to execute the openmp application.
  92 Either install ROCm somewhere that cmake's find_package can locate it, or
  93 build the required subcomponents ROCt and ROCr from source.
  94
  95 The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
  96 Roct is the userspace part of the linux driver. It calls into the driver which
  97 ships with the linux kernel. It is an implementation detail of Rocr from
  98 OpenMP's perspective. Rocr is an implementation of `HSA
  99 <http://www.hsafoundation.com>`_.
 100
 101 .. code-block:: text
 102
 103   SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
 104   BUILD_DIR=somewhere
 105   INSTALL_PREFIX=same-as-llvm-install
 106
 107   cd $SOURCE_DIR
 108   git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
 109     --single-branch
 110   git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
 111     --single-branch
 112
 113   cd $BUILD_DIR && mkdir roct && cd roct
 114   cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
 115     -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
 116   make && make install
 117
 118   cd $BUILD_DIR && mkdir rocr && cd rocr
 119   cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
 120     -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
 121     -DBUILD_SHARED_LIBS=ON
 122   make && make install
 123
 124 ``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
 125
 126 Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
 127 build a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
 128 run if it recognises a GPU on the local system. LLVM will also build a shared
 129 library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
 130
 131 With those libraries installed, then LLVM build and installed, try:
 132
 133 .. code-block:: shell
 134
 135     clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
 136
 137 Q: What are the known limitations of OpenMP AMDGPU offload?
 138 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 139 LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
 140
 141 There is no libc. That is, malloc and printf do not exist. Libm is implemented in terms
 142 of the rocm device library, which will be searched for if linking with '-lm'.
 143
 144 Some versions of the driver for the radeon vii (gfx906) will error unless the
 145 environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
 146
 147 It is a recent addition to LLVM and the implementation differs from that which
 148 has been shipping in ROCm and AOMP for some time. Early adopters will encounter
 149 bugs.
 150
 151 Q: What are the LLVM components used in offloading and how are they found?
 152 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 153 The libraries used by an executable compiled for target offloading are:
 154 - ``libomp.so`` (or similar), the host openmp runtime
 155 - ``libomptarget.so``, the target-agnostic target offloading openmp runtime
 156 - plugins loaded by libomptarget.so:
 157   - ``libomptarget.rtl.amdgpu.so``
 158   - ``libomptarget.rtl.cuda.so``
 159   - ``libomptarget.rtl.x86_64.so``
 160   - ``libomptarget.rtl.ve.so``
 161   - and others
 162 - dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu
 163
 164 The compiled executable is dynamically linked against a host runtime, e.g.
 165 ``libomp.so``, and against the target offloading runtime, ``libomptarget.so``. These
 166 are found like any other dynamic library, by setting rpath or runpath on the
 167 executable, by setting ``LD_LIBRARY_PATH``, or by adding them to the system search.
 168
 169 ``libomptarget.so`` has rpath or runpath (whichever the system default is) set to
 170 ``$ORIGIN``, and the plugins are located next to it, so it will find the plugins
 171 without any environment variables set. If ``LD_LIBRARY_PATH`` is set, whether it
 172 overrides which plugin is found depends on whether your system treats ``-Wl,-rpath``
 173 as RPATH or RUNPATH.
 174
 175 The plugins will try to find their dependencies in plugin-dependent fashion.
 176
 177 The cuda plugin is dynamically linked against libcuda if cmake found it at
 178 compiler build time. Otherwise it will attempt to dlopen ``libcuda.so``. It does
 179 not have rpath set.
 180
 181 The amdgpu plugin is linked against ROCr if cmake found it at compiler build
 182 time. Otherwise it will attempt to dlopen ``libhsa-runtime64.so``. It has rpath
 183 set to ``$ORIGIN``, so installing ``libhsa-runtime64.so`` in the same directory is a
 184 way to locate it without environment variables.
 185
 186 In addition to those, there is a compiler runtime library called deviceRTL.
 187 This is compiled from mostly common code into an architecture specific
 188 bitcode library, e.g. ``libomptarget-nvptx-sm_70.bc``.
 189
 190 Clang and the deviceRTL need to match closely as the interface between them
 191 changes frequently. Using both from the same monorepo checkout is strongly
 192 recommended.
 193
 194 Unlike the host side which lets environment variables select components, the
 195 deviceRTL that is located in the clang lib directory is preferred. Only if
 196 it is absent, the ``LIBRARY_PATH`` environment variable is searched to find a
 197 bitcode file with the right name. This can be overridden by passing a clang
 198 flag, ``--libomptarget-nvptx-bc-path`` or ``--libomptarget-amdgcn-bc-path``. That
 199 can specify a directory or an exact bitcode file to use.
 200
 201
 202 Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
 203 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 204 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
 205
 206 Q: Does OpenMP offloading support work in packages distributed as part of my OS?
 207 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 208 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
 209
 210
 211 .. _math_and_complex_in_target_regions:
 212
 213 Q: Does Clang support `<math.h>` and `<complex.h>` operations in OpenMP target on GPUs?
 214 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 215
 216 Yes, LLVM/Clang allows math functions and complex arithmetic inside of OpenMP
 217 target regions that are compiled for GPUs.
 218
 219 Clang provides a set of wrapper headers that are found first when `math.h` and
 220 `complex.h`, for C, `cmath` and `complex`, for C++, or similar headers are
 221 included by the application. These wrappers will eventually include the system
 222 version of the corresponding header file after setting up a target device
 223 specific environment. The fact that the system header is included is important
 224 because they differ based on the architecture and operating system and may
 225 contain preprocessor, variable, and function definitions that need to be
 226 available in the target region regardless of the targeted device architecture.
 227 However, various functions may require specialized device versions, e.g.,
 228 `sin`, and others are only available on certain devices, e.g., `__umul64hi`. To
 229 provide "native" support for math and complex on the respective architecture,
 230 Clang will wrap the "native" math functions, e.g., as provided by the device
 231 vendor, in an OpenMP begin/end declare variant. These functions will then be
 232 picked up instead of the host versions while host only variables and function
 233 definitions are still available. Complex arithmetic and functions are support
 234 through a similar mechanism. It is worth noting that this support requires
 235 `extensions to the OpenMP begin/end declare variant context selector
 236 <https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant>`__
 237 that are exposed through LLVM/Clang to the user as well.
 238
 239 Q: What is a way to debug errors from mapping memory to a target device?
 240 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 241
 242 An experimental way to debug these errors is to use :ref:`remote process
 243 offloading <remote_offloading_plugin>`.
 244 By using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is
 245 possible to explicitly perform memory transfers between processes on the host
 246 CPU and run sanitizers while doing so in order to catch these errors.
 247
 248 Q: Why does my application say "Named symbol not found" and abort when I run it?
 249 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 250
 251 This is most likely caused by trying to use OpenMP offloading with static
 252 libraries. Static libraries do not contain any device code, so when the runtime
 253 attempts to execute the target region it will not be found and you will get an
 254 an error like this.
 255
 256 .. code-block:: text
 257
 258    CUDA error: Loading '__omp_offloading_fd02_3231c15__Z3foov_l2' Failed
 259    CUDA error: named symbol not found
 260    Libomptarget error: Unable to generate entries table for device id 0.
 261
 262 Currently, the only solution is to change how the application is built and avoid
 263 the use of static libraries.
 264
 265 Q: Can I use dynamically linked libraries with OpenMP offloading?
 266 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 267
 268 Dynamically linked libraries can be only used if there is no device code split
 269 between the library and application. Anything declared on the device inside the
 270 shared library will not be visible to the application when it's linked.
 271
 272 Q: How to build an OpenMP offload capable compiler with an outdated host compiler?
 273 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 274
 275 Enabling the OpenMP runtime will perform a two-stage build for you.
 276 If your host compiler is different from your system-wide compiler, you may need
 277 to set the CMake variable `GCC_INSTALL_PREFIX` so clang will be able to find the
 278 correct GCC toolchain in the second stage of the build.
 279
 280 For example, if your system-wide GCC installation is too old to build LLVM and
 281 you would like to use a newer GCC, set the CMake variable `GCC_INSTALL_PREFIX`
 282 to inform clang of the GCC installation you would like to use in the second stage.
 283
 284 Q: How can I include OpenMP offloading support in my CMake project?
 285 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 286
 287 Currently, there is an experimental CMake find module for OpenMP target
 288 offloading provided by LLVM. It will attempt to find OpenMP target offloading
 289 support for your compiler. The flags necessary for OpenMP target offloading will
 290 be loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the
 291 ``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported
 292 devices are ``AMDGPU`` and ``NVPTX``.
 293
 294 To use this module, simply add the path to CMake's current module path and call
 295 ``find_package``. The module will be installed with your OpenMP installation by
 296 default. Including OpenMP offloading support in an application should now only
 297 require a few additions.
 298
 299 .. code-block:: cmake
 300
 301   cmake_minimum_required(VERSION 3.13.4)
 302   project(offloadTest VERSION 1.0 LANGUAGES CXX)
 303
 304   list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
 305
 306   find_package(OpenMPTarget REQUIRED NVPTX)
 307
 308   add_executable(offload)
 309   target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
 310   target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
 311
 312 Using this module requires at least CMake version 3.13.4. Supported languages
 313 are C and C++ with Fortran support planned in the future. Compiler support is
 314 best for Clang but this module should work for other compiler vendors such as
 315 IBM, GNU.