From: Joseph Huber Date: Tue, 25 Apr 2023 21:23:07 +0000 (-0500) Subject: [libc][Docs] Begin improving documentation for the GPU libc X-Git-Tag: upstream/17.0.6~10336 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=807f0584874d61b0eec5a3ed988402387560534c;p=platform%2Fupstream%2Fllvm.git [libc][Docs] Begin improving documentation for the GPU libc This patch updates some of the documentation for the GPU libc project. There is a lot of work still to be done, but this sets the general outline. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D149194 --- diff --git a/libc/docs/gpu/index.rst b/libc/docs/gpu/index.rst new file mode 100644 index 0000000..0ea54a7 --- /dev/null +++ b/libc/docs/gpu/index.rst @@ -0,0 +1,18 @@ +.. _libc_gpu: + +============= +libc for GPUs +============= + +.. note:: This feature is very experimental and may change in the future. + +The *GPU* support for LLVM's libc project aims to make a subset of the standard +C library available on GPU based accelerators. Navigate using the links below to +learn more about this project. + +.. toctree:: + + using + support + testing + rpc diff --git a/libc/docs/gpu/rpc.rst b/libc/docs/gpu/rpc.rst new file mode 100644 index 0000000..bdc2c4a --- /dev/null +++ b/libc/docs/gpu/rpc.rst @@ -0,0 +1,17 @@ +.. _libc_gpu_rpc: + +====================== +Remote Procedure Calls +====================== + +.. contents:: Table of Contents + :depth: 4 + :local: + +Remote Procedure Call Implementation +==================================== + +Certain features from the standard C library, such as allocation or printing, +require support from the operating system. We instead implement a remote +procedure call (RPC) interface to allow submitting work from the GPU to a host +server that forwards it to the host system. diff --git a/libc/docs/gpu/support.rst b/libc/docs/gpu/support.rst new file mode 100644 index 0000000..59fdb61 --- /dev/null +++ b/libc/docs/gpu/support.rst @@ -0,0 +1,88 @@ +.. _libc_gpu_support: + +=================== +Supported Functions +=================== + +.. include:: ../check.rst + +.. contents:: Table of Contents + :depth: 4 + :local: + +The following functions and headers are supported at least partially on the +device. Some functions are implemented fully on the GPU, while others require a +`remote procedure call `. + +ctype.h +------- + +============= ========= ============ +Function Name Available RPC Required +============= ========= ============ +isalnum |check| +isalpha |check| +isascii |check| +isblank |check| +iscntrl |check| +isdigit |check| +isgraph |check| +islower |check| +isprint |check| +ispunct |check| +isspace |check| +isupper |check| +isxdigit |check| +toascii |check| +tolower |check| +toupper |check| +============= ========= ============ + +string.h +-------- + +============= ========= ============ +Function Name Available RPC Required +============= ========= ============ +bcmp |check| +bzero |check| +memccpy |check| +memchr |check| +memcmp |check| +memcpy |check| +memmove |check| +mempcpy |check| +memrchr |check| +memset |check| +stpcpy |check| +stpncpy |check| +strcat |check| +strchr |check| +strcmp |check| +strcpy |check| +strcspn |check| +strlcat |check| +strlcpy |check| +strlen |check| +strncat |check| +strncmp |check| +strncpy |check| +strnlen |check| +strpbrk |check| +strrchr |check| +strspn |check| +strstr |check| +strtok |check| +strtok_r |check| +strdup +strndup +============= ========= ============ + +stdlib.h +-------- + +============= ========= ============ +Function Name Available RPC Required +============= ========= ============ +atoi |check| +============= ========= ============ diff --git a/libc/docs/gpu/testing.rst b/libc/docs/gpu/testing.rst new file mode 100644 index 0000000..09e875a --- /dev/null +++ b/libc/docs/gpu/testing.rst @@ -0,0 +1,32 @@ +.. _libc_gpu_testing: + + +============================ +Testing the GPU libc library +============================ + +.. contents:: Table of Contents + :depth: 4 + :local: + +Testing Infrastructure +====================== + +The testing support in LLVM's libc implementation for GPUs is designed to mimic +the standard unit tests as much as possible. We use the `remote procedure call +` support to provide the necessary utilities like printing from +the GPU. Execution is performed by emitting a ``_start`` kernel from the GPU +that is then called by an external loader utility. This is an example of how +this can be done manually: + +.. code-block:: sh + + $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto + $> ./amdhsa_loader --threads 1 --blocks 1 a.out + Test Passed! + +Unlike the exported ``libcgpu.a``, the testing architecture can only support a +single architecture at a time. This is either detected automatically, or set +manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful +in cases where the user does not build LLVM's libc on machine with the GPU to +use for testing. diff --git a/libc/docs/gpu/using.rst b/libc/docs/gpu/using.rst new file mode 100644 index 0000000..6808f05 --- /dev/null +++ b/libc/docs/gpu/using.rst @@ -0,0 +1,87 @@ +.. _libc_gpu_usage: + + +=================== +Using libc for GPUs +=================== + +.. contents:: Table of Contents + :depth: 4 + :local: + +Building the GPU library +======================== + +LLVM's libc GPU support *must* be built with an up-to-date ``clang`` compiler +due to heavy reliance on ``clang``'s GPU support. This can be done automatically +using the ``LLVM_ENABLE_RUNTIMES=libc`` option. To enable libc for the GPU, +enable the ``LIBC_GPU_BUILD`` option. By default, ``libcgpu.a`` will be built +using every supported GPU architecture. To restrict the number of architectures +build, either set ``LLVM_LIBC_GPU_ARCHITECTURES`` to the list of desired +architectures manually or use ``native`` to detect the GPUs on your system. A +typical ``cmake`` configuration will look like this: + +.. code-block:: sh + + $> cd llvm-project # The llvm-project checkout + $> mkdir build + $> cd build + $> cmake ../llvm -G Ninja \ + -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \ + -DLLVM_ENABLE_RUNTIMES="libc;openmp" \ + -DCMAKE_BUILD_TYPE= \ # Select build type + -DLIBC_GPU_BUILD=ON \ # Build in GPU mode + -DLLVM_LIBC_GPU_ARCHITECTURES=all \ # Build all supported architectures + -DCMAKE_INSTALL_PREFIX= \ # Where 'libcgpu.a' will live + $> ninja install + +Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our +toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built +using a compatible compiler and to support ``openmp`` offloading, we list them +in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the +newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation +directory in which to install the ``libcgpu.a`` library and headers along with +LLVM. The generated headers will be placed in ``include/gpu-none-llvm``. + +Usage +===== + +Once the ``libcgpu.a`` static archive has been built it can be linked directly +with offloading applications as a standard library. This process is described in +the `clang documentation `_. +This linking mode is used by the OpenMP toolchain, but is currently opt-in for +the CUDA and HIP toolchains through the ``--offload-new-driver``` and +``-fgpu-rdc`` flags. A typical usage will look this this: + +.. code-block:: sh + + $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu + +The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each +supported target device. The supported architectures can be seen using LLVM's +``llvm-objdump`` with the ``--offloading`` flag: + +.. code-block:: sh + + $> llvm-objdump --offloading libcgpu.a + libcgpu.a(strcmp.cpp.o): file format elf64-x86-64 + + OFFLOADING IMAGE [0]: + kind llvm ir + arch gfx90a + triple amdgcn-amd-amdhsa + producer none + +Because the device code is stored inside a fat binary, it can be difficult to +inspect the resulting code. This can be done using the following utilities: + +.. code-block:: sh + + $> llvm-ar x libcgpu.a strcmp.cpp.o + $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc + $> opt -S out.bc + ... + +Please note that this fat binary format is provided for compatibility with +existing offloading toolchains. The implementation in ``libc`` does not depend +on any existing offloading languages and is completely freestanding. diff --git a/libc/docs/gpu_mode.rst b/libc/docs/gpu_mode.rst deleted file mode 100644 index b71b6ee..0000000 --- a/libc/docs/gpu_mode.rst +++ /dev/null @@ -1,169 +0,0 @@ -.. _GPU_mode: - -============== -GPU Mode -============== - -.. include:: check.rst - -.. contents:: Table of Contents - :depth: 4 - :local: - -.. note:: This feature is very experimental and may change in the future. - -The *GPU* mode of LLVM's libc is an experimental mode used to support calling -libc routines during GPU execution. The goal of this project is to provide -access to the standard C library on systems running accelerators. To begin using -this library, build and install the ``libcgpu.a`` static archive following the -instructions in :ref:`building_gpu_mode` and link with your offloading -application. - -.. _building_gpu_mode: - -Building the GPU library -======================== - -LLVM's libc GPU support *must* be built using the same compiler as the final -application to ensure relative LLVM bitcode compatibility. This can be done -automatically using the ``LLVM_ENABLE_RUNTIMES=libc`` option. Furthermore, -building for the GPU is only supported in :ref:`fullbuild_mode`. To enable the -GPU build, set the target OS to ``gpu`` via ``LLVM_LIBC_TARGET_OS=gpu``. By -default, ``libcgpu.a`` will be built using every supported GPU architecture. To -restrict the number of architectures build, set ``LLVM_LIBC_GPU_ARCHITECTURES`` -to the list of desired architectures or use ``all``. A typical ``cmake`` -configuration will look like this: - -.. code-block:: sh - - $> cd llvm-project # The llvm-project checkout - $> mkdir build - $> cd build - $> cmake ../llvm -G Ninja \ - -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \ - -DLLVM_ENABLE_RUNTIMES="libc;openmp" \ - -DCMAKE_BUILD_TYPE= \ # Select build type - -DLLVM_LIBC_FULL_BUILD=ON \ # We need the full libc - -DLIBC_GPU_BUILD=ON \ # Build in GPU mode - -DLLVM_LIBC_GPU_ARCHITECTURES=all \ # Build all supported architectures - -DCMAKE_INSTALL_PREFIX= \ # Where 'libcgpu.a' will live - $> ninja install - -Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our -toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built -using a compatible compiler and to support ``openmp`` offloading, we list them -in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the -newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation -directory in which to install the ``libcgpu.a`` library along with LLVM. - -Usage -===== - -Once the ``libcgpu.a`` static archive has been built in -:ref:`building_gpu_mode`, it can be linked directly with offloading applications -as a standard library. This process is described in the `clang documentation -_`. This linking mode is used -by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains -using the ``--offload-new-driver``` and ``-fgpu-rdc`` flags. A typical usage -will look this this: - -.. code-block:: sh - - $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu - -The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each -supported target device. The supported architectures can be seen using LLVM's -objdump with the ``--offloading`` flag: - -.. code-block:: sh - - $> llvm-objdump --offloading libcgpu.a - libcgpu.a(strcmp.cpp.o): file format elf64-x86-64 - - OFFLOADING IMAGE [0]: - kind llvm ir - arch gfx90a - triple amdgcn-amd-amdhsa - producer - -Because the device code is stored inside a fat binary, it can be difficult to -inspect the resulting code. This can be done using the following utilities: - -.. code-block:: sh - - $> llvm-ar x libcgpu.a strcmp.cpp.o - $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc - $> opt -S out.bc - ... - -Supported Functions -=================== - -The following functions and headers are supported at least partially on the -device. Currently, only basic device functions that do not require an operating -system are supported on the device. Supporting functions like `malloc` using an -RPC mechanism is a work-in-progress. - -ctype.h -------- - -============= ========= -Function Name Available -============= ========= -isalnum |check| -isalpha |check| -isascii |check| -isblank |check| -iscntrl |check| -isdigit |check| -isgraph |check| -islower |check| -isprint |check| -ispunct |check| -isspace |check| -isupper |check| -isxdigit |check| -toascii |check| -tolower |check| -toupper |check| -============= ========= - -string.h --------- - -============= ========= -Function Name Available -============= ========= -bcmp |check| -bzero |check| -memccpy |check| -memchr |check| -memcmp |check| -memcpy |check| -memmove |check| -mempcpy |check| -memrchr |check| -memset |check| -stpcpy |check| -stpncpy |check| -strcat |check| -strchr |check| -strcmp |check| -strcpy |check| -strcspn |check| -strlcat |check| -strlcpy |check| -strlen |check| -strncat |check| -strncmp |check| -strncpy |check| -strnlen |check| -strpbrk |check| -strrchr |check| -strspn |check| -strstr |check| -strtok |check| -strtok_r |check| -strdup -strndup -============= ========= diff --git a/libc/docs/index.rst b/libc/docs/index.rst index 9042261..5e9a602 100644 --- a/libc/docs/index.rst +++ b/libc/docs/index.rst @@ -52,7 +52,7 @@ stages there is no ABI stability in any form. usage_modes overlay_mode fullbuild_mode - gpu_mode + gpu/index.rst .. toctree:: :hidden: