From 8c6d516286d5eb51899f380526c59e8b7af69f24 Mon Sep 17 00:00:00 2001 From: Tony Date: Thu, 17 Dec 2020 02:45:47 +0000 Subject: [PATCH] [NFC][AMDGPU] Reorganize description of scratch handling Differential Revision: https://reviews.llvm.org/D93440 --- llvm/docs/AMDGPUUsage.rst | 610 +++++++++++++++++++++------------------------- 1 file changed, 281 insertions(+), 329 deletions(-) diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index c8dda47..3dbdfa7 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -107,21 +107,21 @@ specific information. .. table:: AMDGPU Processors :name: amdgpu-processor-table - =========== =============== ============ ===== ================= =========== =============== ====================== - Processor Alternative Target dGPU/ Target Target OS Support Example - Processor Triple APU Features Properties *(see* Products - Architecture Supported `amdgpu-os`_ - *and - corresponding - runtime release - notes for - current - information and - level of - support)* - =========== =============== ============ ===== ================= =========== =============== ====================== + =========== =============== ============ ===== ================= =============== =============== ====================== + Processor Alternative Target dGPU/ Target Target OS Support Example + Processor Triple APU Features Properties *(see* Products + Architecture Supported `amdgpu-os`_ + *and + corresponding + runtime release + notes for + current + information and + level of + support)* + =========== =============== ============ ===== ================= =============== =============== ====================== **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_ - ------------------------------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------------------- ``r600`` ``r600`` dGPU - Does not support generic @@ -143,7 +143,7 @@ specific information. address space **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_ - ------------------------------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------------------- ``rv710`` ``r600`` dGPU - Does not support generic @@ -160,7 +160,7 @@ specific information. address space **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_ - ------------------------------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------------------- ``cedar`` ``r600`` dGPU - Does not support generic @@ -187,7 +187,7 @@ specific information. address space **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_ - ------------------------------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------------------- ``barts`` ``r600`` dGPU - Does not support generic @@ -209,208 +209,208 @@ specific information. address space **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_ - ------------------------------------------------------------------------------------------------------------------- - ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal* + ----------------------------------------------------------------------------------------------------------------------- + ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - *pal-amdpal* support generic address space - ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal* + ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - *pal-amdpal* - ``verde`` support generic address space - ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal* + ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - *pal-amdpal* - ``oland`` support generic address space **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_ - ------------------------------------------------------------------------------------------------------------------- - ``gfx700`` - ``kaveri`` ``amdgcn`` APU - *rocm-amdhsa* - A6-7000 - - *pal-amdhsa* - A6 Pro-7050B - - *pal-amdpal* - A8-7100 - - A8 Pro-7150B - - A10-7300 - - A10 Pro-7350B - - FX-7500 - - A8-7200P - - A10-7400P - - FX-7600P - ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro W8100 - - *pal-amdhsa* - FirePro W9100 - - *pal-amdpal* - FirePro S9150 - - FirePro S9170 - ``gfx702`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 290 - - *pal-amdhsa* - Radeon R9 290x - - *pal-amdpal* - Radeon R390 - - Radeon R390x - ``gfx703`` - ``kabini`` ``amdgcn`` APU - *pal-amdhsa* - E1-2100 - - ``mullins`` - *pal-amdpal* - E1-2200 - - E1-2500 - - E2-3000 - - E2-3800 - - A4-5000 - - A4-5100 - - A6-5200 - - A4 Pro-3340B - ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - *pal-amdhsa* - Radeon HD 7790 - - *pal-amdpal* - Radeon HD 8770 - - R7 260 - - R7 260X - ``gfx705`` ``amdgcn`` APU - *pal-amdhsa* *TBA* - - *pal-amdpal* - .. TODO:: - - Add product - names. + ----------------------------------------------------------------------------------------------------------------------- + ``gfx700`` - ``kaveri`` ``amdgcn`` APU - Offset - *rocm-amdhsa* - A6-7000 + flat - *pal-amdhsa* - A6 Pro-7050B + scratch - *pal-amdpal* - A8-7100 + - A8 Pro-7150B + - A10-7300 + - A10 Pro-7350B + - FX-7500 + - A8-7200P + - A10-7400P + - FX-7600P + ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro W8100 + flat - *pal-amdhsa* - FirePro W9100 + scratch - *pal-amdpal* - FirePro S9150 + - FirePro S9170 + ``gfx702`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 290 + flat - *pal-amdhsa* - Radeon R9 290x + scratch - *pal-amdpal* - Radeon R390 + - Radeon R390x + ``gfx703`` - ``kabini`` ``amdgcn`` APU - Offset - *pal-amdhsa* - E1-2100 + - ``mullins`` flat - *pal-amdpal* - E1-2200 + scratch - E1-2500 + - E2-3000 + - E2-3800 + - A4-5000 + - A4-5100 + - A6-5200 + - A4 Pro-3340B + ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Offset - *pal-amdhsa* - Radeon HD 7790 + flat - *pal-amdpal* - Radeon HD 8770 + scratch - R7 260 + - R7 260X + ``gfx705`` ``amdgcn`` APU - Offset - *pal-amdhsa* *TBA* + flat - *pal-amdpal* + scratch .. TODO:: + + Add product + names. **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_ - ------------------------------------------------------------------------------------------------------------------- - ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - A6-8500P - - *pal-amdhsa* - Pro A6-8500B - - *pal-amdpal* - A8-8600P - - Pro A8-8600B - - FX-8800P - - Pro A12-8800B - - A10-8700P - - Pro A10-8700B - - A10-8780P - - A10-9600P - - A10-9630P - - A12-9700P - - A12-9730P - - FX-9800P - - FX-9830P - - E2-9010 - - A6-9210 - - A9-9410 - ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 285 - - ``tonga`` - *pal-amdhsa* - Radeon R9 380 - - *pal-amdpal* - Radeon R9 385 - ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano - - *pal-amdhsa* - Radeon R9 Fury - - *pal-amdpal* - Radeon R9 FuryX - - Radeon Pro Duo - - FirePro S9300x2 - - Radeon Instinct MI8 - \ - ``polaris10`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 470 - - *pal-amdhsa* - Radeon RX 480 - - *pal-amdpal* - Radeon Instinct MI6 - \ - ``polaris11`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon RX 460 - - *pal-amdhsa* - - *pal-amdpal* - ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - *rocm-amdhsa* - FirePro S7150 - - *pal-amdhsa* - FirePro S7100 - - *pal-amdpal* - FirePro W7100 - - Mobile FirePro - M7170 - ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - *rocm-amdhsa* *TBA* - - *pal-amdhsa* - - *pal-amdpal* .. TODO:: - - Add product - names. + ----------------------------------------------------------------------------------------------------------------------- + ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* - A6-8500P + flat - *pal-amdhsa* - Pro A6-8500B + scratch - *pal-amdpal* - A8-8600P + - Pro A8-8600B + - FX-8800P + - Pro A12-8800B + - A10-8700P + - Pro A10-8700B + - A10-8780P + - A10-9600P + - A10-9630P + - A12-9700P + - A12-9730P + - FX-9800P + - FX-9830P + - E2-9010 + - A6-9210 + - A9-9410 + ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon R9 285 + - ``tonga`` flat - *pal-amdhsa* - Radeon R9 380 + scratch - *pal-amdpal* - Radeon R9 385 + ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - *rocm-amdhsa* - Radeon R9 Nano + - *pal-amdhsa* - Radeon R9 Fury + - *pal-amdpal* - Radeon R9 FuryX + - Radeon Pro Duo + - FirePro S9300x2 + - Radeon Instinct MI8 + \ - ``polaris10`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 470 + flat - *pal-amdhsa* - Radeon RX 480 + scratch - *pal-amdpal* - Radeon Instinct MI6 + \ - ``polaris11`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - Radeon RX 460 + flat - *pal-amdhsa* + scratch - *pal-amdpal* + ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - Offset - *rocm-amdhsa* - FirePro S7150 + flat - *pal-amdhsa* - FirePro S7100 + scratch - *pal-amdpal* - FirePro W7100 + - Mobile FirePro + M7170 + ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - Offset - *rocm-amdhsa* *TBA* + flat - *pal-amdhsa* + scratch - *pal-amdpal* .. TODO:: + + Add product + names. **GCN GFX9 (Vega)** [AMD-GCN-GFX9]_ - ------------------------------------------------------------------------------------------------------------------- - ``gfx900`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* - Radeon Vega - - *pal-amdhsa* Frontier Edition - - *pal-amdpal* - Radeon RX Vega 56 - - Radeon RX Vega 64 - - Radeon RX Vega 64 - Liquid - - Radeon Instinct MI25 - ``gfx902`` ``amdgcn`` APU - xnack - *rocm-amdhsa* - Ryzen 3 2200G - - *pal-amdhsa* - Ryzen 5 2400G - - *pal-amdpal* - ``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA* - - *pal-amdhsa* - - *pal-amdpal* .. TODO:: - - Add product - names. - - ``gfx906`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* - Radeon Instinct MI50 - - xnack - *pal-amdhsa* - Radeon Instinct MI60 - - *pal-amdpal* - Radeon VII - - Radeon Pro VII - ``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA* - - xnack - .. TODO:: - - Add product - names. - - ``gfx909`` ``amdgcn`` APU - xnack - *pal-amdpal* *TBA* - - .. TODO:: - - Add product - names. - - ``gfx90c`` ``amdgcn`` APU - xnack - *pal-amdpal* - Ryzen 7 4700G - - Ryzen 7 4700GE - - Ryzen 5 4600G - - Ryzen 5 4600GE - - Ryzen 3 4300G - - Ryzen 3 4300GE - - Ryzen Pro 4000G - - Ryzen 7 Pro 4700G - - Ryzen 7 Pro 4750GE - - Ryzen 5 Pro 4650G - - Ryzen 5 Pro 4650GE - - Ryzen 3 Pro 4350G - - Ryzen 3 Pro 4350GE + ----------------------------------------------------------------------------------------------------------------------- + ``gfx900`` ``amdgcn`` dGPU - xnack - Absolute - *rocm-amdhsa* - Radeon Vega + flat - *pal-amdhsa* Frontier Edition + scratch - *pal-amdpal* - Radeon RX Vega 56 + - Radeon RX Vega 64 + - Radeon RX Vega 64 + Liquid + - Radeon Instinct MI25 + ``gfx902`` ``amdgcn`` APU - xnack - Absolute - *rocm-amdhsa* - Ryzen 3 2200G + flat - *pal-amdhsa* - Ryzen 5 2400G + scratch - *pal-amdpal* + ``gfx904`` ``amdgcn`` dGPU - xnack - *rocm-amdhsa* *TBA* + - *pal-amdhsa* + - *pal-amdpal* .. TODO:: + + Add product + names. + + ``gfx906`` ``amdgcn`` dGPU - sramecc - Absolute - *rocm-amdhsa* - Radeon Instinct MI50 + - xnack flat - *pal-amdhsa* - Radeon Instinct MI60 + scratch - *pal-amdpal* - Radeon VII + - Radeon Pro VII + ``gfx908`` ``amdgcn`` dGPU - sramecc - *rocm-amdhsa* *TBA* + - xnack - Absolute + flat .. TODO:: + scratch + Add product + names. + + ``gfx909`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* *TBA* + flat + scratch .. TODO:: + + Add product + names. + + ``gfx90c`` ``amdgcn`` APU - xnack - Absolute - *pal-amdpal* - Ryzen 7 4700G + flat - Ryzen 7 4700GE + scratch - Ryzen 5 4600G + - Ryzen 5 4600GE + - Ryzen 3 4300G + - Ryzen 3 4300GE + - Ryzen Pro 4000G + - Ryzen 7 Pro 4700G + - Ryzen 7 Pro 4750GE + - Ryzen 5 Pro 4650G + - Ryzen 5 Pro 4650GE + - Ryzen 3 Pro 4350G + - Ryzen 3 Pro 4350GE **GCN GFX10 (RDNA 1)** [AMD-GCN-GFX10-RDNA1]_ - ------------------------------------------------------------------------------------------------------------------- - ``gfx1010`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5700 - - wavefrontsize64 - *pal-amdhsa* - Radeon RX 5700 XT - - xnack - *pal-amdpal* - Radeon Pro 5600 XT - - Radeon Pro 5600M - ``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA* - - wavefrontsize64 - *pal-amdhsa* - - xnack - *pal-amdpal* - .. TODO:: - - Add product - names. - - ``gfx1012`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* - Radeon RX 5500 - - wavefrontsize64 - *pal-amdhsa* - Radeon RX 5500 XT - - xnack - *pal-amdpal* + ----------------------------------------------------------------------------------------------------------------------- + ``gfx1010`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5700 + - wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5700 XT + - xnack scratch - *pal-amdpal* - Radeon Pro 5600 XT + - Radeon Pro 5600M + ``gfx1011`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA* + - wavefrontsize64 - Absolute - *pal-amdhsa* + - xnack flat - *pal-amdpal* + scratch .. TODO:: + + Add product + names. + + ``gfx1012`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* - Radeon RX 5500 + - wavefrontsize64 flat - *pal-amdhsa* - Radeon RX 5500 XT + - xnack scratch - *pal-amdpal* **GCN GFX10 (RDNA 2)** [AMD-GCN-GFX10-RDNA2]_ - ------------------------------------------------------------------------------------------------------------------- - ``gfx1030`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA* - - wavefrontsize64 - *pal-amdhsa* - - *pal-amdpal* .. TODO:: + ----------------------------------------------------------------------------------------------------------------------- + ``gfx1030`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA* + - wavefrontsize64 flat - *pal-amdhsa* + scratch - *pal-amdpal* .. TODO:: - Add product - names. + Add product + names. - ``gfx1031`` ``amdgcn`` dGPU - cumode - *rocm-amdhsa* *TBA* - - wavefrontsize64 - *pal-amdhsa* - - *pal-amdpal* .. TODO:: + ``gfx1031`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA* + - wavefrontsize64 flat - *pal-amdhsa* + scratch - *pal-amdpal* .. TODO:: - Add product - names. + Add product + names. - ``gfx1032`` ``amdgcn`` dGPU - cumode - *pal-amdhsa* *TBA* - - wavefrontsize64 - *pal-amdhsa* - - *pal-amdpal* .. TODO:: + ``gfx1032`` ``amdgcn`` dGPU - cumode - Absolute - *rocm-amdhsa* *TBA* + - wavefrontsize64 flat - *pal-amdhsa* + scratch - *pal-amdpal* .. TODO:: - Add product - names. + Add product + names. - ``gfx1033`` ``amdgcn`` APU - cumode - *pal-amdpal* *TBA* - - wavefrontsize64 - .. TODO:: + ``gfx1033`` ``amdgcn`` APU - cumode - Absolute - *pal-amdpal* *TBA* + - wavefrontsize64 flat + scratch .. TODO:: - Add product - names. + Add product + names. - =========== =============== ============ ===== ================= =========== =============== ====================== + =========== =============== ============ ===== ================= =============== =============== ====================== .. _amdgpu-target-features: @@ -4162,18 +4162,9 @@ SGPR register initial state is defined in (kernel descriptor enable of field) SGPRs ========== ========================== ====== ============================== - First Private Segment Buffer 4 This is 4 SGPRs: - (enable_sgpr_private + First Private Segment Buffer 4 See + (enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`. _segment_buffer) - V# that can be used, - together with Scratch - Wavefront Offset as an - offset, to access the - private memory space using a - segment address. - - CP uses the value provided - by the runtime. then Dispatch Ptr 2 64-bit address of AQL dispatch (enable_sgpr_dispatch_ptr) packet for kernel dispatch actually executing. @@ -4193,87 +4184,8 @@ SGPR register initial state is defined in then Dispatch Id 2 64-bit Dispatch ID of the (enable_sgpr_dispatch_id) dispatch packet being executed. - then Flat Scratch Init 2 This is 2 SGPRs: - (enable_sgpr_flat_scratch - _init) GFX6 - Not supported. - GFX7-GFX8 - The first SGPR is a 32-bit - byte offset from - ``SH_HIDDEN_PRIVATE_BASE_VIMID`` - to per SPI base of memory - for scratch for the queue - executing the kernel - dispatch. CP obtains this - from the runtime. (The - Scratch Segment Buffer base - address is - ``SH_HIDDEN_PRIVATE_BASE_VIMID`` - plus this offset.) The value - of Scratch Wavefront Offset must - be added to this offset by - the kernel machine code, - right shifted by 8, and - moved to the FLAT_SCRATCH_HI - SGPR register. - FLAT_SCRATCH_HI corresponds - to SGPRn-4 on GFX7, and - SGPRn-6 on GFX8 (where SGPRn - is the highest numbered SGPR - allocated to the wavefront). - FLAT_SCRATCH_HI is - multiplied by 256 (as it is - in units of 256 bytes) and - added to - ``SH_HIDDEN_PRIVATE_BASE_VIMID`` - to calculate the per wavefront - FLAT SCRATCH BASE in flat - memory instructions that - access the scratch - aperture. - - The second SGPR is 32-bit - byte size of a single - work-item's scratch memory - usage. CP obtains this from - the runtime, and it is - always a multiple of DWORD. - CP checks that the value in - the kernel dispatch packet - Private Segment Byte Size is - not larger and requests the - runtime to increase the - queue's scratch size if - necessary. The kernel code - must move it to - FLAT_SCRATCH_LO which is - SGPRn-3 on GFX7 and SGPRn-5 - on GFX8. FLAT_SCRATCH_LO is - used as the FLAT SCRATCH - SIZE in flat memory - instructions. Having CP load - it once avoids loading it at - the beginning of every - wavefront. - GFX9-GFX10 - This is the - 64-bit base address of the - per SPI scratch backing - memory managed by SPI for - the queue executing the - kernel dispatch. CP obtains - this from the runtime (and - divides it if there are - multiple Shader Arrays each - with its own SPI). The value - of Scratch Wavefront Offset must - be added by the kernel - machine code and the result - moved to the FLAT_SCRATCH - SGPR which is SGPRn-6 and - SGPRn-5. It is used as the - FLAT SCRATCH BASE in flat - memory instructions. + then Flat Scratch Init 2 See + :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`. then Private Segment Size 1 The 32-bit byte size of a (enable_sgpr_private single work-item's @@ -4338,19 +4250,10 @@ SGPR register initial state is defined in then Work-Group Info 1 {first_wavefront, 14'b0000, (enable_sgpr_workgroup ordered_append_term[10:0], _info) threadgroup_size_in_wavefronts[5:0]} - then Scratch Wavefront Offset 1 This is 1 SGPR: - (enable_sgpr_private - _segment_wavefront_offset) - 32-bit byte offset from base - of scratch base of queue - executing the kernel - dispatch. Must be used as an - offset with Private segment - address when using Scratch - Segment Buffer. It must be - used to set up FLAT SCRATCH - for flat addressing (see - :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). + then Scratch Wavefront Offset 1 See + (enable_sgpr_private :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`. + _segment_wavefront_offset) and + :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`. ========== ========================== ====== ============================== The order of the VGPR registers is defined, but the compiler can specify which @@ -4390,12 +4293,11 @@ The setting of registers is done by GPU CP/ADC/SPI hardware as follows: combination including none. 3. Scratch Wavefront Offset is set by SPI in a per wavefront basis which is why its value cannot be included with the flat scratch init value which is per - queue. + queue (see :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). 4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y) or (X, Y, Z). - -See :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch` for Flat Scratch register -pair initialization. +5. Flat Scratch register pair initialization is described in + :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`. The global segment can be accessed either using buffer instructions (GFX6 which has V# 64-bit address support), flat instructions (GFX7-GFX10), or global @@ -4474,48 +4376,98 @@ pointer are replaced with immediate ``0`` offsets. Flat Scratch ++++++++++++ -GFX6 - Flat scratch is not supported. +There are different methods used for initializing flat scratch: + +* If the *Target Properties* column of :ref:`amdgpu-processor-table` + specifies *Does not support generic address space*: + + Flat scratch is not supported and there is no flat scratch register pair. + +* If the *Target Properties* column of :ref:`amdgpu-processor-table` + specifies *Offset flat scratch*: + + If the kernel or any function it calls may use flat operations to access + scratch memory, the prolog code must set up the FLAT_SCRATCH register pair + (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI). Initialization uses Flat Scratch Init and + Scratch Wavefront Offset SGPR registers (see + :ref:`amdgpu-amdhsa-initial-kernel-execution-state`): + + 1. The low word of Flat Scratch Init is the 32-bit byte offset from + ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory + being managed by SPI for the queue executing the kernel dispatch. This is + the same value used in the Scratch Segment Buffer V# base address. + + CP obtains this from the runtime. (The Scratch Segment Buffer base address + is ``SH_HIDDEN_PRIVATE_BASE_VIMID`` plus this offset.) + + The prolog must add the value of Scratch Wavefront Offset to get the + wavefront's byte scratch backing memory offset from + ``SH_HIDDEN_PRIVATE_BASE_VIMID``. + + The Scratch Wavefront Offset must also be used as an offset with Private + segment address when using the Scratch Segment Buffer. + + Since FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right + shifted by 8 before moving into FLAT_SCRATCH_HI. + + FLAT_SCRATCH_HI corresponds to SGPRn-4 on GFX7, and SGPRn-6 on GFX8 (where + SGPRn is the highest numbered SGPR allocated to the wavefront). + FLAT_SCRATCH_HI is multiplied by 256 (as it is in units of 256 bytes) and + added to ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to calculate the per wavefront + FLAT SCRATCH BASE in flat memory instructions that access the scratch + aperture. + 2. The second word of Flat Scratch Init is 32-bit byte size of a single + work-items scratch memory usage. + + CP obtains this from the runtime, and it is always a multiple of DWORD. CP + checks that the value in the kernel dispatch packet Private Segment Byte + Size is not larger and requests the runtime to increase the queue's scratch + size if necessary. + + CP directly loads from the kernel dispatch packet Private Segment Byte Size + field and rounds up to a multiple of DWORD. Having CP load it once avoids + loading it at the beginning of every wavefront. + + The kernel prolog code must move it to FLAT_SCRATCH_LO which is SGPRn-3 on + GFX7 and SGPRn-5 on GFX8. FLAT_SCRATCH_LO is used as the FLAT SCRATCH SIZE + in flat memory instructions. + +* If the *Target Properties* column of :ref:`amdgpu-processor-table` + specifies *Absolute flat scratch*: -GFX7-GFX10 If the kernel or any function it calls may use flat operations to access scratch memory, the prolog code must set up the FLAT_SCRATCH register pair (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wavefront Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`): - GFX7-GFX8 - - 1. The low word of Flat Scratch Init is 32-bit byte offset from - ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory - being managed by SPI for the queue executing the kernel dispatch. This is - the same value used in the Scratch Segment Buffer V# base address. The - prolog must add the value of Scratch Wavefront Offset to get the - wavefront's byte scratch backing memory offset from - ``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since FLAT_SCRATCH_LO is in units of 256 - bytes, the offset must be right shifted by 8 before moving into - FLAT_SCRATCH_LO. - 2. The second word of Flat Scratch Init is 32-bit byte size of a single - work-items scratch memory usage. This is directly loaded from the kernel - dispatch packet Private Segment Byte Size and rounded up to a multiple of - DWORD. Having CP load it once avoids loading it at the beginning of every - wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT - SCRATCH SIZE. - - GFX9-GFX10 - The Flat Scratch Init is the 64-bit address of the base of scratch backing - memory being managed by SPI for the queue executing the kernel dispatch. The - prolog must add the value of the wave's Scratch Wavefront Offset and moved - as a 64-bit value to the FLAT_SCRATCH pair for use as the flat scratch base - in flat memory instructions. + The Flat Scratch Init is the 64-bit address of the base of scratch backing + memory being managed by SPI for the queue executing the kernel dispatch. + + CP obtains this from the runtime. + + The kernel prolog must add the value of the wave's Scratch Wavefront Offset + and move the result as a 64-bit value to the FLAT_SCRATCH SGPR register pair + which is SGPRn-6 and SGPRn-5. It is used as the FLAT SCRATCH BASE in flat + memory instructions. + + The Scratch Wavefront Offset must also be used as an offset with Private + segment address when using the Scratch Segment Buffer (see + :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`). .. _amdgpu-amdhsa-kernel-prolog-private-segment-buffer: Private Segment Buffer ++++++++++++++++++++++ -A set of four SGPRs beginning at a four-aligned SGPR index are always selected -to serve as the scratch V# for the kernel as follows: +Private Segment Buffer SGPR register is used to initilize 4 SGPRs +that are used as a V# to access scratch. CP uses the value provided by the +runtime. It is used, together with Scratch Wavefront Offset as an offset, to +access the private memory space using a segment address. See +:ref:`amdgpu-amdhsa-initial-kernel-execution-state`. + +The scratch V# is a four-aligned SGPR and always selected for the kernel as +follows: - If it is known during instruction selection that there is stack usage, SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if -- 2.7.4