platform/kernel/linux-starfive.git
16 months agodrm/amdkfd: Flush TLB after unmapping for GFX v9.4.3
Philip Yang [Thu, 9 Feb 2023 23:23:16 +0000 (18:23 -0500)]
drm/amdkfd: Flush TLB after unmapping for GFX v9.4.3

kfd_flush_tlb_after_unmap should return true for GFX v9.4.3, to do TLB
heavyweight flush after unmapping from GPU to guarantee that the GPU
will not access pages after they have been unmapped. This also helps
improve the mapping to GPU performance.

Without this, KFD accidently flush TLB after mapping to GPU because the
vm update sequence number is increased by previous unmapping.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add fallback path for discovery info
Lijo Lazar [Mon, 30 Jan 2023 04:18:39 +0000 (09:48 +0530)]
drm/amdgpu: Add fallback path for discovery info

If SOC doesn't expose dedicated vram, discovery region may be
available through system memory. Rename the existing interface to
generic read_binary_from_mem and add a fallback path to read from system
memory.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Read discovery info from system memory
Lijo Lazar [Mon, 30 Jan 2023 04:08:09 +0000 (09:38 +0530)]
drm/amdgpu: Read discovery info from system memory

On certain ASICs, discovery info is available at reserved region in system
memory. The location is available through ACPI interface. Add API to read
discovery info from there.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add API to get tmr info from acpi
Lijo Lazar [Fri, 27 Jan 2023 13:10:14 +0000 (18:40 +0530)]
drm/amdgpu: Add API to get tmr info from acpi

In certain configs, TMR information is available from ACPI. Add API to
fetch the information.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add parsing of acpi xcc objects
Lijo Lazar [Fri, 27 Jan 2023 12:48:17 +0000 (18:18 +0530)]
drm/amdgpu: Add parsing of acpi xcc objects

Add parsing of ACPI xcc objects and fill in relevant info from them by
invoking the DSM methods.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdkfd: Enable SVM on Native mode
Mukul Joshi [Tue, 31 Jan 2023 16:23:50 +0000 (11:23 -0500)]
drm/amdkfd: Enable SVM on Native mode

This patch enables SVM capability on GFX9.4.3 when
run in Native mode. It also sets best_prefetch and
best_restore locations to CPU as there is no VRAM.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add FGCG for GFX v9.4.3
Lijo Lazar [Thu, 2 Feb 2023 09:43:12 +0000 (15:13 +0530)]
drm/amdgpu: Add FGCG for GFX v9.4.3

It's not fine grain, behaves similar to MGCG.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Use transient mode during xcp switch
Lijo Lazar [Fri, 20 Jan 2023 10:23:47 +0000 (15:53 +0530)]
drm/amdgpu: Use transient mode during xcp switch

During partition switch, keep the state as transient mode. Fetch the
latest state if switch fails.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add flags for partition mode query
Lijo Lazar [Mon, 16 Jan 2023 05:25:38 +0000 (10:55 +0530)]
drm/amdgpu: Add flags for partition mode query

It's not required to take lock on all cases while querying partition
mode. Querying partition mode during KFD init process doesn't need to
take a lock. Init process after a switch will already be happening under
lock. Control the behaviour by adding flags to xcp_query_partition_mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: fix wrong smu socclk value
Yang Wang [Thu, 27 Apr 2023 02:36:51 +0000 (10:36 +0800)]
drm/amd/pm: fix wrong smu socclk value

fix typo about smu socclk value.

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add mode-2 reset in SMU v13.0.6
Lijo Lazar [Thu, 9 Mar 2023 07:34:56 +0000 (13:04 +0530)]
drm/amdgpu: Add mode-2 reset in SMU v13.0.6

Modifications to mode-2 reset flow for SMU v13.0.6 ASICs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Notify PMFW about driver unload cases
Lijo Lazar [Fri, 10 Mar 2023 06:03:37 +0000 (11:33 +0530)]
drm/amd/pm: Notify PMFW about driver unload cases

On SMU v13.0.6 APUs, FW will need to take some actions if driver is going
to halt RLC. Notify PMFW that driver is not going to manage device so
that FW takes care of the required actions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Update PMFW headers for version 85.54
Lijo Lazar [Fri, 10 Mar 2023 12:41:25 +0000 (18:11 +0530)]
drm/amd/pm: Update PMFW headers for version 85.54

It adds message support for FW notification on driver unload.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Expose mem temperature for GC version 9.4.3
Asad Kamal [Wed, 8 Mar 2023 14:30:58 +0000 (22:30 +0800)]
drm/amd/pm: Expose mem temperature for GC version 9.4.3

Add mem temperature as part of hw mon attributes for GC version 9.4.3

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Update hw mon attributes for GC version 9.4.3
Asad Kamal [Fri, 3 Mar 2023 04:20:21 +0000 (12:20 +0800)]
drm/amd/pm: Update hw mon attributes for GC version 9.4.3

Update hw mon attributes for GC Version 9.4.3 to valid ones
on APU and Non APU systems

v2: Group checks along existing one
Added power limit & mclock for gc version 9.4.3

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Initialize power limit for SMU v13.0.6
Lijo Lazar [Mon, 27 Feb 2023 11:21:16 +0000 (16:51 +0530)]
drm/amd/pm: Initialize power limit for SMU v13.0.6

PMFW will initialize the power limit values even if PPT throttler
feature is disabled. Fetch the limit value from FW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Keep interface version in PMFW header
Lijo Lazar [Tue, 21 Feb 2023 09:17:51 +0000 (14:47 +0530)]
drm/amd/pm: Keep interface version in PMFW header

Use the interface version directly from PMFW interface header file rather
than keeping another definition in common smu13 file.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Add ih for SMU v13.0.6 thermal throttling
Asad kamal [Wed, 15 Feb 2023 07:53:15 +0000 (15:53 +0800)]
drm/amd/pm: Add ih for SMU v13.0.6 thermal throttling

Add interrupt handler for thermal throttler events from
PMFW on SMUv13.0.6

Signed-off-by: Asad kamal <asad.kamal@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Update pmfw header files for SMU v13.0.6
Asad kamal [Mon, 13 Feb 2023 11:52:56 +0000 (19:52 +0800)]
drm/amd/pm: Update pmfw header files for SMU v13.0.6

Update driver interface for SMU v13.0.6 to be
compatible with PMFW v85.48 version

Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Update gfx clock frequency for SMU v13.0.6
Asad kamal [Wed, 8 Feb 2023 15:04:25 +0000 (23:04 +0800)]
drm/amd/pm: Update gfx clock frequency for SMU v13.0.6

Update gfx clock frequency from metric table for SMU v13.0.6

Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd/pm: Update pmfw header files for SMU v13.0.6
Asad kamal [Wed, 8 Feb 2023 12:19:13 +0000 (20:19 +0800)]
drm/amd/pm: Update pmfw header files for SMU v13.0.6

Update driver metrics table for SMU v13.0.6 to be
compatible with PMFW v85.47 version

Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: fix sdma instance
Stanley.Yang [Wed, 22 Mar 2023 03:16:53 +0000 (11:16 +0800)]
drm/amdgpu: fix sdma instance

It should change logical instance to device instance
to query ras info

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: change the print level to warn for ip block disabled
Le Ma [Thu, 16 Mar 2023 09:42:49 +0000 (17:42 +0800)]
drm/amdgpu: change the print level to warn for ip block disabled

Avoid to mislead users as it's not a real error.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Increase Max GPU instance to 64
Mukul Joshi [Fri, 5 May 2023 15:54:38 +0000 (11:54 -0400)]
drm/amdgpu: Increase Max GPU instance to 64

Increase Max GPU instances to 64 to handle multi-socket
system with GFX 9.4.3 asic.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: increase AMDGPU_MAX_RINGS
Le Ma [Thu, 16 Mar 2023 03:08:06 +0000 (11:08 +0800)]
drm/amdgpu: increase AMDGPU_MAX_RINGS

On newer GPUs, the number of kernel rings are increased.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Create VRAM BOs on GTT for GFXIP9.4.3
Rajneesh Bhardwaj [Sat, 28 Jan 2023 02:48:06 +0000 (21:48 -0500)]
drm/amdgpu: Create VRAM BOs on GTT for GFXIP9.4.3

On GFXIP9.4.3 APP APU where there is no dedicated VRAM domain handle
VRAM BO allocation requests on CPU domain and validate them on GTT.

Support for handling multi-socket and multi-numa partitions within a
socket will be added by future patches, this enables 1P NPS1 asic
bringup configuration.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Implement new dummy vram manager
Rajneesh Bhardwaj [Sat, 28 Jan 2023 02:46:59 +0000 (21:46 -0500)]
drm/amdgpu: Implement new dummy vram manager

This adds dummy vram manager to support ASICs that do not have a
dedicated or carvedout vram domain.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3
Rajneesh Bhardwaj [Sat, 28 Jan 2023 02:57:00 +0000 (21:57 -0500)]
drm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3

[For 1P NPS1 mode driver bringup]

Changes required to initialize the amdgpu driver with frontdoor firmware
loading and discovery=2 with the native mode SBIOS that enables CPU GPU
unified interleaved memory.

sudo modprobe amdgpu discovery=2

Once PSP TMR region is reported via the ACPI interface, the dependency
on the ip_discovery.bin will be removed.

Choice of where to allocate driver table is given to each IP version. In
general, both GTT and VRAM domains will be considered. If one of the
tables has a strict restriction for VRAM domain, then only VRAM domain
is considered.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
(lijo: Modified the handling for SMU Tables)
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Enable CG for IH v4.4.2
Asad kamal [Tue, 7 Feb 2023 12:55:24 +0000 (20:55 +0800)]
drm/amdgpu: Enable CG for IH v4.4.2

Enable clock gating on IH v4.4.2 versions.

Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Enable persistent edc harvesting in APP APU
Hawking Zhang [Sun, 29 Jan 2023 14:48:15 +0000 (22:48 +0800)]
drm/amdgpu: Enable persistent edc harvesting in APP APU

Persistent edc harvesting is supported in APP APU

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Initialize mmhub v1_8 ras function
Hawking Zhang [Sun, 22 Jan 2023 15:26:40 +0000 (23:26 +0800)]
drm/amdgpu: Initialize mmhub v1_8 ras function

Initialize mmhub v1_8 ras function.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add reset_ras_error_status for mmhub v1_8
Hawking Zhang [Sun, 22 Jan 2023 15:20:09 +0000 (23:20 +0800)]
drm/amdgpu: Add reset_ras_error_status for mmhub v1_8

Add reset_ras_error_status callback for mmhub
v1_8. It will be used to reset mmhub error status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add query_ras_error_status for mmhub v1_8
Hawking Zhang [Sun, 22 Jan 2023 15:36:25 +0000 (23:36 +0800)]
drm/amdgpu: Add query_ras_error_status for mmhub v1_8

Add query_ras_error_status callback for mmhub
v1_8. It will be used to log mmhub error status.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add reset_ras_error_count for mmhub v1_8
Hawking Zhang [Sun, 22 Jan 2023 14:22:06 +0000 (22:22 +0800)]
drm/amdgpu: Add reset_ras_error_count for mmhub v1_8

Add reset_ras_error_count callback for mmhub
v1_8. It will be used to reset mmhub ras error
count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add query_ras_error_count for mmhub v1_8
Hawking Zhang [Thu, 2 Feb 2023 13:00:39 +0000 (21:00 +0800)]
drm/amdgpu: Add query_ras_error_count for mmhub v1_8

Add query_ras_error_count callback for mmhub v1_8.
It will be used to query and log mmhub error count
and memory block.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add mmhub v1_8_0 ras err status registers
Hawking Zhang [Wed, 28 Dec 2022 10:18:38 +0000 (18:18 +0800)]
drm/amdgpu: Add mmhub v1_8_0 ras err status registers

add new ras error status registers introduced in
mmhub v1_8_0 to log mmea and mm_cane ras err, including
MMEAx_UE|CE_ERR_STATUS_LO|HI
MM_CANE_UE|CE_ERR_STATUS_LO|HI

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Initialize sdma v4_4_2 ras function
Hawking Zhang [Sun, 22 Jan 2023 15:29:28 +0000 (23:29 +0800)]
drm/amdgpu: Initialize sdma v4_4_2 ras function

Initialize sdma v4_4_2 ras function and interrupt
handler.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add reset_ras_error_count for sdma v4_4_2
Hawking Zhang [Sun, 22 Jan 2023 04:19:57 +0000 (12:19 +0800)]
drm/amdgpu: Add reset_ras_error_count for sdma v4_4_2

Add reset_ras_error_count callback for sdma
v4_4_2. It will be used to reset sdma ras error
count.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add query_ras_error_count for sdma v4_4_2
Hawking Zhang [Sun, 5 Feb 2023 14:54:50 +0000 (22:54 +0800)]
drm/amdgpu: Add query_ras_error_count for sdma v4_4_2

Add query_ras_error_count callback for sdma
v4_4_2. It will be used to query and log sdma
uncorrectable error count and memory block.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add sdma v4_4_2 ras registers
Hawking Zhang [Fri, 23 Dec 2022 07:54:43 +0000 (15:54 +0800)]
drm/amdgpu: Add sdma v4_4_2 ras registers

SDMA_UE_ERR_STATUS_HI|LO are introduced in v4_4_2
to replace SDMA_EDC_COUNTER/COUNTER2 registers to
log SDMA RAS errors

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add common helper to reset ras error
Hawking Zhang [Fri, 3 Feb 2023 08:10:37 +0000 (16:10 +0800)]
drm/amdgpu: Add common helper to reset ras error

Add common helper to reset ras error status. It
applies to IP blocks that follow the new ras error
logging register design, and need to write 0 to
reset the error status. For IP blocks that don't
support the new design, please still implement ip
specific helper.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add common helper to query ras error (v2)
Hawking Zhang [Thu, 2 Feb 2023 12:54:08 +0000 (20:54 +0800)]
drm/amdgpu: Add common helper to query ras error (v2)

Add common helper to query ras error status and
log error information, including memory block id
and erorr count. The helpers are applicable to IP
blocks that follow the new ras error logging design.
For IP blocks that don't support the new design,
please still implement ip specific helper to query
ras error.

v2: optimize struct amdgpu_ras_err_status_reg_entry
and the implementaion in helper (Lijo/Tao)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Enable MGCG on SDMAv4.4.2
Lijo Lazar [Fri, 3 Feb 2023 07:47:51 +0000 (13:17 +0530)]
drm/amdgpu: Enable MGCG on SDMAv4.4.2

Enable clock gating on SDMAv4.4.2 versions. Leave memory light sleep to
default.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: enable context empty interrupt on sdma v4.4.2
Le Ma [Fri, 3 Feb 2023 06:38:33 +0000 (14:38 +0800)]
drm/amdgpu: enable context empty interrupt on sdma v4.4.2

With SDMA_CTNL.CTXEMPTY_INT_ENABLE set, the F32 clock can be gated when
SDMA finishes all job and goes to idle.

And no specific interrupt handling is required in driver.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: add vcn_4_0_3 codec query
Sonny Jiang [Tue, 31 Jan 2023 21:44:28 +0000 (16:44 -0500)]
drm/amdgpu: add vcn_4_0_3 codec query

Add support for vcn_4_0_3 video codec query

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdkfd: bind cpu and hiveless gpu to a hive if xgmi connected
Jonathan Kim [Thu, 2 Feb 2023 16:10:08 +0000 (11:10 -0500)]
drm/amdkfd: bind cpu and hiveless gpu to a hive if xgmi connected

If a CPU and GPU are xGMI connected but the GPU is hiveless with
respect to other GPUs, create a new CPU-GPU hive using the GPU's PCI
device location ID as the new hive ID to maintain fine grain memory
access usage.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdkfd: Cleanup KFD nodes creation
Philip Yang [Tue, 24 Jan 2023 15:10:14 +0000 (10:10 -0500)]
drm/amdkfd: Cleanup KFD nodes creation

kfd node allocation outside kfd->num_nodes loop is not needed and causes
memory leak because kfd->num_nodes is at least equal to 1.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/ttm: add NUMA node id to the pool
Rajneesh Bhardwaj [Thu, 13 Oct 2022 01:58:29 +0000 (21:58 -0400)]
drm/ttm: add NUMA node id to the pool

This allows backing ttm_tt structure with pages from different NUMA
pools.

Tested-by: Graham Sider <graham.sider@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Fix mqd init on GFX v9.4.3
Lijo Lazar [Fri, 20 Jan 2023 07:12:00 +0000 (12:42 +0530)]
drm/amdgpu: Fix mqd init on GFX v9.4.3

For MQD init, an XCC's queue is selected with GRBM select. However, for
initialization of MQD, values read from logical XCC0 registers are used.
This results in garbage values being read from XCC0 whose queue is not
selected. Change to read from the right XCC for MQD initialization.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amd: fix compiler error to support older compilers
Harish Kasiviswanathan [Sat, 21 Jan 2023 20:47:11 +0000 (15:47 -0500)]
drm/amd: fix compiler error to support older compilers

‘for’ loop initial declarations are only allowed in C99 or C11 mode

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Enable CGCG/LS for GC 9.4.3
Lijo Lazar [Thu, 19 Jan 2023 09:30:45 +0000 (15:00 +0530)]
drm/amdgpu: Enable CGCG/LS for GC 9.4.3

Enable coarse grain clockgating/light sleep for GC v9.4.3. Remove
programming that is not meant for GC 9.4.3.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Use unique doorbell range per xcc
Lijo Lazar [Thu, 19 Jan 2023 09:17:22 +0000 (14:47 +0530)]
drm/amdgpu: Use unique doorbell range per xcc

Program different ranges in each XCC with MEC_DOORBELL_RANGE_LOWER/HIGHER.
Keeping the same range causes CPF in other XCCs also to be busy when an IB
packet is submitted to KCQ. Only the XCC which processes the packet
comes back to idle afterwards and this causes other CPs not be idle.
This in turn affects clockgating behavior as RLC doesn't get idle
interrupt.

LOWER/HIGHER covers only KIQ/KCQs which are per XCC queues. Assigning
different ranges doesn't seem to have any side effect as user queue ranges
are outside of this range. User queue tests - PM4 through KFD and AQL
through rocr - have the same results after this change.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Keep SDMAv4.4.2 active during reset
Lijo Lazar [Tue, 17 Jan 2023 11:24:49 +0000 (16:54 +0530)]
drm/amdgpu: Keep SDMAv4.4.2 active during reset

During ASIC wide reset, SDMA shouldn't be clockgated and be ready to
accept freeze requests from PMFW. For that, don't stop SDMA engine
during reset and keep the clocks active.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdkfd: Report XGMI IOLINKs for GFXIP9.4.3
Rajneesh Bhardwaj [Thu, 5 Jan 2023 16:39:34 +0000 (11:39 -0500)]
drm/amdkfd: Report XGMI IOLINKs for GFXIP9.4.3

GFXIP 9.4.3 could be in APU or carveout mode but we cannot use the
xgmi.connected_to_cpu flag to identify the iolinks type. Use appropriate
APU or Carveout mode based condition to report xgmi connection in kfd
topology.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: add num_xcps return
James Zhu [Tue, 10 Jan 2023 14:05:35 +0000 (09:05 -0500)]
drm/amdgpu: add num_xcps return

Add num_xcps return.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: increase AMDGPU_MAX_HWIP_RINGS
James Zhu [Tue, 10 Jan 2023 14:01:33 +0000 (09:01 -0500)]
drm/amdgpu: increase AMDGPU_MAX_HWIP_RINGS

[WA] Increase AMDGPU_MAX_HWIP_RINGS to 64 to support more compute
ring resource. Later need redesign with queue/prirority/scheduler
factors to reduce AMDGPU_MAX_HWIP_RINGS.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: vcn_v4_0_3 load vcn fw once for all AIDs
James Zhu [Tue, 20 Dec 2022 01:11:11 +0000 (20:11 -0500)]
drm/amdgpu: vcn_v4_0_3 load vcn fw once for all AIDs

Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Populate VCN/JPEG harvest information
Lijo Lazar [Tue, 10 Jan 2023 04:22:53 +0000 (09:52 +0530)]
drm/amdgpu: Populate VCN/JPEG harvest information

Certain instances of VCN/JPEG IPs may not be usable. Fetch the information
from harvest table.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Correct dGPU MTYPE settings for gfx943
Graham Sider [Thu, 5 Jan 2023 15:58:07 +0000 (10:58 -0500)]
drm/amdgpu: Correct dGPU MTYPE settings for gfx943

Revert temporary dGPU VRAM MTYPE setting and align with expected
coherency protocol.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Remove SMU powergate message call for SDMA
Asad kamal [Tue, 3 Jan 2023 05:14:58 +0000 (13:14 +0800)]
drm/amdgpu: Remove SMU powergate message call for SDMA

SDMA v4.4.2 doesn't need explicit power gating control through PMFW

Signed-off-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: enable vcn/jpeg on vcn_v4_0_3
James Zhu [Sun, 18 Dec 2022 00:44:15 +0000 (19:44 -0500)]
drm/amdgpu: enable vcn/jpeg on vcn_v4_0_3

Enable vcn/jpeg on vcn_v4_0_3.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: enable indirect_sram mode on vcn_v4_0_3
James Zhu [Mon, 12 Dec 2022 18:14:05 +0000 (13:14 -0500)]
drm/amdgpu: enable indirect_sram mode on vcn_v4_0_3

Enable indirect_sram mode on vcn_v4_0_3.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: add unified queue support on vcn_v4_0_3
James Zhu [Sat, 17 Dec 2022 15:51:04 +0000 (10:51 -0500)]
drm/amdgpu: add unified queue support on vcn_v4_0_3

Add unified queue support on vcn_v4_0_3.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: add fwlog support on vcn_v4_0_3
James Zhu [Mon, 12 Dec 2022 17:31:23 +0000 (12:31 -0500)]
drm/amdgpu: add fwlog support on vcn_v4_0_3

Add fwlog support on vcn_v4_0_3.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: increase MAX setting to hold more jpeg instances
James Zhu [Mon, 12 Dec 2022 17:29:04 +0000 (12:29 -0500)]
drm/amdgpu: increase MAX setting to hold more jpeg instances

vcn_v4_0_3 increased jpeg instances,
need increasing MAX resources setting accordlingly.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Use discovery to get XCC/SDMA mask
Lijo Lazar [Mon, 28 Nov 2022 06:32:14 +0000 (12:02 +0530)]
drm/amdgpu: Use discovery to get XCC/SDMA mask

Get information about active XCC and SDMAs from discovery table.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Make VRAM discovery read optional
Lijo Lazar [Thu, 1 Dec 2022 11:57:47 +0000 (17:27 +0530)]
drm/amdgpu: Make VRAM discovery read optional

When overridden with module param, directly read discovery info
from discovery binary instead of reading from VRAM.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Allocate GART table in RAM for AMD APU
Felix Kuehling [Tue, 29 Nov 2022 17:45:26 +0000 (12:45 -0500)]
drm/amdgpu: Allocate GART table in RAM for AMD APU

Some AMD APUs may not have a dedicated VRAM. On such platforms the GART
table should be allocated on the system memory. When real vram size is
zero, place the GART table in system memory and create an SG BO to make
it GPU accessible.

v2: fix includes

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
(rajneesh: removed set_memory_wc workaround)
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add FGCG logic for GFX v9.4.3
Lijo Lazar [Tue, 20 Dec 2022 08:51:57 +0000 (14:21 +0530)]
drm/amdgpu: Add FGCG logic for GFX v9.4.3

Add logic for fine grain clock gating logic for GFX v9.4.3. The feature
will be controlled using CG flags. Also, make a change so that RLC safe
mode entry/exit is done only once during CG update sequence.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Make UTCL2 snoop CPU caches
Rajneesh Bhardwaj [Tue, 20 Dec 2022 20:37:57 +0000 (15:37 -0500)]
drm/amdgpu: Make UTCL2 snoop CPU caches

On AMD APP APUs, to make UTCL2 snoop CPU caches, its not sufficient to
rely on xgmi connected flag so add the logic to use is_app_apu to
program the PDE_REQUEST_PHYSICAL bit correctly for gfxhub and mmhub
both.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agoamd/amdgpu: Set MTYPE_UC for access over PCIe
Amber Lin [Mon, 28 Nov 2022 16:26:02 +0000 (11:26 -0500)]
amd/amdgpu: Set MTYPE_UC for access over PCIe

For GFX v9_4_3, set MTYPE_UC for memory access over PCIe.

v4 - add missing indentation pointed out by Felix and add his
reviewed-by tag.
v3 - add missing logic for the svm path.
v2 - add amdgpu_xgmi_same_hive to separate access over xgmi from pcie

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Fix GFX v9.4.3 EOP buffer allocation
Lijo Lazar [Mon, 19 Dec 2022 12:09:42 +0000 (17:39 +0530)]
drm/amdgpu: Fix GFX v9.4.3 EOP buffer allocation

Each compute cluster gets 8 compute queues in GFX v9.4.3. Fix the EOP
buffer allocation so that compute queue on every XCC gets a unique
address.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-and-Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Fix GFX 9.4.3 dma address capability
Lijo Lazar [Thu, 15 Dec 2022 07:43:29 +0000 (13:13 +0530)]
drm/amdgpu: Fix GFX 9.4.3 dma address capability

ASICs with GFX 9.4.3 support 48-bit addressing.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Fix semaphore release
Lijo Lazar [Wed, 14 Dec 2022 04:58:50 +0000 (10:28 +0530)]
drm/amdgpu: Fix semaphore release

Use the right register for semaphore release during invalidation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdkfd: Setup current_logical_xcc_id in MQD
Mukul Joshi [Fri, 9 Dec 2022 14:03:01 +0000 (09:03 -0500)]
drm/amdkfd: Setup current_logical_xcc_id in MQD

Setup rolling current_logical_xcc_id in MQD for GFX9.4.3
to ensure each queue starts at a different place and prevent
hotspotting issues. Also, remove updating current_logical_xcc_id
during queue update.

Suggested-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Remove unnecessary return value check
Lijo Lazar [Thu, 1 Dec 2022 11:52:01 +0000 (17:22 +0530)]
drm/amdgpu: Remove unnecessary return value check

There is no need to check return value, as the function internally
used - amdgpu_discovery_read_binary_from_vram() - returns void.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: correct the vmhub index when page fault occurs
Le Ma [Fri, 9 Dec 2022 11:44:05 +0000 (19:44 +0800)]
drm/amdgpu: correct the vmhub index when page fault occurs

The AMDGPU_GFXHUB was bind to each xcc in the logical order.
Thus convert the node_id to logical xcc_id to index the
correct AMDGPU_GFXHUB. And "node_id / 4" can get the correct
AMDGPU_MMHUB0 index.

Signed-off-by: Le Ma <le.ma@amd.com>
Tested-by: Asad kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdkfd: Update packet manager for GFX9.4.3
Mukul Joshi [Thu, 8 Dec 2022 17:08:17 +0000 (12:08 -0500)]
drm/amdkfd: Update packet manager for GFX9.4.3

In GFX 9.4.3, there can be more than 8 SDMA engines.
As a result, extended_engine_sel and engine_sel fields
in MAP_QUEUES packet need to be updated to allow correct
mapping of SDMA queues to these SDMA engines.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: set MTYPE in PTE for GFXIP 9.4.3
Rajneesh Bhardwaj [Wed, 7 Dec 2022 05:29:40 +0000 (00:29 -0500)]
drm/amdgpu: set MTYPE in PTE for GFXIP 9.4.3

Apply the GFXIP 9.4.3 specific snoop and mtype settings for various
scenarios such as APU, APU in Carveout mode and dGPU mode.

Note: This is expected to change due to:
1 - NPS > 1 support in future
2 - Hardware bugs found during initial asic bringup.

Cc: Graham Sider <graham.sider@amd.com>
Cc: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Use mask for active clusters
Lijo Lazar [Tue, 29 Nov 2022 08:30:37 +0000 (14:00 +0530)]
drm/amdgpu: Use mask for active clusters

Use a mask of available active clusters instead of using only the number
of active clusters.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Derive active clusters from SDMA
Lijo Lazar [Mon, 28 Nov 2022 05:47:15 +0000 (11:17 +0530)]
drm/amdgpu: Derive active clusters from SDMA

SDMA instances per active cluster and SDMA instance mask are used
to find the number of active clusters.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Move generic logic to soc config
Lijo Lazar [Mon, 28 Nov 2022 04:27:51 +0000 (09:57 +0530)]
drm/amdgpu: Move generic logic to soc config

Move soc specific configuration details to aqua vanjaram specific file.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Fix the KCQ hang when binding back
Shiwu Zhang [Fri, 18 Nov 2022 06:21:15 +0000 (14:21 +0800)]
drm/amdgpu: Fix the KCQ hang when binding back

Just like the KIQ, KCQ need to clear the doorbell related regs as well
to avoid hangs when to load driver again after unloading.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Skip TMR allocation if not required
Lijo Lazar [Thu, 24 Nov 2022 08:53:58 +0000 (14:23 +0530)]
drm/amdgpu: Skip TMR allocation if not required

On ASICs with PSPv13.0.6, TMR is reserved at boot time. There is no need
to allocate TMR region by driver. However, it's still required to send
SETUP_TMR command to PSP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add XCP IP callback funcs for each IP
Lijo Lazar [Fri, 23 Sep 2022 11:45:15 +0000 (17:15 +0530)]
drm/amdgpu: Add XCP IP callback funcs for each IP

Initialize with the IP specific functions needed for GFXHUB, GFX and
SDMA.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add XCP functions for GFX v9.4.3
Lijo Lazar [Fri, 23 Sep 2022 11:18:43 +0000 (16:48 +0530)]
drm/amdgpu: Add XCP functions for GFX v9.4.3

Add functions to suspend/resume GFX instances belonging to an XCP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add SDMA v4.4.2 XCP funcs
Lijo Lazar [Fri, 23 Sep 2022 10:10:15 +0000 (15:40 +0530)]
drm/amdgpu: Add SDMA v4.4.2 XCP funcs

Add functions required to suspend/resume instances of SDMA which
are part of an XCP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add GFXHUB v1.2 XCP funcs
Lijo Lazar [Fri, 23 Sep 2022 09:50:08 +0000 (15:20 +0530)]
drm/amdgpu: Add GFXHUB v1.2 XCP funcs

Add functions required for suspend/resume of GFXHUB instances which are
part of an XCP.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Switch to SOC partition funcs
Lijo Lazar [Wed, 16 Nov 2022 11:45:47 +0000 (17:15 +0530)]
drm/amdgpu: Switch to SOC partition funcs

For GFXv9.4.3, use SOC level partition switch implementation rather than
keeping them at GFX IP level. Change the exisiting implementation in
GFX IP for keeping partition mode and restrict it to only GFX related
switch.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add soc config init for GC9.4.3 ASICs
Lijo Lazar [Fri, 23 Sep 2022 09:13:17 +0000 (14:43 +0530)]
drm/amdgpu: Add soc config init for GC9.4.3 ASICs

Add function to initialize soc configuration information for GC 9.4.3
ASICs. Use it to map IPs and other SOC related information once IP
configuration information is available through discovery.

For GC9.4.3 compute partition related callbacks are initialized as part
of configuration init.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add SOC partition funcs for GC v9.4.3
Lijo Lazar [Mon, 19 Sep 2022 12:04:02 +0000 (17:34 +0530)]
drm/amdgpu: Add SOC partition funcs for GC v9.4.3

Switching the partition mode configuration of ASIC is SOC
level function rather than something at GFX core level. Add
partition mode switch functions as SOC specific callbacks.
Implement the XCP manager callbacks needed for partition
switch for GC 9.4.3 based ASICs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add initial version of XCP routines
Lijo Lazar [Fri, 16 Sep 2022 07:13:35 +0000 (12:43 +0530)]
drm/amdgpu: Add initial version of XCP routines

Within a device, an accelerator core partition can be constituted with
different IP instances. These partitions are spatial in nature. Number
of partitions which can exist at the same time depends on the 'partition
mode'. Add a manager entity which is responsible for switching between
different partition modes and maintaining partitions. It is also
responsible for suspend/resume of different partitions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add sdma instance specific functions
Lijo Lazar [Wed, 14 Sep 2022 07:18:08 +0000 (12:48 +0530)]
drm/amdgpu: Add sdma instance specific functions

SDMA 4.4.2 supports multiple instances. Add functions to support
handling of each SDMA instance separately.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add xcc specific functions for gfxhub
Lijo Lazar [Wed, 14 Sep 2022 06:46:48 +0000 (12:16 +0530)]
drm/amdgpu: Add xcc specific functions for gfxhub

GFXHUB 1.2 supports multiple XCC instances. Add XCC specific functions
to handle XCC instances separately.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Add xcc specific functions
Lijo Lazar [Wed, 16 Nov 2022 05:17:18 +0000 (10:47 +0530)]
drm/amdgpu: Add xcc specific functions

Add more XCC specific functions and use them from IP block functions.
RLC, CP functions are further split to have xcc specific versions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Rename xcc specific functions
Lijo Lazar [Wed, 20 Jul 2022 08:15:30 +0000 (13:45 +0530)]
drm/amdgpu: Rename xcc specific functions

Add 'xcc' prefix to xcc specific functions to distinguish from IP block
functions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: Check APU supports true APP mode
Rajneesh Bhardwaj [Wed, 9 Nov 2022 04:04:30 +0000 (23:04 -0500)]
drm/amdgpu: Check APU supports true APP mode

On GPXIP 9.4.3 APU, in no carveout mode there is no real vram heap and
could be emulated by the driver over the interleaved NUMA system memory
and the APU could also  be in the carveout mode during early development
stage or otherwise for debugging purpose so introduce a new member in
amdgpu_gmc to figure out whether the APU is in the native mode as per
the production configuration. AMD_IS_APU cannot be used for Accelerated
Processing Platform APUs as it might be used in a different context on
previous generations or on small APUs.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: more GPU page fault info for GC v9.4.3
Philip Yang [Mon, 14 Nov 2022 22:35:43 +0000 (17:35 -0500)]
drm/amdgpu: more GPU page fault info for GC v9.4.3

Output IH cookie node_id and translate it to the corresponding AID id
and XCC id, to help debug the GPU page fault.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: remove partition attributes sys file for gfx_v9_4_3
Shiwu Zhang [Mon, 14 Nov 2022 07:52:19 +0000 (15:52 +0800)]
drm/amdgpu: remove partition attributes sys file for gfx_v9_4_3

For driver de-init like rmmod operations those partition specific
attributes need to be removed accordingly.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
16 months agodrm/amdgpu: fix kcq mqd_backup buffer double free for multi-XCD
Shiwu Zhang [Fri, 11 Nov 2022 07:54:52 +0000 (15:54 +0800)]
drm/amdgpu: fix kcq mqd_backup buffer double free for multi-XCD

For gfx_v9_4_3 and beyond, struct kiq has its own mqd_backup pointer
rather than using the last pointer from mec struct. Then the kfree
operation on the pointer from the mec struct should be removed otherwise
it will cause double free on the first kcq's mqd_backup buffer on XCD1.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>