platform/kernel/linux-rpi.git
17 months agodrm/amdgpu: Add support EEPROM table v2.1
Stanley.Yang [Wed, 31 May 2023 02:37:09 +0000 (10:37 +0800)]
drm/amdgpu: Add support EEPROM table v2.1

Add ras info to EEPROM table, app can analyse device ECC
status without GPU driver through EEPROM table ras info.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Support setting EEPROM table version
Stanley.Yang [Tue, 30 May 2023 14:48:34 +0000 (22:48 +0800)]
drm/amdgpu: Support setting EEPROM table version

Add setting EEPROM table version interface for umcv8.10,
Add EEPROM table v2.1 to UMC v8.10.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Add RAS table v2.1 macro definition
Stanley.Yang [Tue, 30 May 2023 14:38:17 +0000 (22:38 +0800)]
drm/amdgpu: Add RAS table v2.1 macro definition

Add RAS EEPROM table version 2.1 macro definition.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Rename ras table version
Stanley.Yang [Tue, 30 May 2023 13:48:17 +0000 (21:48 +0800)]
drm/amdgpu: Rename ras table version

Rename RAS_TABLE_VER to RAS_TABLE_VER_V1,
move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Implement gfx9 patch functions for resubmission
Jiadong Zhu [Thu, 25 May 2023 10:42:15 +0000 (18:42 +0800)]
drm/amdgpu: Implement gfx9 patch functions for resubmission

Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9
during the resubmission scenario.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Modify indirect buffer packages for resubmission
Jiadong Zhu [Thu, 25 May 2023 08:52:55 +0000 (16:52 +0800)]
drm/amdgpu: Modify indirect buffer packages for resubmission

When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.

Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu/mmsch: Correct the definition for mmsch init header
Emily Deng [Tue, 6 Jun 2023 06:27:04 +0000 (14:27 +0800)]
drm/amdgpu/mmsch: Correct the definition for mmsch init header

For the header, it is version related, shouldn't use MAX_VCN_INSTANCES.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: potential error pointer dereference in ioctl
Dan Carpenter [Tue, 6 Jun 2023 23:29:51 +0000 (19:29 -0400)]
drm/amdkfd: potential error pointer dereference in ioctl

The "target" either comes from kfd_create_process() which returns error
pointers on error or kfd_lookup_process_by_pid() which returns NULL on
error.  So we need to check for both types of errors.

Fixes: 0ab2d7532b05 ("drm/amdkfd: prepare per-process debug enable and disable")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Only use ODM2:1 policy for high pixel rate displays
Aurabindo Pillai [Tue, 6 Jun 2023 15:42:53 +0000 (11:42 -0400)]
drm/amd/display: Only use ODM2:1 policy for high pixel rate displays

We only gain a benefit of using the ODM2:1 dynamic policy if it allow us
to decrease DISPCLK to use the VMIN freq.  If the display config can
already achieve VMIN DISPCLK freq without ODM2:1, don't apply the
policy.

This patch was reverted but that causes some IGT regressions. To
unblock, the patch is being applied again until IGT failures are
fixed.

Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
17 months agodrm/amd/pm: Fix memory some memory corruption
Dan Carpenter [Tue, 6 Jun 2023 08:33:46 +0000 (11:33 +0300)]
drm/amd/pm: Fix memory some memory corruption

The "od_table" is a pointer to a large struct, but this code is doing
pointer math as if it were pointing to bytes.  It results in writing
far outside the struct.

Fixes: 2e8452ea4ef6 ("drm/amd/pm: fulfill the OD support for SMU13.0.0")
Fixes: 2a9aa52e4617 ("drm/amd/pm: fulfill the OD support for SMU13.0.7")
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: display/Kconfig: replace leading spaces with tab
Sui Jingfeng [Tue, 6 Jun 2023 13:33:28 +0000 (21:33 +0800)]
drm/amdgpu: display/Kconfig: replace leading spaces with tab

This patch replace the leading spaces with tab, make them keep aligned with
the rest of the config options. No functional change.

Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: mark dml314's UseMinimumDCFCLK() as noinline_for_stack
Hamza Mahfooz [Mon, 5 Jun 2023 18:18:49 +0000 (14:18 -0400)]
drm/amd/display: mark dml314's UseMinimumDCFCLK() as noinline_for_stack

clang reports:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_mode_vba_314.c:3892:6: error: stack frame size (2632) exceeds limit (2048) in 'dml314_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
 3892 | void dml314_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
      |      ^
1 error generated.

So, since UseMinimumDCFCLK() consumes a lot of stack space, mark it as
noinline_for_stack to prevent it from blowing up
dml314_ModeSupportAndSystemConfigurationFull()'s stack size.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: mark dml31's UseMinimumDCFCLK() as noinline_for_stack
Hamza Mahfooz [Mon, 5 Jun 2023 17:59:08 +0000 (13:59 -0400)]
drm/amd/display: mark dml31's UseMinimumDCFCLK() as noinline_for_stack

clang reports:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_mode_vba_31.c:3797:6: error: stack frame size (2632) exceeds limit (2048) in 'dml31_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
 3797 | void dml31_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
      |      ^
1 error generated.

So, since UseMinimumDCFCLK() consumes a lot of stack space, mark it as
noinline_for_stack to prevent it from blowing up
dml31_ModeSupportAndSystemConfigurationFull()'s stack size.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Fix unused variable ‘should_lock_all_pipes’
Srinivasan Shanmugam [Tue, 6 Jun 2023 12:06:45 +0000 (17:36 +0530)]
drm/amd/display: Fix unused variable ‘should_lock_all_pipes’

Fix below compilation error:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3524:7: error: unused variable 'should_lock_all_pipes' [-Werror,-Wunused-variable]
        bool should_lock_all_pipes = (update_type != UPDATE_TYPE_FAST);

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Reduce sdp bw after urgent to 90%
Alvin Lee [Fri, 19 May 2023 15:38:15 +0000 (11:38 -0400)]
drm/amd/display: Reduce sdp bw after urgent to 90%

[Description]
Reduce expected SDP bandwidth due to poor QoS and
arbitration issues on high bandwidth configs

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Nevenko Stupar <Nevenko.Stupar@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Add control flag to dc_stream_state to skip eDP BL off/link off
Max Tseng [Tue, 25 Apr 2023 07:05:17 +0000 (15:05 +0800)]
drm/amd/display: Add control flag to dc_stream_state to skip eDP BL off/link off

Add control flag to dc_stream_state to skip eDP BL off/link off.

Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Max Tseng <max.tseng@amd.com>
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Wrong index type for pipe iterator
Saaem Rizvi [Thu, 18 May 2023 16:12:20 +0000 (12:12 -0400)]
drm/amd/display: Wrong index type for pipe iterator

[Why and How]
Type mismatch in index and pipe count might cause an infinite loop. code
Change should resolve this issue.

Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Saaem Rizvi <syedsaaem.rizvi@amd.com>
Reviewed-by: Josip Pavic <Josip.Pavic@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Refactor fast update to use new HWSS build sequence
Alvin Lee [Thu, 18 May 2023 15:30:44 +0000 (11:30 -0400)]
drm/amd/display: Refactor fast update to use new HWSS build sequence

[Description]
- Refactor HW sequencer to use a build / execute sequence
- Also move gamma updates to become fast

v2: squash in build fix ("drm/amd/display: Fix guarding of 'if (dc->debug.visual_confirm)'")

Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: fix dcn315 single stream crb allocation
Dmytro Laktyushkin [Tue, 16 May 2023 19:50:40 +0000 (15:50 -0400)]
drm/amd/display: fix dcn315 single stream crb allocation

Change to improve avoiding asymetric crb calculations for single stream
scenarios.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: change reserved vram info print
YiPeng Chai [Wed, 24 May 2023 09:14:15 +0000 (17:14 +0800)]
drm/amdgpu: change reserved vram info print

The link object of mgr->reserved_pages is the blocks
variable in struct amdgpu_vram_reservation, not the
link variable in struct drm_buddy_block.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
17 months agodrm/amdgpu: fix xclk freq on CHIP_STONEY
Chia-I Wu [Thu, 1 Jun 2023 21:48:08 +0000 (14:48 -0700)]
drm/amdgpu: fix xclk freq on CHIP_STONEY

According to Alex, most APUs from that time seem to have the same issue
(vbios says 48Mhz, actual is 100Mhz).  I only have a CHIP_STONEY so I
limit the fixup to CHIP_STONEY

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl
Min Li [Sat, 3 Jun 2023 07:43:45 +0000 (15:43 +0800)]
drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl

Userspace can race to free the gobj(robj converted from), robj should not
be accessed again after drm_gem_object_put, otherwith it will result in
use-after-free.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agoRevert "drm/amdgpu: switch to golden tsc registers for raven/raven2"
Alex Deucher [Fri, 2 Jun 2023 18:37:13 +0000 (14:37 -0400)]
Revert "drm/amdgpu: switch to golden tsc registers for raven/raven2"

This reverts commit f03eb1d26c2739b75580f58bbab4ab2d5d3eba46.

This results in inconsistent timing reported via asynchronous
GPU queries.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
Cc: Jesse.Zhang@amd.com
Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agoRevert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revis...
Alex Deucher [Fri, 2 Jun 2023 18:34:12 +0000 (14:34 -0400)]
Revert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id"

This reverts commit 9d2d1827af295fd6971786672c41c4dba3657154.

This results in inconsistent timing reported via asynchronous
GPU queries.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
Cc: Jesse.Zhang@amd.com
Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agoRevert "drm/amdgpu: change the reference clock for raven/raven2"
Alex Deucher [Fri, 2 Jun 2023 18:32:55 +0000 (14:32 -0400)]
Revert "drm/amdgpu: change the reference clock for raven/raven2"

This reverts commit fbc24293ca16b3b9ef891fe32ccd04735a6f8dc1.

This results in inconsistent timing reported via asynchronous
GPU queries.

Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html
Cc: Jesse.Zhang@amd.com
Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: convert vcn/jpeg logical mask to physical mask
Stanley.Yang [Fri, 21 Apr 2023 12:58:39 +0000 (20:58 +0800)]
drm/amdgpu: convert vcn/jpeg logical mask to physical mask

 Changed from V1:
  Remove amdgpu_ras_logical_mask_to_physical_mask
due to GET_MASK provides same feature.
Support convert VCN/JPEG logical mask to physical
mask.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: support check vcn jpeg block mask
Stanley.Yang [Fri, 21 Apr 2023 13:14:36 +0000 (21:14 +0800)]
drm/amdgpu: support check vcn jpeg block mask

Support VCN/JPEG instance mask checking, pass logical
mask directly except GFX/SDMA/VCN/JPEG blocks.

Changed from V1:
correct a typo

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: pass xcc mask to ras ta
Stanley.Yang [Wed, 29 Mar 2023 14:03:09 +0000 (22:03 +0800)]
drm/amdgpu: pass xcc mask to ras ta

pass xcc mask to ras ta, ras ta will compare
the mask with the one from chiplet topology.

Changed from V1:
Remove IP version checking.
Set ras_cmd->ras_init_message.init_flags.xcc_mask
directly due to xcc_mask is common structres to
all the devices.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: update smu-driver if header for smu 13.0.0 and smu 13.0.10
Kenneth Feng [Mon, 5 Jun 2023 03:15:34 +0000 (11:15 +0800)]
drm/amd/pm: update smu-driver if header for smu 13.0.0 and smu 13.0.10

update smu-driver if header for smu 13.0.0 and smu 13.0.10

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu/pm: notify driver unloading to PMFW for SMU v13.0.6 dGPU
Le Ma [Wed, 31 May 2023 08:08:50 +0000 (16:08 +0800)]
drm/amdgpu/pm: notify driver unloading to PMFW for SMU v13.0.6 dGPU

Per requested, follow the same sequence as APU to send only
PPSMC_MSG_PrepareForDriverUnload to PMFW during driver unloading.

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Mark 'kgd_gfx_aldebaran_clear_address_watch' & 'kgd_gfx_v11_clear_address...
Srinivasan Shanmugam [Fri, 2 Jun 2023 15:44:21 +0000 (21:14 +0530)]
drm/amdgpu: Mark 'kgd_gfx_aldebaran_clear_address_watch' & 'kgd_gfx_v11_clear_address_watch' functions as static

Below two functions cause a warning because they lack a prototype:

drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c:164:10: warning: no previous prototype for ‘kgd_gfx_aldebaran_clear_address_watch’ [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c:782:10: warning: no previous prototype for ‘kgd_gfx_v11_clear_address_watch’ [-Wmissing-prototypes]

There are no callers from other files, so just mark them as 'static'.

Also fixes the following checks:

CHECK: Alignment should match open parenthesis +static uint32_t
kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev,
uint32_t watch_id)

CHECK: Alignment should match open parenthesis +static uint32_t
kgd_gfx_v11_clear_address_watch(struct amdgpu_device *adev, uint32_t
watch_id)

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+
Aurabindo Pillai [Thu, 1 Jun 2023 17:09:44 +0000 (13:09 -0400)]
drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+

For FPO/FAMS, DMCUB will try to change the output timings by writing to
the OTG registers. However, the timings written directly to the OTG
registers will not be honoured unless VMIN/VMAX selector registers are
programmed with the right bits and trigger source is selected correctly.
Proper solution needs to go into DMCUB but will require additional state
tracking to ensure that the selectors are set and reset correctly as per
driver state. Until fix is merged into firmware, apply the workaround in
driver to unconditionally write OTG vmin/vmax selectors.

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agoRevert "drm/amd/display: Only use ODM2:1 policy for high pixel rate displays"
Aurabindo Pillai [Tue, 4 Apr 2023 16:53:54 +0000 (12:53 -0400)]
Revert "drm/amd/display: Only use ODM2:1 policy for high pixel rate displays"

This reverts commit 047783cdd5f604d87398236beb4971abb4d43293 since it
causes higher power consumption for single display use case (4k60).

Also, this patch introduced a 35% performance drop in a Vulkan benchmark.

* The patch disabled the ODM-combination on most popular monitors, including 4K, 2K and FHD monitors at 60Hz.

* ODM-combination can halve the DPP clock to save power, that is the reason why we introduce ODM-combination, and the PM log shows single pipe consumes more power at 4K@60Hz.

* ODM-combination has 2 de-tiled buffer involved, which provides longer self-sustained time, that benefit to the memory power optimization.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Add gnu_printf format attribute for snprintf_count()
Srinivasan Shanmugam [Fri, 2 Jun 2023 15:38:49 +0000 (21:08 +0530)]
drm/amd/display: Add gnu_printf format attribute for snprintf_count()

Fix the following W=1 kernel build warning:

display/dc/dcn10/dcn10_hw_sequencer_debug.c: In function ‘snprintf_count’:
display/dc/dcn10/dcn10_hw_sequencer_debug.c:56:2: warning: function ‘snprintf_count’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]

Use the __printf() attribute to let the compiler warn if
invalid format strings are passed in.

And fix the following checks:

CHECK: Avoid CamelCase: <pBuf> +unsigned int __printf(3, 4)
snprintf_count(char *pBuf, unsigned int bufSize, char *fmt, ...)

CHECK: Avoid CamelCase: <bufSize> +unsigned int __printf(3, 4)
snprintf_count(char *pBuf, unsigned int bufSize, char *fmt, ...)

Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Address kdoc warnings in dcn30_fpu.c
Srinivasan Shanmugam [Fri, 2 Jun 2023 08:32:45 +0000 (14:02 +0530)]
drm/amd/display: Address kdoc warnings in dcn30_fpu.c

Fixes the following gcc with W=1:

display/dc/dml/dcn30/dcn30_fpu.c:677: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Finds dummy_latency_index when MCLK switching using firmware based
display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'dc' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch'
display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'context' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch'
display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'pipes' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch'
display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'pipe_cnt' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch'
display/dc/dml/dcn30/dcn30_fpu.c:688: warning: Function parameter or member 'vlevel' not described in 'dcn30_find_dummy_latency_index_for_fw_based_mclk_switch'

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: fix compilation error due to shifting negative value
GONG, Ruiqi [Fri, 2 Jun 2023 10:12:33 +0000 (18:12 +0800)]
drm/amd/display: fix compilation error due to shifting negative value

Currently compiling linux-next with allmodconfig triggers the following
error:

./drivers/gpu/drm/amd/amdgpu/../display/include/fixed31_32.h: In function ‘dc_fixpt_truncate’:
./drivers/gpu/drm/amd/amdgpu/../display/include/fixed31_32.h:528:22: error: left shift of negative value [-Werror=shift-negative-value]
  528 |  arg.value &= (~0LL) << (FIXED31_32_BITS_PER_FRACTIONAL_PART - frac_bits);
      |                      ^~

Use `unsigned long long` instead.

Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members
Gustavo A. R. Silva [Sun, 28 May 2023 20:26:37 +0000 (14:26 -0600)]
drm/amdgpu/discovery: Replace fake flex-arrays with flexible-array members

Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.

Use the DECLARE_FLEX_ARRAY() helper macro to transform zero-length
arrays in a union into flexible-array members. And replace a one-element
array with a C99 flexible-array member.

Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1009:89: warning: array subscript kk is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} [-Warray-bounds=]
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1007:94: warning: array subscript kk is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=]
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1310:94: warning: array subscript k is outside array bounds of ‘uint64_t[0]’ {aka ‘long long unsigned int[]’} [-Warray-bounds=]
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1309:57: warning: array subscript k is outside array bounds of ‘uint32_t[0]’ {aka ‘unsigned int[]’} [-Warray-bounds=]

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].

This results in no differences in binary output.

Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/193
Link: https://github.com/KSPP/linux/issues/300
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agoamdgpu: validate offset_in_bo of drm_amdgpu_gem_va
Chia-I Wu [Thu, 1 Jun 2023 22:44:12 +0000 (15:44 -0700)]
amdgpu: validate offset_in_bo of drm_amdgpu_gem_va

This is motivated by OOB access in amdgpu_vm_update_range when
offset_in_bo+map_size overflows.

v2: keep the validations in amdgpu_vm_bo_map
v3: add the validations to amdgpu_vm_bo_map/amdgpu_vm_bo_replace_map
    rather than to amdgpu_gem_va_ioctl

Fixes: 9f7eb5367d00 ("drm/amdgpu: actually use the VM map parameters")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: fix debug wait on idle for gfx9.4.1
Jonathan Kim [Fri, 2 Jun 2023 17:52:04 +0000 (13:52 -0400)]
drm/amdgpu: fix debug wait on idle for gfx9.4.1

Wait calls for amd_ip_block_type not amd_hw_ip_block_type.

Reported-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: fix seamless odm transitions
Dmytro Laktyushkin [Tue, 18 Apr 2023 14:11:56 +0000 (10:11 -0400)]
drm/amd/display: fix seamless odm transitions

Add missing programming and function pointers

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: add ODM case when looking for first split pipe
Samson Tam [Tue, 9 May 2023 20:40:19 +0000 (16:40 -0400)]
drm/amd/display: add ODM case when looking for first split pipe

[Why]
When going from ODM 2:1 single display case to max displays, second
odm pipe needs to be repurposed for one of the new single displays.
However, acquire_first_split_pipe() only handles MPC case and not
ODM case

[How]
Add ODM conditions in acquire_first_split_pipe()
Add commit_minimal_transition_state() in commit_streams() to handle
odm 2:1 exit first, and then process new streams
Handle ODM condition in commit_minimal_transition_state()

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Samson Tam <samson.tam@amd.com>
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: clean up some inconsistent indenting
Jiapeng Chong [Fri, 2 Jun 2023 06:17:53 +0000 (14:17 +0800)]
drm/amd/display: clean up some inconsistent indenting

No functional modification involved.

drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_dpms.c:2377 link_set_dpms_on() warn: inconsistent indenting.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5376
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Fix dc/dcn20/dcn20_optc.c kdoc
Srinivasan Shanmugam [Fri, 2 Jun 2023 09:49:22 +0000 (15:19 +0530)]
drm/amd/display: Fix dc/dcn20/dcn20_optc.c kdoc

Fix all kdoc warnings in dc/dcn20/dcn20_optc.c:

display/dc/dcn20/dcn20_optc.c:41: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Enable CRTC
display/dc/dcn20/dcn20_optc.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 *For the below, I'm not sure how your GSL parameters are stored in your
  env,
display/dc/dcn20/dcn20_optc.c:85: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * There are (MAX_OPTC+1)/2 gsl groups available for use.

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: fulfill the OD support for SMU13.0.7
Evan Quan [Tue, 9 May 2023 02:21:52 +0000 (10:21 +0800)]
drm/amd/pm: fulfill the OD support for SMU13.0.7

Fulfill the interfaces for OD settings retrieving and setting.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: Fill metrics data for SMUv13.0.6
Lijo Lazar [Thu, 1 Jun 2023 12:23:45 +0000 (17:53 +0530)]
drm/amd/pm: Fill metrics data for SMUv13.0.6

Populate metrics data table for SMU v13.0.6. Add PCIe link speed/width
information also.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: fulfill the OD support for SMU13.0.0
Evan Quan [Mon, 8 May 2023 08:57:02 +0000 (16:57 +0800)]
drm/amd/pm: fulfill the OD support for SMU13.0.0

Fulfill the interfaces for OD settings retrieving and setting.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: fulfill SMU13 OD settings init and restore
Evan Quan [Tue, 11 Apr 2023 03:49:09 +0000 (11:49 +0800)]
drm/amd/pm: fulfill SMU13 OD settings init and restore

Gfxclk fmin/fmax, Uclk fmin/fmax and Gfx v/f curve voltage offset
OD settings are supported for SMU13.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: bump kfd ioctl minor version for debug api availability
Jonathan Kim [Tue, 10 May 2022 16:51:26 +0000 (12:51 -0400)]
drm/amdkfd: bump kfd ioctl  minor version for debug api availability

Bump the minor version to declare debugging capability is now
available.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug device snapshot operation
Jonathan Kim [Tue, 10 May 2022 16:47:45 +0000 (12:47 -0400)]
drm/amdkfd: add debug device snapshot operation

Similar to queue snapshot, return an array of device information using
an entry_size check and return.
Unlike queue snapshots, the debugger needs to pass to correct number of
devices that exist.  If it fails to do so, the KFD will return the
number of actual devices so that the debugger can make a subsequent
successful call.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug queue snapshot operation
Jonathan Kim [Tue, 10 May 2022 15:15:29 +0000 (11:15 -0400)]
drm/amdkfd: add debug queue snapshot operation

Allow the debugger to get a snapshot of a specified number of queues
containing various queue property information that is copied to the
debugger.

Since the debugger doesn't know how many queues exist at any given time,
allow the debugger to pass the requested number of snapshots as 0 to get
the actual number of potential snapshots to use for a subsequent snapshot
request for actual information.

To prevent future ABI breakage, pass in the requested entry_size.
The KFD will return it's own entry_size in case the debugger still wants
log the information in a core dump on sizing failure.

Also allow the debugger to clear exceptions when doing a snapshot.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug query exception info operation
Jonathan Kim [Mon, 9 May 2022 17:37:36 +0000 (13:37 -0400)]
drm/amdkfd: add debug query exception info operation

Allow the debugger to query additional info based on an exception code.
For device exceptions, it's currently only memory violation information.
For process exceptions, it's currently only runtime information.
Queue exception only report the queue exception status.

The debugger has the option of clearing the target exception on query.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug query event operation
Jonathan Kim [Mon, 9 May 2022 15:10:32 +0000 (11:10 -0400)]
drm/amdkfd: add debug query event operation

Allow the debugger to query a single queue, device and process
exception.
The KFD should also return the GPU or Queue id of the exception.
The debugger also has the option of clearing exceptions after
being queried.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug set flags operation
Jonathan Kim [Mon, 9 May 2022 14:51:56 +0000 (10:51 -0400)]
drm/amdkfd: add debug set flags operation

Allow the debugger to set single memory and single ALU operations.

Some exceptions are imprecise (memory violations, address watch) in the
sense that a trap occurs only when the exception interrupt occurs and
not at the non-halting faulty instruction.  Trap temporaries 0 & 1 save
the program counter address, which means that these values will not point
to the faulty instruction address but to whenever the interrupt was
raised.

Setting the Single Memory Operations flag will inject an automatic wait
on every memory operation instruction forcing imprecise memory exceptions
to become precise at the cost of performance.  This setting is not
permitted on debug devices that support only a global setting of this
option.

Return the previous set flags to the debugger as well.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug set and clear address watch points operation
Jonathan Kim [Fri, 6 May 2022 18:58:55 +0000 (14:58 -0400)]
drm/amdkfd: add debug set and clear address watch points operation

Shader read, write and atomic memory operations can be alerted to the
debugger as an address watch exception.

Allow the debugger to pass in a watch point to a particular memory
address per device.

Note that there exists only 4 watch points per devices to date, so have
the KFD keep track of what watch points are allocated or not.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug suspend and resume process queues operation
Jonathan Kim [Thu, 5 May 2022 20:15:37 +0000 (16:15 -0400)]
drm/amdkfd: add debug suspend and resume process queues operation

In order to inspect waves from the saved context at any point during a
debug session, the debugger must be able to preempt queues to trigger
context save by suspending them.

On queue suspend, the KFD will copy the context save header information
so that the debugger can correctly crawl the appropriate size of the saved
context. The debugger must then also be allowed to resume suspended queues.

A queue that is newly created cannot be suspended because queue ids are
recycled after destruction so the debugger needs to know that this has
occurred.  Query functions will be later added that will clear a given
queue of its new queue status.

A queue cannot be destroyed while it is suspended to preserve its saved
context during debugger inspection.  Have queue destruction block while
a queue is suspended and unblocked when it is resumed.  Likewise, if a
queue is about to be destroyed, it cannot be suspended.

Return the number of queues successfully suspended or resumed along with
a per queue status array where the upper bits per queue status show that
the request was invalid (new/destroyed queue suspend request, missing
queue) or an error occurred (HWS in a fatal state so it can't suspend or
resume queues).

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug wave launch mode operation
Jonathan Kim [Mon, 2 May 2022 15:45:05 +0000 (11:45 -0400)]
drm/amdkfd: add debug wave launch mode operation

Allow the debugger to set wave behaviour on to either normally operate,
halt at launch, trap on every instruction, terminate immediately or
stall on allocation.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug wave launch override operation
Jonathan Kim [Wed, 27 Apr 2022 17:18:10 +0000 (13:18 -0400)]
drm/amdkfd: add debug wave launch override operation

This operation allows the debugger to override the enabled HW
exceptions on the device.

On debug devices that only support the debugging of a single process,
the HW exceptions are global and set through the SPI_GDBG_TRAP_MASK
register.
Because they are global, only address watch exceptions are allowed to
be enabled.  In other words, the debugger must preserve all non-address
watch exception states in normal mode operation by barring a full
replacement override or a non-address watch override request.

For multi-process debugging, all HW exception overrides are per-VMID so
all exceptions can be overridden or fully replaced.

In order for the debugger to know what is permissible, returned the
supported override mask back to the debugger along with the previously
enable overrides.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug set exceptions enabled operation
Jonathan Kim [Wed, 27 Apr 2022 14:24:37 +0000 (10:24 -0400)]
drm/amdkfd: add debug set exceptions enabled operation

The debugger subscibes to nofication for requested exceptions on attach.
Allow the debugger to change its subsciption later on.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: update process interrupt handling for debug events
Jonathan Kim [Fri, 22 Apr 2022 16:26:18 +0000 (12:26 -0400)]
drm/amdkfd: update process interrupt handling for debug events

The debugger must be notified by any debugger subscribed exception
that comes from hardware interrupts.

If a debugger session exits, any exceptions it subscribed to may still
have interrupts in the interrupt ring buffer or KGD/KFD pipeline.
To prevent a new session from inheriting stale interrupts, when a new
queue is created, open an interrupt drain and allow the IH ring to drain
from a timestamped checkpoint.  Then inject a custom IV so that once
the custom IV is picked up by the KFD, it's safe to close the drain
and proceed with queue creation.

The drain must also be on debug disable as SW interrupts may still
be processed.  Drain at this time and clear all the exception status.

The debugger may also not be attached nor subscibed to certain
exceptions so forward them directly to the runtime.

GFX10 also requires its own IV processing, hence the creation of
kfd_int_process_v10.c.  This is because the IV from SQ interrupts are
packed into a new continguous format unlike GFX9. To make this clear,
a separate interrupting handling code file was created.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: update SMU13 header files for coming OD support
Evan Quan [Tue, 11 Apr 2023 03:25:52 +0000 (11:25 +0800)]
drm/amd/pm: update SMU13 header files for coming OD support

Correct the data structures for OD feature support.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug trap enabled flag to tma
Jay Cornwall [Tue, 2 Mar 2021 00:34:39 +0000 (18:34 -0600)]
drm/amdkfd: add debug trap enabled flag to tma

Trap handler behavior will differ when a debugger is attached.

Make the debug trap flag available in the trap handler TMA.
Update it when the debug trap ioctl is invoked.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add runtime enable operation
Jonathan Kim [Fri, 8 Apr 2022 17:12:24 +0000 (13:12 -0400)]
drm/amdkfd: add runtime enable operation

The debugger can attach to a process prior to HSA enablement (i.e.
inferior is spawned by the debugger and attached to immediately before
target process has been enabled for HSA dispatches) or it
can attach to a running target that is already HSA enabled.  Either
way, the debugger needs to know the enablement status to know when
it can inspect queues.

For the scenario where the debugger spawns the target process,
it will have to wait for ROCr's runtime enable request from the target.
The runtime enable request will be able to see that its process has been
debug attached.  ROCr raises an EC_PROCESS_RUNTIME signal to the
debugger then blocks the target process while waiting the debugger's
response. Once the debugger has received the runtime signal, it will
unblock the target process.

For the scenario where the debugger attaches to a running target
process, ROCr will set the target process' runtime status as enabled so
that on an attach request, the debugger will be able to see this
status and will continue with debug enablement as normal.

A secondary requirement is to conditionally enable the trap tempories only
if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger
attaches with HSA runtime enabled.  This is because setting up the trap
temporaries incurs a performance overhead that is unacceptable for
microbench performance in normal mode for certain customers.

In the scenario where the debugger spawns the target process, when ROCr
detects that the debugger has attached during the runtime enable
request, it will enable the trap temporaries before it blocks the target
process while waiting for the debugger to respond.

In the scenario where the debugger attaches to a running target process,
it will enable to trap temporaries itself.

Finally, there is an additional restriction that is required to be
enforced with runtime enable and HW debug mode setting. The debugger must
first ensure that HW debug mode has been enabled before permitting HW debug
mode operations.

With single process debug devices, allowing the debugger to set debug
HW modes prior to trap activation means that debug HW mode setting can
occur before the KFD has reserved the debug VMID (0xf) from the hardware
scheduler's VMID allocation resource pool.  This can result in the
hardware scheduler assigning VMID 0xf to a non-debugged process and
having that process inherit debug HW mode settings intended for the
debugged target process instead, which is both incorrect and potentially
fatal for normal mode operation.

With multi process debug devices, allowing the debugger to set debug
HW modes prior to trap activation means that non-debugged processes
migrating to a new VMID could inherit unintended debug settings.

All debug operations that touch HW settings must require trap activation
where trap activation is triggered by both debug attach and runtime
enablement (target has KFD opened and is ready to dispatch work).

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add send exception operation
Jonathan Kim [Fri, 8 Apr 2022 16:49:48 +0000 (12:49 -0400)]
drm/amdkfd: add send exception operation

Add a debug operation that allows the debugger to send an exception
directly to runtime through a payload address.

For memory violations, normal vmfault signals will be applied to
notify runtime instead after passing in the saved exception data
when a memory violation was raised to the debugger.

For runtime exceptions, this will unblock the runtime enable
function which will be explained and implemented in a follow up
patch.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add raise exception event function
Jonathan Kim [Wed, 6 Apr 2022 16:03:31 +0000 (12:03 -0400)]
drm/amdkfd: add raise exception event function

Exception events can be generated from interrupts or queue activitity.

The raise event function will save exception status of a queue, device
or process then notify the debugger of the status change by writing to
a debugger polled file descriptor that the debugger provides during
debug attach.

For memory violation exceptions, extra exception data will be saved.

The debugger will be able to query the saved exception states by query
operation that will be provided by follow up patches.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: apply trap workaround for gfx11
Jonathan Kim [Thu, 1 Sep 2022 15:27:15 +0000 (11:27 -0400)]
drm/amdkfd: apply trap workaround for gfx11

Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add per process hw trap enable and disable functions
Jonathan Kim [Tue, 5 Apr 2022 16:34:55 +0000 (12:34 -0400)]
drm/amdkfd: add per process hw trap enable and disable functions

To enable HW debug mode per process, all devices must be debug enabled
successfully.  If a failure occures, rewind the enablement of debug mode
on the enabled devices.

A power management scenario that needs to be considered is HW
debug mode setting during GFXOFF.  During GFXOFF, these registers
will be unreachable so we have to transiently disable GFXOFF when
setting.  Also, some devices don't support the RLC save restore
function for these debug registers so we have to disable GFXOFF
completely during a debug session.

Cooperative launch also has debugging restriction based on HW/FW bugs.
If such bugs exists, the debugger cannot attach to a process that uses GWS
resources nor can GWS resources be requested if a process is being
debugged.

Multi-process debug devices can only enable trap temporaries based
on certain runtime scenerios, which will be explained when the
runtime enable functions are implemented in a follow up patch.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: expose debug api for mes
Jonathan Kim [Sat, 27 Aug 2022 02:04:15 +0000 (22:04 -0400)]
drm/amdgpu: expose debug api for mes

Similar to the F32 HWS, the RS64 HWS for GFX11 now supports a multi-process
debug API.

The skip_process_ctx_clear ADD_QUEUE requirement is to prevent the MES
from clearing the process context when the first queue is added to the
scheduler in order to maintain debug mode settings during queue preemption
and restore.  The MES clears the process context in this case due to an
unresolved FW caching bug during normal mode operations.
During debug mode, the KFD will hold a reference to the target process
so the process context should never go stale and MES can afford to skip
this requirement.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: prepare map process for multi-process debug devices
Jonathan Kim [Mon, 4 Apr 2022 17:38:11 +0000 (13:38 -0400)]
drm/amdgpu: prepare map process for multi-process debug devices

Unlike single process debug devices, multi-process debug devices allow
debug mode setting per-VMID (non-device-global).

Because the HWS manages PASID-VMID mapping, the new MAP_PROCESS API allows
the KFD to forward the required SPI debug register write requests.

To request a new debug mode setting change, the KFD must be able to
preempt all queues then remap all queues with these new setting
requests for MAP_PROCESS to take effect.

Note that by default, trap enablement in non-debug mode must be disabled
for performance reasons for multi-process debug devices due to setup
overhead in FW.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: prepare map process for single process debug devices
Jonathan Kim [Mon, 4 Apr 2022 16:27:43 +0000 (12:27 -0400)]
drm/amdkfd: prepare map process for single process debug devices

Older HW only supports debugging on a single process because the
SPI debug mode setting registers are device global.

The HWS has supplied a single pinned VMID (0xf) for MAP_PROCESS
for debug purposes. To pin the VMID, the KFD will remove the VMID from
the HWS dynamic VMID allocation via SET_RESOUCES so that a debugged
process will never migrate away from its pinned VMID.

The KFD is responsible for reserving and releasing this pinned VMID
accordingly whenever the debugger attaches and detaches respectively.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add configurable grace period for unmap queues
Jonathan Kim [Thu, 23 Mar 2023 21:17:20 +0000 (17:17 -0400)]
drm/amdgpu: add configurable grace period for unmap queues

The HWS schedule allows a grace period for wave completion prior to
preemption for better performance by avoiding CWSR on waves that can
potentially complete quickly. The debugger, on the other hand, will
want to inspect wave status immediately after it actively triggers
preemption (a suspend function to be provided).

To minimize latency between preemption and debugger wave inspection, allow
immediate preemption by setting the grace period to 0.

Note that setting the preepmtion grace period to 0 will result in an
infinite grace period being set due to a CP FW bug so set it to 1 for now.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add gfx11 hw debug mode enable and disable calls
Jonathan Kim [Sat, 27 Aug 2022 02:35:50 +0000 (22:35 -0400)]
drm/amdgpu: add gfx11 hw debug mode enable and disable calls

Implement the per-device calls to enable or disable HW debug mode
for GFX11.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls
Jonathan Kim [Fri, 1 Apr 2022 17:31:57 +0000 (13:31 -0400)]
drm/amdgpu: add gfx9.4.2 hw debug mode enable and disable calls

GFX9.4.2 now supports per-VMID debug mode controls registers
(SPI_GDBG_PER_VMID_CNTL).

Because the KFD lets the HWS handle PASID-VMID mapping, the KFD will
forward all debug mode setting register writes to the HWS scheduler
using a new MAP_PROCESS API, so instead of writing to registers, return
the required register values that the HWS needs to write on debug enable
and disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add gfx10 hw debug mode enable and disable calls
Jonathan Kim [Thu, 31 Mar 2022 17:14:01 +0000 (13:14 -0400)]
drm/amdgpu: add gfx10 hw debug mode enable and disable calls

Similar to GFX9 debug devices, set the hardware debug mode by draining
the SPI appropriately prior the mode setting request.

Because GFX10 has waves allocated by the work group boundary and each
SE's SPI instances do not communicate, the SPI drain time is much longer.
This long drain time will be fixed for GFX11 onwards.

Also remove a bunch of deprecated misplaced references for GFX10.3.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: fix kfd_suspend_all_processes
Jonathan Kim [Fri, 24 Mar 2023 20:19:27 +0000 (16:19 -0400)]
drm/amdkfd: fix kfd_suspend_all_processes

Flush delayed restore work in kfd_suspend_all_queues instead of
cancelling. Cancelling the work before it runs results in the queues
becoming permanently disabled. Flushing the work ensures that the
queue suspend/resume state stays balanced.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls
Jonathan Kim [Wed, 30 Mar 2022 19:31:00 +0000 (15:31 -0400)]
drm/amdgpu: add gfx9.4.1 hw debug mode enable and disable calls

On GFX9.4.1, the implicit wait count instruction on s_barrier is
disabled by default in the driver during normal operation for
performance requirements.

There is a hardware bug in GFX9.4.1 where if the implicit wait count
instruction after an s_barrier instruction is disabled, any wave that
hits an exception may step over the s_barrier when returning from the
trap handler with the barrier logic having no ability to be
aware of this, thereby causing other waves to wait at the barrier
indefinitely resulting in a shader hang.  This bug has been corrected
for GFX9.4.2 and onward.

Since the debugger subscribes to hardware exceptions, in order to avoid
this bug, the debugger must enable implicit wait count on s_barrier
for a debug session and disable it on detach.

In order to change this setting in the in the device global SQ_CONFIG
register, the GFX pipeline must be idle.  GFX9.4.1 as a compute device
will either dispatch work through the compute ring buffers used for
image post processing or through the hardware scheduler by the KFD.

Have the KGD suspend and drain the compute ring buffer, then suspend the
hardware scheduler and block any future KFD process job requests before
changing the implicit wait count setting.  Once set, resume all work.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add gfx9 hw debug mode enable and disable calls
Jonathan Kim [Wed, 30 Mar 2022 19:09:11 +0000 (15:09 -0400)]
drm/amdgpu: add gfx9 hw debug mode enable and disable calls

Implement the per-device calls to enable or disable HW debug mode for
GFX9 prior to GFX9.4.1.

GFX9.4.1 and onward will require their own enable/disable sequence as
follow on patches.

When hardware debug mode setting is requested, waves will inherit
these settings in the Shader Processor Input's (SPI) Sequencer Global
Block (SQG). This means that the KGD must drain all waves from the SPI
into SQG (approximately 96 SPI clock cycles) prior to debug mode setting
to ensure that the order of operations that the debugger expects with
regards to debug mode setting transaction requests and wave inheritence
of that mode is upheld.

Also ensure that exception overrides are reset to their original state
prior to debug enable or disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: clean up one inconsistent indenting
Yang Li [Wed, 31 May 2023 02:08:11 +0000 (10:08 +0800)]
drm/amdkfd: clean up one inconsistent indenting

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:1036 kgd2kfd_interrupt() warn: inconsistent indenting

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Drop unused DCN_BASE variable in dcn314_resource.c
Srinivasan Shanmugam [Wed, 31 May 2023 03:33:27 +0000 (09:03 +0530)]
drm/amd/display: Drop unused DCN_BASE variable in dcn314_resource.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn314/dcn314_resource.c:128:29: warning: ‘DCN_BASE’ defined but not used [-Wunused-const-variable=]
  128 | static const struct IP_BASE DCN_BASE = { { { { 0x00000012, 0x000000C0, 0x000034C0, 0x00009000, 0x02403C00, 0, 0, 0 } },
      |                             ^~~~~~~~

Suggested-by: Roman Li <Roman.Li@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder during...
Mario Limonciello [Tue, 30 May 2023 17:44:30 +0000 (12:44 -0500)]
drm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder during suspend path

Users have reported that s2idle wasn't working on OEM Phoenix systems,
but it was root caused to be because `CONFIG_AMD_PMC` wasn't set in
the distribution kernel config.

To make this more apparent, raise the messaging to err instead of warn.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217497
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: setup hw debug registers on driver initialization
Jonathan Kim [Thu, 31 Mar 2022 16:05:00 +0000 (12:05 -0400)]
drm/amdgpu: setup hw debug registers on driver initialization

Add missing debug trap registers references and initialize all debug
registers on boot by clearing the hardware exception overrides and the
wave allocation ID index.

The debugger requires that TTMPs 6 & 7 save the dispatch ID to map
waves onto dispatch during compute context inspection.
In order to correctly set this up, set the special reserved CP bit by
default whenever the MQD is initailized.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: add kgd hw debug mode setting interface
Jonathan Kim [Wed, 30 Mar 2022 18:54:16 +0000 (14:54 -0400)]
drm/amdgpu: add kgd hw debug mode setting interface

Introduce the require KGD debug calls that will execute hardware debug
mode setting.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: prepare per-process debug enable and disable
Jonathan Kim [Fri, 25 Mar 2022 18:55:30 +0000 (14:55 -0400)]
drm/amdkfd: prepare per-process debug enable and disable

The ROCm debugger will attach to a process to debug by PTRACE and will
expect the KFD to prepare a process for the target PID, whether the
target PID has opened the KFD device or not.

This patch is to explicity handle this requirement.  Further HW mode
setting and runtime coordination requirements will be handled in
following patches.

In the case where the target process has not opened the KFD device,
a new KFD process must be created for the target PID.
The debugger as well as the target process for this case will have not
acquired any VMs so handle process restoration to correctly account for
this.

To coordinate with HSA runtime, the debugger must be aware of the target
process' runtime enablement status and will copy the runtime status
information into the debugged KFD process for later query.

On enablement, the debugger will subscribe to a set of exceptions where
each exception events will notify the debugger through a pollable FIFO
file descriptor that the debugger provides to the KFD to manage.

Finally on process termination of either the debugger or the target,
debugging must be disabled if it has not been done so.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: display debug capabilities
Jonathan Kim [Fri, 25 Mar 2022 16:39:06 +0000 (12:39 -0400)]
drm/amdkfd: display debug capabilities

Expose debug capabilities in the KFD topology node's HSA capabilities and
debug properties flags.

Ensure correct capabilities are exposed based on firmware support.

Flag definitions can be referenced in uapi/linux/kfd_sysfs.h.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: add debug and runtime enable interface
Jonathan Kim [Wed, 2 Mar 2022 19:30:12 +0000 (14:30 -0500)]
drm/amdkfd: add debug and runtime enable interface

Introduce the GPU debug operations interface.

For ROCm-GDB to extend the GNU Debugger's ability to inspect the AMD GPU
instruction set, provide the necessary interface to allow the debugger
to HW debug-mode set and query exceptions per HSA queue, process or
device.

The runtime_enable interface coordinates exception handling with the
HSA runtime.

Usage is available in the kern docs at uapi/linux/kfd_ioctl.h.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agoamd/amdkfd: drop unused KFD_IOCTL_SVM_FLAG_UNCACHED flag
Alex Deucher [Fri, 2 Jun 2023 16:58:05 +0000 (12:58 -0400)]
amd/amdkfd: drop unused KFD_IOCTL_SVM_FLAG_UNCACHED flag

Was leftover from GC 9.4.3 bring up and is currently
unused.  Drop it for now.

Cc: Philip.Yang@amd.com
Cc: rajneesh.bhardwaj@amd.com
Cc: Felix.Kuehling@amd.com
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs
Evan Quan [Thu, 6 Apr 2023 04:08:21 +0000 (12:08 +0800)]
drm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs

Disable the pcie lane switching for some sienna_cichlid SKUs since it
might not work well on some platforms.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: Fix power context allocation in SMU13
Lijo Lazar [Fri, 31 Mar 2023 11:00:01 +0000 (16:30 +0530)]
drm/amd/pm: Fix power context allocation in SMU13

Use the right data structure for allocation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: add unique serial number support for smu_v13_0_6
Yang Wang [Wed, 24 May 2023 05:54:26 +0000 (13:54 +0800)]
drm/amd/pm: add unique serial number support for smu_v13_0_6

add unique serial number support for smu_v13_0_6.
(use aid0 serial number by default)

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: Fix SMUv13.0.6 throttle status report
Lijo Lazar [Fri, 31 Mar 2023 11:04:15 +0000 (16:34 +0530)]
drm/amd/pm: Fix SMUv13.0.6 throttle status report

Add throttle status in power context
Keep throttle status indicator in SMUv13 power context

v2: Removed Dummy definition

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/pm: Update SMUv13.0.6 PMFW headers
Lijo Lazar [Mon, 3 Apr 2023 06:08:17 +0000 (11:38 +0530)]
drm/amd/pm: Update SMUv13.0.6 PMFW headers

Update PMFW interface headers to for new metrics table format and
throttling information.

v2: Added dummy definition for compilation error

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram
Horatio Zhang [Mon, 29 May 2023 18:23:37 +0000 (14:23 -0400)]
drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vram

Use the function of amdgpu_bo_vm_destroy to handle the resource release
of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but
vmbo->shadow_list was not, which caused a null pointer reference error
in amdgpu_device_recover_vram when GPU reset.

Fixes: 6c032c37ac3e ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)")
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Add function parameter 'event' to kdoc in svm_range_evict()
Srinivasan Shanmugam [Wed, 31 May 2023 18:03:35 +0000 (23:33 +0530)]
drm/amdgpu: Add function parameter 'event' to kdoc in svm_range_evict()

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:1841: warning: Function parameter or member 'event' not described in 'svm_range_evict'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Fix up kdoc in amdgpu_device.c
Srinivasan Shanmugam [Thu, 25 May 2023 17:26:17 +0000 (22:56 +0530)]
drm/amdgpu: Fix up kdoc in amdgpu_device.c

Fix these warnings by deleting the deviant arguments.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:799: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_index' description in 'amdgpu_device_indirect_wreg64'
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:870: warning: Excess function parameter 'pcie_data' description in 'amdgpu_device_indirect_wreg64'

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdgpu: Fix up kdoc 'ring' parameter in sdma_v6_0_ring_pad_ib
Srinivasan Shanmugam [Tue, 30 May 2023 18:42:22 +0000 (00:12 +0530)]
drm/amdgpu: Fix up kdoc 'ring' parameter in sdma_v6_0_ring_pad_ib

Fix this warning by adding 'ring' arguments to kdoc.

gcc with W=1
drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c:1128: warning: Function parameter or member 'ring' not described in 'sdma_v6_0_ring_pad_ib'

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Fix up kdoc formatting in display_mode_vba.c
Srinivasan Shanmugam [Tue, 30 May 2023 19:04:10 +0000 (00:34 +0530)]
drm/amd/display: Fix up kdoc formatting in display_mode_vba.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_vba.c:936: warning: Cannot understand  * *************************************************************************

Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/amdgpu: introduce DRM_AMDGPU_WERROR
Hamza Mahfooz [Wed, 24 May 2023 18:59:32 +0000 (14:59 -0400)]
drm/amd/amdgpu: introduce DRM_AMDGPU_WERROR

We want to do -Werror builds on our CI. However, non-amdgpu breakages
have prevented us from doing so thus far. Also, there are a number of
additional checks that we should enable, that the community cares about
and are hidden behind -Wextra. So, define DRM_AMDGPU_WERROR to only
enable -Werror for the amdgpu kernel module and enable -Wextra while
disabling all of the checks that are too noisy.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Kenny Ho <kenny.ho@amd.com>
Suggested-by: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Kenny Ho <Kenny.Ho@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amdkfd: remove unused sq_int_priv variable
Tom Rix [Thu, 30 Mar 2023 15:20:40 +0000 (11:20 -0400)]
drm/amdkfd: remove unused sq_int_priv variable

clang with W=1 reports
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v11.c:282:38: error: variable
  'sq_int_priv' set but not used [-Werror,-Wunused-but-set-variable]
        uint8_t sq_int_enc, sq_int_errtype, sq_int_priv;
                                            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd: Disallow s0ix without BIOS support again
Mario Limonciello [Tue, 30 May 2023 16:57:59 +0000 (11:57 -0500)]
drm/amd: Disallow s0ix without BIOS support again

commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") showed
improvements to power consumption over suspend when s0ix wasn't enabled in
BIOS and the system didn't support S3.

This patch however was misguided because the reason the system didn't
support S3 was because SMT was disabled in OEM BIOS setup.
This prevented the BIOS from allowing S3.

Also allowing GPUs to use the s2idle path actually causes problems if
they're invoked on systems that may not support s2idle in the platform
firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
for any reason, which could lead to unexpected flows.

The original commit also fixed a problem during resume from suspend to idle
without hardware support, but this is no longer necessary with commit
ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than Raven")

Revert commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
to make it match the expected behavior again.

Cc: Rafael Ávila de Espíndola <rafael@espindo.la>
Link: https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1060
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Correct kdoc formatting for DCN32_CRB_SEGMENT_SIZE_KB in dcn32_hubbub.c
Srinivasan Shanmugam [Wed, 31 May 2023 03:52:36 +0000 (09:22 +0530)]
drm/amd/display: Correct kdoc formatting for DCN32_CRB_SEGMENT_SIZE_KB in dcn32_hubbub.c

Fixes the following W=1 kernel build warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn32/dcn32_hubbub.c:45: warning: Cannot understand  * @DCN32_CRB_SEGMENT_SIZE_KB: Maximum Configurable Return Buffer size for
 on line 45 - I thought it was a doc line

Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
17 months agodrm/amd/display: Fix up missing 'dc' & 'pipe_ctx' kdoc parameters in delay_cursor_unt...
Srinivasan Shanmugam [Wed, 31 May 2023 05:11:59 +0000 (10:41 +0530)]
drm/amd/display: Fix up missing 'dc' & 'pipe_ctx' kdoc parameters in delay_cursor_until_vupdate()

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1904: warning: Function parameter or member 'dc' not described in 'delay_cursor_until_vupdate'
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:1904: warning: Function parameter or member 'pipe_ctx' not described in 'delay_cursor_until_vupdate'

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>