platform/kernel/linux-starfive.git
3 years agodrm/amdgpu: Fix spelling mistake "disabed" -> "disabled"
Colin Ian King [Thu, 11 Mar 2021 09:28:30 +0000 (09:28 +0000)]
drm/amdgpu: Fix spelling mistake "disabed" -> "disabled"

There is a spelling mistake in a drm debug message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/smu8: return an error rather than 50% if busy query fails
Alex Deucher [Wed, 10 Mar 2021 17:08:42 +0000 (12:08 -0500)]
drm/amdgpu/smu8: return an error rather than 50% if busy query fails

For consistency with SMU10.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/powerplay/smu10: add support for gpu busy query (v2)
Alex Deucher [Tue, 9 Mar 2021 16:29:54 +0000 (11:29 -0500)]
drm/amdgpu/powerplay/smu10: add support for gpu busy query (v2)

Was added in newer versions of the firmware.  Add support
for it.

v2: return an error in SMU error, drop needless break.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: workaround for audio noise issue
Kenneth Feng [Thu, 11 Mar 2021 04:19:57 +0000 (12:19 +0800)]
drm/amd/pm: workaround for audio noise issue

On some Intel platforms, audio noise can be detected due to
high pcie speed switch latency.
This patch leaverages ppfeaturemask to fix to the highest pcie
speed then disable pcie switching.

v2:
coding style fix

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update secure display TA header
Jinzhou Su [Tue, 9 Mar 2021 02:52:14 +0000 (10:52 +0800)]
drm/amdgpu: update secure display TA header

update secure display TA header file.

Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu:disable XGMI TA unload for A+A aldebaran
Feifei Xu [Wed, 10 Mar 2021 14:04:32 +0000 (22:04 +0800)]
drm/amdgpu:disable XGMI TA unload for A+A aldebaran

In gpu reset, XGMI TA unload will cause gpu hang.
Skip it on A+A aldebaran.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: remove duplicate include in amdgpu_dm.c
Anson Jacob [Wed, 10 Mar 2021 15:24:20 +0000 (10:24 -0500)]
drm/amd/display: remove duplicate include in amdgpu_dm.c

'drm/drm_hdcp.h' included in 'amdgpu_dm.c' is duplicated.

Reported-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Anson Jacob <Anson.Jacob@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoRevert "drm/amd/display: remove duplicate include in amdgpu_dm.c"
Anson Jacob [Wed, 10 Mar 2021 15:20:42 +0000 (10:20 -0500)]
Revert "drm/amd/display: remove duplicate include in amdgpu_dm.c"

This reverts commit 51713e4e540b1adb49d4024323e43abb39a89577.

The duplicate from #79 should be removed instead.

Signed-off-by: Anson Jacob <Anson.Jacob@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Enable light SBR for SMU on passthrough and XGMI configuration
shaoyunl [Wed, 10 Mar 2021 17:51:39 +0000 (12:51 -0500)]
drm/amdgpu: Enable light SBR for SMU on passthrough and XGMI configuration

SMU introduce the new interface to enable light Secondary Bus Reset mode, driver
enable it on passthrough + XGMI configuration

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add LightSBR SMU MSG support
shaoyunl [Wed, 10 Mar 2021 17:03:37 +0000 (12:03 -0500)]
drm/amd/pm: Add LightSBR SMU MSG support

This new MSG provide the interface for driver to enable/disable the Light Secondary Bus Reset
support from SMU. When enabled, SMU will only do minimum NBIO response to the SBR request and
leave the real HW reset to be handled by driver later. When disabled (default state),SMU will
pass the request to PSP for a HW reset

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: skip read eeprom for device that pending on XGMI reset
shaoyunl [Wed, 10 Mar 2021 01:02:42 +0000 (20:02 -0500)]
drm/amdgpu: skip read eeprom for device that pending on XGMI reset

Read eeprom through SMU doesn't works stable on XGMI reset during test.
skip it for now

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Remove unused defines
Qingqing Zhuo [Wed, 10 Mar 2021 17:15:29 +0000 (12:15 -0500)]
drm/amd/display: Remove unused defines

[Why]
CONFIG_DRM_AMD_DC_DCN3_0 has been folded into
CONFIG_DRM_AMD_DC_DCN and is not needed.

[How]
Drop CONFIG_DRM_AMD_DC_DCN3_0.

Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m
Alex Deucher [Wed, 10 Mar 2021 03:58:47 +0000 (22:58 -0500)]
drm/amdgpu: fix S0ix handling when the CONFIG_AMD_PMC=m

Need to check the module variant as well.

Acked-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/radeon: keep __user during cast
Christian König [Mon, 8 Mar 2021 18:15:42 +0000 (19:15 +0100)]
drm/radeon: keep __user during cast

Silence static checker warning.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/radeon: fix AGP dependency
Christian König [Mon, 8 Mar 2021 18:22:13 +0000 (19:22 +0100)]
drm/radeon: fix AGP dependency

When AGP is compiled as module radeon must be compiled as module as
well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/radeon: also init GEM funcs in radeon_gem_prime_import_sg_table
Christian König [Mon, 8 Mar 2021 18:35:14 +0000 (19:35 +0100)]
drm/radeon: also init GEM funcs in radeon_gem_prime_import_sg_table

Otherwise we will run into a NULL ptr deref.

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=212137
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: correct the watermark settings for Polaris
Evan Quan [Fri, 5 Mar 2021 06:21:26 +0000 (14:21 +0800)]
drm/amd/pm: correct the watermark settings for Polaris

The "/ 10" should be applied to the right-hand operand instead of
the left-hand one.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Noticed-by: Georgios Toptsidis <gtoptsid@gmail.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f
shaoyunl [Tue, 9 Mar 2021 15:30:15 +0000 (10:30 -0500)]
drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f

This recent change introduce SDMA interrupt info printing with irq->process function.
These functions do not require a set function to enable/disable the irq

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: bug fix for pcie dpm
Kenneth Feng [Tue, 9 Mar 2021 13:10:16 +0000 (21:10 +0800)]
drm/amd/pm: bug fix for pcie dpm

Currently the pcie dpm has two problems.
1. Only the high dpm level speed/width can be overrided
if the requested values are out of the pcie capability.
2. The high dpm level is always overrided though sometimes
it's not necesarry.

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add ih waiter on process until checkpoint
Jonathan Kim [Tue, 23 Feb 2021 20:10:33 +0000 (15:10 -0500)]
drm/amdgpu: add ih waiter on process until checkpoint

Add IH function to allow caller to wait until ring entries are processed
until the checkpoint write pointer.

This will be primarily used by HMM to drain pending page fault interrupts
before memory unmap to prevent HMM from handling stale interrupts.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/display: Implement functions to let DC allocate GPU memory
Zhan Liu [Tue, 9 Mar 2021 01:20:44 +0000 (20:20 -0500)]
drm/amdgpu/display: Implement functions to let DC allocate GPU memory

[Why]
DC needs to communicate with PM FW through GPU memory. In order
to do so we need to be able to allocate memory from within DC.

[How]
Call amdgpu_bo_create_kernel to allocate GPU memory and use a
list in amdgpu_display_manager to track our allocations so we
can clean them up later.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/display: Use wm_table.entries for dcn301 calculate_wm
Zhan Liu [Tue, 9 Mar 2021 01:28:22 +0000 (20:28 -0500)]
drm/amdgpu/display: Use wm_table.entries for dcn301 calculate_wm

[Why]
For DGPU Navi, the wm_table.nv_entries are used. These entires are not
populated for DCN301 Vangogh APU, but instead wm_table.entries are.

[How]
Use DCN21 Renoir style wm calculations.

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: fb BO should be ttm_bo_type_device
Nirmoy Das [Mon, 8 Mar 2021 14:22:22 +0000 (15:22 +0100)]
drm/amdgpu: fb BO should be ttm_bo_type_device

FB BO should not be ttm_bo_type_kernel type and
amdgpufb_create_pinned_object() pins the FB BO anyway.

Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Reset the devices in the XGMI hive duirng probe
shaoyunl [Tue, 16 Feb 2021 17:50:42 +0000 (12:50 -0500)]
drm/amdgpu: Reset the devices in the XGMI hive duirng probe

In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices
without sync to each other. This could cause device hang since for XGMI configuration, all the devices
within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue
by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't
do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init
to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add reset_list for device list used for reset
shaoyunl [Fri, 5 Mar 2021 02:58:29 +0000 (21:58 -0500)]
drm/amdgpu: Add reset_list for device list used for reset

The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it
for reset purpose will prevent the reset function to adjust XGMI device list which is required
in next change

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Init the cp MQD if it's not be initialized before
shaoyunl [Tue, 16 Feb 2021 16:27:04 +0000 (11:27 -0500)]
drm/amdgpu: Init the cp MQD if it's not be initialized before

The MQD might not be initialized duirng first init period if the device need to be reset
druing probe. Driver need to proper init them in gpu recovery period

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add kfd init_complete flag to check from amdgpu side
shaoyunl [Tue, 16 Feb 2021 15:57:17 +0000 (10:57 -0500)]
drm/amdgpu: Add kfd init_complete flag to check from amdgpu side

amdgpu driver may be in reset state during init which will not initialize the kfd,
driver need to initialize the KFD after reset by check the flag

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoRevert "drm/amdgpu: add psp RAP L0 check support"
Kevin Wang [Tue, 9 Mar 2021 10:28:59 +0000 (18:28 +0800)]
Revert "drm/amdgpu: add psp RAP L0 check support"

This reverts commit d86fd724e59af96ae5ab6630f4f07a076e9b80cd.

Disable PSP RAP L0 self test until to RAP feature ready.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Verify bo size can fit framebuffer size on init.
Mark Yacoub [Mon, 8 Mar 2021 21:36:22 +0000 (16:36 -0500)]
drm/amdgpu: Verify bo size can fit framebuffer size on init.

To initialize the framebuffer, call drm_gem_fb_init_with_funcs which
verifies that the BO size can fit the FB size by calculating the minimum
expected size of each plane.

The bug was caught using igt-gpu-tools test: kms_addfb_basic.too-high
and kms_addfb_basic.bo-too-small

Tested on ChromeOS Zork by turning on the display and running a YT
video.

=== Changes from v1 ===
1. Added new line under declarations.
2. Use C style comment.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Mark Yacoub <markyacoub@chromium.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: remove duplicate include in dcn21 and gpio
Zhang Yunkai [Sat, 6 Mar 2021 11:05:25 +0000 (03:05 -0800)]
drm/amd/display: remove duplicate include in dcn21 and gpio

'dce110_resource.h' included in 'dcn21_resource.c' is duplicated.
'hw_gpio.h' included in 'hw_factory_dce110.c' is duplicated.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: remove duplicate include in amdgpu_dm.c
Zhang Yunkai [Sat, 6 Mar 2021 10:47:20 +0000 (02:47 -0800)]
drm/amd/display: remove duplicate include in amdgpu_dm.c

'drm/drm_hdcp.h' included in 'amdgpu_dm.c' is duplicated.
It is also included in the 79th line.

Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/swsmu: fix error return code of smu_v11_0_set_allowed_mask()
Jia-Ju Bai [Fri, 5 Mar 2021 03:54:28 +0000 (19:54 -0800)]
drm/amdgpu/swsmu: fix error return code of smu_v11_0_set_allowed_mask()

When bitmap_empty() or feature->feature_num triggers an error,
no error return code of smu_v11_0_set_allowed_mask() is assigned.
To fix this bug, ret is assigned with -EINVAL as error return code.

Reviewed-by: Evan Quan <evan.quan@amd.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Align cursor cache address to 2KB
Joshua Aberback [Sat, 27 Feb 2021 00:44:24 +0000 (19:44 -0500)]
drm/amd/display: Align cursor cache address to 2KB

[Why]
The registers for the address of the cursor are aligned to 2KB, so all
cursor surfaces also need to be aligned to 2KB. Currently, the
provided cursor cache surface is not aligned, so we need a workaround
until alignment is enforced by the surface provider.

[How]
 - round up surface address to nearest multiple of 2048
 - current policy is to provide a much bigger cache size than
   necessary,so this operation is safe

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Revert dram_clock_change_latency for DCN2.1
Sung Lee [Fri, 26 Feb 2021 18:20:43 +0000 (13:20 -0500)]
drm/amd/display: Revert dram_clock_change_latency for DCN2.1

[WHY & HOW]
Using values provided by DF for latency may cause hangs in
multi display configurations. Revert change to previous value.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Haonan Wang <Haonan.Wang2@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: 3.2.126
Aric Cyr [Mon, 1 Mar 2021 14:10:45 +0000 (09:10 -0500)]
drm/amd/display: 3.2.126

DC version 3.2.126 brings improvements in multiple areas.
In summary, we highlight:

- DMUB fixes
- Firmware relase 0.0.55
- Expanded dmub_cmd documentation
- Enhancements in DCN30

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Added multi instance support for panel control
Jake Wang [Tue, 23 Feb 2021 22:46:55 +0000 (17:46 -0500)]
drm/amd/display: Added multi instance support for panel control

[Why]
Panel control always programs instance 0. With multi eDP we need to
support multiple instances.

[How]
Use link index to set different instances for panel control.
Refactored LVTMA control to support multiple instances.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Jake Wang <haonan.wang2@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: [FW Promotion] Release 0.0.55
Anthony Koo [Fri, 26 Feb 2021 23:15:55 +0000 (18:15 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.55

Add comments to better describe the function of different cmds
and parameters in the dmub interface

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fixed read/write pointer issue for get dmub trace
Yongqiang Sun [Fri, 26 Feb 2021 16:07:37 +0000 (11:07 -0500)]
drm/amd/display: Fixed read/write pointer issue for get dmub trace

[Why]
Driver get wrap around dmub trace data due to read pointer being
increased incorrectly when there are multiple interrupt
queues with very short interval

[How]
Check read/write pointer before copying data from ring buffer

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fix warning
Qingqing Zhuo [Wed, 24 Feb 2021 04:05:20 +0000 (23:05 -0500)]
drm/amd/display: Fix warning

[Why]
- Wrong scope for ifdef
- Missing struct description

[How]
Move ifdef and add comment

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Read all the trace entry if it is not empty
Yongqiang Sun [Tue, 23 Feb 2021 20:16:44 +0000 (15:16 -0500)]
drm/amd/display: Read all the trace entry if it is not empty

[Why]
If interval of two interrupt from dmub outbox0 is too short,
some event might be skipped

[How]
Compare read pointer and write pointer until all the event
entry is processed

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Enable pflip interrupt upon pipe enable
Qingqing Zhuo [Fri, 19 Feb 2021 22:17:50 +0000 (17:17 -0500)]
drm/amd/display: Enable pflip interrupt upon pipe enable

[Why]
pflip interrupt would not be enabled promptly if a pipe is disabled
and re-enabled, causing flip_done timeout error during DP
compliance tests

[How]
Enable pflip interrupt upon pipe enablement

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fix dmub trace event not update issue
Yongqiang Sun [Tue, 23 Feb 2021 14:57:21 +0000 (09:57 -0500)]
drm/amd/display: Fix dmub trace event not update issue

[Why & How]
Reference to read pointer which is incorrect.
Change to reference to write pointer.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Move define from internal header to dmub_cmd.h
Yongqiang Sun [Mon, 22 Feb 2021 17:30:18 +0000 (12:30 -0500)]
drm/amd/display: Move define from internal header to dmub_cmd.h

[Why & How]
Fix linux compile error

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Fix typo when retrieving dppclk from UEFI config
Martin Leung [Mon, 22 Feb 2021 18:46:43 +0000 (13:46 -0500)]
drm/amd/display: Fix typo when retrieving dppclk from UEFI config

[why]
In some boot configurations we need to retrieve the currently
UEFI-set dppclk, but there was a typo in the calculation

[how]
Fix typo to make dpp_clk calculate off dpp_clk divider instead of
disp_clk

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Martin Leung <martin.leung@amd.com>
Reviewed-by: Sung Lee <Sung.Lee@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Skip powerstate DC hw access if virtual dal
Martin Leung [Fri, 19 Feb 2021 22:40:09 +0000 (17:40 -0500)]
drm/amd/display: Skip powerstate DC hw access if virtual dal

[Why]
On baco-enabled systems running virtual dal, can get set power
state when hw is not initialized

[How]
Skip DC hw part of setPowerState when hw not available

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Martin Leung <martin.leung@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/display: Enabled pipe harvesting in dcn30
Dillon Varone [Fri, 19 Feb 2021 23:15:30 +0000 (18:15 -0500)]
drm/amd/display: Enabled pipe harvesting in dcn30

[Why & How]
Ported logic from dcn21 for reading in pipe fusing to dcn30.
Supported configurations are 1 and 6 pipes. Invalid fusing
will revert to 1 pipe being enabled.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Check if FB BAR is enabled for ROM read
Lijo Lazar [Tue, 2 Mar 2021 08:28:43 +0000 (16:28 +0800)]
drm/amdgpu: Check if FB BAR is enabled for ROM read

Some configurations don't have FB BAR enabled. Avoid reading ROM image
from FB BAR region in such cases.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Remove min/max overload of pp_dpm_sclk
Lijo Lazar [Mon, 8 Mar 2021 06:16:32 +0000 (14:16 +0800)]
drm/amd/pm: Remove min/max overload of pp_dpm_sclk

To maintain consistency with legacy usage, remove min/max clock overload
of pp_dpm_sclk node.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Enable pp_od_clk_voltage node on aldebaran
Lijo Lazar [Mon, 8 Mar 2021 05:57:17 +0000 (13:57 +0800)]
drm/amd/pm: Enable pp_od_clk_voltage node on aldebaran

Use pp_od_clk_voltage node to enable performance determinism and GFX
clock min/max range for aldebaran. This is to avoid overload of
pp_dpm_sclk and maintain consistency in user lib interfaces.

Ex: To enable perf determinism at 900MHz max gfx clock

1) echo perf_determinism > /sys/bus/pci/devices/.../power_dpm_force_performance_level
2) echo s 1 900 > /sys/bus/pci/devices/.../pp_od_clk_voltage
3) echo c > /sys/bus/pci/devices/.../pp_od_clk_voltage

Ex: To enable min 500MHz/max 900MHz gfx clocks

1) echo manual > "/sys/bus/pci/devices/.../power_dpm_force_performance_level"
2) echo s 0 500 > "/sys/bus/pci/devices/.../pp_od_clk_voltage"
3) echo s 1 900 > "/sys/bus/pci/devices/.../pp_od_clk_voltage”
4) echo c > "/sys/bus/pci/devices/.../pp_od_clk_voltage”

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Set GTT_USWC flag to enable freesync v2
Shashank Sharma [Sat, 13 Feb 2021 16:37:24 +0000 (22:07 +0530)]
drm/amdgpu: Set GTT_USWC flag to enable freesync v2

This patch sets 'AMDGPU_GEM_CREATE_CPU_GTT_USWC' as input
parameter flag, during object creation of an imported DMA
buffer.

In absence of this flag:
1. Function amdgpu_display_supported_domains() doesn't add
   AMDGPU_GEM_DOMAIN_GTT as supported domain.
2. Due to which, Function amdgpu_display_user_framebuffer_create()
   refuses to create framebuffer for imported DMA buffers.
3. Due to which, AddFB() IOCTL fails.
4. Due to which, amdgpu_present_check_flip() check fails in DDX
5. Due to which DDX driver doesn't allow flips (goes to blitting)
6. Due to which setting Freesync/VRR property fails for PRIME buffers.

So, this patch finally enables Freesync with PRIME buffer offloading.

v2 (chk): instead of just checking the flag we copy it over if the
          exporter is an amdgpu device as well.

Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: clean-up unused variable
Shashank Sharma [Sat, 13 Feb 2021 16:37:25 +0000 (22:07 +0530)]
drm/amdgpu: clean-up unused variable

Variable 'bp' seems to be unused residue from previous
logic, and is not required anymore.

Cc: Koenig Christian <christian.koenig@amd.com>
Cc: Deucher Alexander <alexander.deucher@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agoRevert freesync video patches temporarily
Aurabindo Pillai [Sun, 7 Mar 2021 01:59:14 +0000 (20:59 -0500)]
Revert freesync video patches temporarily

This temporarily reverts freesync video patches since it causes regression with
eDP displays. This patch is a squashed revert of the following patches:

6f59f229f8ed ("drm/amd/display: Skip modeset for front porch change")
d10cd527f5e5 ("drm/amd/display: Add freesync video modes based on preferred modes")
0eb1af2e8205 ("drm/amd/display: Add module parameter for freesync video mode")

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Anson Jacob <anson.jacob@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Fix UBSAN shift-out-of-bounds warning
Anson Jacob [Wed, 3 Mar 2021 17:33:15 +0000 (12:33 -0500)]
drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

If get_num_sdma_queues or get_num_xgmi_sdma_queues is 0, we end up
doing a shift operation where the number of bits shifted equals
number of bits in the operand. This behaviour is undefined.

Set num_sdma_queues or num_xgmi_sdma_queues to ULLONG_MAX, if the
count is >= number of bits in the operand.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1472

Reported-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Anson Jacob <Anson.Jacob@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Increase PSP runtime TMR region size
Oak Zeng [Mon, 1 Mar 2021 23:36:19 +0000 (17:36 -0600)]
drm/amdgpu: Increase PSP runtime TMR region size

Aldebaran uses more than 4M runtime TMR. The current
hard coded 4M TMR is not big enough for Aldebaran.
Increase it to 8M.

v2: Only do 8M size for ALDEBARAN (Hawking)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: apply uncached flag for aldebaran
Eric Huang [Wed, 13 May 2020 18:29:59 +0000 (14:29 -0400)]
drm/amdkfd: apply uncached flag for aldebaran

The flag is only applied on fine-grained memory.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: set snoop bit in pde/pte entries for A+A
Eric Huang [Sat, 27 Feb 2021 22:46:44 +0000 (17:46 -0500)]
drm/amdgpu: set snoop bit in pde/pte entries for A+A

Page tables in vram mapping to cpu is changed from uncached to
cached in A+A, the snoop bit in VM_CONTEXTx_PAGE_TABLE_BASE_ADDR/
PDE0s/PDE1s/PDE2s/PTE.TFs has to be set so gpuvm walker snoop
page table data out of CPU cache.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: set CPU mapping of vram as cached for A+A mode
Eric Huang [Sat, 27 Feb 2021 21:51:19 +0000 (16:51 -0500)]
drm/amdgpu: set CPU mapping of vram as cached for A+A mode

New A+A HW supports cached vram mapped to cpu.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: harvest edc status when connected to host via xGMI
Dennis Li [Thu, 4 Feb 2021 05:32:05 +0000 (13:32 +0800)]
drm/amdgpu: harvest edc status when connected to host via xGMI

When connected to a host via xGMI, system fatal errors may trigger
warm reset, driver has no change to query edc status before reset.
Therefore in this case, driver should harvest previous error loging
registers during boot, instead of only resetting them.

v2:
1. IP's ras_manager object is created when its ras feature is enabled,
so change to query edc status after amdgpu_ras_late_init called

2. change to enable watchdog timer after finishing gfx edc init

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reivewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Make noretry the default on Aldebaran
Felix Kuehling [Thu, 11 Feb 2021 21:02:05 +0000 (16:02 -0500)]
drm/amdgpu: Make noretry the default on Aldebaran

This is needed for best machine learning performance. XNACK can still
be enabled per-process if needed.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update default timeout of Aldebaran SQ watchdog
Harish Kasiviswanathan [Tue, 23 Feb 2021 17:42:18 +0000 (12:42 -0500)]
drm/amdgpu: update default timeout of Aldebaran SQ watchdog

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reivewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: add new data in metrics table
Kenneth Feng [Fri, 5 Mar 2021 21:41:45 +0000 (16:41 -0500)]
drm/amd/pm: add new data in metrics table

Export new data in the metrics table for gfx and memory
utilization counter, and each hbm temperature as well.

v2:
change the metrics table version to v1.1

v3:
fix the coding style
v4:
rebase against latest kernel

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add psp RAP L0 check support
Kevin Wang [Mon, 8 Feb 2021 03:00:03 +0000 (11:00 +0800)]
drm/amdgpu: add psp RAP L0 check support

add PSP RAP L0 check when RAP TA is loaded.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: change psp_rap_invoke() function return value
Kevin Wang [Sun, 7 Feb 2021 13:09:59 +0000 (21:09 +0800)]
drm/amdgpu: change psp_rap_invoke() function return value

RAP TA is an optional firmware. if it doesn’t exist,
the driver should bypass psp_rap_invoke() function.

1. bypass psp_rap_invoke() when RAP TA is not loaded.
2. add new parameter (status) to query RAP TA status.
   (the status value is different with psp_ta_invoke(),
3. fix the 'rap_status' MThread critical problem.
   (used without lock)

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: add aldebaran serial number support
Kevin Wang [Fri, 5 Feb 2021 11:52:24 +0000 (19:52 +0800)]
drm/amd/pm: add aldebaran serial number support

add aldebaran serial number support.
(serial number from metrics table)

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Let KFD use more VMIDs on Aldebaran
Felix Kuehling [Wed, 10 Feb 2021 02:26:14 +0000 (21:26 -0500)]
drm/amdgpu: Let KFD use more VMIDs on Aldebaran

When there is no graphics support, KFD can use more of the VMIDs. Graphics
VMIDs are only used for video decoding/encoding and post processing. With
two VCE engines, there is no reason to reserve more than 2 VMIDs for that.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable watchdog feature for SQ of aldebaran
Dennis Li [Fri, 5 Mar 2021 21:30:54 +0000 (16:30 -0500)]
drm/amdgpu: enable watchdog feature for SQ of aldebaran

SQ's watchdog timer monitors forward progress, a mask of which waves
caused the watchdog timeout is recorded into ras status registers and
then trigger a system fatal error event.

v2:
1. change *query_timeout_status to *query_sq_timeout_status.
2. move query_sq_timeout_status into amdgpu_ras_do_recovery.
3. add module parameters to enable/disable fatal error event and modify
the watchdog timer.

v3:
1. remove unused parameters of *enable_watchdog_timer

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: refine ras codes for GC utc of aldebaran
Dennis Li [Wed, 27 Jan 2021 06:36:15 +0000 (14:36 +0800)]
drm/amdgpu: refine ras codes for GC utc of aldebaran

The bank number of both VML2 and ATCL2 are changed to 8, so refine
related codes to avoid defining long name arrays.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add ras support for gfx of aldebaran
Dennis Li [Tue, 26 Jan 2021 02:50:41 +0000 (10:50 +0800)]
drm/amdgpu: add ras support for gfx of aldebaran

add edc counter/status reset and query functions for gfx block of
aldebaran.

v2: change to clear edc counter explicitly
aldebaran hardware will not clear edc counter after driver reading them,
so driver should clear them explicitly.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add gc powerbrake support (v2)
Kevin Wang [Fri, 15 Jan 2021 06:51:07 +0000 (14:51 +0800)]
drm/amdgpu: add gc powerbrake support (v2)

add GC power brake feature support for Aldebaran.

v2: squash in fixes (Alex)

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update TCP_CHAN_STEER_1 golden value for aldebaran
Hawking Zhang [Thu, 12 Nov 2020 08:55:05 +0000 (16:55 +0800)]
drm/amdgpu: update TCP_CHAN_STEER_1 golden value for aldebaran

The golden setting was changed recently. update to
the latest one

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add common gc golden settings for aldebaran
Hawking Zhang [Wed, 11 Nov 2020 12:07:18 +0000 (20:07 +0800)]
drm/amdgpu: add common gc golden settings for aldebaran

golden settings that should be applied

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: apply gc v9_4_2 golden settings for aldebaran
Hawking Zhang [Mon, 19 Oct 2020 12:54:25 +0000 (20:54 +0800)]
drm/amdgpu: apply gc v9_4_2 golden settings for aldebaran

Those registers should be programmed as one-time initialization

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)
Jonathan Kim [Fri, 21 Aug 2020 07:02:49 +0000 (15:02 +0800)]
drm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)

Initialization of TRAP_DATA0/1 is still required for the debugger to detect
new waves on Aldebaran.  Also, per-vmid global trap enablement may be
required outside of debugger scope so move to init phase.

v2: just add the gfx 9.4.2 changes (Alex)

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: add aldebaran kfd2kgd callbacks to kfd device (v2)
Jonathan Kim [Sat, 5 Sep 2020 15:32:59 +0000 (23:32 +0800)]
drm/amdkfd: add aldebaran kfd2kgd callbacks to kfd device (v2)

Create dedicated Aldebaran kfd2kgd callbacks to prepare
for new per-vmid register instructions for debug trap
setting functions and sending host traps.

v2: rebase (Alex)

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Check HIQ's MQD for queue preemption status
Oak Zeng [Tue, 7 Jul 2020 23:29:37 +0000 (18:29 -0500)]
drm/amdkfd: Check HIQ's MQD for queue preemption status

MEC firmware can silently fail the queue preemption request
without time out. In this case, HIQ's MQD's queue_doorbell_id
will be set. Check this field to see whether last queue preemption
was successful or not.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Jay Cornwall <Jay.Cornwall@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Add kernel parameter to stop queue eviction on vm fault
Oak Zeng [Tue, 23 Jun 2020 00:27:45 +0000 (19:27 -0500)]
drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault

This is to keep wavefront context for debug purpose

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: allow use psp to load firmware (v2)
Hawking Zhang [Mon, 13 Apr 2020 07:49:27 +0000 (15:49 +0800)]
drm/amdgpu: allow use psp to load firmware (v2)

Match existing asics.

v2: rebase (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Enable user min/max gfxclk on aldebaran
Lijo Lazar [Thu, 4 Feb 2021 10:51:32 +0000 (18:51 +0800)]
drm/amd/pm: Enable user min/max gfxclk on aldebaran

Aldebaran has fine grained DPM for GFXCLK. Instead of a discrete level,
user can specify a min/max range of GFXCLK for any profiling/tuning
purpose.This option is available only in manual performance level mode.
Select "manual" as power_dpm_force_performance_level and specify the
min/max range using pp_dpm_sclk sysfs node. User cannot specify a min/max
range outside of the default min/max range of the ASIC. If specified
outside the range, values will be bound by the default min/max range.

Ex: To use gfxclk min = 600MHz and max = 900MHz

echo manual > /sys/bus/pci/devices/.../power_dpm_force_performance_level
echo min 600 max 900 > /sys/bus/pci/devices/.../pp_dpm_sclk

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: remove aldebaran serial number support
Kevin Wang [Thu, 4 Feb 2021 09:53:25 +0000 (17:53 +0800)]
drm/amd/pm: remove aldebaran serial number support

the following message is not supported.

PPSMC_MSG_ReadSerialNumTop32
PPSMC_MSG_ReadSerialNumBottom32

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: use pd addr based on gart level page table
Alex Sierra [Thu, 4 Feb 2021 01:02:20 +0000 (19:02 -0600)]
drm/amdgpu: use pd addr based on gart level page table

With a recent gart page table re-construction, the gart page
table is now 2-level for some ASICs: PDB0->PTB.
In the case of 2-level gart page table, the page_table_base
of vmid0 should point to PDB0 instead of PTB.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Fix the comment in amdgpu_gmc.h
Oak Zeng [Wed, 3 Feb 2021 17:09:02 +0000 (11:09 -0600)]
drm/amdgpu: Fix the comment in amdgpu_gmc.h

More accurate words are used to address a
code review feedback

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Fix GART page table s-bit
Oak Zeng [Sat, 23 Jan 2021 03:51:39 +0000 (21:51 -0600)]
drm/amdgpu: Fix GART page table s-bit

For the new 2-level GART table, the last PDE0 points
to PTB. Since PTB is in vram and right now we are
runing under s=0 mode (vram is treated as FB carveout),
so the s bit of this PDE0 should be set to 0.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update mmhub client ids for Aldebaran
Alex Sierra [Tue, 2 Feb 2021 18:24:30 +0000 (12:24 -0600)]
drm/amdgpu: update mmhub client ids for Aldebaran

update mmhub client id table for Aldebaran.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable sram initialization for aldebaran
Dennis Li [Mon, 1 Feb 2021 08:18:31 +0000 (16:18 +0800)]
drm/amdgpu: enable sram initialization for aldebaran

Aldebaran can share the same initializing shader code witn
arcturus.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: workaround the TMR MC address issue (v2)
Oak Zeng [Tue, 26 Jan 2021 19:51:36 +0000 (13:51 -0600)]
drm/amdgpu: workaround the TMR MC address issue (v2)

With the 2-level gart page table,  vram is squeezed into gart aperture
and FB aperture is disabled. Therefore all VRAM virtual addresses are
 in the GART aperture. However currently PSP requires TMR addresses
in FB aperture. So we need some design change at PSP FW level to support
this 2-level gart table driver change. Right now this PSP FW support
doesn't exist. To workaround this issue temporarily, FB aperture is
added back and the gart aperture address is converted back to FB aperture
for this PSP TMR address.

Will revert it after we get a fix from PSP FW.

v2: squash in tmr fix for other asics (Kevin)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: HW setup of 2-level vmid0 page table
Oak Zeng [Fri, 18 Sep 2020 04:12:56 +0000 (23:12 -0500)]
drm/amdgpu: HW setup of 2-level vmid0 page table

Set up HW for 2-level vmid0 page table: 1. Set up
PAGE_TABLE_START/END registers. Currently only plan
to do 2-level page table for ALDEBARAN, so only gfxhub1.0
and mmhub1.7 is changed. 2. Set page table base register.
For 2-level page table, the page table base should point
to PDB0. 3. Disable AGP and FB aperture as they are not
used.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Set up vmid0 PDB0
Oak Zeng [Fri, 18 Sep 2020 04:04:29 +0000 (23:04 -0500)]
drm/amdgpu: Set up vmid0 PDB0

If use gart for FB translation, allocate and fill
PDB0.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add function to allocate and fill PDB0
Oak Zeng [Fri, 18 Sep 2020 03:53:54 +0000 (22:53 -0500)]
drm/amdgpu: Add function to allocate and fill PDB0

Add functions to allocate PDB0, map it for CPU access,
and fill it.

Those functions are only used for 2-level vmid0 page
table construction

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Use different gart table parameters for 2-level gart table
Oak Zeng [Fri, 18 Sep 2020 01:32:56 +0000 (20:32 -0500)]
drm/amdgpu: Use different gart table parameters for 2-level gart table

If use gart for FB translation, we will squeeze vram into
sysvm aperture. This requires 2 level gart table. Add
page table depth and page table block size parameters
to gmc. This is prepare work to 2-level gart table
construction

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Placement of gart and vram in sysvm aperture
Oak Zeng [Wed, 16 Sep 2020 02:08:50 +0000 (21:08 -0500)]
drm/amdgpu: Placement of gart and vram in sysvm aperture

If use GART for FB translation, place both vram and gart to sysvm
aperture. AGP aperture is not set up in this case because it
is not used

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Modify comments of vram_start/end
Oak Zeng [Tue, 15 Sep 2020 19:47:30 +0000 (14:47 -0500)]
drm/amdgpu: Modify comments of vram_start/end

Modify the comment to reflect the fact that, if
use GART for vram address translation for vmid0,
[vram_start, vram_end] will be placed inside SYSVM
aperture, together with GART.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Moved gart_size calculation to mc_init functions
Oak Zeng [Sat, 3 Oct 2020 01:03:11 +0000 (20:03 -0500)]
drm/amdgpu: Moved gart_size calculation to mc_init functions

In amdgpu_gmc_gart_location function, gart_size is adjusted
by a smu_prv_buffer_size. This logic shouldn't belong to
this function. Move the logic to the mc_init functions

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Use physical translation mode to access page table
Oak Zeng [Sat, 23 Jan 2021 17:34:45 +0000 (11:34 -0600)]
drm/amdgpu: Use physical translation mode to access page table

On A+A platform, CPU write page directory and page table in cached
mode. So it is necessary for page table walker to snoop CPU cache.
This setting is necessary for page walker to snoop page directory
and page table data out of CPU cache.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Don't reserve vram as WC for A+A
Oak Zeng [Fri, 22 Jan 2021 19:00:06 +0000 (13:00 -0600)]
drm/amdgpu: Don't reserve vram as WC for A+A

On A+A platform, vram can be mapped as WB. Not necessarily
to always map vram as WC on such platform.

Calling function arch_io_reserve_memtype_wc will mark the
whole vram region as WC. So don't call it for A+A platform.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Correct msg status check for powerlimit
Lijo Lazar [Fri, 29 Jan 2021 08:24:05 +0000 (16:24 +0800)]
drm/amd/pm: Correct msg status check for powerlimit

Status 0 indicates success, fix the check before using PPTable limit

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>`
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Enable performance determinism on aldebaran
Lijo Lazar [Fri, 5 Mar 2021 21:02:49 +0000 (16:02 -0500)]
drm/amd/pm: Enable performance determinism on aldebaran

Performance Determinism is a new mode in Aldebaran where PMFW tries to
maintain sustained performance level. It can be enabled on a per-die
basis on aldebaran. To guarantee that it remains within the power cap,
a max GFX frequency needs to be specified in this mode. A new
power_dpm_force_performance_level, "perf_determinism", is defined to enable
this mode in amdgpu. The max frequency (in MHz) can be specified through
pp_dpm_sclk. The mode will be disabled once any other performance level
is chosen.

Ex: To enable perf determinism at 900Mhz max gfx clock

echo perf_determinism > /sys/bus/pci/devices/.../power_dpm_force_performance_level
echo max 900 > /sys/bus/pci/devices/.../pp_dpm_sclk

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add DCBTC support for aldebaran
Lijo Lazar [Thu, 28 Jan 2021 10:53:27 +0000 (18:53 +0800)]
drm/amd/pm: Add DCBTC support for aldebaran

On aldebaran DCBTC should be run after enabling DPM. DCBTC won't be run
if support is not enabled in PPTable. Without PPTable support the message
is dummy and will return success always.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Fix power limit query on aldebaran
Lijo Lazar [Thu, 28 Jan 2021 10:20:06 +0000 (18:20 +0800)]
drm/amd/pm: Fix power limit query on aldebaran

Aldebaran doesn't have AC/DC power limits. Separate the implementation
from SMU13. Max power limit is queried from PPTable.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: mask the xgmi number of hops reported from psp to kfd
Jonathan Kim [Wed, 27 Jan 2021 20:24:59 +0000 (15:24 -0500)]
drm/amdgpu: mask the xgmi number of hops reported from psp to kfd

The psp supplies the link type in the upper 2 bits of the psp xgmi node
information num_hops field.  With a new link type, Aldebaran has these
bits set to a non-zero value (1 = xGMI3) so the KFD topology will report
the incorrect IO link weights without proper masking.
The actual number of hops is located in the 3 least significant bits of
this field so mask if off accordingly before passing it to the KFD.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Amber Lin <amber.lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable 48-bit IH timestamp counter
Alex Sierra [Fri, 15 Jan 2021 23:03:18 +0000 (17:03 -0600)]
drm/amdgpu: enable 48-bit IH timestamp counter

By default this timestamp is 32 bit counter. It gets
overflowed in around 10 minutes.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>