platform/kernel/linux-starfive.git
3 years agodrm/amdgpu: set snoop bit in pde/pte entries for A+A
Eric Huang [Sat, 27 Feb 2021 22:46:44 +0000 (17:46 -0500)]
drm/amdgpu: set snoop bit in pde/pte entries for A+A

Page tables in vram mapping to cpu is changed from uncached to
cached in A+A, the snoop bit in VM_CONTEXTx_PAGE_TABLE_BASE_ADDR/
PDE0s/PDE1s/PDE2s/PTE.TFs has to be set so gpuvm walker snoop
page table data out of CPU cache.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: set CPU mapping of vram as cached for A+A mode
Eric Huang [Sat, 27 Feb 2021 21:51:19 +0000 (16:51 -0500)]
drm/amdgpu: set CPU mapping of vram as cached for A+A mode

New A+A HW supports cached vram mapped to cpu.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: harvest edc status when connected to host via xGMI
Dennis Li [Thu, 4 Feb 2021 05:32:05 +0000 (13:32 +0800)]
drm/amdgpu: harvest edc status when connected to host via xGMI

When connected to a host via xGMI, system fatal errors may trigger
warm reset, driver has no change to query edc status before reset.
Therefore in this case, driver should harvest previous error loging
registers during boot, instead of only resetting them.

v2:
1. IP's ras_manager object is created when its ras feature is enabled,
so change to query edc status after amdgpu_ras_late_init called

2. change to enable watchdog timer after finishing gfx edc init

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reivewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Make noretry the default on Aldebaran
Felix Kuehling [Thu, 11 Feb 2021 21:02:05 +0000 (16:02 -0500)]
drm/amdgpu: Make noretry the default on Aldebaran

This is needed for best machine learning performance. XNACK can still
be enabled per-process if needed.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update default timeout of Aldebaran SQ watchdog
Harish Kasiviswanathan [Tue, 23 Feb 2021 17:42:18 +0000 (12:42 -0500)]
drm/amdgpu: update default timeout of Aldebaran SQ watchdog

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reivewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: add new data in metrics table
Kenneth Feng [Fri, 5 Mar 2021 21:41:45 +0000 (16:41 -0500)]
drm/amd/pm: add new data in metrics table

Export new data in the metrics table for gfx and memory
utilization counter, and each hbm temperature as well.

v2:
change the metrics table version to v1.1

v3:
fix the coding style
v4:
rebase against latest kernel

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add psp RAP L0 check support
Kevin Wang [Mon, 8 Feb 2021 03:00:03 +0000 (11:00 +0800)]
drm/amdgpu: add psp RAP L0 check support

add PSP RAP L0 check when RAP TA is loaded.

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: change psp_rap_invoke() function return value
Kevin Wang [Sun, 7 Feb 2021 13:09:59 +0000 (21:09 +0800)]
drm/amdgpu: change psp_rap_invoke() function return value

RAP TA is an optional firmware. if it doesn’t exist,
the driver should bypass psp_rap_invoke() function.

1. bypass psp_rap_invoke() when RAP TA is not loaded.
2. add new parameter (status) to query RAP TA status.
   (the status value is different with psp_ta_invoke(),
3. fix the 'rap_status' MThread critical problem.
   (used without lock)

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: add aldebaran serial number support
Kevin Wang [Fri, 5 Feb 2021 11:52:24 +0000 (19:52 +0800)]
drm/amd/pm: add aldebaran serial number support

add aldebaran serial number support.
(serial number from metrics table)

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Let KFD use more VMIDs on Aldebaran
Felix Kuehling [Wed, 10 Feb 2021 02:26:14 +0000 (21:26 -0500)]
drm/amdgpu: Let KFD use more VMIDs on Aldebaran

When there is no graphics support, KFD can use more of the VMIDs. Graphics
VMIDs are only used for video decoding/encoding and post processing. With
two VCE engines, there is no reason to reserve more than 2 VMIDs for that.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable watchdog feature for SQ of aldebaran
Dennis Li [Fri, 5 Mar 2021 21:30:54 +0000 (16:30 -0500)]
drm/amdgpu: enable watchdog feature for SQ of aldebaran

SQ's watchdog timer monitors forward progress, a mask of which waves
caused the watchdog timeout is recorded into ras status registers and
then trigger a system fatal error event.

v2:
1. change *query_timeout_status to *query_sq_timeout_status.
2. move query_sq_timeout_status into amdgpu_ras_do_recovery.
3. add module parameters to enable/disable fatal error event and modify
the watchdog timer.

v3:
1. remove unused parameters of *enable_watchdog_timer

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: refine ras codes for GC utc of aldebaran
Dennis Li [Wed, 27 Jan 2021 06:36:15 +0000 (14:36 +0800)]
drm/amdgpu: refine ras codes for GC utc of aldebaran

The bank number of both VML2 and ATCL2 are changed to 8, so refine
related codes to avoid defining long name arrays.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add ras support for gfx of aldebaran
Dennis Li [Tue, 26 Jan 2021 02:50:41 +0000 (10:50 +0800)]
drm/amdgpu: add ras support for gfx of aldebaran

add edc counter/status reset and query functions for gfx block of
aldebaran.

v2: change to clear edc counter explicitly
aldebaran hardware will not clear edc counter after driver reading them,
so driver should clear them explicitly.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add gc powerbrake support (v2)
Kevin Wang [Fri, 15 Jan 2021 06:51:07 +0000 (14:51 +0800)]
drm/amdgpu: add gc powerbrake support (v2)

add GC power brake feature support for Aldebaran.

v2: squash in fixes (Alex)

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update TCP_CHAN_STEER_1 golden value for aldebaran
Hawking Zhang [Thu, 12 Nov 2020 08:55:05 +0000 (16:55 +0800)]
drm/amdgpu: update TCP_CHAN_STEER_1 golden value for aldebaran

The golden setting was changed recently. update to
the latest one

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add common gc golden settings for aldebaran
Hawking Zhang [Wed, 11 Nov 2020 12:07:18 +0000 (20:07 +0800)]
drm/amdgpu: add common gc golden settings for aldebaran

golden settings that should be applied

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: apply gc v9_4_2 golden settings for aldebaran
Hawking Zhang [Mon, 19 Oct 2020 12:54:25 +0000 (20:54 +0800)]
drm/amdgpu: apply gc v9_4_2 golden settings for aldebaran

Those registers should be programmed as one-time initialization

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)
Jonathan Kim [Fri, 21 Aug 2020 07:02:49 +0000 (15:02 +0800)]
drm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)

Initialization of TRAP_DATA0/1 is still required for the debugger to detect
new waves on Aldebaran.  Also, per-vmid global trap enablement may be
required outside of debugger scope so move to init phase.

v2: just add the gfx 9.4.2 changes (Alex)

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: add aldebaran kfd2kgd callbacks to kfd device (v2)
Jonathan Kim [Sat, 5 Sep 2020 15:32:59 +0000 (23:32 +0800)]
drm/amdkfd: add aldebaran kfd2kgd callbacks to kfd device (v2)

Create dedicated Aldebaran kfd2kgd callbacks to prepare
for new per-vmid register instructions for debug trap
setting functions and sending host traps.

v2: rebase (Alex)

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Check HIQ's MQD for queue preemption status
Oak Zeng [Tue, 7 Jul 2020 23:29:37 +0000 (18:29 -0500)]
drm/amdkfd: Check HIQ's MQD for queue preemption status

MEC firmware can silently fail the queue preemption request
without time out. In this case, HIQ's MQD's queue_doorbell_id
will be set. Check this field to see whether last queue preemption
was successful or not.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Jay Cornwall <Jay.Cornwall@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Add kernel parameter to stop queue eviction on vm fault
Oak Zeng [Tue, 23 Jun 2020 00:27:45 +0000 (19:27 -0500)]
drm/amdkfd: Add kernel parameter to stop queue eviction on vm fault

This is to keep wavefront context for debug purpose

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: allow use psp to load firmware (v2)
Hawking Zhang [Mon, 13 Apr 2020 07:49:27 +0000 (15:49 +0800)]
drm/amdgpu: allow use psp to load firmware (v2)

Match existing asics.

v2: rebase (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Enable user min/max gfxclk on aldebaran
Lijo Lazar [Thu, 4 Feb 2021 10:51:32 +0000 (18:51 +0800)]
drm/amd/pm: Enable user min/max gfxclk on aldebaran

Aldebaran has fine grained DPM for GFXCLK. Instead of a discrete level,
user can specify a min/max range of GFXCLK for any profiling/tuning
purpose.This option is available only in manual performance level mode.
Select "manual" as power_dpm_force_performance_level and specify the
min/max range using pp_dpm_sclk sysfs node. User cannot specify a min/max
range outside of the default min/max range of the ASIC. If specified
outside the range, values will be bound by the default min/max range.

Ex: To use gfxclk min = 600MHz and max = 900MHz

echo manual > /sys/bus/pci/devices/.../power_dpm_force_performance_level
echo min 600 max 900 > /sys/bus/pci/devices/.../pp_dpm_sclk

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: remove aldebaran serial number support
Kevin Wang [Thu, 4 Feb 2021 09:53:25 +0000 (17:53 +0800)]
drm/amd/pm: remove aldebaran serial number support

the following message is not supported.

PPSMC_MSG_ReadSerialNumTop32
PPSMC_MSG_ReadSerialNumBottom32

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: use pd addr based on gart level page table
Alex Sierra [Thu, 4 Feb 2021 01:02:20 +0000 (19:02 -0600)]
drm/amdgpu: use pd addr based on gart level page table

With a recent gart page table re-construction, the gart page
table is now 2-level for some ASICs: PDB0->PTB.
In the case of 2-level gart page table, the page_table_base
of vmid0 should point to PDB0 instead of PTB.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Fix the comment in amdgpu_gmc.h
Oak Zeng [Wed, 3 Feb 2021 17:09:02 +0000 (11:09 -0600)]
drm/amdgpu: Fix the comment in amdgpu_gmc.h

More accurate words are used to address a
code review feedback

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Fix GART page table s-bit
Oak Zeng [Sat, 23 Jan 2021 03:51:39 +0000 (21:51 -0600)]
drm/amdgpu: Fix GART page table s-bit

For the new 2-level GART table, the last PDE0 points
to PTB. Since PTB is in vram and right now we are
runing under s=0 mode (vram is treated as FB carveout),
so the s bit of this PDE0 should be set to 0.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update mmhub client ids for Aldebaran
Alex Sierra [Tue, 2 Feb 2021 18:24:30 +0000 (12:24 -0600)]
drm/amdgpu: update mmhub client ids for Aldebaran

update mmhub client id table for Aldebaran.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable sram initialization for aldebaran
Dennis Li [Mon, 1 Feb 2021 08:18:31 +0000 (16:18 +0800)]
drm/amdgpu: enable sram initialization for aldebaran

Aldebaran can share the same initializing shader code witn
arcturus.

Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: workaround the TMR MC address issue (v2)
Oak Zeng [Tue, 26 Jan 2021 19:51:36 +0000 (13:51 -0600)]
drm/amdgpu: workaround the TMR MC address issue (v2)

With the 2-level gart page table,  vram is squeezed into gart aperture
and FB aperture is disabled. Therefore all VRAM virtual addresses are
 in the GART aperture. However currently PSP requires TMR addresses
in FB aperture. So we need some design change at PSP FW level to support
this 2-level gart table driver change. Right now this PSP FW support
doesn't exist. To workaround this issue temporarily, FB aperture is
added back and the gart aperture address is converted back to FB aperture
for this PSP TMR address.

Will revert it after we get a fix from PSP FW.

v2: squash in tmr fix for other asics (Kevin)

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: HW setup of 2-level vmid0 page table
Oak Zeng [Fri, 18 Sep 2020 04:12:56 +0000 (23:12 -0500)]
drm/amdgpu: HW setup of 2-level vmid0 page table

Set up HW for 2-level vmid0 page table: 1. Set up
PAGE_TABLE_START/END registers. Currently only plan
to do 2-level page table for ALDEBARAN, so only gfxhub1.0
and mmhub1.7 is changed. 2. Set page table base register.
For 2-level page table, the page table base should point
to PDB0. 3. Disable AGP and FB aperture as they are not
used.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Set up vmid0 PDB0
Oak Zeng [Fri, 18 Sep 2020 04:04:29 +0000 (23:04 -0500)]
drm/amdgpu: Set up vmid0 PDB0

If use gart for FB translation, allocate and fill
PDB0.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add function to allocate and fill PDB0
Oak Zeng [Fri, 18 Sep 2020 03:53:54 +0000 (22:53 -0500)]
drm/amdgpu: Add function to allocate and fill PDB0

Add functions to allocate PDB0, map it for CPU access,
and fill it.

Those functions are only used for 2-level vmid0 page
table construction

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Use different gart table parameters for 2-level gart table
Oak Zeng [Fri, 18 Sep 2020 01:32:56 +0000 (20:32 -0500)]
drm/amdgpu: Use different gart table parameters for 2-level gart table

If use gart for FB translation, we will squeeze vram into
sysvm aperture. This requires 2 level gart table. Add
page table depth and page table block size parameters
to gmc. This is prepare work to 2-level gart table
construction

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Placement of gart and vram in sysvm aperture
Oak Zeng [Wed, 16 Sep 2020 02:08:50 +0000 (21:08 -0500)]
drm/amdgpu: Placement of gart and vram in sysvm aperture

If use GART for FB translation, place both vram and gart to sysvm
aperture. AGP aperture is not set up in this case because it
is not used

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Modify comments of vram_start/end
Oak Zeng [Tue, 15 Sep 2020 19:47:30 +0000 (14:47 -0500)]
drm/amdgpu: Modify comments of vram_start/end

Modify the comment to reflect the fact that, if
use GART for vram address translation for vmid0,
[vram_start, vram_end] will be placed inside SYSVM
aperture, together with GART.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Moved gart_size calculation to mc_init functions
Oak Zeng [Sat, 3 Oct 2020 01:03:11 +0000 (20:03 -0500)]
drm/amdgpu: Moved gart_size calculation to mc_init functions

In amdgpu_gmc_gart_location function, gart_size is adjusted
by a smu_prv_buffer_size. This logic shouldn't belong to
this function. Move the logic to the mc_init functions

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Use physical translation mode to access page table
Oak Zeng [Sat, 23 Jan 2021 17:34:45 +0000 (11:34 -0600)]
drm/amdgpu: Use physical translation mode to access page table

On A+A platform, CPU write page directory and page table in cached
mode. So it is necessary for page table walker to snoop CPU cache.
This setting is necessary for page walker to snoop page directory
and page table data out of CPU cache.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Acked-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Don't reserve vram as WC for A+A
Oak Zeng [Fri, 22 Jan 2021 19:00:06 +0000 (13:00 -0600)]
drm/amdgpu: Don't reserve vram as WC for A+A

On A+A platform, vram can be mapped as WB. Not necessarily
to always map vram as WC on such platform.

Calling function arch_io_reserve_memtype_wc will mark the
whole vram region as WC. So don't call it for A+A platform.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian Konig <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Correct msg status check for powerlimit
Lijo Lazar [Fri, 29 Jan 2021 08:24:05 +0000 (16:24 +0800)]
drm/amd/pm: Correct msg status check for powerlimit

Status 0 indicates success, fix the check before using PPTable limit

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>`
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Enable performance determinism on aldebaran
Lijo Lazar [Fri, 5 Mar 2021 21:02:49 +0000 (16:02 -0500)]
drm/amd/pm: Enable performance determinism on aldebaran

Performance Determinism is a new mode in Aldebaran where PMFW tries to
maintain sustained performance level. It can be enabled on a per-die
basis on aldebaran. To guarantee that it remains within the power cap,
a max GFX frequency needs to be specified in this mode. A new
power_dpm_force_performance_level, "perf_determinism", is defined to enable
this mode in amdgpu. The max frequency (in MHz) can be specified through
pp_dpm_sclk. The mode will be disabled once any other performance level
is chosen.

Ex: To enable perf determinism at 900Mhz max gfx clock

echo perf_determinism > /sys/bus/pci/devices/.../power_dpm_force_performance_level
echo max 900 > /sys/bus/pci/devices/.../pp_dpm_sclk

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add DCBTC support for aldebaran
Lijo Lazar [Thu, 28 Jan 2021 10:53:27 +0000 (18:53 +0800)]
drm/amd/pm: Add DCBTC support for aldebaran

On aldebaran DCBTC should be run after enabling DPM. DCBTC won't be run
if support is not enabled in PPTable. Without PPTable support the message
is dummy and will return success always.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Fix power limit query on aldebaran
Lijo Lazar [Thu, 28 Jan 2021 10:20:06 +0000 (18:20 +0800)]
drm/amd/pm: Fix power limit query on aldebaran

Aldebaran doesn't have AC/DC power limits. Separate the implementation
from SMU13. Max power limit is queried from PPTable.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: mask the xgmi number of hops reported from psp to kfd
Jonathan Kim [Wed, 27 Jan 2021 20:24:59 +0000 (15:24 -0500)]
drm/amdgpu: mask the xgmi number of hops reported from psp to kfd

The psp supplies the link type in the upper 2 bits of the psp xgmi node
information num_hops field.  With a new link type, Aldebaran has these
bits set to a non-zero value (1 = xGMI3) so the KFD topology will report
the incorrect IO link weights without proper masking.
The actual number of hops is located in the 3 least significant bits of
this field so mask if off accordingly before passing it to the KFD.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Amber Lin <amber.lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable 48-bit IH timestamp counter
Alex Sierra [Fri, 15 Jan 2021 23:03:18 +0000 (17:03 -0600)]
drm/amdgpu: enable 48-bit IH timestamp counter

By default this timestamp is 32 bit counter. It gets
overflowed in around 10 minutes.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable retry fault wptr overflow
Philip Yang [Tue, 22 Sep 2020 17:09:33 +0000 (13:09 -0400)]
drm/amdgpu: enable retry fault wptr overflow

If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done while
handling retry fault.

Remove VMC from IH storm client, enable ring1 write pointer overflow,
then IH will drop retry fault interrupts and be able to receive other
interrupts while driver is handling retry fault.

IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Use free system memory size for kfd memory accounting
Oak Zeng [Mon, 18 Jan 2021 20:55:34 +0000 (14:55 -0600)]
drm/amdgpu: Use free system memory size for kfd memory accounting

With the current kfd memory accounting scheme, kfd applications
can use up to 15/16 of total system memory. For system which
has small total system memory size it leaves small system memory
for OS. For example, if the system has totally 16GB of system
memory, this scheme leave OS and non-kfd applications only 1GB
of system memory. In many cases, this leads to OOM killer.

This patch changed the KFD system memory accounting scheme.
15/16 of free system memory when kfd driver load. This deduct
the system memory that OS already use.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: apply new pmfw loading sequence to arcturus and onwards
Hawking Zhang [Wed, 20 Jan 2021 11:49:05 +0000 (19:49 +0800)]
drm/amdgpu: apply new pmfw loading sequence to arcturus and onwards

Arcturus and onwards products should follow the same sequence
that have pmfw loading ahead of tmr setup

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Fix aldebaran MMHUB CG/LS logic
Lijo Lazar [Mon, 18 Jan 2021 14:44:16 +0000 (22:44 +0800)]
drm/amdgpu: Fix aldebaran MMHUB CG/LS logic

Aldebaran MMHUB CG/LS logic is controlled by VBIOS. Enable the state
change logic only if driver is used for control.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Enable CP idle interrupts
Lijo Lazar [Sat, 16 Jan 2021 05:57:55 +0000 (13:57 +0800)]
drm/amdgpu: Enable CP idle interrupts

v1: The interrupts need to be enabled to move to DS clocks.
v2: Don't enable GFX IDLE interrupts if there are no GFX rings.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/pm: Remove redundant generic message index
Lijo Lazar [Tue, 12 Jan 2021 11:21:04 +0000 (19:21 +0800)]
drm/amdgpu/pm: Remove redundant generic message index

Remove SMU_MSG_GfxDriverReset generic index.
Always use SMU_MSG_GfxDeviceDriverReset as the generic index for reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/pm: Fix reset message mapping on aldebaran
Lijo Lazar [Tue, 12 Jan 2021 11:19:52 +0000 (19:19 +0800)]
drm/amdgpu/pm: Fix reset message mapping on aldebaran

Use the correct mapping for mode-reset messages on aldebaran

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu/pm: Remove unsupported MP1 messages from aldebaran
Lijo Lazar [Mon, 11 Jan 2021 05:51:52 +0000 (13:51 +0800)]
drm/amdgpu/pm: Remove unsupported MP1 messages from aldebaran

PrepareMp1Reset and SoftReset messages are not supported on aldebaran.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add clock gating support for aldebaran
Lijo Lazar [Fri, 5 Mar 2021 20:58:04 +0000 (15:58 -0500)]
drm/amdgpu: Add clock gating support for aldebaran

Aldebaran clock gating support for GFX,SDMA,IH blocks
VCN/JPEG blocks are excluded in this patch, to be enabled later

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add mmhub client ids for aldebaran
Alex Deucher [Tue, 5 Jan 2021 21:21:31 +0000 (16:21 -0500)]
drm/amdgpu: add mmhub client ids for aldebaran

Add the mmhub client id table for aldebaran.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable dpg indirect sram mode on aldebaran
James Zhu [Thu, 17 Dec 2020 21:18:14 +0000 (16:18 -0500)]
drm/amdgpu: enable dpg indirect sram mode on aldebaran

Enable dpg indirect sram mode on aldebaran.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable vcn dpg mode on aldebaran
James Zhu [Thu, 17 Dec 2020 21:14:08 +0000 (16:14 -0500)]
drm/amdgpu: enable vcn dpg mode on aldebaran

Enable vcn dpg mode on aldebaran

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: enable vcn and jpeg on aldebaran
James Zhu [Thu, 17 Dec 2020 21:12:08 +0000 (16:12 -0500)]
drm/amdgpu: enable vcn and jpeg on aldebaran

Enable vcn and jpeg 2.6 on aldebaran.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Enable swsmu block on aldebaran
Lijo Lazar [Tue, 22 Dec 2020 13:43:49 +0000 (21:43 +0800)]
drm/amdgpu: Enable swsmu block on aldebaran

Enable smu13 block on aldebaran

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: switch to cached noretry setting for aldebaran
Hawking Zhang [Thu, 31 Dec 2020 04:50:56 +0000 (12:50 +0800)]
drm/amdgpu: switch to cached noretry setting for aldebaran

global noretry setting now is cached to gmc.noretry

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdkfd: Fix saving the ACC vgprs for Aldebaran
Laurent Morichetti [Tue, 22 Dec 2020 19:42:46 +0000 (11:42 -0800)]
drm/amdkfd: Fix saving the ACC vgprs for Aldebaran

get_num_acc_vgprs does not set status.scc if the number of acc vgprs
is 0, so use an and instruction to set the condition code.

The Aldebaran handler binary was not based on the latest version of
the sources, so this update to the binary is the minimal change only
adding two instructions to set the condition code.

A newer version of the handler should be generated and tested in
another commit.

Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Set no fan control flag as needed.
Lijo Lazar [Wed, 9 Dec 2020 13:06:16 +0000 (21:06 +0800)]
drm/amd/pm: Set no fan control flag as needed.

For GPUs that don't support fan control, set the no fan control flag so
that they don't appear in hwmon sensors.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: bypass hdp read cache invalidation for aldebaran (v2)
Hawking Zhang [Tue, 22 Dec 2020 07:55:35 +0000 (15:55 +0800)]
drm/amdgpu: bypass hdp read cache invalidation for aldebaran (v2)

hdp read cache is removed in aldebaran. don't issue
an mmio write or write data packet to hardware.

v2: rebase

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Aldebaran doesn't use semaphore
Amber Lin [Mon, 14 Dec 2020 20:21:23 +0000 (15:21 -0500)]
drm/amdgpu: Aldebaran doesn't use semaphore

Simplify all Aldebaran DIDs into one ASIC type.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: UTLC1 RB SDMA timeout on Aldebaran
Alex Sierra [Tue, 15 Dec 2020 00:15:42 +0000 (18:15 -0600)]
drm/amdgpu: UTLC1 RB SDMA timeout on Aldebaran

[Why]
This causes infinite retries on the UTCL1 RB, preventing
higher priority RB such as paging RB.

[How]
Set to one the SDMAx_UTLC1_TIMEOUT registers for all SDMAs.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdpgu: add ATOM_DGPU_VRAM_TYPE_HBM2E vram type
Feifei Xu [Wed, 16 Dec 2020 04:41:27 +0000 (12:41 +0800)]
drm/amdpgu: add ATOM_DGPU_VRAM_TYPE_HBM2E vram type

0x61 is assigned to HBM2E in atom_dgpu_vram_type.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: retire aldebaran gpu_info firmware
Hawking Zhang [Mon, 16 Nov 2020 08:15:30 +0000 (16:15 +0800)]
drm/amdgpu: retire aldebaran gpu_info firmware

driver should use the gfx_info atomfirmware interface

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: query aldebaran gfx_config through atomfirmware i/f
Hawking Zhang [Fri, 13 Nov 2020 06:35:39 +0000 (14:35 +0800)]
drm/amdgpu: query aldebaran gfx_config through atomfirmware i/f

For ASICs that don't support ip discovery feature, query
gfx configuration through atomfirmware interface, rather
than gpu_info firmware.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Remove CPU virtual address notification in aldebaran
Lijo Lazar [Sat, 28 Nov 2020 10:09:55 +0000 (18:09 +0800)]
drm/amd/pm: Remove CPU virtual address notification in aldebaran

PPSMC_MSG_SetSystemVirtualDramAddrHigh/Low messages are not handled by
PMFW in aldebaran

Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Kenneth Feng <Kenneth.Feng@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add support to override pptable id for aldebaran
Lijo Lazar [Sat, 28 Nov 2020 09:31:08 +0000 (17:31 +0800)]
drm/amd/pm: Add support to override pptable id for aldebaran

Temporarily force to use BU PPTable defined in VBIOS. Add support to
override PPTable defined by module parameter.Add FW reported version to
kernel log.

Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Kenneth Feng <Kenneth.Feng@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/amdgpu: Add smu_pptable module parameter
Lijo Lazar [Sat, 28 Nov 2020 09:06:54 +0000 (17:06 +0800)]
drm/amd/amdgpu: Add smu_pptable module parameter

Temporarily add smu_pptable module parameter for aldebaran.This is used
to force soft PPTable use overriding any VBIOS PPTable.

Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm: Add atom_smc_dpm_info_v4_10 for aldebaran
Lijo Lazar [Sat, 28 Nov 2020 08:38:56 +0000 (16:38 +0800)]
drm/amd/pm: Add atom_smc_dpm_info_v4_10 for aldebaran

Add atom_smc_dpm_info_v4_10 that defines board parameters for aldebaran

Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Kenneth Feng <Kenneth.Feng@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Don't do FB resize under A+A config
Oak Zeng [Sun, 22 Nov 2020 03:13:19 +0000 (21:13 -0600)]
drm/amdgpu: Don't do FB resize under A+A config

Disable PCIe BAR resizing on A+A config. It's not needed because we won't use the
PCIe BAR, but it breaks the PCI BAR configuration with the current SBIOS.

Error message of FB BAR resize failure under A+A:

[  154.913731] [drm:amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem resizing BAR0 (-22).

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.kuehling@amd.com>
Reviewed-by: Christian Koenig <Christian.Koenig@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: pre-map device buffer as cached for A+A config
Oak Zeng [Sat, 21 Nov 2020 04:18:10 +0000 (22:18 -0600)]
drm/amdgpu: pre-map device buffer as cached for A+A config

For A+A configuration, device memory is supposed to be mapped as
cachable from CPU side. For kernel pre-map gpu device memory using
ioremap_cache

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Christian Koenig <Christian.Koenig@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: update atom_firmware_info_v3_4 (v2)
Feifei Xu [Tue, 8 Dec 2020 15:51:40 +0000 (23:51 +0800)]
drm/amdgpu: update atom_firmware_info_v3_4 (v2)

v1: Added some pspbl parameters
v2: fix fallthrough issue

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Lazar Lijo <Lijo.Lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amd/pm:add aldebaran support for getting bootup values
Feifei Xu [Thu, 26 Nov 2020 11:04:51 +0000 (19:04 +0800)]
drm/amd/pm:add aldebaran support for getting bootup values

for SMU config.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: disallow use semaphore on aldebaran
Hawking Zhang [Mon, 23 Nov 2020 21:23:36 +0000 (05:23 +0800)]
drm/amdgpu: disallow use semaphore on aldebaran

shall revisit the change later

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: switch to vega20 ih block for aldebaran
Hawking Zhang [Mon, 30 Nov 2020 16:20:35 +0000 (00:20 +0800)]
drm/amdgpu: switch to vega20 ih block for aldebaran

replace vega10 ih block with vega20 ih block for
aldebaran.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: correct IH_CHICKEN programming for aldebaran
Hawking Zhang [Tue, 1 Dec 2020 15:50:51 +0000 (23:50 +0800)]
drm/amdgpu: correct IH_CHICKEN programming for aldebaran

For aldebaran, psp firmware won't program IH_CHICKEN.
it now depends on driver to program it properly so
either bus address or gpu virtual address is just
working for ih ring.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Dennis Li <Dennis.Li@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add mmhub error status query callback for aldebaran
Hawking Zhang [Thu, 19 Nov 2020 08:44:55 +0000 (16:44 +0800)]
drm/amdgpu: add mmhub error status query callback for aldebaran

The callback will be invoked to query mmea error
status when needed.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add mmhub ras error reset callback for aldebaran
Hawking Zhang [Thu, 19 Nov 2020 08:40:16 +0000 (16:40 +0800)]
drm/amdgpu: add mmhub ras error reset callback for aldebaran

The callback will be invoked to reset mmhub ras error
counters when needed.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add mmhub ras error query callback for aldebaran
Hawking Zhang [Thu, 19 Nov 2020 08:35:51 +0000 (16:35 +0800)]
drm/amdgpu: add mmhub ras error query callback for aldebaran

The callback will be invoked to harvest all kinds
of mmhub ras error

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add sdma ras error reset callback for aldebaran
Hawking Zhang [Wed, 18 Nov 2020 16:25:09 +0000 (00:25 +0800)]
drm/amdgpu: add sdma ras error reset callback for aldebaran

The callback will be invoked to reset sdma ras error
counters when needed.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add sdma ras error query callback for aldebaran
Hawking Zhang [Wed, 18 Nov 2020 15:55:11 +0000 (23:55 +0800)]
drm/amdgpu: add sdma ras error query callback for aldebaran

The callback will be invoked to harvest all kinds
of sdma ras error

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: add sdma v4_4 ras function
Hawking Zhang [Wed, 18 Nov 2020 13:14:59 +0000 (21:14 +0800)]
drm/amdgpu: add sdma v4_4 ras function

sdma ras function is the main structure to support
sdma ras on aldebaran. the patch initializes late_init
late_fini callbacks.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Dennis Li<Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: apply sdma golden settings for aldebaran
Hawking Zhang [Wed, 18 Nov 2020 10:28:08 +0000 (18:28 +0800)]
drm/amdgpu: apply sdma golden settings for aldebaran

perform one-time initialization for sdma registers

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: use physical_node_id to calculate aper_base
Hawking Zhang [Tue, 17 Nov 2020 07:51:29 +0000 (15:51 +0800)]
drm/amdgpu: use physical_node_id to calculate aper_base

Similar as xgmi connected gpu nodes, physical_node_id
* segment_size should be used to calculate the offset
of aper_base.

The asic type check is redundant. once physical_node_id
and segment_size are initialized, it should be count
on.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: skip gds ras workaround for aldebaran
Hawking Zhang [Mon, 16 Nov 2020 08:00:59 +0000 (16:00 +0800)]
drm/amdgpu: skip gds ras workaround for aldebaran

there won't be any gds useage in either kernel or
pm4 anymore for aldebaran.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: init gds for aldebaran
Hawking Zhang [Mon, 16 Nov 2020 07:54:36 +0000 (15:54 +0800)]
drm/amdgpu: init gds for aldebaran

aldebaran removed gds internal memory for atomic usage.
it only supports gws opcode in kernel like barrier,
semaphore.etc. there won't be usage of gds in either
kernel or pm4 packet. max_wave_id should also be marked
as deprecated for aldebaran.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: correct vram_info for HBM2E
Feifei Xu [Mon, 30 Nov 2020 10:57:19 +0000 (18:57 +0800)]
drm/amdgpu: correct vram_info for HBM2E

correct atom_vram_info_header_v2_6 and its vram_module.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: support get_vram_info atomfirmware i/f for aldebaran
Hawking Zhang [Fri, 13 Nov 2020 10:03:07 +0000 (18:03 +0800)]
drm/amdgpu: support get_vram_info atomfirmware i/f for aldebaran

Query vram_type, channel_num, channel_width
information through atomfirmware i/f

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu:return true for mode1_reset_support on aldebaran
Feifei Xu [Thu, 19 Nov 2020 12:04:37 +0000 (20:04 +0800)]
drm/amdgpu:return true for mode1_reset_support on aldebaran

Will remove once validation finished.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu:add smu mode1/2 support for aldebaran
Feifei Xu [Thu, 19 Nov 2020 10:12:26 +0000 (18:12 +0800)]
drm/amdgpu:add smu mode1/2 support for aldebaran

Use MSG_GfxDriverReset for mode reset and retire MSG_Mode1Reset.
Centralize soc15_asic_mode1_reset() and nv_asic_mode1_reset()functions.
Add mode2_reset_is_support() for smu->ppt_funcs.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: Add DID for aldebaran
Feifei Xu [Thu, 12 Nov 2020 06:24:51 +0000 (14:24 +0800)]
drm/amdgpu: Add DID for aldebaran

Add 0x7408,0x740C,0x740F in pciidlist.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: added support for register list loading (v2)
John Clements [Wed, 18 Nov 2020 06:25:40 +0000 (14:25 +0800)]
drm/amdgpu: added support for register list loading (v2)

call host to  psp cmd to load reg list

v2: update to latest interface (Alex)

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: added register list driver ctx (v2)
John Clements [Wed, 18 Nov 2020 06:24:52 +0000 (14:24 +0800)]
drm/amdgpu: added register list driver ctx (v2)

updated psp bin parsing and load register list

v2: update to latest interface (Alex)

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: updated host to psp mailbox cmd (v2)
John Clements [Wed, 18 Nov 2020 06:24:12 +0000 (14:24 +0800)]
drm/amdgpu: updated host to psp mailbox cmd (v2)

added host to psp cmd for register list

v2: update to new interface (Alex)

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: declare smuio v13_0 callbacks as static
Hawking Zhang [Mon, 7 Dec 2020 16:46:18 +0000 (00:46 +0800)]
drm/amdgpu: declare smuio v13_0 callbacks as static

fix -Wmissing-protoypes warning

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: initialize external rev_id for aldebaran
Hawking Zhang [Thu, 12 Nov 2020 02:34:58 +0000 (10:34 +0800)]
drm/amdgpu: initialize external rev_id for aldebaran

add exteranal rev_id for aldebaran

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 years agodrm/amdgpu: declare sdma firmware binary file for aldebaran
Kevin Wang [Wed, 9 Sep 2020 05:56:44 +0000 (13:56 +0800)]
drm/amdgpu: declare sdma firmware binary file for aldebaran

declare sdma firmware binary file for aldebaran

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>