Hawking Zhang [Thu, 30 Dec 2021 11:21:43 +0000 (19:21 +0800)]
drm/amdgpu: add osssys v6_0_0 ip headers v4
Add osssys v6_0_0 register offset and shift masks
header files (Hawking)
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Likun Gao [Mon, 4 Apr 2022 20:55:18 +0000 (16:55 -0400)]
drm/amdgpu/discovery: add NBIO 4.3 Support
Enable NBIO 4.3 on asics where it is present.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Mon, 4 Apr 2022 21:28:13 +0000 (17:28 -0400)]
drm/amdgpu: add nbio v4_3_0 ip block v2
This adds nbio v4_3_0 ip block support
Changed from v1:
use WREG32_SOC15/RREG32_SOC15 instead of
WREG32_PCIE/RREG32_PCIE
remove the programming of PCIE_CONFIG_CNTL
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Thu, 30 Dec 2021 11:18:08 +0000 (19:18 +0800)]
drm/amdgpu: add nbio v4_3_0 ip headers v6
Add nbio v4_3_0 register offset and shift masks
header files (Hawking)
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Likun Gao [Mon, 4 Apr 2022 21:04:55 +0000 (17:04 -0400)]
drm/amdgpu/discovery: add soc21 common Support
Enable soc21 common support on asics where it is present.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 19 Apr 2022 17:03:12 +0000 (13:03 -0400)]
drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT
A faulty receiver might report an erroneous channel count. We
should guard against reading beyond AUDIO_CHANNELS_COUNT as
that would overflow the dpcd_pattern_period array.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 27 Apr 2022 22:50:49 +0000 (18:50 -0400)]
drm/amdgpu: Free user pages if amdgpu_cs_parser_bos failed
Otherwise userspace resubmit the BOs again will trigger kernel WARNING
and fail the command submission.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Robert Święcki <robert@swiecki.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Wed, 27 Apr 2022 10:02:45 +0000 (18:02 +0800)]
drm/amdgpu: Fix build warning for TA debugfs interface
Remove the redundant codes to fix build warning
when CONFIG_DEBUG_FS is disabled.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Fri, 1 Apr 2022 18:41:06 +0000 (14:41 -0400)]
drm/amdgpu: add soc21 common ip block v2
This adds soc21 common ip block support
Changed from v1:
Switch WREG32/RREG32_PCIE to use indirect reg access
helper for sco15 and onwards
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stanley.Yang [Wed, 4 Aug 2021 07:43:17 +0000 (15:43 +0800)]
drm/amdgpu: add new write field for soc21
add new write field macro to handle soc21
registers with reg prefix
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sat, 8 Jan 2022 09:18:37 +0000 (17:18 +0800)]
drm/amdgpu: add nbio callback to query rom offset
Add nbio callback func used to query rom offset.
Used to query the rom offset for fetching the vbios.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Thu, 30 Dec 2021 08:56:09 +0000 (16:56 +0800)]
drm/amdgpu: add gc v11_0_0 ip headers v11
Add gc v11_0_0 register offset and shift masks
header files (Hawking)
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Thu, 30 Dec 2021 11:12:28 +0000 (19:12 +0800)]
drm/amdgpu: add mp v13_0_0 ip headers v7
Add mp v13_0_0 register offset and shift masks
header files (Hawking)
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sun, 23 Jan 2022 10:47:47 +0000 (18:47 +0800)]
drm/amdgpu: update query ref clk from bios
Handle atom_gfx_info_v3_0 structure.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Thu, 7 Apr 2022 05:49:55 +0000 (01:49 -0400)]
drm/amdgpu: update gc info from bios table
Handle newer gc info tables.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sun, 23 Jan 2022 09:40:30 +0000 (17:40 +0800)]
drm/amdgpu: add atom_gfx_info_v3_0 structure
atomfirmware table used for newer gfx IPs.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sat, 5 Feb 2022 09:24:11 +0000 (17:24 +0800)]
drm/amdgpu: support query vram_info v3_0
vram_info table provides various vram information
including vram_vendor, vram_type, vram_width, etc.
v2: correct the calculation of vram_width
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Sat, 5 Feb 2022 07:50:18 +0000 (15:50 +0800)]
drm/amdgpu: add vram_info v3_0 structure
To support query vram_width, vram_type, vram_vendor
information
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Mon, 28 Feb 2022 10:35:02 +0000 (18:35 +0800)]
drm/amdgpu: switch to atomfirmware_asic_init
Some initial settings now are not available from
the atom data table. The assumption that !ps[0]
|| !ps[1] in amdgpu_atom_asic_init is not valid.
In addition, driver needs to strictly follow
atomfirmware structure (asic_init_parameters) to
initialize parameters used to execute asic_init
function, otherwise, the execution of asic_init
would fail.
This shall be applicable to all soc15 adapters,but
let make the transition on soc21 first.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hawking Zhang [Tue, 1 Mar 2022 08:29:21 +0000 (16:29 +0800)]
drm/amdgpu: add helper to execute atomfirmware asic_init
Add helper function to execute atomfirmware asic_init
from the cmd table
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 30 Mar 2022 05:09:48 +0000 (01:09 -0400)]
drm/amdgpu/discovery: move all table parsing into amdgpu_discovery.c
This data has no dependencies, so encapsulate it all within
amdgpu_discovery.c.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 30 Mar 2022 04:48:25 +0000 (00:48 -0400)]
drm/amdgpu/discovery: add a function to parse the vcn info table
To get the codec disable fuse mask.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 30 Mar 2022 05:23:33 +0000 (01:23 -0400)]
drm/amdgpu/discovery: add additional validation
Check the table signatures and checksums and verify that
the tables exist before accessing them.
v2: disable MALL table for now
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 29 Mar 2022 21:23:49 +0000 (17:23 -0400)]
drm/amdgpu/discovery: add a function to get the mall_size
Add a function to fetch the mall size from the IP discovery
table. Properly handle harvest configurations where more
or less cache may be available.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 29 Mar 2022 22:04:15 +0000 (18:04 -0400)]
drm/amdgpu/discovery: handle UMC harvesting in IP discovery
Check the harvesting table to determing if any UMC blocks have
been harvested.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 31 Mar 2022 22:12:47 +0000 (18:12 -0400)]
drm/amdgpu/discovery: store the number of UMC IPs on the asic
For chips with IP discovery get this from the table,
hardcode it for older asics.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 31 Mar 2022 22:11:51 +0000 (18:11 -0400)]
drm/amdgpu: store the mall size in the gmc structure
This will be useful in future patches.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 30 Mar 2022 05:21:34 +0000 (01:21 -0400)]
drm/amdgpu/discovery: fix byteswapping in gc info parsing
The table is in little endian format.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guchun Chen [Wed, 27 Apr 2022 07:51:02 +0000 (15:51 +0800)]
drm/amdgpu: disable runtime pm on several sienna cichlid cards(v2)
Disable runtime power management on several sienna cichlid
cards, otherwise SMU will possibly fail to be resumed from
runtime suspend. Will drop this after a clean solution between
kernel driver and SMU FW is available.
amdgpu 0000:63:00.0: amdgpu: GECC is enabled
amdgpu 0000:63:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:63:00.0: amdgpu: SMU is resuming...
amdgpu 0000:63:00.0: amdgpu: SMU: I'm not done with your command: SMN_C2PMSG_66:0x0000000E SMN_C2PMSG_82:0x00000080
amdgpu 0000:63:00.0: amdgpu: Failed to SetDriverDramAddr!
amdgpu 0000:63:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_resume failed (-62)
v2: seperate to a function.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 29 Mar 2022 18:40:28 +0000 (14:40 -0400)]
drm/amdgpu/discovery: populate additional GC info
From the GC info table to the gfx config structure in the
driver. The driver will use this data to configure the
card correctly.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 29 Mar 2022 18:15:46 +0000 (14:15 -0400)]
drm/amdgpu: update latest IP discovery table structures
Add new tables.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Likun Gao [Fri, 10 Dec 2021 07:05:57 +0000 (15:05 +0800)]
drm/amdgpu: add function to decode ip version
Add function to decode IP version.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Likun Gao [Thu, 7 Nov 2019 08:27:16 +0000 (16:27 +0800)]
drm/amdgpu: increase HWIP MAX INSTANCE
Extend HWIP MAX INSTANCE to 11.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Marek Marczykowski-Górecki [Tue, 26 Apr 2022 23:57:15 +0000 (01:57 +0200)]
drm/amdgpu: do not use passthrough mode in Xen dom0
While technically Xen dom0 is a virtual machine too, it does have
access to most of the hardware so it doesn't need to be considered a
"passthrough". Commit
b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for
APUs in passthrough") changed how FB is accessed based on passthrough
mode. This breaks amdgpu in Xen dom0 with message like this:
[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
While the reason for this failure is unclear, the passthrough mode is
not really necessary in Xen dom0 anyway. So, to unbreak booting affected
kernels, disable passthrough mode in this case.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
Fixes: b818a5d37454 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Mon, 25 Apr 2022 02:16:46 +0000 (10:16 +0800)]
drm/amd/pm: fix the compile warning
Fix the compile warning below:
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:1641
kv_get_acp_boot_level() warn: always true condition '(table->entries[i]->clk >= 0) => (0-u32max >= 0)'
Reported-by: kernel test robot <lkp@intel.com>
CC: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mukul Joshi [Fri, 22 Apr 2022 15:39:20 +0000 (11:39 -0400)]
drm/amdkfd: Fix circular lock dependency warning
[ 168.544078] ======================================================
[ 168.550309] WARNING: possible circular locking dependency detected
[ 168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G E
[ 168.562558] ------------------------------------------------------
[ 168.568764] kfdtest/3479 is trying to acquire lock:
[ 168.573672]
ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
kfd_topology_device_by_id+0x16/0x60 [amdgpu] [ 168.583663]
but task is already holding lock:
[ 168.589529]
ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
vm_mmap_pgoff+0xa9/0x180 [ 168.597755]
which lock already depends on the new lock.
[ 168.605970]
the existing dependency chain (in reverse order) is:
[ 168.613487]
-> #3 (&mm->mmap_lock#2){++++}-{3:3}:
[ 168.619700] lock_acquire+0xca/0x2e0
[ 168.623814] down_read+0x3e/0x140
[ 168.627676] do_user_addr_fault+0x40d/0x690
[ 168.632399] exc_page_fault+0x6f/0x270
[ 168.636692] asm_exc_page_fault+0x1e/0x30
[ 168.641249] filldir64+0xc8/0x1e0
[ 168.645115] call_filldir+0x7c/0x110
[ 168.649238] ext4_readdir+0x58e/0x940
[ 168.653442] iterate_dir+0x16a/0x1b0
[ 168.657558] __x64_sys_getdents64+0x83/0x140
[ 168.662375] do_syscall_64+0x35/0x80
[ 168.666492] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 168.672095]
-> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
[ 168.679008] lock_acquire+0xca/0x2e0
[ 168.683122] down_read+0x3e/0x140
[ 168.686982] path_openat+0x5b2/0xa50
[ 168.691095] do_file_open_root+0xfc/0x190
[ 168.695652] file_open_root+0xd8/0x1b0
[ 168.702010] kernel_read_file_from_path_initns+0xc4/0x140
[ 168.709542] _request_firmware+0x2e9/0x5e0
[ 168.715741] request_firmware+0x32/0x50
[ 168.721667] amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
[ 168.730060] smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
[ 168.738414] fiji_start_smu+0xcf/0x4e0 [amdgpu]
[ 168.745539] pp_dpm_load_fw+0x21/0x30 [amdgpu]
[ 168.752503] amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
[ 168.760698] amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
[ 168.768412] amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
[ 168.776285] amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[ 168.784034] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[ 168.791161] local_pci_probe+0x40/0x80
[ 168.797027] work_for_cpu_fn+0x10/0x20
[ 168.802839] process_one_work+0x273/0x5b0
[ 168.808903] worker_thread+0x20f/0x3d0
[ 168.814700] kthread+0x176/0x1a0
[ 168.819968] ret_from_fork+0x1f/0x30
[ 168.825563]
-> #1 (&adev->pm.mutex){+.+.}-{3:3}:
[ 168.834721] lock_acquire+0xca/0x2e0
[ 168.840364] __mutex_lock+0xa2/0x930
[ 168.846020] amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
[ 168.853257] amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
[ 168.861547] kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
[ 168.869478] kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
[ 168.877884] kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
[ 168.885556] kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
[ 168.893347] amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
[ 168.901177] amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
[ 168.909025] amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[ 168.916458] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[ 168.923442] local_pci_probe+0x40/0x80
[ 168.929249] work_for_cpu_fn+0x10/0x20
[ 168.935008] process_one_work+0x273/0x5b0
[ 168.940944] worker_thread+0x20f/0x3d0
[ 168.946623] kthread+0x176/0x1a0
[ 168.951765] ret_from_fork+0x1f/0x30
[ 168.957277]
-> #0 (&topology_lock){++++}-{3:3}:
[ 168.965993] check_prev_add+0x8f/0xbf0
[ 168.971613] __lock_acquire+0x1299/0x1ca0
[ 168.977485] lock_acquire+0xca/0x2e0
[ 168.982877] down_read+0x3e/0x140
[ 168.987975] kfd_topology_device_by_id+0x16/0x60 [amdgpu]
[ 168.995583] kfd_device_by_id+0xa/0x20 [amdgpu]
[ 169.002180] kfd_mmap+0x95/0x200 [amdgpu]
[ 169.008293] mmap_region+0x337/0x5a0
[ 169.013679] do_mmap+0x3aa/0x540
[ 169.018678] vm_mmap_pgoff+0xdc/0x180
[ 169.024095] ksys_mmap_pgoff+0x186/0x1f0
[ 169.029734] do_syscall_64+0x35/0x80
[ 169.035005] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 169.041754]
other info that might help us debug this:
[ 169.053276] Chain exists of:
&topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2
[ 169.068389] Possible unsafe locking scenario:
[ 169.076661] CPU0 CPU1
[ 169.082383] ---- ----
[ 169.088087] lock(&mm->mmap_lock#2);
[ 169.092922] lock(&type->i_mutex_dir_key#6);
[ 169.100975] lock(&mm->mmap_lock#2);
[ 169.108320] lock(&topology_lock);
[ 169.112957]
*** DEADLOCK ***
This commit fixes the deadlock warning by ensuring pm.mutex is not
held while holding the topology lock. For this, kfd_local_mem_info
is moved into the KFD dev struct and filled during device init.
This cached value can then be used instead of querying the value
again and again.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mukul Joshi [Wed, 20 Apr 2022 15:34:53 +0000 (11:34 -0400)]
drm/amdkfd: Fix updating IO links during device removal
The logic to update the IO links when a KFD device
is removed was not correct as it would miss updating
the proximity domain values for some nodes where the
node_from and node_to both were greater values than the
proximity domain value of the KFD device being removed
from topology.
Fixes: 46d18d510d7831 ("drm/amdkfd: Cleanup IO links during KFD device removal")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christophe JAILLET [Sun, 28 Nov 2021 16:46:15 +0000 (17:46 +0100)]
drm/amdkfd: Use non-atomic bitmap functions when possible
All uses of the 'kfd->gtt_sa_bitmap' bitmap are protected with the
'kfd->gtt_sa_lock' mutex.
So:
- prefer the non-atomic '__set_bit()' function
- use the non-atomic 'bitmap_[set|clear]()' functions instead of
equivalent 'for' loops. These functions can work on several bits at a
time
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christophe JAILLET [Sun, 28 Nov 2021 16:45:55 +0000 (17:45 +0100)]
drm/amdkfd: Use bitmap_zalloc() when applicable
'kfd->gtt_sa_bitmap' is a bitmap. So use 'bitmap_zalloc()' to simplify
code, improve the semantic and avoid some open-coded arithmetic in
allocator arguments.
Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Melissa Wen [Wed, 30 Mar 2022 23:02:04 +0000 (22:02 -0100)]
drm/amd/display: protect remaining FPU-code calls on dcn3.1.x
From [1], I realized two other calls to dcn30 code are associated with
FPU operations and are not protected by DC_FP_* macros:
* dcn30_populate_dml_writeback_from_context()
* dcn30_set_mcif_arb_params()
So, since FPU-associated code is not fully isolated in dcn30, and
dcn3.1.x reuses them, let's wrap their calls properly.
Note: this patch complements the fix from [1].
[1] https://lore.kernel.org/amd-gfx/
20220329082957.
1662655-1-chandan.vurdigerenataraj@amd.com/
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
oushixiong [Tue, 26 Apr 2022 10:06:16 +0000 (18:06 +0800)]
drm/amd: Fix spelling typo in comment
Signed-off-by: oushixiong <oushixiong@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rongguang Wei [Tue, 26 Apr 2022 08:15:29 +0000 (16:15 +0800)]
drm/amdgpu: fix typo
Fix spelling mistake:
"differnt" -> "different"
"commond" -> "common"
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dan Carpenter [Tue, 26 Apr 2022 08:49:20 +0000 (11:49 +0300)]
drm/amdgpu: debugfs: fix NULL dereference in ta_if_invoke_debugfs_write()
If the kzalloc() fails then this code will crash. Return -ENOMEM instead.
Fixes: e50d9ba0d2cd ("drm/amdgpu: Add debugfs TA load/unload/invoke support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dan Carpenter [Tue, 26 Apr 2022 08:48:03 +0000 (11:48 +0300)]
drm/amdgpu: debugfs: fix error codes in write functions
There are two error code bugs here. The copy_to/from_user() functions
return the number of bytes remaining (a positive number). We should
return -EFAULT if the copy fails.
Second if we fail because "context.resp_status" is non-zero then return
-EINVAL instead of zero.
Fixes: e50d9ba0d2cd ("drm/amdgpu: Add debugfs TA load/unload/invoke support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Zhenneng Li [Tue, 26 Apr 2022 08:49:59 +0000 (16:49 +0800)]
gpu/drm/radeon: Fix typo in comments
Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Zhang [Mon, 25 Apr 2022 18:33:16 +0000 (14:33 -0400)]
drm/amd: add dc feature mask flags for PSR allow smu and multi-display optimizations
[Why]
Allow for PSR SMU optimization and PSR multiple display optimization.
[How]
Add feature flags of PSR smu optimization and PSR multiple display
optimiztaion, and set them during init sequence. By default, flags
are disabled.
Signed-off-by: David Zhang <dingchen.zhang@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Tue, 19 Apr 2022 09:22:34 +0000 (17:22 +0800)]
drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
Without MMHUB clock gating being enabled then MMHUB will not disconnect
from DF and will result in DF C-state entry can't be accessed during S2idle
suspend, and eventually s0ix entry will be blocked.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Guo Zhengkui [Sun, 24 Apr 2022 09:06:23 +0000 (17:06 +0800)]
drm/amd/display: fix if == else warning
Fix the following coccicheck warning:
drivers/gpu/drm/amd/display/dc/dcn201/dcn201_hwseq.c:98:8-10:
WARNING: possible condition with no effect (if == else)
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 25 Apr 2022 16:05:32 +0000 (12:05 -0400)]
drm/amdgpu/display: Make dcn31_set_low_power_state static
It's not used outside of dcn31_clk_mgr.c.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Haohui Mai [Mon, 25 Apr 2022 08:56:05 +0000 (16:56 +0800)]
drm/amdgpu: Fix out-of-bound access for gfx_v10_0_ring_test_ib()
The gfx_v10_0_ring_test_ib() function uses 20 bytes instead of 16
bytes during the test. The patch sets the size of the allocation to be
4-byte larger to match the actual usage.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Haohui Mai <ricetons@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Haohui Mai [Mon, 25 Apr 2022 12:23:38 +0000 (20:23 +0800)]
drm/amdgpu/sdma: Remove redundant lower_32_bits() calls when settings SDMA doorbell
Updated the patch for the pre-vega hardware. I kept the clamping code
to be safe.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Haohui Mai <ricetons@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Haohui Mai [Mon, 25 Apr 2022 12:41:38 +0000 (20:41 +0800)]
drm/amdgpu/sdma: Fix incorrect calculations of the wptr of the doorbells
This patch fixes the issue where the driver miscomputes the 64-bit
values of the wptr of the SDMA doorbell when initializing the
hardware. SDMA engines v4 and later on have full 64-bit registers for
wptr thus they should be set properly.
Older generation hardwares like CIK / SI have only 16 / 20 / 24bits
for the WPTR, where the calls of lower_32_bits() will be removed in a
following patch.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Haohui Mai <ricetons@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Rix [Sat, 23 Apr 2022 20:01:55 +0000 (16:01 -0400)]
drm/radeon: change cac_weights_* to static
Sparse reports these issues
si_dpm.c:332:26: warning: symbol 'cac_weights_pitcairn' was not declared. Should it be static?
si_dpm.c:1088:26: warning: symbol 'cac_weights_oland' was not declared. Should it be static?
Both of these variables are only used in si_dpm.c. Single file variables
should be static, so change their storage-class specifiers to static.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Rix [Sat, 23 Apr 2022 13:44:02 +0000 (09:44 -0400)]
drm/radeon: change cik_default_state table from global to static
Sparse reports these issues
cik_blit_shaders.c:31:11: warning: symbol 'cik_default_state' was not declared. Should it be static?
cik_blit_shaders.c:246:11: warning: symbol 'cik_default_size' was not declared. Should it be static?
cik_default_state and cik_default_size are only used in cik.c. Single file symbols
should be static. So move their definitions to cik_blit_shaders.h and change their
storage-class-specifier to static.
Remove unneeded cik_blit_shader.c
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Randy Dunlap [Sat, 23 Apr 2022 01:29:43 +0000 (18:29 -0700)]
drm/amd/display: fix non-kernel-doc comment warnings
Fix kernel-doc warnings for a comment that should not use
kernel-doc notation:
dmub_psr.c:235: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Set PSR power optimization flags.
dmub_psr.c:235: warning: missing initial short description on line:
* Set PSR power optimization flags.
Fixes: e5dfcd272722 ("drm/amd/display: dc_link_set_psr_allow_active refactoring")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Robin Chen <po-tchen@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Anthony Koo <Anthony.Koo@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 19 Apr 2022 01:38:43 +0000 (21:38 -0400)]
drm/amdkfd: Update mapping if range attributes changed
Change SVM range mapping flags or access attributes don't trigger
migration, if range is already mapped on GPUs we should update GPU
mapping and pass flush_tlb flag true to amdgpu vm.
Change SVM range preferred_loc or migration granularity don't need
update GPU mapping, skip the validate_and_map.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 19 Apr 2022 01:32:14 +0000 (21:32 -0400)]
drm/amdkfd: Add SVM range mapped_to_gpu flag
To avoid unnecessary unmap SVM range from GPUs if range is not mapped on
GPUs when migrating the range. This flag will also be used to flush TLB
when updating the existing mapping on GPUs.
It is protected by prange->migrate_mutex and mmap read lock in MMU
notifier callback.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Mon, 18 Apr 2022 05:12:10 +0000 (01:12 -0400)]
drm/amd/display: 3.2.183
This version brings along following fixes:
- Keep tracking of DSC packed PPS for future use
- Maintain current link settings in link loss interrupt
- Remove DDC write and read size check
- Read PSR-SU cap DPCD for specific panel
- Don't pass HostVM by default on DCN3.1
- Reset cached PSR parameters after hibernate
- Add audio readback registers
- Update dcn315 clk table read
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ilya Bakoulin [Mon, 14 Mar 2022 21:59:01 +0000 (17:59 -0400)]
drm/amd/display: Keep track of DSC packed PPS
[Why]
Store current packed PPS data in dc_stream_state for future use.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dillon Varone [Wed, 13 Apr 2022 21:54:19 +0000 (17:54 -0400)]
drm/amd/display: Remove unused integer
Integer no longer needed.
Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gary Li [Thu, 14 Apr 2022 13:01:48 +0000 (09:01 -0400)]
drm/amd/display: Maintain current link settings in link loss interrupt
[Why]
DP compliance test case 400.3.2.3 is failed because in link loss interrupt
the current link settings is not used in the DP link training.
[How]
In link loss interrupt, use the current link settings in the following DP
link training.
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Gary Li <garyli12@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Ma [Wed, 23 Mar 2022 16:00:53 +0000 (12:00 -0400)]
drm/amd/display: Remove ddc write and read size checking
[Why]
Customer found I2C over AUX using ADL_Display_DDCBlockAccess_Get
will fail when sending more than 256 bytes of data;
[How]
Remove the write and read size checking to allow sending data more
than 256 bytes;
Reviewed-by: Martin Leung <Martin.Leung@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Leo Ma <hanghong.ma@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Zhang [Mon, 11 Apr 2022 20:04:39 +0000 (16:04 -0400)]
drm/amd/display: read PSR-SU cap DPCD for specific panel
[why & how]
For some specific eDP panel, we'd check the PSR-SU cap during boot
by reading the vendor specific DPCD, otherwise it will cause to
false report the eDP panel which supports PSR-SU as an non-PSR-SU
panel.
- add the vendor specific DPCD address in ddc_service_types header
- if specific eDP panel detected, check vendor specific DPCD for
PSR-SU cap
Reviewed-by: Aurabindo Jayamohanan Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: David Zhang <dingchen.zhang@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Tue, 5 Apr 2022 19:56:04 +0000 (15:56 -0400)]
drm/amd/display: Don't pass HostVM by default on DCN3.1
[WHY]
Roll back previous change to stop passing this value by default, instead
add a debug flag to override to previous behaviour (or force HostVM calcs)
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evgenii Krasnikov [Tue, 5 Apr 2022 14:59:27 +0000 (10:59 -0400)]
drm/amd/display: Reset cached PSR parameters after hibernate
[WHY]
After hibernate system might be using old invalid psr_power_opt and
psr_allow_active that never get reset
[HOW]
Reset cached Panel Self Refresh parameters when PSR is first configured
for eDP in dc_link_setup_psr.
Reviewed-by: Harry Vanzylldejong <harry.vanzylldejong@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Evgenii Krasnikov <Evgenii.Krasnikov@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ilya Bakoulin [Mon, 28 Mar 2022 21:43:52 +0000 (17:43 -0400)]
drm/amd/display: Add Audio readback registers
[Why]
Can be useful for verifying the correctness of audio output.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Dmytro Laktyushkin [Fri, 8 Apr 2022 14:10:04 +0000 (10:10 -0400)]
drm/amd/display: update dcn315 clk table read
Clean up the sequence by making sure clk_mgr always builds a
reasonable clock table regardless of what we read from smu
by moving all defaults from resource soc struct to clk_mgr.
Now the only thing resource soc update does is read
the clock table and apply any DC specific policy decisions
to how clocks are populated in dml soc.
Reviewed-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aric Cyr [Mon, 11 Apr 2022 01:53:26 +0000 (21:53 -0400)]
drm/amd/display: 3.2.182
This version brings along following improvements:
- Fix HDCP QUERY Error for eDP and Tiled
- Insert smu busy status before sending another request
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mustapha Ghaddar [Thu, 7 Apr 2022 13:29:16 +0000 (09:29 -0400)]
drm/amd/display: Fix HDCP QUERY Error for eDP and Tiled
[WHY]
For dio_output_encoder ID we are relying on SW concept which is
invisible to HW
[HOW]
Needed to create separate cases for when DPIA and non DPIA for
dio link encoder ID
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Reviewed-by: James Zhang <james.zhang@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Mustapha Ghaddar <mghaddar@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Oliver Logush [Fri, 1 Apr 2022 14:40:30 +0000 (10:40 -0400)]
drm/amd/display: Insert smu busy status before sending another request
[why]
Need to check if result register is busy before sending another request
[how]
Call method to check if result register is busy
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Oliver Logush <oliver.logush@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Thu, 7 Apr 2022 22:53:56 +0000 (18:53 -0400)]
drm/amdkfd: Ignore bogus signals from MEC efficiently
MEC firmware sometimes sends signal interrupts without a valid context ID
on end of pipe events that don't intend to signal any HSA signals.
This triggers the slow path in kfd_signal_event_interrupt that scans the
entire event page for signaled events. Detect these signals in the top
half interrupt handler to stop processing them as early as possible.
Because we now always treat event ID 0 as invalid, reserve that ID during
process initialization.
v2: Update firmware version checks to support more GPUs
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Haowen Bai [Fri, 22 Apr 2022 06:03:57 +0000 (14:03 +0800)]
drm/amdgpu: Remove useless kfree
After alloc fail, we do not need to kfree.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yu [Fri, 22 Apr 2022 14:43:41 +0000 (10:43 -0400)]
drm/amdgpu: Ta fw needs to be loaded for SRIOV aldebaran
Load ta fw during psp_init_sriov_microcode to enable XGMI.
It is required to be loaded by both guest and host starting
from Arcturus. Cap fw needs to be loaded first.
Signed-off-by: David Yu <David.Yu@amd.com>
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Fri, 8 Apr 2022 11:51:34 +0000 (19:51 +0800)]
drm/amd/pm: fix the deadlock issue observed on SI
The adev->pm.mutx is already held at the beginning of
amdgpu_dpm_compute_clocks/amdgpu_dpm_enable_uvd/amdgpu_dpm_enable_vce.
But on their calling path, amdgpu_display_bandwidth_update will be
called and thus its sub functions amdgpu_dpm_get_sclk/mclk. They
will then try to acquire the same adev->pm.mutex and deadlock will
occur.
By placing amdgpu_display_bandwidth_update outside of adev->pm.mutex
protection(considering logically they do not need such protection) and
restructuring the call flow accordingly, we can eliminate the deadlock
issue. This comes with no real logics change.
Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Link: https://lore.kernel.org/all/9e689fea-6c69-f4b0-8dee-32c4cf7d8f9c@molgen.mpg.de/
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1957
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Tue, 19 Apr 2022 06:45:09 +0000 (14:45 +0800)]
drm/amdgpu: add RAS fatal error interrupt handler
The fatal error handler is independent from general ras interrupt
handler since there is no related IH ring.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Tue, 19 Apr 2022 03:04:19 +0000 (11:04 +0800)]
drm/amdgpu: add RAS poison consumption handler (v2)
Add support for general RAS poison consumption handler.
v2: remove callback function for poison consumption.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Fri, 8 Apr 2022 11:51:20 +0000 (19:51 +0800)]
drm/amdgpu: add RAS poison creation handler (v2)
Prepare for the implementation of poison consumption handler.
v2: separate umc handler from poison creation.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yang Wang [Thu, 21 Apr 2022 12:24:55 +0000 (20:24 +0800)]
drm/amdkfd: use kvcalloc() instead of kvmalloc() in kfd_migrate
simplify programming with existing functions.
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bokun Zhang [Thu, 21 Apr 2022 17:57:03 +0000 (13:57 -0400)]
drm/amd/amdgpu: Update PF2VF header
- In the latest version of the header, there is a variable name change.
This should not cause any backward compatibility since the variable is
at the same offset in the struct.
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bokun Zhang [Thu, 21 Apr 2022 17:56:26 +0000 (13:56 -0400)]
drm/amd/amdgpu: Properly indent PF2VF header
- Clean up the identation in the header file
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Bokun Zhang [Thu, 21 Apr 2022 17:55:45 +0000 (13:55 -0400)]
drm/amd/amdgpu: Update MIT license in SRIOV msg header
- Update MIT license header
Signed-off-by: Bokun Zhang <Bokun.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 21 Apr 2022 14:09:08 +0000 (10:09 -0400)]
drm/amdgpu/display: make hubp31_program_extended_blank static
It's not used outside of dcn31_hubp.c.
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Miaoqian Lin [Thu, 21 Apr 2022 09:03:09 +0000 (17:03 +0800)]
drm/amd/display: Fix memory leak in dcn21_clock_source_create
When dcn20_clk_src_construct() fails, we need to release clk_src.
Fixes: 6f4e6361c3ff ("drm/amd/display: Add Renoir resource (v2)")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Haowen Bai [Thu, 21 Apr 2022 10:28:58 +0000 (18:28 +0800)]
drm/amd/display: Remove useless code
aux_rep only memset but no use at all, so we drop it.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 28 Dec 2021 22:26:24 +0000 (17:26 -0500)]
drm/amdgpu: don't runtime suspend if there are displays attached (v3)
We normally runtime suspend when there are displays attached if they
are in the DPMS off state, however, if something wakes the GPU
we send a hotplug event on resume (in case any displays were connected
while the GPU was in suspend) which can cause userspace to light
up the displays again soon after they were turned off.
Prior to
commit
087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."),
the driver took a runtime pm reference when the fbdev emulation was
enabled because we didn't implement proper shadowing support for
vram access when the device was off so the device never runtime
suspended when there was a console bound. Once that commit landed,
we now utilize the core fb helper implementation which properly
handles the emulation, so runtime pm now suspends in cases where it did
not before. Ultimately, we need to sort out why runtime suspend in not
working in this case for some users, but this should restore similar
behavior to before.
v2: move check into runtime_suspend
v3: wake ups -> wakeups in comment, retain pm_runtime behavior in
runtime_idle callback
Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Link: https://lore.kernel.org/r/20220403132322.51c90903@darkstar.example.org/
Tested-by: Michele Ballabio <ballabio.m@gmail.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Wed, 20 Apr 2022 02:24:31 +0000 (10:24 +0800)]
Revert "drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too"
This reverts commit
36bf93216ecbe399c40c5e0486f0f0e3a4afa69e.
It causes SVM regressions on Vega10 with XNACK-ON. Just revert it
at the moment.
./kfdtest --gtest_filter=KFDSVMRangeTest.MigratePolicyTest
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Sun, 17 Apr 2022 10:50:27 +0000 (18:50 +0800)]
drm/amdgpu: Add debugfs TA load/unload/invoke support
v1:
Add debugfs support to load/unload/invoke TA in runtime.
v2:
1. Update some variables to static.
2. Use PAGE_ALIGN to calculate shared buf size directly.
3. Remove fp check.
4. Update debugfs from read to write.
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Candice Li [Sun, 17 Apr 2022 09:39:46 +0000 (17:39 +0800)]
drm/amdgpu: Use indirect buffer and save response status for TA load/invoke
The upcoming TA debugfs interface needs to use indirect buffer
when performing TA invoke and check psp response status for TA
load and invoke.
Signed-off-by: John Clements <john.clements@amd.com>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Wed, 13 Apr 2022 15:37:53 +0000 (11:37 -0400)]
drm/amdkfd: CRIU add support for GWS queues
Add support to checkpoint/restore GWS (Global Wave Sync) queues.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Mon, 18 Apr 2022 15:55:58 +0000 (11:55 -0400)]
drm/amdkfd: Fix GWS queue count
dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated
each time the queue gets evicted.
Fixes: b8020b0304c8 ("drm/amdkfd: Enable over-subscription with >1 GWS queue")
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tales Lelo da Aparecida [Fri, 15 Apr 2022 19:50:27 +0000 (16:50 -0300)]
MAINTAINERS: add docs entry to AMDGPU
To make sure maintainers of amdgpu drivers are aware of any changes
in their documentation, add its entry to MAINTAINERS.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tales Lelo da Aparecida <tales.aparecida@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tales Lelo da Aparecida [Fri, 15 Apr 2022 19:50:26 +0000 (16:50 -0300)]
Documentation/gpu: Add entries to amdgpu glossary
Add missing acronyms to the amdgppu glossary.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1939
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tales Lelo da Aparecida <tales.aparecida@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Rix [Sat, 16 Apr 2022 18:47:36 +0000 (14:47 -0400)]
drm/radeon/kms: change evergreen_default_state table from global to static
evergreen_default_state and evergreen_default_size are only
used in evergreen.c. Single file symbols should be static.
So move their definitions to evergreen_blit_shaders.h
and change their storage-class-specifier to static.
Remove unneeded evergreen_blit_shader.c
evergreen_ps/vs definitions were removed with
commit
4f8629675800 ("drm/radeon/kms: remove r6xx+ blit copy routines")
So their declarations in evergreen_blit_shader.h
are not needed, so remove them.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Rix [Mon, 18 Apr 2022 19:48:30 +0000 (15:48 -0400)]
drm/amd/display: add virtual_setup_stream_attribute decl to header
Smatch reports this issue
virtual_link_hwss.c:32:6: warning: symbol
'virtual_setup_stream_attribute' was not declared.
Should it be static?
virtual_setup_stream_attribute is only used in
virtual_link_hwss.c, but the other functions in the
file are declared in the header file and used elsewhere.
For consistency, add the virtual_setup_stream_attribute
decl to virtual_link_hwss.h.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Keita Suzuki [Tue, 19 Apr 2022 10:37:19 +0000 (10:37 +0000)]
drm/amd/pm: fix double free in si_parse_power_table()
In function si_parse_power_table(), array adev->pm.dpm.ps and its member
is allocated. If the allocation of each member fails, the array itself
is freed and returned with an error code. However, the array is later
freed again in si_dpm_fini() function which is called when the function
returns an error.
This leads to potential double free of the array adev->pm.dpm.ps, as
well as leak of its array members, since the members are not freed in
the allocation function and the array is not nulled when freed.
In addition adev->pm.dpm.num_ps, which keeps track of the allocated
array member, is not updated until the member allocation is
successfully finished, this could also lead to either use after free,
or uninitialized variable access in si_dpm_fini().
Fix this by postponing the free of the array until si_dpm_fini() and
increment adev->pm.dpm.num_ps everytime the array member is allocated.
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tales Lelo da Aparecida [Fri, 15 Apr 2022 18:20:14 +0000 (15:20 -0300)]
drm/amd/display: make hubp1_wait_pipe_read_start() static
It's a local function, let's make it static.
AGD: remove prototype in dcn10_hubp.h
Signed-off-by: Tales Lelo da Aparecida <tales.aparecida@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Darren Powell [Wed, 6 Apr 2022 02:17:08 +0000 (22:17 -0400)]
amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
Clarify the smu_cmn_send_smc_msg_with_param documentation to mention two
cases exist where messages are silently dropped with no error returned.
These cases occur in unusual situations where either:
1. the message type is not allowed to a virtual GPU, or
2. a PCI recovery is underway and the HW is not yet in sync with the SW
For more details see
commit
4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
commit
bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
(v2)
Reworked with suggestions from Luben & Paul
(v3)
Updated wording as per Luben's feedback
Corrected error stating all messages denied on virtual GPU
(each GPU has mask of which messages are allowed)
Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Huang Rui [Thu, 14 Apr 2022 13:04:59 +0000 (21:04 +0800)]
drm/amdgpu/pm: fix the null pointer while the smu is disabled
It needs to check if the pp_funcs is initialized while release the
context, otherwise it will trigger null pointer panic while the software
smu is not enabled.
[ 1109.404555] BUG: kernel NULL pointer dereference, address:
0000000000000078
[ 1109.404609] #PF: supervisor read access in kernel mode
[ 1109.404638] #PF: error_code(0x0000) - not-present page
[ 1109.404657] PGD 0 P4D 0
[ 1109.404672] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1109.404701] CPU: 7 PID: 9150 Comm: amdgpu_test Tainted: G OEL 5.16.0-custom #1
[ 1109.404732] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 1109.404765] RIP: 0010:amdgpu_dpm_force_performance_level+0x1d/0x170 [amdgpu]
[ 1109.405109] Code: 5d c3 44 8b a3 f0 80 00 00 eb e5 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 4c 8b b7 f0 7d 00 00 <49> 83 7e 78 00 0f 84 f2 00 00 00 80 bf 87 80 00 00 00 48 89 fb 0f
[ 1109.405176] RSP: 0018:
ffffaf3083ad7c20 EFLAGS:
00010282
[ 1109.405203] RAX:
0000000000000000 RBX:
ffff9796b1c14600 RCX:
0000000002862007
[ 1109.405229] RDX:
ffff97968591c8c0 RSI:
0000000000000001 RDI:
ffff9796a3700000
[ 1109.405260] RBP:
ffffaf3083ad7c50 R08:
ffffffff9897de00 R09:
ffff979688d9db60
[ 1109.405286] R10:
0000000000000000 R11:
ffff979688d9db90 R12:
0000000000000001
[ 1109.405316] R13:
ffff9796a3700000 R14:
0000000000000000 R15:
ffff9796a3708fc0
[ 1109.405345] FS:
00007ff055cff180(0000) GS:
ffff9796bfdc0000(0000) knlGS:
0000000000000000
[ 1109.405378] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1109.405400] CR2:
0000000000000078 CR3:
000000000a394000 CR4:
00000000000506e0
[ 1109.405434] Call Trace:
[ 1109.405445] <TASK>
[ 1109.405456] ? delete_object_full+0x1d/0x20
[ 1109.405480] amdgpu_ctx_set_stable_pstate+0x7c/0xa0 [amdgpu]
[ 1109.405698] amdgpu_ctx_fini.part.0+0xcb/0x100 [amdgpu]
[ 1109.405911] amdgpu_ctx_do_release+0x71/0x80 [amdgpu]
[ 1109.406121] amdgpu_ctx_ioctl+0x52d/0x550 [amdgpu]
[ 1109.406327] ? _raw_spin_unlock+0x1a/0x30
[ 1109.406354] ? drm_gem_handle_delete+0x81/0xb0 [drm]
[ 1109.406400] ? amdgpu_ctx_get_entity+0x2c0/0x2c0 [amdgpu]
[ 1109.406609] drm_ioctl_kernel+0xb6/0x140 [drm]
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Fri, 15 Apr 2022 07:35:44 +0000 (15:35 +0800)]
drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too
The idea is from
commit
a50fe7078035 ("drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran")
and
commit
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus").
At the moment, heavy-weight TLB could cause problems on ASICs except
Aldebaran and Arcturus.
A simple hipMallocManaged/hipFree program could trigger this issue.
[ 97.787657] amdgpu 0000:01:00.0: amdgpu: wait for kiq fence error: 0.
[ 106.868758] amdgpu: qcm fence wait loop timeout expired
[ 106.868966] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
[ 106.869203] amdgpu: Failed to evict process queues
[ 106.869261] amdgpu: Failed to quiesce KFD
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lang Yu [Thu, 14 Apr 2022 07:16:25 +0000 (15:16 +0800)]
drm/amdkfd: move kfd_flush_tlb_after_unmap into kfd_priv.h
To make kfd_flush_tlb_after_unmap visible in kfd_svm.c,
move it into kfd_priv.h. And change it to an inline function.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>