When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:
[ 805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[ 806.290952] [drm] free PSP TMR buffer
[ 806.319406] ============================================
[ 806.320315] WARNING: possible recursive locking detected
[ 806.321225] 5.11.0-custom #1 Tainted: G W OEL
[ 806.322135] --------------------------------------------
[ 806.323043] cat/2593 is trying to acquire lock:
[ 806.323825]
ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.325668]
but task is already holding lock:
[ 806.326664]
ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.328430]
other info that might help us debug this:
[ 806.329539] Possible unsafe locking scenario:
[ 806.330549] CPU0
[ 806.330983] ----
[ 806.331416] lock(&adev->dm.dc_lock);
[ 806.332086] lock(&adev->dm.dc_lock);
[ 806.332738]
*** DEADLOCK ***
[ 806.333747] May be due to missing lock nesting notation
[ 806.334899] 3 locks held by cat/2593:
[ 806.335537] #0:
ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[ 806.337009] #1:
ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[ 806.339018] #2:
ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.340869]
stack backtrace:
[ 806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G W OEL 5.11.0-custom #1
[ 806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[ 806.344413] Call Trace:
[ 806.344849] dump_stack+0x93/0xbd
[ 806.345435] __lock_acquire.cold+0x18a/0x2cf
[ 806.346179] lock_acquire+0xca/0x390
[ 806.346807] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.347813] __mutex_lock+0x9b/0x930
[ 806.348454] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.349434] ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[ 806.350581] ? _raw_spin_unlock_irqrestore+0x47/0x50
[ 806.351437] ? dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.352437] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.353252] ? rcu_read_lock_sched_held+0x4f/0x80
[ 806.354064] mutex_lock_nested+0x1b/0x20
[ 806.354747] ? mutex_lock_nested+0x1b/0x20
[ 806.355457] dm_suspend+0xb8/0x1d0 [amdgpu]
[ 806.356427] ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[ 806.357736] amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[ 806.360394] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[ 806.362926] amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[ 806.365560] amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Acked-by: Christian KÃnig <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>