drm/amdkfd: rm BO resv on validation to avoid deadlock
authorAlex Sierra <alex.sierra@amd.com>
Thu, 7 Oct 2021 17:04:09 +0000 (12:04 -0500)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 18 Nov 2021 18:16:14 +0000 (19:16 +0100)
commit5425f404b527b04ca6ce5eb01e2b9b92e9d1853b
tree1366fbe268adbb46f11107f92b35d7b949452782
parentb3aa4e54add8d2bf32a8a060688751970289f04f
drm/amdkfd: rm BO resv on validation to avoid deadlock

[ Upstream commit ec6abe831a843208e99a59adf108adba22166b3f ]

This fix the deadlock with the BO reservations during SVM_BO evictions
while allocations in VRAM are concurrently performed. More specific,
while the ttm waits for the fence to be signaled (ttm_bo_wait), it
already has the BO reserved. In parallel, the restore worker might be
running, prefetching memory to VRAM. This also requires to reserve the
BO, but blocks the mmap semaphore first. The deadlock happens when the
SVM_BO eviction worker kicks in and waits for the mmap semaphore held
in restore worker. Preventing signal the fence back, causing the
deadlock until the ttm times out.

We don't need to hold the BO reservation anymore during validation
and mapping. Now the physical addresses are taken from hmm_range_fault.
We also take migrate_mutex to prevent range migration while
validate_and_map update GPU page table.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c