drm/amdkfd: fix some race conditions in vram buffer alloc/free of svm code
authorXiaogang Chen <xiaogang.chen@amd.com>
Wed, 20 Sep 2023 16:02:51 +0000 (11:02 -0500)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Mon, 20 Nov 2023 10:52:00 +0000 (11:52 +0100)
[ Upstream commit 7bfaa160caed8192f8262c4638f552cad94bcf5a ]

This patch fixes:
1: ref number of prange's svm_bo got decreased by an async call from hmm. When
wait svm_bo of prange got released we shoul also wait prang->svm_bo become NULL,
otherwise prange->svm_bo may be set to null after allocate new vram buffer.

2: During waiting svm_bo of prange got released in a while loop should reschedule
current task to give other tasks oppotunity to run, specially the the workque
task that handles svm_bo ref release, otherwise we may enter to softlock.

Signed-off-by: Xiaogang.Chen <xiaogang.chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 63feea08904cbd3a0694cdee26abf55b8a3db581..d7e758c86a0b86b967326695dddd8238179f3a0c 100644 (file)
@@ -487,11 +487,11 @@ svm_range_validate_svm_bo(struct amdgpu_device *adev, struct svm_range *prange)
 
        /* We need a new svm_bo. Spin-loop to wait for concurrent
         * svm_range_bo_release to finish removing this range from
-        * its range list. After this, it is safe to reuse the
-        * svm_bo pointer and svm_bo_list head.
+        * its range list and set prange->svm_bo to null. After this,
+        * it is safe to reuse the svm_bo pointer and svm_bo_list head.
         */
-       while (!list_empty_careful(&prange->svm_bo_list))
-               ;
+       while (!list_empty_careful(&prange->svm_bo_list) || prange->svm_bo)
+               cond_resched();
 
        return false;
 }