drm/amdgpu: fix double gpu_recovery for NV of SRIOV
authorMonk Liu <Monk.Liu@amd.com>
Tue, 17 Dec 2019 10:18:31 +0000 (18:18 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 18 Dec 2019 21:09:12 +0000 (16:09 -0500)
issues:
gpu_recover() is re-entered by the mailbox interrupt
handler mxgpu_nv.c

fix:
we need to bypass the gpu_recover() invoke in mailbox
interrupt as long as the timeout is not infinite (thus the TDR
will be triggered automatically after time out, no need to invoke
gpu_recover() through mailbox interrupt.

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c

index 0d8767e..1c3a7d4 100644 (file)
@@ -269,7 +269,11 @@ flr_done:
        }
 
        /* Trigger recovery for world switch failure if no TDR */
-       if (amdgpu_device_should_recover_gpu(adev))
+       if (amdgpu_device_should_recover_gpu(adev)
+               && (adev->sdma_timeout == MAX_SCHEDULE_TIMEOUT ||
+               adev->gfx_timeout == MAX_SCHEDULE_TIMEOUT ||
+               adev->compute_timeout == MAX_SCHEDULE_TIMEOUT ||
+               adev->video_timeout == MAX_SCHEDULE_TIMEOUT))
                amdgpu_device_gpu_recover(adev, NULL);
 }