drm/amd/amdgpu: consider kernel job always not guilty
authorJingwen Chen <Jingwen.Chen2@amd.com>
Tue, 20 Jul 2021 10:35:35 +0000 (18:35 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 23 Jul 2021 14:08:00 +0000 (10:08 -0400)
commitff99849b00fef595ae46681ce0c2217a9f834332
tree5837b66f83264ba3923b2ed0a6ba73302b8dd4b7
parent410e302ea53f095f5d94dc14efefe8191bde901b
drm/amd/amdgpu: consider kernel job always not guilty

[Why]
Currently all timedout job will be considered to be guilty. In SRIOV
multi-vf use case, the vf flr happens first and then job time out is
found. There can be several jobs timeout during a very small time slice.
And if the innocent sdma job time out is found before the real bad
job, then the innocent sdma job will be set to guilty. This will lead
to a page fault after resubmitting job.

[How]
If the job is a kernel job, we will always consider it not guilty

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c