drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again
authorshaoyunl <shaoyun.liu@amd.com>
Sun, 14 Nov 2021 17:38:18 +0000 (12:38 -0500)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Wed, 8 Dec 2021 08:04:39 +0000 (09:04 +0100)
[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]

In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case.  When reset been triggered again, driver should avoid to do uninitialization again.

Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c

index f8fce9d..4f2e0cc 100644 (file)
@@ -1225,6 +1225,11 @@ static int stop_cpsch(struct device_queue_manager *dqm)
        bool hanging;
 
        dqm_lock(dqm);
+       if (!dqm->sched_running) {
+               dqm_unlock(dqm);
+               return 0;
+       }
+
        if (!dqm->is_hws_hang)
                unmap_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0);
        hanging = dqm->is_hws_hang || dqm->is_resetting;