drm/amdgpu: Fix crash when hot unplug in BACO
authorAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Fri, 21 May 2021 20:41:22 +0000 (16:41 -0400)
committerAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Tue, 25 May 2021 15:56:48 +0000 (11:56 -0400)
Problem:
When device goes into runtime suspend due to prolonged
inactivity (e.g. BACO sleep) and then hot unplugged,
PCI core will try to wake up the device as part of
unplug process. Since the device is gone all HW
programming during rpm resume fails leading
to a bad SW state later during pci remove handling.

Fix:
Use a flag we use for PCIe error recovery to avoid
accessing registres. This allows to successfully complete
rpm resume sequence and finish pci remove.

v2: Renamed HW access block flag

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1081
Link: https://patchwork.freedesktop.org/patch/msgid/20210521204122.762288-2-andrey.grodzovsky@amd.com
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

index 3a0890c..e8bbcde 100644 (file)
@@ -1557,6 +1557,10 @@ static int amdgpu_pmops_runtime_resume(struct device *dev)
        if (!adev->runpm)
                return -EINVAL;
 
+       /* Avoids registers access if device is physically gone */
+       if (!pci_device_is_present(adev->pdev))
+               adev->no_hw_access = true;
+
        if (amdgpu_device_supports_px(drm_dev)) {
                drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;