drm/i915/reset: Handle reset timeouts under unrelated kernel hangs
authorChris Wilson <chris@chris-wilson.co.uk>
Thu, 30 Jun 2022 04:39:59 +0000 (21:39 -0700)
committerMatthew Auld <matthew.auld@intel.com>
Fri, 15 Jul 2022 13:16:29 +0000 (14:16 +0100)
commit1dab4561a341afdbaafe0ce6091106d0c63c79e0
tree582edf6ec307052bb9f2fd6b5f54dd14764ced37
parent17cd10a44a8962860ff4ba351b2a290e752dbbde
drm/i915/reset: Handle reset timeouts under unrelated kernel hangs

When resuming after hibernate sometimes we see hangs in unrelated kernel
subsystems. These hangs often result in the following i915 trace:

i915 0000:00:02.0: [drm] *ERROR* \
intel_gt_reset_global timed out, cancelling all in-flight rendering

implying our reset task has been starved by the hanging kernel subsystem,
causing us to inappropiately declare the system as wedged beyond recovery.

The trace would be caused by our synchronize_srcu_expedited() taking more
than the allowed 5s due to the unrelated kernel hang. But we neither need
to perform that synchronisation inside the reset watchdog, nor do we need
such a short timeout before declaring the device as unrecoverable.

v2: Restore watchdog timeout to the previous 5 seconds (Ashutosh)

Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/3575
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220630043959.5708-1-ashutosh.dixit@intel.com
drivers/gpu/drm/i915/gt/intel_reset.c