drm/i915/gt: Be defensive in the face of false CS events
authorChris Wilson <chris@chris-wilson.co.uk>
Fri, 10 Jul 2020 13:31:25 +0000 (14:31 +0100)
committerChris Wilson <chris@chris-wilson.co.uk>
Fri, 10 Jul 2020 14:24:17 +0000 (15:24 +0100)
commitb2295e2ecc04d189477cb08a96129ff1b3606f3a
tree8d1ad0a04325936e005b064bede20333488ddc4e
parented2690a9ca896882a124ee0bd4eaff9678ed1162
drm/i915/gt: Be defensive in the face of false CS events

If the HW throws a curve ball and reports either en event before it is
possible, or just a completely impossible event, we have to grin and
bear it. The first few events, we will likely not notice as we would be
expecting some event, but as soon as we stop expecting an event and yet
they still keep coming, then we enter into undefined state territory.
In which case, bail out, stop processing the events, and reset the
engine and our set of queued requests to recover.

The sporadic hangs and warnings will continue to plague CI, but at least
system stability should not be compromised.

v2: Commentary and force the reset-on-error.
v3: Customised user facing message for forced resets from internal errors.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200710133125.30194-1-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/intel_engine_types.h
drivers/gpu/drm/i915/gt/intel_gt_irq.c
drivers/gpu/drm/i915/gt/intel_lrc.c