drm/i915: Restore the kernel context after a GPU reset on an idle engine
authorChris Wilson <chris@chris-wilson.co.uk>
Sat, 16 Dec 2017 00:03:34 +0000 (00:03 +0000)
committerChris Wilson <chris@chris-wilson.co.uk>
Sat, 16 Dec 2017 09:37:35 +0000 (09:37 +0000)
As part of the system requirement for powersaving is that we always have
a context loaded. Upon boot and resume, we load the kernel_context to
ensure that some valid state is set before powersaving kicks in, we
should do so after a full GPU reset as well. We only need to do so for
an idle engine, as any active engines will restart by executing the
stuck request, loading its context. For the idle engine, we create a
new request to load the kernel_context instead.

For whatever reason, perfoming a dummy execute on the idle engine after
reset papers over a subsequent GPU hang in rare circumstances, even on
machines not using contexts (e.g. Pineview).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104259
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104261
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216000334.8197-1-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/i915_gem.c

index 4a7f557..dca15c1 100644 (file)
@@ -3119,6 +3119,25 @@ void i915_gem_reset(struct drm_i915_private *dev_priv)
                ctx = fetch_and_zero(&engine->last_retired_context);
                if (ctx)
                        engine->context_unpin(engine, ctx);
+
+               /*
+                * Ostensibily, we always want a context loaded for powersaving,
+                * so if the engine is idle after the reset, send a request
+                * to load our scratch kernel_context.
+                *
+                * More mysteriously, if we leave the engine idle after a reset,
+                * the next userspace batch may hang, with what appears to be
+                * an incoherent read by the CS (presumably stale TLB). An
+                * empty request appears sufficient to paper over the glitch.
+                */
+               if (list_empty(&engine->timeline->requests)) {
+                       struct drm_i915_gem_request *rq;
+
+                       rq = i915_gem_request_alloc(engine,
+                                                   dev_priv->kernel_context);
+                       if (!IS_ERR(rq))
+                               __i915_add_request(rq, false);
+               }
        }
 
        i915_gem_restore_fences(dev_priv);