drm/i915/execlists: Workaround switching back to a completed context
authorChris Wilson <chris@chris-wilson.co.uk>
Fri, 27 Mar 2020 20:14:33 +0000 (20:14 +0000)
committerChris Wilson <chris@chris-wilson.co.uk>
Fri, 27 Mar 2020 20:53:26 +0000 (20:53 +0000)
In what seems remarkably similar to the w/a required to not reload an
idle context with HEAD==TAIL, it appears we must prevent the HW from
switching to an idle context in ELSP[1], while simultaneously trying to
preempt the HW to run another context and a continuation of the idle
context (which is no longer idle).

We can achieve this by preventing the context from completing while we
reload a new ELSP (by applying ring_set_paused(1) across the whole of
dequeue), except this eventually fails due to a lite-restore into a
waiting semaphore does not generate an ACK. Instead, we try to avoid
making the GPU do anything too challenging and not submit a new ELSP
while the interrupts + CSB events appear to have fallen behind the
completed contexts. We expect it to catch up shortly so we queue another
tasklet execution and hope for the best.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200327201433.21864-1-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/intel_lrc.c

index b12355048501a61da4e066f62e9897389630bb6e..9104796673dca70a9ee13dad57c32f02458c5d90 100644 (file)
@@ -1915,11 +1915,26 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
         * of trouble.
         */
        active = READ_ONCE(execlists->active);
-       while ((last = *active) && i915_request_completed(last))
-               active++;
 
-       if (last) {
+       /*
+        * In theory we can skip over completed contexts that have not
+        * yet been processed by events (as those events are in flight):
+        *
+        * while ((last = *active) && i915_request_completed(last))
+        *      active++;
+        *
+        * However, the GPU cannot handle this as it will ultimately
+        * find itself trying to jump back into a context it has just
+        * completed and barf.
+        */
+
+       if ((last = *active)) {
                if (need_preempt(engine, last, rb)) {
+                       if (i915_request_completed(last)) {
+                               tasklet_hi_schedule(&execlists->tasklet);
+                               return;
+                       }
+
                        ENGINE_TRACE(engine,
                                     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
                                     last->fence.context,
@@ -1947,6 +1962,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
                        last = NULL;
                } else if (need_timeslice(engine, last) &&
                           timer_expired(&engine->execlists.timer)) {
+                       if (i915_request_completed(last)) {
+                               tasklet_hi_schedule(&execlists->tasklet);
+                               return;
+                       }
+
                        ENGINE_TRACE(engine,
                                     "expired last=%llx:%lld, prio=%d, hint=%d\n",
                                     last->fence.context,