drm/i915/guc: Split hw submission for replay after GPU reset
Something I missed before sending off the partial series was that the
non-scheduler guc reset path was broken (in the full series, this is
pushed to the execlists reset handler). The issue is that after a reset,
we have to refill the GuC workqueues, which we do by resubmitting the
requests. However, if we already have submitted them, the fences within
them have already been used and triggering them again is an error.
Instead, just repopulate the guc workqueue.
[ 115.858560] [IGT] gem_busy: starting subtest hang-render
[ 135.839867] [drm] GPU HANG: ecode 9:0:0xe757fefe, in gem_busy [1716], reason: Hang on render ring, action: reset
[ 135.839902] drm/i915: Resetting chip after gpu hang
[ 135.839957] [drm] RC6 on
[ 135.858351] ------------[ cut here ]------------
[ 135.858357] WARNING: CPU: 2 PID: 45 at drivers/gpu/drm/i915/i915_sw_fence.c:108 i915_sw_fence_complete+0x25/0x30
[ 135.858357] Modules linked in: rfcomm bnep binfmt_misc nls_iso8859_1 input_leds snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core btusb btrtl snd_hwdep snd_pcm 8250_dw snd_seq_midi hid_lenovo snd_seq_midi_event snd_rawmidi iwlwifi x86_pkg_temp_thermal coretemp snd_seq crct10dif_pclmul snd_seq_device hci_uart snd_timer crc32_pclmul ghash_clmulni_intel idma64 aesni_intel virt_dma btbcm snd btqca aes_x86_64 btintel lrw cfg80211 bluetooth gf128mul glue_helper ablk_helper cryptd soundcore intel_lpss_pci intel_pch_thermal intel_lpss_acpi intel_lpss acpi_als mfd_core kfifo_buf acpi_pad industrialio autofs4 hid_plantronics usbhid dm_mirror dm_region_hash dm_log sdhci_pci ahci sdhci libahci i2c_hid hid
[ 135.858389] CPU: 2 PID: 45 Comm: kworker/2:1 Tainted: G W 4.9.0-rc4+ #238
[ 135.858389] Hardware name: /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
[ 135.858392] Workqueue: events_long i915_hangcheck_elapsed
[ 135.858394]
ffffc900001bf9b8 ffffffff812bb238 0000000000000000 0000000000000000
[ 135.858396]
ffffc900001bf9f8 ffffffff8104f621 0000006c00000000 ffff8808296137f8
[ 135.858398]
0000000000000a00 ffff8808457a0000 ffff880845764e60 ffff880845760000
[ 135.858399] Call Trace:
[ 135.858403] [<
ffffffff812bb238>] dump_stack+0x4d/0x65
[ 135.858405] [<
ffffffff8104f621>] __warn+0xc1/0xe0
[ 135.858406] [<
ffffffff8104f748>] warn_slowpath_null+0x18/0x20
[ 135.858408] [<
ffffffff813f8c15>] i915_sw_fence_complete+0x25/0x30
[ 135.858410] [<
ffffffff813f8fad>] i915_sw_fence_commit+0xd/0x30
[ 135.858412] [<
ffffffff8142e591>] __i915_gem_request_submit+0xe1/0xf0
[ 135.858413] [<
ffffffff8142e5c8>] i915_gem_request_submit+0x28/0x40
[ 135.858415] [<
ffffffff814433e7>] i915_guc_submit+0x47/0x210
[ 135.858417] [<
ffffffff81443e98>] i915_guc_submission_enable+0x468/0x540
[ 135.858419] [<
ffffffff81442495>] intel_guc_setup+0x715/0x810
[ 135.858421] [<
ffffffff8142b6b4>] i915_gem_init_hw+0x114/0x2a0
[ 135.858423] [<
ffffffff813eeaa8>] i915_reset+0xe8/0x120
[ 135.858424] [<
ffffffff813f3937>] i915_reset_and_wakeup+0x157/0x180
[ 135.858426] [<
ffffffff813f79db>] i915_handle_error+0x1ab/0x230
[ 135.858428] [<
ffffffff812c760d>] ? scnprintf+0x4d/0x90
[ 135.858430] [<
ffffffff81435985>] i915_hangcheck_elapsed+0x275/0x3d0
[ 135.858432] [<
ffffffff810668cf>] process_one_work+0x12f/0x410
[ 135.858433] [<
ffffffff81066bf3>] worker_thread+0x43/0x4d0
[ 135.858435] [<
ffffffff81066bb0>] ? process_one_work+0x410/0x410
[ 135.858436] [<
ffffffff81066bb0>] ? process_one_work+0x410/0x410
[ 135.858438] [<
ffffffff8106bbb4>] kthread+0xd4/0xf0
[ 135.858440] [<
ffffffff8106bae0>] ? kthread_park+0x60/0x60
v2: Only resubmit submitted requests
v3: Don't forget the pending requests have reserved space.
Fixes:
d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129121024.22650-6-chris@chris-wilson.co.uk