drm/i915: Check that each request phase is completed before retiring
Trying to chase an impossible bug (ivb):
[ 207.765411] [drm:i915_reset_and_wakeup [i915]] resetting chip
[ 207.765734] [drm:i915_gem_reset [i915]] resetting render ring to restart from tail of request 0x4ee834
[ 207.765791] [drm:intel_print_rc6_info [i915]] Enabling RC6 states: RC6 on RC6p on RC6pp off
[ 207.767213] [drm:intel_guc_setup [i915]] GuC fw status: path (null), fetch NONE, load NONE
[ 207.767515] kernel BUG at drivers/gpu/drm/i915/i915_gem_request.c:203!
[ 207.767551] invalid opcode: 0000 [#1] PREEMPT SMP
[ 207.767576] Modules linked in: snd_hda_intel i915 cdc_ncm usbnet mii x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core mei_me mei snd_pcm sdhci_pci sdhci mmc_core e1000e ptp pps_core [last unloaded: i915]
[ 207.767808] CPU: 3 PID: 8855 Comm: gem_ringfill Tainted: G U 4.9.0-rc5-CI-Patchwork_3052+ #1
[ 207.767854] Hardware name: LENOVO 2356GCG/2356GCG, BIOS G7ET31WW (1.13 ) 07/02/2012
[ 207.767894] task:
ffff88012c82a740 task.stack:
ffffc9000383c000
[ 207.767927] RIP: 0010:[<
ffffffffa00a0a3a>] [<
ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915]
[ 207.767999] RSP: 0018:
ffffc9000383fb20 EFLAGS:
00010293
[ 207.768027] RAX:
00000000004ee83c RBX:
ffff880135dcb480 RCX:
00000000004ee83a
[ 207.768062] RDX:
ffff88012fea42a8 RSI:
0000000000000001 RDI:
ffff88012c82af68
[ 207.768095] RBP:
ffffc9000383fb48 R08:
0000000000000000 R09:
0000000000000000
[ 207.768129] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff880135dcb480
[ 207.768163] R13:
ffff88012fea42a8 R14:
0000000000000000 R15:
00000000000001d8
[ 207.768200] FS:
00007f955f658740(0000) GS:
ffff88013e2c0000(0000) knlGS:
0000000000000000
[ 207.768239] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 207.768258] CR2:
0000555899725930 CR3:
00000001316f6000 CR4:
00000000001406e0
[ 207.768286] Stack:
[ 207.768299]
ffff880135dcb480 ffff880135dcbe00 ffff88012fea42a8 0000000000000000
[ 207.768350]
00000000000001d8 ffffc9000383fb70 ffffffffa00a1339 0000000000000000
[ 207.768402]
ffff88012f296c88 00000000000003f0 ffffc9000383fbb0 ffffffffa00b582d
[ 207.768453] Call Trace:
[ 207.768493] [<
ffffffffa00a1339>] i915_gem_request_retire_upto+0x49/0x90 [i915]
[ 207.768553] [<
ffffffffa00b582d>] intel_ring_begin+0x15d/0x2d0 [i915]
[ 207.768608] [<
ffffffffa00b59cb>] intel_ring_alloc_request_extras+0x2b/0x40 [i915]
[ 207.768667] [<
ffffffffa00a2fd9>] i915_gem_request_alloc+0x359/0x440 [i915]
[ 207.768723] [<
ffffffffa008bd03>] i915_gem_do_execbuffer.isra.15+0x783/0x1a10 [i915]
[ 207.768766] [<
ffffffff811a6a2e>] ? __might_fault+0x3e/0x90
[ 207.768816] [<
ffffffffa008d380>] i915_gem_execbuffer2+0xc0/0x250 [i915]
[ 207.768854] [<
ffffffff815532a6>] drm_ioctl+0x1f6/0x480
[ 207.768900] [<
ffffffffa008d2c0>] ? i915_gem_execbuffer+0x330/0x330 [i915]
[ 207.768939] [<
ffffffff81202f6e>] do_vfs_ioctl+0x8e/0x690
[ 207.768972] [<
ffffffff818193ac>] ? retint_kernel+0x2d/0x2d
[ 207.769004] [<
ffffffff810d6ef2>] ? trace_hardirqs_on_caller+0x122/0x1b0
[ 207.769039] [<
ffffffff812035ac>] SyS_ioctl+0x3c/0x70
[ 207.769068] [<
ffffffff818189ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 207.769103] Code: 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 8b 35 fa 7b e1 e1 85 f6 0f 85 55 03 00 00 41 8b 84 24 80 02 00 00 85 c0 75 02 <0f> 0b 49 8b 94 24 a8 00 00 00 48 8b 8a e0 01 00 00 8b 89 c0 00
[ 207.769400] RIP [<
ffffffffa00a0a3a>] i915_gem_request_retire+0x2a/0x4b0 [i915]
[ 207.769463] RSP <
ffffc9000383fb20>
Let's add a couple more BUG_ONs before this to ascertain that the request
did make it to hardware. The impossible part of this stacktrace is that
request must have been considered completed by the i915_request_wait()
before we tried to retire it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161118143412.26508-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>