drm/i915: Use the active reference on the vma while capturing
During error capture, we need to take a reference to the vma from before
the reset in order to catpure the contents of the vma later. Currently
we are using both an active reference and a kref, but due to nature of
the i915_vma reference handling, that kref is on the vma->obj and not
the vma itself. This means the vma may be destroyed as soon as it is
idle, that is in between the i915_active_release(&vma->active) and the
i915_vma_put(vma):
<3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<3> [197.866339] Read of size 8 at addr
ffff8881258cb800 by task gem_exec_captur/1041
<3> [197.866467]
<4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1
<4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.
1608081044 08/08/2016
<4> [197.866530] Call Trace:
<4> [197.866549] dump_stack+0x99/0xd0
<4> [197.866760] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.866783] print_address_description.constprop.8+0x3e/0x60
<4> [197.866797] ? kmsg_dump_rewind_nolock+0xd4/0xd4
<4> [197.866819] ? lockdep_hardirqs_off+0xd4/0x120
<4> [197.867037] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867249] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867270] kasan_report.cold.10+0x1f/0x37
<4> [197.867492] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867710] intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867949] i915_gpu_coredump.part.29+0x150/0x7b0 [i915]
<4> [197.868186] i915_capture_error_state+0x5e/0xc0 [i915]
<4> [197.868396] intel_gt_handle_error+0x6eb/0xa20 [i915]
<4> [197.868624] ? intel_gt_reset_global+0x370/0x370 [i915]
<4> [197.868644] ? check_flags+0x50/0x50
<4> [197.868662] ? __lock_acquire+0xd59/0x6b00
<4> [197.868678] ? register_lock_class+0x1ad0/0x1ad0
<4> [197.868944] i915_wedged_set+0xcf/0x1b0 [i915]
<4> [197.869147] ? i915_wedged_get+0x90/0x90 [i915]
<4> [197.869371] ? i915_wedged_get+0x90/0x90 [i915]
<4> [197.869398] simple_attr_write+0x153/0x1c0
<4> [197.869428] full_proxy_write+0xee/0x180
<4> [197.869442] ? __sb_start_write+0x1f3/0x310
<4> [197.869465] vfs_write+0x1a3/0x640
<4> [197.869492] ksys_write+0xec/0x1c0
<4> [197.869507] ? __ia32_sys_read+0xa0/0xa0
<4> [197.869525] ? lockdep_hardirqs_on_prepare+0x32b/0x4e0
<4> [197.869541] ? syscall_enter_from_user_mode+0x1c/0x50
<4> [197.869566] do_syscall_64+0x33/0x80
<4> [197.869579] entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4> [197.869590] RIP: 0033:0x7fd8b7aee281
<4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [197.869613] RSP: 002b:
00007ffea3b72008 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
<4> [197.869625] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007fd8b7aee281
<4> [197.869633] RDX:
0000000000000002 RSI:
00007fd8b81a82e7 RDI:
000000000000000d
<4> [197.869641] RBP:
0000000000000002 R08:
0000000000000000 R09:
0000000000000034
<4> [197.869650] R10:
0000000000000000 R11:
0000000000000246 R12:
00007fd8b81a82e7
<4> [197.869658] R13:
000000000000000d R14:
0000000000000000 R15:
0000000000000000
<3> [197.869707]
<3> [197.869757] Allocated by task 1041:
<4> [197.869833] kasan_save_stack+0x19/0x40
<4> [197.869843] __kasan_kmalloc.constprop.5+0xc1/0xd0
<4> [197.869853] kmem_cache_alloc+0x106/0x8e0
<4> [197.870059] i915_vma_instance+0x212/0x1930 [i915]
<4> [197.870270] eb_lookup_vmas+0xe06/0x1d10 [i915]
<4> [197.870475] i915_gem_do_execbuffer+0x131d/0x4080 [i915]
<4> [197.870682] i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915]
<4> [197.870701] drm_ioctl_kernel+0x1d2/0x270
<4> [197.870710] drm_ioctl+0x40d/0x85c
<4> [197.870721] __x64_sys_ioctl+0x10d/0x170
<4> [197.870731] do_syscall_64+0x33/0x80
<4> [197.870740] entry_SYSCALL_64_after_hwframe+0x44/0xa9
<3> [197.870748]
<3> [197.870798] Freed by task 22:
<4> [197.870865] kasan_save_stack+0x19/0x40
<4> [197.870875] kasan_set_track+0x1c/0x30
<4> [197.870884] kasan_set_free_info+0x1b/0x30
<4> [197.870894] __kasan_slab_free+0x111/0x160
<4> [197.870903] kmem_cache_free+0xcd/0x710
<4> [197.871109] i915_vma_parked+0x618/0x800 [i915]
<4> [197.871307] __gt_park+0xdb/0x1e0 [i915]
<4> [197.871501] ____intel_wakeref_put_last+0xb1/0x190 [i915]
<4> [197.871516] process_one_work+0x8dc/0x15d0
<4> [197.871525] worker_thread+0x82/0xb30
<4> [197.871535] kthread+0x36d/0x440
<4> [197.871545] ret_from_fork+0x22/0x30
<3> [197.871553]
<3> [197.871602] The buggy address belongs to the object at
ffff8881258cb740
which belongs to the cache i915_vma of size 968
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553
Fixes:
2850748ef876 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk