drm/i915: Check for rq->hwsp validity after acquiring RCU lock
authorChris Wilson <chris@chris-wilson.co.uk>
Fri, 18 Dec 2020 12:24:21 +0000 (12:24 +0000)
committerChris Wilson <chris@chris-wilson.co.uk>
Fri, 18 Dec 2020 18:49:40 +0000 (18:49 +0000)
commit9bb36cf66091ddf2d8840e5aa705ad3c93a6279b
tree4f181372f33c646db2d2816baa329e6baf3bde84
parent0a982c15711ec03d77a6f11b33559be76323fb16
drm/i915: Check for rq->hwsp validity after acquiring RCU lock

Since we allow removing the timeline map at runtime, there is a risk
that rq->hwsp points into a stale page. To control that risk, we hold
the RCU read lock while reading *rq->hwsp, but we missed a couple of
important barriers. First, the unpinning / removal of the timeline map
must be after all RCU readers into that map are complete, i.e. after an
rcu barrier (in this case courtesy of call_rcu()). Secondly, we must
make sure that the rq->hwsp we are about to dereference under the RCU
lock is valid. In this case, we make the rq->hwsp pointer safe during
i915_request_retire() and so we know that rq->hwsp may become invalid
only after the request has been signaled. Therefore is the request is
not yet signaled when we acquire rq->hwsp under the RCU, we know that
rq->hwsp will remain valid for the duration of the RCU read lock.

This is a very small window that may lead to either considering the
request not completed (causing a delay until the request is checked
again, any wait for the request is not affected) or dereferencing an
invalid pointer.

Fixes: 3adac4689f58 ("drm/i915: Introduce concept of per-timeline (context) HWSP")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.1+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201218122421.18344-1-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
drivers/gpu/drm/i915/gt/intel_timeline.c
drivers/gpu/drm/i915/i915_request.h