drm/i915: add guard page to ggtt->error_capture
authorAndrzej Hajda <andrzej.hajda@intel.com>
Fri, 10 Mar 2023 09:23:50 +0000 (10:23 +0100)
committerAndrzej Hajda <andrzej.hajda@intel.com>
Thu, 16 Mar 2023 16:14:41 +0000 (17:14 +0100)
Write-combining memory allows speculative reads by CPU.
ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
to prefetch memory beyond the error_capture, ie it tries
to read memory pointed by next PTE in GGTT.
If this PTE points to invalid address DMAR errors will occur.
This behaviour was observed on ADL and RPL platforms.
To avoid it, guard scratch page should be added after error_capture.
The patch fixes the most annoying issue with error capture but
since WC reads are used also in other places there is a risk similar
problem can affect them as well.

v2:
  - modified commit message (I hope the diagnosis is correct),
  - added bug checks to ensure scratch is initialized on gen3 platforms.
    CI produces strange stacktrace for it suggesting scratch[0] is NULL,
    to be removed after resolving the issue with gen3 platforms.
v3:
  - removed bug checks, replaced with gen check.
v4:
  - change code for scratch page insertion to support all platforms,
  - add info in commit message there could be more similar issues
v5:
  - check for nop_clear_range instead of gen8 (Tvrtko),
  - re-insert scratch pages on resume (Tvrtko)
v6:
  - use scratch_range callback to set scratch pages (Chris)

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308-guard_error_capture-v6-2-1b5f31422563@intel.com
drivers/gpu/drm/i915/gt/intel_ggtt.c

index 8a45624203b84d1b9087cd64003b2198cf59e601..c1dfb219075a45dc2248ccbb6e4cd6a22c89bfbe 100644 (file)
@@ -571,8 +571,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
                 * paths, and we trust that 0 will remain reserved. However,
                 * the only likely reason for failure to insert is a driver
                 * bug, which we expect to cause other failures...
+                *
+                * Since CPU can perform speculative reads on error capture
+                * (write-combining allows it) add scratch page after error
+                * capture to avoid DMAR errors.
                 */
-               ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
+               ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
                ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
                if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
                        drm_mm_insert_node_in_range(&ggtt->vm.mm,
@@ -582,11 +586,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
                                                    0, ggtt->mappable_end,
                                                    DRM_MM_INSERT_LOW);
        }
-       if (drm_mm_node_allocated(&ggtt->error_capture))
+       if (drm_mm_node_allocated(&ggtt->error_capture)) {
+               u64 start = ggtt->error_capture.start;
+               u64 size = ggtt->error_capture.size;
+
+               ggtt->vm.scratch_range(&ggtt->vm, start, size);
                drm_dbg(&ggtt->vm.i915->drm,
                        "Reserved GGTT:[%llx, %llx] for use by error capture\n",
-                       ggtt->error_capture.start,
-                       ggtt->error_capture.start + ggtt->error_capture.size);
+                       start, start + size);
+       }
 
        /*
         * The upper portion of the GuC address space has a sizeable hole
@@ -1279,6 +1287,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
 
        flush = i915_ggtt_resume_vm(&ggtt->vm);
 
+       if (drm_mm_node_allocated(&ggtt->error_capture))
+               ggtt->vm.scratch_range(&ggtt->vm, ggtt->error_capture.start,
+                                      ggtt->error_capture.size);
+
        ggtt->invalidate(ggtt);
 
        if (flush)