drm/amdgpu: workaround for TLB seq race
authorChristian König <christian.koenig@amd.com>
Wed, 2 Nov 2022 13:55:13 +0000 (14:55 +0100)
committerAlex Deucher <alexander.deucher@amd.com>
Wed, 9 Nov 2022 22:22:48 +0000 (17:22 -0500)
It can happen that we query the sequence value before the callback
had a chance to run.

Workaround that by grabbing the fence lock and releasing it again.
Should be replaced by hw handling soon.

Signed-off-by: Christian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org # 5.19+
Fixes: 5255e146c99a6 ("drm/amdgpu: rework TLB flushing")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Stefan Springer <stefanspr94@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 83acb7b..1d31771 100644 (file)
@@ -492,6 +492,21 @@ void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct seq_file *m);
  */
 static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
 {
+       unsigned long flags;
+       spinlock_t *lock;
+
+       /*
+        * Workaround to stop racing between the fence signaling and handling
+        * the cb. The lock is static after initially setting it up, just make
+        * sure that the dma_fence structure isn't freed up.
+        */
+       rcu_read_lock();
+       lock = vm->last_tlb_flush->lock;
+       rcu_read_unlock();
+
+       spin_lock_irqsave(lock, flags);
+       spin_unlock_irqrestore(lock, flags);
+
        return atomic64_read(&vm->tlb_seq);
 }