drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaled
authorJiadong Zhu <Jiadong.Zhu@amd.com>
Wed, 24 May 2023 03:42:19 +0000 (11:42 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 9 Jun 2023 15:02:05 +0000 (11:02 -0400)
When MEC executes unmap_queue for mid command buffer preemption, it will
kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the
preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption
done. There is a race condition that PFP may excute the resetting command
before MEC set CP_VMID_PREEMPT. As a result, hang happens as
CP_VMID_PREEMPT is always 0xffff.

To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing
fence is siganled and update gfx write pointer explicitly.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

index fe090eafa3665254db56755b9c297f8852c7a150..45fa02063491d745734f50f00d55148cc47bf30f 100644 (file)
@@ -5400,10 +5400,6 @@ static int gfx_v9_0_ring_preempt_ib(struct amdgpu_ring *ring)
        amdgpu_ring_alloc(ring, 13);
        gfx_v9_0_ring_emit_fence(ring, ring->trail_fence_gpu_addr,
                                 ring->trail_seq, AMDGPU_FENCE_FLAG_EXEC | AMDGPU_FENCE_FLAG_INT);
-       /*reset the CP_VMID_PREEMPT after trailing fence*/
-       amdgpu_ring_emit_wreg(ring,
-                             SOC15_REG_OFFSET(GC, 0, mmCP_VMID_PREEMPT),
-                             0x0);
 
        /* assert IB preemption, emit the trailing fence */
        kiq->pmf->kiq_unmap_queues(kiq_ring, ring, PREEMPT_QUEUES_NO_UNMAP,
@@ -5426,6 +5422,10 @@ static int gfx_v9_0_ring_preempt_ib(struct amdgpu_ring *ring)
                DRM_WARN("ring %d timeout to preempt ib\n", ring->idx);
        }
 
+       /*reset the CP_VMID_PREEMPT after trailing fence*/
+       amdgpu_ring_emit_wreg(ring,
+                             SOC15_REG_OFFSET(GC, 0, mmCP_VMID_PREEMPT),
+                             0x0);
        amdgpu_ring_commit(ring);
 
        /* deassert preemption condition */