anv: set ComputeMode.PixelAsyncComputeThreadLimit = 4
authorFelix DeGrood <felix.j.degrood@intel.com>
Wed, 13 Sep 2023 20:56:59 +0000 (20:56 +0000)
committerMarge Bot <emma+marge@anholt.net>
Tue, 17 Oct 2023 18:09:29 +0000 (18:09 +0000)
Heuristic-based optimization throttling CCS work (async compute).
Without throttling, background compute work consumes all threads,
deminishing performance gains by running dispatch in parallel with
3D work.

Optimization is heuristics based, meaning a workload might slow
down when using async compute.

Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this
equates to a max CCS thread occupancy of 37.5%.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>

src/intel/vulkan/genX_init_state.c

index ff53419..deb9007 100644 (file)
@@ -654,7 +654,10 @@ init_compute_queue_state(struct anv_queue *queue)
           ANV_PIPE_HDC_PIPELINE_FLUSH_BIT);
    }
 
-   anv_batch_emit(&batch, GENX(STATE_COMPUTE_MODE), zero);
+   anv_batch_emit(&batch, GENX(STATE_COMPUTE_MODE), cm) {
+      cm.PixelAsyncComputeThreadLimit = 4;
+      cm.PixelAsyncComputeThreadLimitMask = 0x7;
+   }
 #endif
 
    init_common_queue_state(queue, &batch);