Heuristic-based optimization throttling CCS work (async compute).
Without throttling, background compute work consumes all threads,
deminishing performance gains by running dispatch in parallel with
3D work.
Optimization is heuristics based, meaning a workload might slow
down when using async compute.
Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this
equates to a max CCS thread occupancy of 37.5%.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>
ANV_PIPE_HDC_PIPELINE_FLUSH_BIT);
}
- anv_batch_emit(&batch, GENX(STATE_COMPUTE_MODE), zero);
+ anv_batch_emit(&batch, GENX(STATE_COMPUTE_MODE), cm) {
+ cm.PixelAsyncComputeThreadLimit = 4;
+ cm.PixelAsyncComputeThreadLimitMask = 0x7;
+ }
#endif
init_common_queue_state(queue, &batch);