intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops
authorJordan Justen <jordan.l.justen@intel.com>
Wed, 19 Apr 2023 00:11:41 +0000 (20:11 -0400)
committerJordan Justen <jordan.l.justen@intel.com>
Thu, 20 Apr 2023 18:41:10 +0000 (11:41 -0700)
commitfcb72ffd0c61e2b3226306fae37b85ab4982a39e
treeb60e3954e75d539dbea07887614fdef69a5bfbad
parent74ab9401561c5d5bef62330c0b1264f42bfe52da
intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops

For MTL (verx10 == 125), float64 is supported, but int64 is not.
Therefore we need to lower cluster broadcast using 32-bit int ops.

For gfx12.5+ platforms that support int64, the register regions
used by cluster broadcast aren't supported by the 64-bit pipeline.

On MTL, dEQP-VK.subgroups.clustered.*_double* and
dEQP-VK.subgroups.clustered.*_dvec* were failing to validate the
compiled shader in debug mode, and reportedly gpu-hanging in release
mode.

With this change dEQP-VK.subgroups.clustered.*_double* passed all 48
tests and dEQP-VK.subgroups.clustered.*_dvec* passed all 140 tests on
MTL.

Rework:
 * Move from generator to brw_fs_lower_regioning.cpp. (Suggested by
   Francisco)
 * Apply to verx10 >= 125.. (Suggested by Francisco)

Cc: 23.1 <mesa-stable>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marcin Ĺšlusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>
src/intel/compiler/brw_fs_lower_regioning.cpp