Comparisons which produce 32-bit boolean results (0 or 0xFFFFFFFF)
but operate on 16-bit types would first generate a CMP instruction
with W or HF types, before expanding it out. This CMP is a partial
write, which leads us to think the register may contain some prior
contents still. When placed in a loop, this causes its live range
to extend beyond its real life time.
Mark the register with UNDEF first so that we know that no prior
contents exist and need to be preserved.
This affects:
flt32, fge32, feq32, fneu32, ilt32, ult32, ige32, uge32, ieq32, ine32
On one of Cyberpunk 2077's most complex compute shaders, this reduces
the maximum live registers from 696 to 537 (22.8%). Together with the
next patch, Cyberpunk's spills and fills are cut by 10.23% and 9.19%,
respectively.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22835>
fs_reg dest = result;
const uint32_t bit_size = nir_src_bit_size(instr->src[0].src);
- if (bit_size != 32)
+ if (bit_size != 32) {
dest = bld.vgrf(op[0].type, 1);
+ bld.UNDEF(dest);
+ }
bld.CMP(dest, op[0], op[1], brw_cmod_for_nir_comparison(instr->op));
fs_reg dest = result;
const uint32_t bit_size = type_sz(op[0].type) * 8;
- if (bit_size != 32)
+ if (bit_size != 32) {
dest = bld.vgrf(op[0].type, 1);
+ bld.UNDEF(dest);
+ }
bld.CMP(dest, op[0], op[1],
brw_cmod_for_nir_comparison(instr->op));