From db204121686726c74dd0aba2d1c1790d40e7baba Mon Sep 17 00:00:00 2001 From: Ian Romanick Date: Tue, 8 Feb 2022 13:26:13 -0800 Subject: [PATCH] intel/fs: Fix constant propagation into 32x16 integer multiplication Don't copy propagate the constant in situations like mov(8) g8<1>D 0x7fffffffD mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W On platforms that only have a 32x16 multiplier, this will result in lowering the multiply to mul(8) g15<1>D g14<8,8,1>D 0xffffUW mul(8) g16<1>D g14<8,8,1>D 0x7fffUW add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD Volume 2a of the Skylake PRM says: When multiplying a DW and any lower precision integer, the DW operand must on src0. See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104. Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I don't think it would be possible to create a situation where this could occur. I discovered this via some optimizations that can determine that the non-constant source must be able to fit in 16-bits. The case listed above came from piglit's "ext_transform_feedback-order arrays points" with those optimizations in place. No shader-db or fossil-db changes on any Intel platform. Fixes: de6c0f84879 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2") Reviewed-by: Caio Oliveira Part-of: --- src/intel/compiler/brw_fs_copy_propagation.cpp | 27 ++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp index 8535b31..9819b5d 100644 --- a/src/intel/compiler/brw_fs_copy_propagation.cpp +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp @@ -836,6 +836,33 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) inst->src[i] = val; progress = true; } else if (i == 0 && inst->src[1].file != IMM) { + /* Don't copy propagate the constant in situations like + * + * mov(8) g8<1>D 0x7fffffffD + * mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W + * + * On platforms that only have a 32x16 multiplier, this will + * result in lowering the multiply to + * + * mul(8) g15<1>D g14<8,8,1>D 0xffffUW + * mul(8) g16<1>D g14<8,8,1>D 0x7fffUW + * add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW + * + * On Gfx8 and Gfx9, which have the full 32x32 multiplier, it + * results in + * + * mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD + * + * Volume 2a of the Skylake PRM says: + * + * When multiplying a DW and any lower precision integer, the + * DW operand must on src0. + */ + if (inst->opcode == BRW_OPCODE_MUL && + type_sz(inst->src[1].type) < 4 && + type_sz(val.type) == 4) + break; + /* Fit this constant in by commuting the operands. * Exception: we can't do this for 32-bit integer MUL/MACH * because it's asymmetric. -- 2.7.4